html2dom

I originally blogged about html2dom on the Mozilla Security Blog

Having spent significant time to review the source code of some Firefox OS core apps, I noticed that a lot of developers like to use innerHTML (or insertAdjacentHTML). It is indeed a useful API to insert HTML from a given string without hand-crafting objects for each and every node you want to insert into the DOM. The dilemma begins however, when this is not a hardcoded string but something which is constructed dynamically. If the string contains user input (or something from a malicious third-party - be it app or website), it may as well insert and change application logic (Cross-Site Scripting): The typical example would be a <script> tag that runs code on the attacker's behalf and reads, modifies or forwards the current content to a third-party. CSP, which we use in Firefox OS, can only mitigate some of these attacks, but certainly not all.

Using innerHTML is bad (Hint: DOM XSS)

What's also frustrating about these pieces of code is that analyzing it requires you to manually trace every function call and variable back to its definition to see whether it is indeed tainted by user input.

With code changing frequently those reviews don't really scale. One possible approach is to avoid using innerHTML for good. Even though this idea sounds a bit naive, I have dived into the world of automated HTML parsing and code generation to see how feasible it is.

Enter html2dom

For the sake of experimentation (and solving this neatly self-contained problem), I have created html2dom. html2dom is a tiny library that accepts a HTML string and returns alternative JavaScript source code. Example: <p id="greeting">Hello <b>World</b></p>

Will yield this (as a string).

var docFragment = document.createDocumentFragment();
// this fragment contains all DOM nodes
var greeting = document.createElement('P');
greeting.setAttribute("id", "greeting");
docFragment.appendChild(greeting);
var text = document.createTextNode("Hello ");
greeting.appendChild(text);
var b = document.createElement('B');
greeting.appendChild(b);
var text_0 = document.createTextNode("World");
b.appendChild(text_0);

As you can see, html2dom tries to use meaningful variable names to make the code readable. If you want, you can try the demo here. Now we could also just replace the "World" string with a JavaScript variable. It cannot do any harm as it is always rendered as text.

When it comes to HTML parsers, you also don't want to write your own.

Luckily, there are numerous very useful APIs which helped making the development of html2dom fairly easy. First there is the DOMParser API which took care about all HTML parsing. Using the DOM tree output, I could just iterate over all nodes and their children to emit a specific piece of JavaScript depending on its type (e.g., HTML or Text). For this, the nodeIterator turned out really valuable.

I have also written a few unit tests, so if you want to start messing with my code, I suggest you start by checking them out right away.

Known Bugs & Security

This tool doesn't really save you from all of your troubles. But if you can, make sure that the user input is always somewhere in a text node, then html2dom can prevent you from a great deal of harm. Give it a try!

On the horizon

I have also been looking at attempts to rewrite potentially dangerous JavaScript automatically. This is at an early stage and still experimental but you can look at a prototype here


If you find a mistake in this article, you can submit a pull request on GitHub.

Other posts

  1. The Mozilla Monument in San Francisco (Fri 05 July 2024)
  2. What is mixed content? (Sat 15 June 2024)
  3. How I got a new domain name (Sat 15 June 2024)
  4. How Firefox gives special permissions to some domains (Fri 02 February 2024)
  5. Examine Firefox Inter-Process Communication using JavaScript in 2023 (Mon 17 April 2023)
  6. Origins, Sites and other Terminologies (Sat 14 January 2023)
  7. Finding and Fixing DOM-based XSS with Static Analysis (Mon 02 January 2023)
  8. DOM Clobbering (Mon 12 December 2022)
  9. Neue Methoden für Cross-Origin Isolation: Resource, Opener & Embedding Policies mit COOP, COEP, CORP und CORB (Thu 10 November 2022)
  10. Reference Sheet for Principals in Mozilla Code (Mon 03 August 2020)
  11. Hardening Firefox against Injection Attacks – The Technical Details (Tue 07 July 2020)
  12. Understanding Web Security Checks in Firefox (Part 1) (Wed 10 June 2020)
  13. Help Test Firefox's built-in HTML Sanitizer to protect against UXSS bugs (Fri 06 December 2019)
  14. Remote Code Execution in Firefox beyond memory corruptions (Sun 29 September 2019)
  15. XSS in The Digital #ClimateStrike Widget (Mon 23 September 2019)
  16. Chrome switching the XSSAuditor to filter mode re-enables old attack (Fri 10 May 2019)
  17. Challenge Write-up: Subresource Integrity in Service Workers (Sat 25 March 2017)
  18. Finding the SqueezeBox Radio Default SSH Password (Fri 02 September 2016)
  19. New CSP directive to make Subresource Integrity mandatory (`require-sri-for`) (Thu 02 June 2016)
  20. Firefox OS apps and beyond (Tue 12 April 2016)
  21. Teacher's Pinboard Write-up (Wed 02 December 2015)
  22. A CDN that can not XSS you: Using Subresource Integrity (Sun 19 July 2015)
  23. The Twitter Gazebo (Sat 18 July 2015)
  24. German Firefox 1.0 ad (OCR) (Sun 09 November 2014)
  25. My thoughts on Tor appliances (Tue 14 October 2014)
  26. Subresource Integrity (Sun 05 October 2014)
  27. Revoke App Permissions on Firefox OS (Sun 24 August 2014)
  28. (Self) XSS at Mozilla's internal Phonebook (Fri 23 May 2014)
  29. Tales of Python's Encoding (Mon 17 March 2014)
  30. On the X-Frame-Options Security Header (Thu 12 December 2013)
  31. html2dom (Tue 24 September 2013)
  32. Security Review: HTML sanitizer in Thunderbird (Mon 22 July 2013)
  33. Week 29 2013 (Sun 21 July 2013)
  34. The First Post (Tue 16 July 2013)
π