Origami.

htmlParse(html)

This parses the given HTML text into a DOM structure, which it returns as a plain JavaScript object. This object has the same format used by the related Origami.xmlParse builtin.

Given the following HTML file, adams.html:

<p>
  A learning experience is one of those things that say,
  <i>“You know that thing you just did? Don't do that.”</i>
  –Douglas Adams, <cite>The Salmon of Doubt</cite>
</p>

Passing this to Origami.htmlParse returns the document structure as a plain object. In the console this will render by default in YAML form:

$ ori Origami.htmlParse adams.html
name: p
children:
  - name: "#text"
    text: " A learning experience is one of those things that say, "
  - name: i
    text: “You know that thing you just did? Don't do that.”
  - name: "#text"
    text: " –Douglas Adams, "
  - name: cite
    text: The Salmon of Doubt