site:

crawl(tree, [baseHref])

Crawls the indicated live site and returns the complete tree of reachable resources. This includes following links and references in HTML pages, CSS stylesheets, and JavaScript files.

(If you already have a specific set of known resources you want to fetch from a site, you can extract specific resources from a site.)

Crawl an existing site

You can use crawl to crawl an existing website and copy the resulting crawled tree for local inspection.

In this case, the tree parameter is typically a SiteTree. A convenient way to wrap an existing site is with the httpstree protocol (or httptree for non-secure HTTP sites) in a URL.

For example, you can copy the original Space Jam website to a local folder called spacejam via:

$ ori "copy crawl(httpstree://www.spacejam.com/1996/), files:spacejam"

On a machine that doesn’t have Origami installed, you can invoke ori via npm’s npx command:

$ npx ori "copy crawl(httpstree://www.spacejam.com/1996/), files:spacejam"

Crawling is a network-intensive operation, so a command to crawl a site like the (surprisingly large!) site above can take a long time to complete – on the order of minutes.

If the crawl operation finds links to internal references that do not exist, it will return those in a crawl-errors.json entry at the top level of the returned tree.

You can also use the related audit builtin to audit a site for broken internal links.