Crawls the indicated tree, following links and references in HTML pages, CSS stylesheets, and JavaScript files to construct and return the complete tree of reachable resources.
Crawl an existing site
You can use @crawl
to crawl an existing website and copy the resulting crawled tree for local inspection.
In this case, the tree
parameter is typically a SiteTree. A convenient way to wrap an existing site is with the tree
protocol (or treehttp
for non-secure HTTP sites) in a URL.
For example, you can copy the original Space Jam website to a local folder called spacejam
via:
$ ori "@copy @crawl(tree://www.spacejam.com/1996/), @files/spacejam"
Crawling is a network-intensive operation, so a command to crawl a site like the (surprisingly large!) site above can take a long time to complete – on the order of minutes.
Shorthand: If the first parameter to @crawl
is a string, it will be interpreted as the host of an HTTPS site, so in cases where you want to crawl the top level of a domain like example.com
, you can use a simpler form:
$ ori @copy @crawl/example.com, @files/example
Check an Origami site for broken links
If the crawl operation finds references that do not exist, it will return those in a crawl-errors.json
entry at the top level of the returned tree. You can use this to crawl a site you’re creating in Origami to find broken links.
Give @crawl
a reference to the .ori
or .js
file that defines your site’s root. For example, if you define your site in a file src/site.ori
:
$ ori "@copy @crawl(src/site.ori), @files/crawl"
Then inspect the local file crawl/crawl-errors.json
(if it exists) for paths that were referenced by pages in your site but which your site does not actually define.