Crawl an existing site
You can use
@crawl to crawl an existing website and copy the resulting crawled tree for local inspection.
In this case, the
tree parameter is typically a SiteTree. A convenient way to wrap an existing site is with the
tree protocol (or
treehttp for non-secure HTTP sites) in a URL.
For example, you can copy the original Space Jam website to a local folder called
$ ori "@copy @crawl(tree://www.spacejam.com/1996/), @files/spacejam"
Crawling is a network-intensive operation, so a command to crawl a site like the (surprisingly large!) site above can take a long time to complete – on the order of minutes.
Shorthand: If the first parameter to
@crawl is a string, it will be interpreted as the host of an HTTPS site, so in cases where you want to crawl the top level of a domain like
example.com, you can use a simpler form:
$ ori @copy @crawl/example.com, @files/example
Check an Origami site for broken links
If the crawl operation finds references that do not exist, it will return those in a
crawl-errors.json entry at the top level of the returned tree. You can use this to crawl a site you’re creating in Origami to find broken links.
@crawl a reference to the
.js file that defines your site’s root. For example, if you define your site in a file
$ ori "@copy @crawl(src/site.ori), @files/crawl"
Then inspect the local file
crawl/crawl-errors.json (if it exists) for paths that were referenced by pages in your site but which your site does not actually define.