Helen Grimbly, Support Lead
This week, Helen Grimbly, Support Lead at Sitemorse, is looking at why different pages are sometimes assessed in assesments of the same size (e.g. 125 pages assessments).
We often get asked about how our crawler works, how it finds pages and how it knows where to go. So let's take a look.
When the scanner spiders the site it will follow all the clickable links from one page to the following pages (which are within the same domain and under the initial directory of the starting URL, unless inclusions or exclusions are added). However, the order of the clickable links that are followed from an individual page is random. So the scan is not necessarily the same each time.
So once the links are assessed on one page, the next pages followed will be from the previous page's links - but the order those links are followed is not always the same, so different pages can be assessed each time.
The structure of the website can factor in how random this can be, for example:
Structure 1: one starting page contains 100 links to other pages (which are within the same domain / initial directory path):
- If we run two separate 50 pages assessments on that page, each assessment could follow a different set of 50 out of those 100 links. Hence the assessments could assess different pages, and therefore have different scores.
Structure 2: one starting page contains 50 links to other pages (which are within the same domain / initial directory path):
- If we run two separate 50 page assessments on that starting page, each assessment is likely to cover the same pages linked to, and therefore have a similar score.
This explains why sometimes different pages are assessed, but it should also be noted that different pages can be assessed each time, because sites change over time. Pages are removed or added, or the structure of the site is changed. So if a page appears in one audit but not another, we recommend reviewing the report where the link in question is found, by using the "linked to" information of a diagnostic for example, to then check whether that page or link still appears on the live site.