Using the JSON examples

As a way to illustrate how to use the detailed JSON output, this document takes a series of practical examples by asking some typical questions.

Were all my pages tested and when?

When a report is run, it starts from a supplied URL and follows links from page to page until the specifed number of pages has been reached. Although the term pages is used, types of content other than HTML pages can count as a page. For example, if a PDF is found, this will count as a page (regardless of how many pages the actual PDF contains). MS Office documents also count as a page each. To clarify this, if a 20 page report is requested, the results might be 15 HTML pages, 4 PDFs, and an MS Word document.

When specifying the number of pages to test, keep in mind you might need to specify more pages than the number of HTML pages you think exist.

The summary section of the report contains information about how many pages were requested and how many pages were tested, along with information about when this was done.

{
  "summary": {
    ...
    "start": "2022-10-28T09:12:01.850Z",
    "finish": "2022-10-28T09:13:19.062Z",
    "limits": ["pages"],
    "pages": 138,
    "requestedPages": 150,
    "urls": 686
  },
  ...
}

Comparing requestedPages and pages indicates if the scan ran out of pages to find before the requested number of pages was reached.

The start and finish times show when the report was run.

What items have been tested?

The detailed JSON has a pages section.

{
  ...
  "pages": [
    "https://mycarda.info/",
    "https://mycarda.info/about.html",
    "https://mycarda.info/contact.html",
    "https://mycarda.info/index.html",
    ...
  }
  ...
}

Listing the pages tested is a simple matter of iterating through the pages.

var report = JSON.parse(DetailJSON);
var pages = report.pages;
pages.forEach( page => {
  console.log(`Page tested ${page}`);
});

Note that not all items in the pages section will be HTML pages if the report has been configured to test additional types of content, such as PDFs. See the Were all my pages tested and when? section for more details about this.

In this example, we will locate the broken links for each page, firstly assuming that none of the links are redirects, and secondly, how to follow a redirect and see if the ultimate destination of the link is broken. We will also look at just finding broken links that are clickable.

The detailed JSON has a urls section. In the following example, the detailed information for each URL has been omitted for clarity.

{
  ...
  "urls": {
    "https://mycarda.info/": { ... }
    "https://mycarda.info/tests/PDF.html": { ... }
    "https://mycarda.info/assets/img/post-bg.jpg": { ... },
    "https://mycarda.info/contact.html": { ... },
    "https://mycarda.info/css/styles.css": { ... },
    "https://mycarda.info/index.html": { ... },
    "https://mycarda.info/js/scripts.js": { ... },
    "https://mycarda.info/tests/": { ... },
    "https://mycarda.info/tests/302.php": { ... },
    "https://mycarda.info/tests/302Blank.php": { ... },
    ...
  }
}

Each item in the urls section contains information about each link in the report. Within each url the important values to consider for broken links are contentTested and ok

{
  ...
  "urls": {
    ...
    "https://mycarda.info/tests/links.html": {
      "contentTested": true,
      ...
      "ok": true,
      ...
    }
    ...
  }
  ...
}

The ok values of true indicates the URL was not broken. The value of contentTested needs to be considered because it is possible to request that some URLs that are known to the report, should be ignored. This can be specified using custom configuration options. See the configuration documentation to more details about this.

Here is an example of a URL for a broken link where ok is false.

{
  ...
  "urls": {
    ...
    "https://www.mycarda2.info/media/2017-12-06-KentISD.pdf": {
      "diagnostics": [
        {
          "category": "links",
          "level": "serious",
          "message": "Address not found in DNS",
          "module": "core",
          "name": "dns-not-found",
          "type": "url"
        }
      ],
      "finish": "2022-10-23T13:59:58.318Z",
      "ok": false,
      "start": "2022-10-23T13:59:58.315Z"
    },
    ...
  }
  ...
}

Note in the above example, the diagnostics section contains details of why the link was broken (message) and the severity of the issue (level).

Ignoring redirects

If you are just interested in how many of the links are broken, you could just count the number of URLs with ok set to false, however it is probably more useful to know which pages contain the broken links. For this, we can dig into the JSON a little deeper.

For each of the page tested in the urls section, there is a links section as in the following example.

{
  ...
  "urls": {
    ...
    "https://mycarda.info/tests/links.html": {
      ...
      "links": {
        "https://fonts.googleapis.com/css?family=Open+Sans:300italic": [ ... ],
        "https://fonts.gstatic.com/MoFoq92mQ.ttf": [ ... ],
        "https://fonts.gstatic.com/s/opensans/v34/me1x4gaVc.ttf": [ ... ],
        "https://fonts.gstatic.com/s/opensans/v34/mex4gaVc.ttf": [ ... ],
        "https://mycarda.info/assets/favicon.ico": [ ... ],
        "https://mycarda.info/assets/img/post-bg.jpg": [ ... ],
        "https://mycarda.info/contact.html": [ ... ],
        "https://mycarda.info/css/styles.css": [ ... ],
        "https://mycarda.info/index.html": [ ... ],
        "https://mycarda.info/js/scripts.js": [ ... ],
        "https://mycarda.info/tests/": [ ... ],
        ...
      }
      ...
    }
    ...
  }
  ...
}

The links section lists all the links on each page.

Putting this all together, we can iterate through all the pages tested as in the what items have been tested section, for each page, iterate through the links section and check each link.

var report = JSON.parse(DetailJSON);
var urls = report.urls;
var pages = report.pages;

pages.forEach( page => {
  console.log(`Page tested ${page}`);
  var linkObjects = Object.keys( urls[page].links );
  linkObjects.forEach( link => {
    if ( typeof urls[link] !== 'undefined' ) {  // check the page has some links
      if ( urls[link].ok == false) {
        console.log( `  broken ${link}` );
      }
    }
  });
});

Example output.

Page tested https://mycarda.info/
Page tested https://mycarda.info/about.html
  broken http://aircraft.mycarda.co.uk:1090/gmap.html
Page tested https://mycarda.info/contact.html
Page tested https://mycarda.info/index.html
Page tested https://mycarda.info/post.html
Page tested https://mycarda.info/shakespeare/
  broken https://mycarda.info/shakespeare/anthony-and-cleopatra.html
  broken https://mycarda.info/shakespeare/post.html
Page tested https://mycarda.info/tests/
Page tested https://mycarda.info/shakespeare/alls-well-that-ends-well.html
Page tested https://mycarda.info/tests/PDF.html
  broken https://mycarda.info/tests/PDF1.pdf
  broken https://mycarda.info/tests/PDF2.pdf
...

With redirects

If the response to requesting a page link is a redirect, then this is not a broken link. although the final destination of the redirect might be a broken link.

The report process will follow redirects and if a redirect is detected, then additional location information is returned in urls section.

{
  ...
  "urls": {
    ...

    "https://mycarda.info/tests/302.php": {
      "mimeType": "text/html",
      "mimeParams": { "charset": "UTF-8" },
      "finish": "2022-10-25T08:11:38.459Z",
      "start": "2022-10-25T08:11:38.210Z",
      "ok": true,
      "links": { ... },
      "location": "https://mycarda.info/tests/test.html"
    },
    ...
  }
  ...
}

The location value contains the link to the final destination of the redirect, that is, if the link to page A redirects to page B that redirects to page C, the final destination, page C, will be the location value.

To cater for redirects, we need an additional step in the code to check for a location, and if one exists, check if the location is a broken link. Here is the (verbose for clarity) addition to the code.

var report = JSON.parse(DetailJSON);
var urls = report.urls;
var pages = report.pages;

pages.forEach( page => {
  console.log(`Page tested ${page}`);
  var linkObjects = Object.keys( urls[page].links );
  linkObjects.forEach( link => {
    if ( typeof urls[link] !== 'undefined' ) {  // check the page has some links
      if ( urls[link].ok == false) {
        console.log( `  broken ${link}` );
      }
      else {
        if ( typeof urls[link].location !== 'undefined' ) { // check for redirect
          var redirectUrl = urls[link].location;
          if ( urls[redirectUrl].ok == false) {
            console.log( `  broken ${link} -> ${redirectUrl}` );
          }
        }
      }
    }
  });
});

Example output

...
Page tested https://mycarda.info/tests/links.html
  broken https://mycarda.info:1234/media/2017-12-04-AccessByCountry.pdf
  broken https://mycarda.info/tests/teapot.php
  broken https://mycarda.info/tests/774.php
  broken https://mycarda.info/tests/500.php
  broken https://www.mycarda2.info/media/2017-12-06-KentISD.pdf
  broken https://mycarda.info/tests/error.php
  broken https://mycarda.info/tests/302Broken.php -> https://mycarda.info/tests/broken.html
  broken https://mycarda.info/tests/broken.pdf
  broken https://mycarda.info/tests/403.php
...

The report will test all links on a page (other than specifically excluded pages specified in a custom configuration). This includes links to CSS, JavaScript, images, etc. as well as links within anchor tags a site visitor ca click. In some cases, it might be more important to you to prioritise the links on which a site visitor can click, as clickable broken links will contribute more to the percieved quality of a site than broken links the site visitor never sees.

To enable this, we need an additional check for the interaction value and test if it is true

if ( typeof urls[link].interaction !== 'undefined' ) {
  if ( urls[link].interaction == true ) {
    console.log( ... );
  }
}

Are my pages accessible?

To comply with the equality standards required in most countries, the reports include WCAG 2.2 technique tests and best practice recommendations.

In the urls seciton of the report, each page that is tested will include a diagnostics section that includes information about all issues found for that page

{
  ...
  "urls": {
    "https://mycarda.info/": {
      ...
      "diagnostics": [
        {
          "category": "accessibility",
          "subcategory": "wcag",
          "detail": "Ensures the contrast between foreground and background colors meets WCAG 2 AA contrast ratio thresholds",
          "extract": "<span class=\"desktop\">For research, inspiration and enjoyment</span>",
          "level": "moderate",
          "message": "Elements must have sufficient color contrast",
          "name": "color-contrast",
          "selector": "span>span:nth-of-type(1)",
          "tag": "span",
          "type": "file",
          "parameters": { "wcag": { "techniques": ["G18"], "level": "AA" } },
          "coords": [{ "x": 16, "y": 330, "width": 473, "height": 26 }],
          "module": "axe"
        },
        ...
      ]
      ...
    }
    ...
  }
  ...
}

Diagnostics specific to accessibility have "category": "accessibility". The values in the JSON are self explanatory to those familiar with WCAG techniques and best practice.

For information on other values, such as extract, selector, and coords, see the section identifying issue locations.

Listing the accessibility diagnostics per page is very much the same as listing the links except the message can contain values that must be replaced with values from the parameters. See how do I display messages? for more detail.

The following code example shows the accessibility diagnostics for each page.

var report = JSON.parse(DetailJSON);
var urls = report.urls;
var pages = report.pages;

pages.forEach( page => {
console.log(`Page tested ${page}`);
  if ( typeof urls[page].diagnostics !== 'undefined' ) {  // check the page has some diagnostics
    urls[page].diagnostics.forEach( diagnostic => {
      if ( diagnostic.category == "accessibility" ) {
        console.log( `    WCAG message ${ReplaceValues(diagnostic.message, diagnostic.parameters )}` );
      }
    });
  }
});

// see section on displaying messages for more information
function ReplaceValues( message, parameters ) {

  if ( typeof parameters == 'undefined' ) {
    return message;
  }

  var parameterObjects = Object.keys( parameters );
  parameterObjects.forEach( parameter => {
    if ( message.includes(`{${parameter}}`) ) {
      message = message.replace(`{${parameter}}`,parameters[parameter]);
    }
  });

  return message;
}

Example output.

Page tested https://www.bl.uk/
  WCAG message HTML page does not validate.
  WCAG message Elements must have sufficient color contrast
  WCAG message Ensures landmarks are unique
  WCAG message All page content should be contained by landmarks
  WCAG message All page content should be contained by landmarks
Page tested https://www.bl.uk/20th-century-literature/activities/black-literature-timeline
  WCAG message HTML page does not validate.
  WCAG message Elements must have sufficient color contrast
  WCAG message Ensures landmarks are unique
Page tested https://www.bl.uk/about-us
  WCAG message Stray end tag 'a'.
  WCAG message HTML page does not validate.
  ...

Are any of my email addresses bad?

In terms of annoying site visitors, bad email addresses are much worse than broken links. Organisations spend a lot of money trying to interact with their customers so something as simple as a broken email address can negate all that effort.

The urls section of the report contains a list of all the links and filtering this by links that start with mailto: will provide a list of all the emails in the report.

var report = JSON.parse(data);
var urls = report.urls;
var links = Object.keys( urls );

links.forEach( link => {
  if ( link.startsWith("mailto:") ) { // check link is an email address
    console.log(`email address ${link}`);
  }
});

Example output

email address mailto:mcarter@sitemorse.com
email address mailto:mcarter@sitemo0se.com
email address mailto:mycarda@sitemorse.com
email address mailto:mcarter@sitemorse.com?subject=Good+subject&cc=MCarterSitemorse%40gmail.com
email address mailto:MCarterSitemorse@gmail.com
...

This provides a list of email addresses, which might be useful for auditing, but it does not privide information about which of these has issues. Like links, any issues are provided in the diagnsotics section but for email, "category": "email".

...
"mailto:mycardaSitemorse@mycardamail.com": {
  "start": "2022-11-10T13:15:05.942Z",
  "diagnostics": [
    {
      "category": "email",
      "type": "url",
      "level": "serious",
      "name": "dns-notfound",
      "message": "Domain '{domain}' is not found",
      "parameters": {
        "domain": "mycardamail.com",
        "error": "queryMx ENOTFOUND mycardamail.com"
      },
      "module": "mailto"
    }
  ],
...

Like accessibility, the diagnostic messages contain replacement values.

var report = JSON.parse(data);
var urls = report.urls;
var links = Object.keys( urls );

links.forEach( link => {
  if ( link.startsWith("mailto:") ) { // check link is an email address
    console.log(`email address ${link}`);
    if ( typeof urls[link].diagnostics !== 'undefined' ) {  // check the page has some diagnostics
      urls[link].diagnostics.forEach( diagnostic => {
        if ( diagnostic.category == "email" ) {
          console.log( `    email message ${ReplaceValues(diagnostic.message, diagnostic.parameters )}` );
        }
      });
    }
  }
});

The ReplaceValues() function is omitted for clarity. For more information see how do I display messages?.

Example output

email address mailto:mcarter@sitemorse.com
email address mailto:mcarter@sitemo0se.com
  email message Domain 'sitemo0se.com' is not found
email address mailto:mycarda@sitemorse.com
email address mailto:mcarter@sitemorse.com?s0bject=Good+subject&cc=MCarterSitemorse%40gmail.com
    email message Invalid mailto option 's0bject'
email address mailto:mcarter@sitemorse.com?subject=Good+subject&cc=mycardaSitemorse%40mycardamail.com
email address mailto:mycardaSitemorse@mycardamail.com
    email message Domain 'mycardamail.com' is not found
...

The final part you might require is knowing which pages contain the faulty email addresses. This could be done by iterating through the links on each page finding a match to the faulty email address. Alternatively, if you are already iterating through each page in the report, faulty email addresses could be checked whilst checking broken links. As this has been covered already in the links section, the example code is left as an exercise to the reader.

How good is my code quality?

This API uses the W3C code quality checker with a few enhancements to make identifying and finding the issues clearer.

Code quality diagnostics are indicated by "category": "code" and like described in other sections of this document, messages contain replacement values.

"diagnostics": [
  {
    "category": "code",
    "extract": ":0.1;\">\n\t\t<span id=\"loadingUrlInfo\" alt=\"\" class=\"sts-dn\">&nbsp;",
    "level": "serious",
    "message": "Attribute '{attribute}' not allowed on element '{tag}' at this point.",
    "name": "bad-attribute",
    "parameters": { "attribute": "alt", "tag": "span" },
...

Typically, you would want to identify the code quality issues for each page rather than a list of all the issues.

var report = JSON.parse(data);
var urls = report.urls;
var pages = report.pages;

pages.forEach( page => {
  console.log(`Page tested ${page} for code quality`);
    if ( typeof urls[page].diagnostics !== 'undefined' ) {  // check the page has some diagnostics
      urls[page].diagnostics.forEach( diagnostic => {
        if ( diagnostic.category == "code" ) {
          console.log( `    ${ReplaceValues(diagnostic.message, diagnostic.parameters )}` );
        }
      });
    }
});

The ReplaceValues() function is omitted for clarity. For more information see how do I display messages?.

Example output.

...
Page tested https://www.bl.uk/20th-century-literature/activities/black-literature-timeline for code quality
    A 'meta' element with an 'http-equiv' attribute whose value is 'X-UA-Compatible' must have a 'content' attribute with the value 'IE=edge'.
    ReferenceError: jQuery is not defined
Page tested https://www.bl.uk/about-us for code quality
    A parser-blocking, cross site (i.e. different eTLD+1) script, https://cdnjs.cloudflare.com/ajax/libs/jquery-migrate/3.4.0/jquery-migrate.min.js, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.
    Element 'script' must not have attribute 'async' unless attribute 'src' is also specified or unless attribute 'type' is specified with value 'module'.
    Element 'script' must not have attribute 'defer' unless attribute 'src' is also specified.
    A 'charset' attribute on a 'meta' element found after the first 1024 bytes.
    Attribute 'xmlns:xsi' not allowed here.
    The 'scheme' attribute on the 'meta' element is obsolete. Use only one scheme per field, or make the scheme declaration part of the value.
    The 'scheme' attribute on the 'meta' element is obsolete. Use only one scheme per field, or make the scheme declaration part of the value.
...

Are there any spelling mistakes?

Note that the API does not check spelling directly but it does extract the text so you can check spelling in your own post-processing operations. There are many reasons why spelling is not checked by the API but the main reason is knowing which dictionary to use and what would be false positives, such as placenames or product names which are not found in a standard dictionary.

To post-process with your own spell checker requires something simpler than the HTML but richer than just plain text. For this reason, markdown is used.

By default, markdown output is not enabled. To enable the output, you pass htmltext.enabled in the configuration options. The resulting JSON detail contains a textContent item for each page tested. This is a link to the markdown detail.

The following shows an example snipped of the textContent markdown.

[tag-box-selector]: # (h2 122,497,556,34 article>div>div>div>h2)

## Why test for brand?

[tag-box-selector]: # (p 122,563,556,192 div>p:nth-of-type(1\))

The most basic form of brand testing is to detect text that does not contain the correct spelling, capitalisation, word order or spacing associated with a branded word or phrase. The following examples make this clear. Our brand is to use "Jane Austen" as the brand name and all accidental variations must be caught and flagged as brand violations. We also need to take into consideration false positives when testing.

The additional metadata in the markdown allows your post-processing spell check to locate the relevent text. For more details, see the text content section of the reference API.

How do I identify the location of an issue on the page?

Selector and extract

These two values are provided for all diagnostics. The extract provides a snippet of HTML where the issue occurs and the selector allows you to search for the issue in the doucment model (for example in debug mode in Chrome).

For example, this is a section of a code quality diagnostic.

{
  "category": "code",
  "extract": "          <span class='st_sharethis_hcount' displaytext='ShareThis'></span",
  ...
  "selector": "div#sociallinkstop>div>span:nth-of-type(1)",
  ...
},

Using selector and extract is particularly useful if content is loaded dynamically by JavaScript so the code issue would not be present in a view source of the HTML.

Line numbers

Information about the location of the issue within the HTML can be found in the source section of the diagnstoics. This section will only appear if the issue can be identified within the HTML source.

{
  "category": "code",
  ...
  "source": {
    "firstLine": 1385,
    "lastLine": 1385,
    "firstColumn": 21,
    "lastColumn": 78,
    "firstByte": 81696,
    "lastByte": 81753
  },
  ...
},

The values in the source section are obvious so will not be discussed in detail. These values provide an easy way to highlight the issue in a source code view.

Coordinates

It is useful to have a visual representation of where the issue occurs on the page. To do this, you need a image of the page, and coordinates identifying part of the page.

A screenshot of each page tested is provided in the urls section for each page. The storage value is a link to the image.

"urls": {
  ...
  "https://mycarda.info/tests/email.html": {
    ...
    "screenshot": {
      "height": 1294,
      "storage": "https://dxtfs.com/d09f1355fd9498918385e036a674eb4c49e196d02e5966e01f90709c1133f9a3(image,png)",
      "width": 800
    }
    ...

Where applicable, a diagnostic can contain a coords section that defines where on the page screenshot the issue occurs.

{
  "category": "accessibility",
  ...
  "message": "Elements must have sufficient color contrast",
  ...
  "coords": [{ "x": 16, "y": 330, "width": 473, "height": 26 }],
  ...
},

Along with the message value, the screenshot and coords can be used to highlight a page image with the message appearing, for example, as a mouseover.

As an aside, downloading all the screenshots might be a handy way of keeping a visual archive of your website.

How do I display messages?

message can contain values that must be replaced with values from the parameters. For example

"message": "The '{attribute}' on the '{tag}' element is obsolete.",
"parameters": { "attribute": "scheme", "tag": "meta" },

This provides the future option to translate the messages into multiple languages.

The following code iterates through the parameters and replaces each related item in the messages text.

function ReplaceValues( message, parameters ) {
  if ( typeof parameters == 'undefined' ) {
    return message;
  }
  var parameterObjects = Object.keys( parameters );
  parameterObjects.forEach( parameter => {
    if ( message.includes(`{${parameter}}`) ) {
      message = message.replace(`{${parameter}}`,parameters[parameter]);
    }
  });
  return message;
}

I just want to know the number of issues

Much of the example code in this document has interated through the pages to find specific types of issue on each page. Sometimes, when preparing a summary dashboard, you just want to know how many broken links there are, or how many accessibility issues. In this case, it might be easier to just query the JSON using something link JSONata.

The following code example uses a JSONata library to obtain just the totals for each category of test.

var jsonata = require("jsonata");

var report = JSON.parse(data);

console.log(`Total diagnositcs`);
var accessibilityQuery = jsonata("$count(**.diagnostics[category='accessibility' and subcategory='wcag'])");
console.log(`  accessibility ${accessibilityQuery.evaluate(report)}`);
var codeQuery = jsonata("$count(**.diagnostics[category='code'])");
console.log(`  code quality  ${codeQuery.evaluate(report)}`);
var emailQuery = jsonata("$count(**.diagnostics[category='email'])");
console.log(`  email         ${emailQuery.evaluate(report)}`);
var linksQuery = jsonata("$count(**.diagnostics[category='links'])");
console.log(`  links         ${linksQuery.evaluate(report)}`);

Example output

Total diagnositcs
  accessibility 166
  code quality  144
  email         1
  links         73