We've recently received many questions about historical PDF documents. This week in tech update, Helen Grimbly, Support Lead at Sitemorse has been looking at best practices for archiving old content (including PDF documents).
Helen Grimbly, Support Lead
Websites sometimes contain lots of rarely viewed historical PDF documents. For example, these documents could be Minutes for a meeting a few years ago, with email addresses that no longer work but should be kept for reference, or a document about an event that occurred with links to websites that may no longer exist. In these situations, it can be preferred not to edit the original documents, even though the documents may now contain broken links and email addresses, and may also not comply with modern standards and specifications, either of the organisation publishing it, or of the wider web.
A standards-compliant way of marking pages or documents as archived is to include the value 'archived' in the 'rel' attribute of a link to a target resource (such as a link to an HTML page or PDF document), to indicate that that resource is archived and kept solely for historical reasons. During an assessment, when Sitemorse finds this attribute present for a link to a resource, the contents of the page or document will not be assessed and the Sitemorse score will not be affected. This attribute also applies for INDEX assessments and will not affect their scores either.
Search engines will still continue to INDEX these resources as they normally do.
The following HTML shows the archived attribute value, rel="archived" applied to a link to a PDF document:
<a href="example.pdf" rel="archived">
Please note that if you have one link to a URL with 'rel=archived' and then another link to the same URL, without 'rel=archived', the URL will be assessed - you need to ensure all links mark it as 'archived'.