(Last Updated on )
Joe Clark wrote a canonical list of document types that may be delivered as PDF. There is a simple way to ensure that these documents are as accessible as can be: start from a structured Microsoft Office document.
From Word to accessible PDF
Most organisations require certain branding on documents – logos, copyright notices, fonts, whatever. Most content producers in an organisation understand the need for consistency, but hate having to do it, remembering what size point headings should be etc. So I always advocate the distribution of a Word document template that has all the rules defined in it:
- The logo is in the right place, and has “alt text” defined by right-clicking a picture, choosing “format picture”, the “Web” tab and entering the alt text
- The footer rules are in place – a copyright notice, page numbering etc
- The Word styles are set up: that is, there is a definition for headings 1 through to six, tables formatting, bullets and numbering. Essentially, this is a stylesheet for a Word document.
The great thing about this template is that giving it to staff means that they can double click on it, and be presented with a new, blank document, with all the logos and branding pre-defined; all they need to do is choose the right heading from the Word formatting toolbar and the styles are applied automatically. (There’s more on making good source documents at WebAIM)
What it means for you is that when you use Adobe Acrobat Professional on a Windows machine to make your PDF, all of this structural information (the level of heading, the fact that you’re in a list) gets passed to the resulting PDF document. Unfortunately, most Assistive Technologies can’t access this strutural information. (Even Adobe’s own “Read Outloud” facility can’t use it, although it’s rumoured that a JAWS beta can navigate headings in PDF.)
PDFs that shouldn’t be PDFs
As well as the categories of PDFs that Joe lists as being “good” PDFs, there are two further categories that I’d add for purely pragmatic reasons, not because they should really be PDF, but because in the insane world of bureaucracy, they have to be. Think of these not as categories, but crapegories.
The first crapegory is the massive amount of shite documents that people have “put on the web” by turning into PDF format because it’s easy to do it and because somebody, somewhere might just want to read the Chief Exec’s 1998 christmas message again. It’s indicative of the sheer rubbishness of Content Management Systems that it’s almost impossible for a non-technical user to publish an accessible html document, but a piece of cake to upload a PDF via a CMS and “put it on the Web”. This category is the “dark matter” of the Web; a huge weight of uselessness that no-one ever sees and which interacts with nothing else.
The second crapegory of “bad” PDFs are often published in the format for one or both of these reasons:
- The user needs a printable version
- The user wants to be able to save and email the document
I used to mock these reasons (being Mr Web Standards smartarse purist, and all). But I’ve come to realise that they’re very good reasons for wanting a non-html format. The first makes sense because websites traditionally haven’t printed well; they usually won’t fit properly on a portrait-oriented page, and are full of extraneous crap (advertising, navigation etc).
Now, modern web designers will deliver a print stylesheet, but the majority of users don’t know that. (As an aside, wouldn’t it be jolly if browsers showed a large print button on the toolbar if they found a print stylesheet on a site? That way, users would be alerted to the fact that you can print this webpage, because the designer was a professional).
Similarly, wanting to save a webpage is common, but browsers don’t have a nice “floppy disc” save icon that people are used to from other desktop applications. Even if you know how to save a webpage, it generally gets saved in two lumps: as a .html and a folder of subfiles, which makes it difficult to attach to an email. (Microsoft’s “bagged-up” .mht format is a notable exception,but it requires that you know to choose “File” then “Save as”, then “Web Archive, single file” in Internet Explorer. People don’t know that.)
In such a case, the most accessible method of delivering the content is as html. However, the most usable method may be by offering a PDF. However, it’s version control hell to publish twice, so a great way to square the circle is to generate the PDF on demand from the xhtml.
There’s lots of ways to generate PDF, but I’m going to focus on one: Extensible Stylesheet Language Formatting Objects, or XSLFO. This allows you to take xml (which is why your content should be valid xhtml, as that is xml) and manipulate and format it any blumming way you please: you can make PDFs with footers, page counts, headers etc on the fly, from your content. (Example)
What this won’t (yet) do is produce a fully-tagged PDF, to the same level of semantic sexiness that you’d do yourself from your carefully-crafted Word document. (There’s no real reason why a good program shouldn’t be able to tag a PDF from the structure of the source xml; after all, all Adobe Acrobat does is translate the structural markup from Word markup to PDF markup; it’s a mechnaical transformation job).
Even so, that doesn’t mean the document is totally inaccessible; as Alastair Campbell notes, in his article The four levels of PDF accessibility. Listening to an on-the-fly generated PDF in a screen reader isn’t much worse than listening to a PDF “hand-made” from a Word document; essentially, you can’t navigate either, but can only listen to it linearly. The “hand-made” version, however, does carry with it the “alt text” if you added in it Word.
But, the main question is, does it matter? If you’re offering the content primarily as accessible, well-structured, semantic xhtml, does it matter if your icing-on-the-cake, printable and saveable PDF isn’t perfectly accessible?
I’d like to thank Jim O’Donnell and others on the Accessify forum for conversations that led to my wriitng this. If there are any errors or typos, blame those bastards. Not me.