New E-Commerce Checkout Research Why 68% of Users Abandon Their Cart – 2/3 of shopping carts are abandoned. The main reasons: deceptive practices (not showing total cost upfront / adding high “taxes” or delivery charges), stupidly requiring customer to set up an account) and bad UX (too complex to do).
The Web Can’t Survive a Monoculture – “Web developers are right to fear a monoculture, but they don’t seem to appreciate their agency in the solution. They have the power to protect the platform if only they take the call.”
Love them or hate them, PDFs are a fact of life for many organisations. If you produce PDFs, you should make them accessible to people with disabilities. With Prince, (twitter) it’s easy to produce accessible, tagged PDFs from semantic HTML, CSS and SVG.
It’s an enduring myth that PDF is an inaccessible format. In 2012, the PDF profile PDF/UA (for Universal Accessibility’) was standardised. It’s the U.S. Library of Congress’ preferred format for page-oriented content and the International Standard for accessible PDF technology, ISO 14289.
Let’s look at how to make accessible PDFs with Prince. Even if you already have Prince installed, grab Prince 13 and install it; it’s a free license for non-commercial use. Prince is available for Windows, Mac, Linux, Free BSD desktops and wrappers are available for Java, C#/ .NET, ActiveX/COM, PHP, Ruby on Rails and Node/ JavaScript for integrating Prince into websites and applications.
Here’s a trivial HTML file, which I’ve called prince1.html.
Prince has produced prince1.pdf in the same folder. (There are many command line switches to choose the name of the output file, combine files into a single PDF etc., but that’s not relevant here. Windows fans can also use a GUI.)
Using Adobe Acrobat Pro, I can inspect the tag structure of the PDF produced:
As you can see, Acrobat reports “No Tags available. This is because it’s perfectly legitimate to make inaccessible PDFs – documents intended only for printing, for example. So let’s tell Prince to make a tagged PDF:
$ prince prince1.html --tagged-pdf
Inspecting this file in Acrobat shows the tag structure:
Now we can see that under the <Document> tag (PDF’s equivalent of a <body> element), we have an <H1> and a <P>. Yes, PDF tags often but not always have the same name as their HTML counterparts. As Adobe says
PDF tags are similar to tags used in HTML to make Web pages more accessible. The World Wide Web Consortium (W3C) did pioneering work with HTML tags to incorporate the document structure that was needed for accessibility as the HTML standard evolved.
However, the fact that the PDF now has structural tags doesn’t mean it’s accessible. Let’s try making a PDF with the PDF-UA profile:
$ prince prince1.html --pdf-profile="PDF/UA-1"
Prince aborts, giving the error “prince: error: PDF/UA-1 requires language specification. This is because our HTML page is missing the lang attribute on the HTML element, which tells assistive technologies which language the text is written in. This is very important to screen reader users, for example; the pronunciation of the word “six is very different in English and French.
Unfortunately, this is a very common error on the Web; WebAIM recently analysed the accessibility of the top 1,000,000 home pages and discovered that a whopping 97.8% of home pages had detectable accessibility failures. A missing language specification was the fifth most common error, affecting 33% of sites.
Let’s fix our web page by amending the HTML element to read <html lang=en>.
Now it princifies without errors. Inspecting it in Acrobat Pro, we see a new <Annot> tag has appeared. Right-clicking on it in the tag inspector reveals it to be the small Prince logo image (that all free licenses generate), with alternate text “This document was created with Prince, a great way of getting web content onto paper:
This generation of the <Annot> with alternate text, and checking that the document’s language is specified allows us to produce a fully-accessible PDF, which is why we generally advise using the --pdf-profile="PDF/UA-1" command line switch rather than --tagged-pdf.
Adobe maintains a list of Standard PDF tags, most of which can easily be mapped by Prince to HTML counterparts.
Customising Prince’s default mappings
Prince can’t always map HTML directly to PDF tags. This could be because there isn’t a direct counterpart in HTML, or it could be because the source markup has conflicting markup and styling.
Let’s look at the first scenario. HTML has a <main> element, which doesn’t have a one-to-one correspondence with a single PDF tag. On many sites, there is one article per document (a wikipedia entry, for example), and it’s wrapped by a <main> element, or some other element serving to wrap the main content.
We can see from browser developer tools that this article’s content is wrapped with <div id=bodyContent>. We can tell Prince to map this to the PDF <Art> tag, defined as “Article element. A self-contained body of text considered to be a single narrative by adding a declaration in our stylesheet:
#bodyContent { prince-pdf-tag-type: Art; }
On another site, we might want to map the <main> element to <Art>. The same method applies:
Main { prince-pdf-tag-type: Art;}
Different authors’ conventions over the years is one reason why Prince can’t necessarily map everything automatically (although, by default HTML <article> gets mapped to <Art>).
Therefore, in this new build of PrinceXML, much of the mapping of HTML elements to PDF tags has been removed from the logic of Prince, and into the default stylesheet html.css in the style sub-folder. This makes it clearer how Prince maps HTML elements to PDF tags, and allows the author to override or customise it if necessary.
Here is the relevant section of the default mappings:
There are also two new properties, prince-alt-text and prince-expansion-text, which can be overridden to support the relevant ARIA attributes.
Uncle Håkon shouting at me last month in Paris
Taking our lead from wikipedia again, we might want to produce a PDF table of contents from the Contents’ box. Here is the Contents for the entry about otters (which are the best non-dinosaurs):
The box is wrapped in an unordered list inside a <div id=”toc”>. To make this into a PDF Table of Contents (<TOC>), I add these lines to Prince’s HTML.css (because obviously I can’t touch the wikipedia source files):
#toc ul {prince-pdf-tag-type: TOC;} /*Table of Contents */
#toc li {prince-pdf-tag-type: TOCI;} /* TOC item */
This produces the following tag structure:
In one of my personal sites, I use HTML <nav> as the wrapper for my internal navigation, so would use these declaration instead:
nav ul {prince-pdf-tag-type: TOC;}
nav li {prince-pdf-tag-type: TOCI;}
Only internal links are appropriate for a PDF Table of Contents, which is why Prince can’t automatically map <nav> to <TOC> but makes it easy for you to do so, either by editing html.css directly, or by pulling in a supplementary stylesheet.
Mapping when semantic and styling conflict
There are a number of tricky questions when it comes to tagging when markup and style conflict. For example, consider this markup which is used to “fake a bulleted list visually:
But this merely looks like a bulleted list it isn’t structurally anything other than three meaningless <div>s. If you need this to be tagged in the output PDF as a list (so a screen reader user can use a keyboard short cut to jump from list to list, for example), you can use these lines of CSS:
body>div {prince-pdf-tag-type: UL;}
div div {prince-pdf-tag-type: LI;}
Prince creates custom OL-L and UL-L tags which are role-mapped to PDF’s list structure tag <L>. Prince also sets the ListNumbering attribute when it can infer it.
Mapping ARIA roles
Often, developers supplement their HTML with ARIA roles. This can be particularly useful when retrofitting legacy markup to be accessible, especially when that markup contains few semantic elements the usual example is adding role=button to a set of nested <div>s that are styled to look like a button.
Prince does not do anything special with ARIA roles, partly because, as webaim reports,
they are often used to override correct HTML semantics and thus present incorrect information or interactions to screen reader users
But by supplementing Prince’s html.css, an author can map elements with specific ARIA roles to PDF tags. For example, if your webpage has many <div role=article> you can map these to pdf <Art> tags thus:
div[role="article"] {prince-pdf-tag-type: Art;}
Conclusion
As with HTML, the more structured and semantic the markup is, the better the output will be. But of course, Prince cannot verify that alternate text is an accurate description of the function of an image. Ultimately claiming that a document meets the PDF/UA-1 profile actually requires some human review, so Prince has to trust that the author has done their part in terms of making the input intelligible. Using Prince, it’s very easy to turn long documents even whole books into accessible and attractive PDFs.
Link Of The Week!! The Accessibility of Styled Form Controls – “A repository of styled and “styled form control elements and markup patterns, and how they are announced by screen readers” by Scott O’Hara.
How to design a voice experience – “Inspired by our work on the BBC Kids skill – a voice experience for three to seven year olds – we’ve come up with 12 design principles for voice.”
Uber’s Path of Destruction – “Silicon Valley investors amassed staggering riches while inflicting enormous damage on the rest of society (e.g., creating an uncontrollable surveillance apparatus, poisoning public discourse, exploiting massive anticompetitive power)”
Want my reading lists sent straight to your inbox? Sign up and Mr Mailchimp will send it your way!
This conference season I’ve spoken at some events for non-frontenders, suggesting that people invest time in learning the semantics of HTML. After all, there are only 120(ish) elements; the average two year old knows 100 words and by the time a child is three will have a vocabulary of over 300 words.
A few people asked me the difference between <article> and <section>. My reply: don’t worry. Simply, don’t use <section>. its only use is in the HTML Document Outline Algorithm, which isn’t implemented anywhere, and seemingly never will be. For the same reason, don’t worry about the <hgroup> element.
But do use <article>, and not just for blog posts/ news stories. It’s not just for articles in the news sense, it’s for discrete self-contained things. Think “article of clothing”, not “magazine article”. So a list of videos should have each one (and its description) wrapped in an <article>. A list of products, similarly. Consider adding microdata from schema.org, as that will give you better search engine results and look better on Apple watches.
And, of course, do use <main>, <nav>, <header> and <footer>. It’s really useful for screen reader users – see my article The practical value of semantic HTML.