A Minimal HTML5 Document
There seems to be confusion about the minimal set of elements that make a valid HTML5 page.
(Amended on prompting from Tab Atkins and Mathias Bynens in comments below.)
The simplest valid document is
<!doctype html>
The title
element is required in most situations, but when a higher-level protocol provides title information, e.g. in the Subject line of an e-mail when HTML is used as an e-mail authoring format, the title
element can be omitted.
Assuming you’re writing a web page rather than an HTML email, you need the title
element, although technically it can be blank.
<!doctype html>
<title></title>
However, you shouldn’t do that. Failure to specify a character encoding which can introduce an obscure but real security vulnerability. So, the simplest valid and secure document looks like this:
<!doctype html>
<meta charset=utf-8>
<title>blah</title>
<p>I'm the content
(You don’t actually need the content, of course, but it’s a pretty rubbish web page without it, and an empty title
isn’t much good.)
However, for accessibility reasons, you should declare the natural language of the document (English/ French/ Swahili) on the html
element, which therefore means you need that element (note that you don’t need to close it, though):
<!doctype html>
<html lang=en>
<meta charset=utf-8>
<title>blah</title>
<p>I'm the content
If you’re planning to use AppCache to enable offline applications, you’ll need the html
element as the manifest
attribute goes there.
Internet Explorer 9 Developer Preview 3 and its antecedents can’t apply CSS to new HTML5 elements without a body
element. (Try it without body
and with body
.)
So if you’re attempting to do that, the smallest valid, secure, screenreader-accessible and stylable-in-IE HTML5 page you can have is
<!doctype html>
<html lang=en>
<meta charset=utf-8>
<title>blah</title>
<body>
<p>I'm the content
Just because you can do this doesn’t mean you should, of course. Depending on your colleagues, it could be confusing and thus a maintainability nightmare.
I use the head
element, and close those tags that need closing (although I don’t bother with trailing slashes on self-closing elements).
So the minimal valid, secure, screenreader-accessible and stylable-in-IE HTML5 page (not email) that it easily readable and maintainable (subjective, of course) is probably
<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
<title>blah</title>
</head>
<body>
<p>I'm the content</p>
</body>
</html>
Enjoy.
(PS, I co-wrote a book!)
Buy "Calling For The Moon", my debut album of songs I wrote while living in Thailand, India, Turkey. (Only £2, on Bandcamp.)
38 Responses to “ A Minimal HTML5 Document ”
I guess I’m old-old-skool then (from back before tags started being self-closed in XHTML).
I don’t think it matters either way, but I like to leave off the ” /” at the end as it means less code to look and scroll through, plus I think it looks neater.
Good mention about the tag though, what’s up with that? Is it really not needed anymore??
what happens if you don’t specify the language? does it default to Browser language or some such? a means of dropping to OS or Browser lang might be preferable in some cases. an incorrect lang is much much worse than non at all. many’s the time that my screenreader switches to reading english in a french, germa, or other accent because it believes what it’s reading is the language specified in the page rather than that which it actually is. of course of Devs didn’t allow incorrect languages to be set it would never happen but they do, so it does.
This is such a step back we might as well hunch over and grunt again. The whole concept of a HTML document as a parseable document regardless of software that checks it is out of the window that way again.
How would this continue? Do I have to go through the file byte by byte to find the next opening tag bracket?
In terms of maintainability this is just moronic.
The <meta charset> isn’t required for a valid document. If you’re transmitting the charset of the page via your response headers, or just sticking with pure ascii-range characters, there’s no need for it.
The actual bare-bones minimal valid page is
<!DOCTYPE html><title></title>. 30 bytes.
Ha ha! Now my first comment makes me look like a git 🙂
I got the intent, and I didn’t mean to say you’d missed something, but it might be good to link to (or show) a *good* base template. Your Google-juice is powerful (young Jedi), I can see many people looking for a starting HTML5 template landing here.
@Christian:
this is nothing new. I’ve been playing with minimal valid HTML4.01 documents years ago, and that’s mostly the same.
And there is nothing moronic in it.
@christian
but the parsing algorithm for how to unambiguously interpret the markup into the same DOM across all browsers is documented in painfully dull detail (as opposed to what happened previously, which was the source of incompatibilities with non-validating markup giving funky results in different browsers). so any software that wants to digest html5 can do so by implementing the algorithm and be guaranteed a sane result, no?
This is tag soup rubbish from the last century all over again. Moronic is an understatement.
There seem to be two issues here:
– What should be the minimum for a valid document, and
– Whether closing tags should be required.
The post was about the former, but we’re reacting to the later. (Sorry Bruce)
Although I would assume Patrick is correct (the algorithm is defined), it definitely feels like a step back, as it allows for less ordered (orderly?) markup.
It’s fair enough the spec defines how things work now, but shouldn’t we be promoting well-formed markup? Then in future the bar to create a browser isn’t quite so high.
Also, it’s not just browsers; other things consume HTML (e.g. YQL, WYSIWYG editors, assistive technology), and allowing for ill-formed markup makes their job harder.
> “shouldn’t we be promoting well-formed markup”
No, speed is far more important for most websites.
Speed! That is the point of well formed markup, so browsers can reliably render an incoming stream in a single pass using an XML compliant engine. Ill-formed markup leads to multi-pass quirks mode. That was the whole point of the XHTML “fad” a decade ago and now you want to go back to the rubbish that existed before that!
Actually, the shortest valid HTML5 document is 15 bytes:
<!doctype html>
However, this is only valid in the case of an HTML e-mail or similar.
@alastairc
“shouldn’t we be promoting well-formed markup?”
on what grounds, though? is there any advantage, other than readability of the source?
“Then in future the bar to create a browser isn’t quite so high.”
the bar is leveled and low for anybody wishing to create their own browser now, as – compared to the partially grey and wooly areas of html 4 parsing and particularly error correction – the algorithm is clearly defined, fully documented, and freely available without the need to reverse-engineer how some browsers cope with things (and even available, from what i remember, as actual running code examples etc in different languages).
Patrick, are you really saying that it’s as easy to parse (and build a parser for) HTML which doesn’t close tags? (Or put attributes in quotes as related aspect)
It isn’t just browsers, the advantages (apart from readability of source, which is a valid one), are that it is easier for other parsers as well. As I said above, YQL, editors, assistive tech etc.
Isn’t that why the microformat guys insisted on XHTML?
There is a big difference between something that parses HTML for content, and something that renders it. The level of complexity of a parser is (and should be) much less than a browser, it doesn’t have to worry about CSS/JS for starters.
Browsers have to render dodgy content, because it’s reporting directly to a user, but a parser shouldn’t have to know such complex rules. What you’re talking about is essentially like using regex on HTML.
Christian might be able to speak to YQL, but WYSIWYG editors tend to convert the different forms of tag soup (from each browser that does contentEdtiable) to XHTML. The server-side microformat processors I’ve used rely on a valid (XHTML) DOM.
Parsing content out of the browser context is hard, things like beautifulsoup.py exist because it’s so hard to get right.
Logically, a well-formed DOM should be easier to read than tag soup, in the same way that this JavaScript is ambiguous:
if ( a === 1 )
b = 2;
c = 3;
(I hope that comes out ok!)
I had thought the HTML group was creating this compendium of current behaviours (the spec) because it was necessary. I did not think that authors should be encouraged to use poorly formed markup.
Well specified tag soup is still tag soup.
Letter to some people
If you like to get angry about this completely correct and accessible document, than an obsessive–compulsive disorder might also be fun for you.
Eat shi… XHTML!
And by the way: I really love it when it’s not announced what will happen to my input.
<!DOCTYPE html>
<html lang=en>
<meta http-equiv=content-type content=”text/html;charset=utf-8″>
<title>Letter to some people</title>
<p>If you like to get angry about this completely correct and accessible document, than an obsessive–compulsive disorder might also be fun for you.
<p>Eat shi… XHTML!
@markc: The “whole point” of XHTML was to be “ready for the future”, for wildly improbable values of “future”.
Parser speed is not a performance bottleneck. Download speed is.
Oh, well if we’re going that far and allowing special-purpose documents like emails in this competition, then the shortest HTML5 document is “” (0 bytes), which is a valid input to @srcdoc.
I’d like to see anyone produce a valid negative-length HTML5 document!
Also, response to markc, comment 14:
Speed! That is the point of well formed markup, so browsers can reliably render an incoming stream in a single pass using an XML compliant engine. Ill-formed markup leads to multi-pass quirks mode. That was the whole point of the XHTML “fad” a decade ago and now you want to go back to the rubbish that existed before that!
I can assure you that parsing XML is, in fact, generally slower than parsing HTML. “XML is faster” is a persistent myth. Just ask any browser dev who’s worked on their browser’s parsing engine.
Bruce, I heard that you are able to have a valid website without using . I know no wouldn’t for a huge website. But for a simple blog is the really necessary?
Oops. I’m talking about the body
element.
“is there any advantage [to well-formed markup] other than readability of the source?”
Given that if you do this for a living, you’ll always spend significantly more time maintaining code than writing it, isn’t that enough?
Bruce, I would love to use @charset, but Lynx doesn’t understand it. And accessibility is my personal obsessive–compulsive disorder. (But I always throw the first stone.)
And I am sorry for being so rude about the missing info about your transforms on replies – maybe I have to less other things to care for. But I really like it now.
Interesting discussion. To me the most valid point is readability and maintainability of my code. If I rely on a browser engine to make something useful out of this then that doesn’t sound safe to me – I have been fooled by them far too often.
How does this “document” look in a text editor with colour coding? Can I collapse parts of it when I don’t want to be distracted by them? I edit HTML code – if speed is a real concern I write a build script that concatenates, minifies and changes the code to live code. If people really think that a few closing P tags would make their page slower they have not understood gzip on the server.
We’re developers and should be allowed to write and maintain code that is predictable and follows a clean convention. The last example above is totally fine by me and this is how I write my HTML5 except that I put ” around the attributes as that helps my colour coding, too.
The whole argument that “less code is better” leads to unmaintainable code. If you want to do speedcoding, enter a 64k intro contest in the demo scene – don’t expect future maintainers to be as excited as you are about the things you do as they will not read up why your code is so short and just add random stuff at the end of it. Want proof of that? Compare any CSS document after it went through a few rounds of maintenance.
@bruce for any browser needs there is the evolt archive: http://browsers.evolt.org/download.php?/lynx/2.81/32bit/lynx2-8-1.zip
[…] Lawson has a nice, clear writeup of what he considers to be the minimal HTML5 document framework. I agree with this markup template, […]
[…] Bruce Lawson, A Minimal HTML5 Document […]
FWIW, I’ve compiled a list of the smallest possible (X)HTML documents for each and every HTML and XHTML version.
From your article:
“””The title element is required in most situations, but when a higher-level protocol provides title information, e.g. in the Subject line of an e-mail when HTML is used as an e-mail authoring format, the title element can be omitted.
Assuming you’re writing a web page rather than an HTML email, you need the title element, although technically it can be blank.”””
That is odd, if you look at the WhatWG specs for the title element itself it says something different:
If it’s reasonable for the Document to have no title, then the title element is probably not required. See the head element’s content model for a description of when the element is required.
Okay that’s just a little bit ambiguous, and in fact I misread it at first to mean the title element is pretty much optional as long as you can make a reasonable argument for the document to have no (need for a) title. But the description in the head element‘s content model (that you linked the changelog of) is pretty strict about when it is required or not.
When in doubt, the stricter option is probably the way to go, but it could have been worded more clearly IMO. However the next bit flat-out contradicts you, also from the WhatWG title element section:
The title element must not be empty.
So in your example, your title element actually isn’t empty, but for the wrong reasons 🙂 It’s actually not allowed to be empty.
Sorry if this is nit-picking, but that’s kind of the name of the game when web standards are concerned, heh 😉
Call me old-skool, but I’ll still use closing tags. and probably the head tag unless that’s deprecated.