The three semantics of HTML

by Tomasz Jakut published on

There is always that one elephant in the room alongside HTML – semantics. You can ignore it for quite a while, but at the end of the day, you'll need to acknowledge that it's there, cluttering half of the space.

Our beloved elephant

So let's examine this exotic beast! According to Longman Dictionary of Contemporary English, semantics is the meaning of a word or expression. In our case, a word is an HTML element. And how can we check what the meaning of a word is? Well, we've already done that – we've checked the semantics of the semantics (inception!). Fortunately, there is also a dictionary for HTML – it's called HTML Living Standard. It even says it contains meanings:

Elements, attributes, and attribute values in HTML are defined (by this specification) to have certain meanings (semantics).

Detailed descriptions of all HTML elements are in chapter 4, called The elements of HTML.

Note: That's why I'm not keen on the semantic elements concept. For me, they are all semantic – even <div> – since their meanings are described in the specification.

But what if I told you there are actually three elephants in the room?

First semantics: for users

This is the most straightforward one of them all. That's the most basic role of semantics: to provide the end user with all the information necessary to correctly understand the content.

This type of semantics can be compared to formatting a text document in Word. You can always select some text and make it bigger and bolder, pretending it's a heading. But then you won't be able to generate a Table of Contents for your document with 100+ pages. The same is true for HTML – except it's the user that will suffer. Let's look at an example – my short bio:

<div>
<div>Comandeer</div>
<img src="comandeer.avif" alt="">
<div>Proud HTML Expert</div>
</div>

It's still "semantic HTML" (because every HTML element has a meaning!), yet it's not especially useful. And yes, we can style it in any imaginable way (I bet we can even make it do the can-can with a bit of CSS magic) – but it's still just a bunch of <div>s without any intrinsic meaning:

The div element has no special meaning at all.

Just like this big, bold text in Word – if it's not a heading, it's not a heading.

Let's focus on raw HTML, forgetting about CSS. If we load the above code in the browser, it will look like this:

"Comandeer&qout; is followed by a logo and &qout;Proud HTML Expert" – all text is unstyled.

It's plain text. And in terms of HTML semantics, that's the truth – we used all these HTML elements to mark up plain text. Good job!

Fortunately, it can be easily fixed:

<article>
<h1>Comandeer</h1>
<img src="comandeer.avif" alt="A photo of a smiling Comandeer.">
<p>Proud HTML Expert</p>
</article>
  • The wrapper <div> element was replaced with the <article> one – a bio can be treated as a complete, or self-contained, composition in a document.
  • The <div> element with my name was replaced with the <h1> element to indicate the purpose of the <article>.
  • The image got a descriptive alt attribute, describing its content.
  • The last <div> was replaced with the <p> element – after all, it's a paragraph of text.

If we open the document in the browser, it looks better:

The "Comandeer" is now bigger and bolder, indicating that it is a heading and the "Proud HTML Expert" got a margin around itself.

In other words, we formatted our text!

Note: Each HTML element has its own default styles, used to display it when there are no stylesheets provided by the website owner. This is the purest achievable form of HTML in a browser.

But it isn't particularly useful on its own. After all, there's always some CSS to make things prettier – why should we care about choosing appropriate HTML elements? Because the Web should not be only visual. One of the main principles of Web Content Accessibility Guidelines (WCAG) is Perceivable:

Information and user interface components must be presentable to users in ways they can perceive.

In other words, if someone can't perceive visual information, it must be available for them in another form. A bunch of styled <div>s do not convey any meaning without their visuals. And that's problematic for assistive technology, like screen readers. Screen readers use an accessibility tree – a DOM-like structure, built from the page's HTML and containing information about roles and purposes of all HTML elements. Let's see what that tree looks like for the <div> version of the page:

Both "Comandeer" and "Proud HTML Expert" are marked up as static text.

It's just a text. More importantly, the image is missing, due to its empty alt attribute. The <article> version is much better:

The whole bio is wrapped in the article, the "Comandeer" is correctly identified as a heading, the image has a description of "A photo of smiling Comandeer" and the "Proud HTML Expert" is marked up as a paragraph.

Thanks to that, the screen reader can successfully transform the page content into voice or braille and inform the user where the articles, headings, and other elements are. It would not only allow the user to fully understand the page (similarly to how the non-screen-reader user perceived the visual version of the page) but also allow the screen reader to provide better ways of navigating the page, e.g., by jumping to headings.

In other words, caring about semantics is caring for the user!

Second semantics: for user agents

Semantics can also be used to provide additional information for user agents. And I deliberately use this term in quite a wide meaning, to mean any machine or software acting on behalf of its user. The most obvious example of a user agent is a browser (which is the term's typical meaning), but I would extend that name also to search engines. With a long history of SEO (Search Engine Optimization), it's not a mystery that search engines can and do understand HTML – just to mention everlasting discussions about the importance of heading elements. Also, browsers can use semantics to provide affordances for users, like using the <article> element to extract content in a reader mode. But there are also other ways to provide yet another layer of semantics just for machines.

The most ambitious take on semantics for user agents is Web 3.0. And no, I don't mean Web3, based on blockchain. Long before that, back in the 2000s, W3C imagined a new type of web, as they called it – Web of Data and Services, aka Semantic Web. The idea itself was quite simple yet ingenious: the Web is full of data, but there was no way to enable effective communication between different websites and apps. Back then, machines were also far worse at discovering and understanding data. So they needed us, humans, to mark up data for them. The most basic way of marking stuff up is using HTML. But several additional technologies were developed to allow embedding additional metadata into websites. One of the first was Resource Description Framework (RDF). It's a huge standard, divided into several specifications. Yet it has one, pretty serious, downside: it's not built on top of HTML, as it has its own syntaxes (yes, plural). That's why RDFa has been developed. It's a "lighter" version of RDF and can be used inside HTML via attributes:

<article vocab="https://schema.org/" typeof="Person">
<h1 property="name">Comandeer</h1>
<img src="comandeer.avif" property="image" alt="A photo of a smiling Comandeer.">
<p>Proud <span property="jobTitle">HTML Expert</span></p>
</article>

The vocab attribute points to the vocabulary we want to use and the typeof one – a type of data we try to mark up. In our example, we want to spice up my bio with some additional metadata. Due to that, we decided to use the Person type (as I'm personally a person). Each property of, well, me is marked up with the property attribute, e.g., the heading with the name now has the property="name" attribute.

But of course, there is always more than one way to do something. HTML has its own way of marking up additional metadata, so-called microdata:

<article itemscope itemtype="https://schema.org/Person">
<h1 itemprop="name">Comandeer</h1>
<img src="comandeer.avif" itemprop="image" alt="A photo of a smiling Comandeer.">
<p>Proud <span itemprop="jobTitle">HTML Expert</span></p>
</article>

It's pretty similar to RDFa. Instead of vocab + typeof attributes, it uses itemscope + itemtype ones, and the property one is replaced with itemprop one.

And if we want something more like RDF (so a separate thing from HTML), there is always JSON-LD – a way to express metadata in JSON. I also need to make one honorable mention – microformats. They are the OGs of metadata in HTML. They use the class attribute to provide additional info about the content:

<article class="h-card">
<h1 class="p-name">Comandeer</h1>
<img src="comandeer.avif" class="u-photo" alt="A photo of a smiling Comandeer.">
<p>Proud <span class="p-job-title">HTML Expert</span></p>
</article>

The hCard microformat is used for marking up information about people. What I love about microformats is the fact that they are Just HTML™ without any fancy attributes.

Note: Personally, I'm a fan of the RDFa format, so I'll stick to it for the rest of the article.

But what are vocabularies? In short, they are catalogs of things that can be described, like people, events, books, and places. Each of these things has its own unique set of properties (people have names, places have addresses, events have dates…). Things can also be mixed together (events are organized by people, etc.). Thanks to that, even more complex pieces of reality can be described in a manner understandable by machines.

In our example, the Schema.org vocabulary is used. Schema.org is a community effort, founded by Google, Microsoft, Yahoo, and Yandex, with a goal to create an open standard for structured data on the Web. This data is then used by search engines to provide additional information about the websites' content. Google provides a gallery of how it uses the structured data to prepare more detailed search results.

The only problem with Web 3.0 is that it never happened. Granted, we have several great technologies to embed additional metadata. But at the end of the day, they're used mainly by search engines. The ambitious vision of the Web where machines talk to each other and exchange data to provide better services to their users is still just a vision.

Third semantics: for web developers

And finally, we can leave the vast land of user-centered semantics and enter the cozy territory of developer-centered semantics. Because an easily understandable HTML is also an easily maintainable HTML. And if we use HTML elements as intended (like creating lists with <ul> and <li> elements instead of <div>s), we're already halfway there!

But sometimes additional info is needed, e.g., a page contains several <article> elements and we want to easily distinguish between them. In that case, we can go back to webdev roots and use the class attribute.

<article vocab="https://schema.org/" typeof="Person" class="bio">
<h1 property="name" class="bio__name">Comandeer</h1>
<img src="comandeer.avif" property="image" alt="A photo of a smiling Comandeer." class="bio__image">
<p class="bio__description">Proud <span property="jobTitle" class="bio__job-title">HTML Expert</span></p>
</article>

I'm using a BEM-like naming convention here to indicate that this particular <article> is a bio. Each of the elements inside the <article> also has its own class name, indicating its purpose.

Note: It can be argued that most of these classes are redundant, repeating the information already provided by RDFa. Yet I tend to treat BEM not as a naming convention but as a DSL (Domain Specific Language) for creating component-based applications. However, that's a story for another time.

There are also other ways to make HTML more developer-friendly. One of the better (at least in my opinion) is to use custom elements to replace plain old <div>s:

<div class="card"></div>
<!-- vs -->
<the-card></the-card>

We would also need to add a little bit of CSS:

:not(:defined) {
display: block;
}

It ensures that custom elements are displayed as block elements (that is, like <div>s) because by default, they’re displayed inline (like <span>s).

Fourth semantics?

Yet another elephant looms on the horizon… There is no doubt that we live in the AI era. LLMs (Large Language Models) are everywhere around us, and they are yet another kind of machine that can understand HTML. But hey, they are language models, finally our technology has advanced enough to be able to fully grasp the beauty of Web 3.0!

Except it doesn't seem to be the case. Instead of using what we already have, new standards emerge [insert XKCD strip about standards here]. The most popular of them is the /llms.txt file. It's a Markdown file containing detailed instructions for LLMs on how to handle the website and where particular info is located.

It’s quite ironic that after years of striving toward a semantic Web, we finally have the technology capable of understanding it - and it still doesn’t work. Oh, well, maybe in another 20 years…

However, there is a glimpse of change! OpenAI recently released their AI browser, Atlas. Supposedly, it understands ARIA. In theory, the more accessible a site is, the more it's suitable for ChatGPT. In practice, it could make the Web far worse. ARIA should be the last resort when native HTML is not enough to express a complex UI pattern. Adding it solely to impress a machine sounds disturbingly similar to an ancient arcane of stuffing web pages with invisible links to fool crawlers. On the other hand, Atlas's reliance on ARIA could be a visible symptom of The Great Disease of the Web: LLMs are powerful enough to understand HTML semantics, but there is no semantics to be understood…

Managing the menagerie

Let me ask you a fundamental question: do you pay attention to HTML semantics?

If the answer is "yes", then good job, your elephants are happy and thriving 🐘!

However, if the answer is "no"… Don't make them sad, find a little bit of time and enhance your HTML. Throw away extraneous <div>s, add a couple of headings, embed some metadata! There is no need to do everything at once; do it step by step. Every little improvement matters and can make someone's day a little better. A screenreader user will be grateful for that additional heading, your fellow web developer will thank you for that extra class, and a crawler… Well, it won't feel anything, but hey, it will understand your content better!

If you do not know how to start, I suggest removing all distractions – including CSS – and focusing on the raw content. Copy it to your favourite word processor and think about the role of each element and its relationship with other elements. And then format all of them according to your findings. That's your content's semantics, waiting to be ported back to HTML!

Do it, for your users, for your fellow web developers (not necessarily for crawlers…), but most importantly – for the elephants!

About Tomasz Jakut

Tomasz is a senior software engineer with 10+ years of experience in accessibility and web standards. As a W3C Invited Expert, he contributed to text editing standards for browsers. In his spare time, he likes to read the HTML specification.

Blog (in Polish): blog.comandeer.pl
GitHub: Comandeer

Comments

There are no comments yet.

Leave a comment

HTML is not supported, but you can use Markdown.