Front-end for the Middle: Semantics and Schemas

This is a continuation of a previous post that I wrote, entitled Front-end for the Middle: Focus and Design. Previously, we established that it’s better for a front-end developer to be focused on content, rather than design. In this post, we’re going to move into a discussion on how HTML semantics can influence what happens in the CMS. If your CMS-of-choice is Tridion, then the operative term is schema.

Previous versions of HTML were semantic, but not as semantic as they can be today. When a front-end developer understands the content, that helps him use semantic markup.

What does semantic mean?

of, relating to, or arising from the different meanings of words or other symbols…

Okay, neat. So how does that impact front end?

Look at the source quote of the above quotation. You’ll see that it’s marked up using the <blockquote> element. Doing so means that I am quoting someone, or something. Visually, I can make anything look italicized and indented, but by marking up that quotation in a <blockquote>, I am telling the browser, a search engine, and anyone relying on an assistive device (like a screen reader), “hey, this text isn’t like the other text, because Frank didn’t write it.” Not only that, I used a semantic element called <cite> to wrap my link, so that anyone looking at my quote knows that the link has a special relationship to that quote.

Take into consideration that a front-end developer who is trying to do his job well wants to describe the content, and that he cannot do that when all he is given is a picture of how his content must be decorated.

What does semantic content look like?

There are quite a few semantic elements available for usage, and a front-end developer should want to use each in an appropriate and relevant way. A particular piece of content, that has been edited by a content author, could use any of the following semantic HTML elements:

Text that is being emphasized
Text that’s important, urgent, or serious
feature disclaimers, caveats, legal restrictions, copyrights, license attributions
Inline quotations
A term that is about to be defined. In Tridion, this could apply to a given field.
Time, regardless of the format that it’s in
A sample output from a program or computing system
User input! It could be from a keyboard, or it could be any other kind of command, such as a voice command
An instance of a variable, or some programming argument
And oldie, but a good. A paragraph of text
<ul>, <ol >
unordered and ordered lists
A definition list. Like, as in, the thing you’re reading right now
<dt>, <dd >
A definition term, and a definition description. The former is the word that is defined by the latter

Oh, keep in mind, most everything in the aforementioned list was there in HTML4, too.

Semantics and Schema Fields

Now, a few of those elements are a bit odd. Things like <samp> and <var> are definitely elements that a developer would use. But what about <kbd>? Wouldn’t a content author, from time-to-time, try to write instructions for someone on how to do something? If we wanted to create a schema where the content author could create instructions, and those instructions were wrapped in <kbd> , and the results were wrapped in <samp>, we could create a pretty well-defined schema that allowed the content author to dictate input, result, and next step. All within an ordered list.

Think of how many times, in a content management system, you’ve had to implement a glossary of some sort. Did you use a definition list? If the front-end developer has this idea that some set of content is a definition list, then perhaps we need a definition-list schema, which contains links to definition components. And maybe each definition component has two fields: definition term, and definition description.

How many times does a content author need to write content where he needs to provide a definition of a thing? Doesn’t that merit the use of <dfn> so that he can tell the reader, this is a term that I’m defining?

How many times have we published something that needed to show a time or a date? That certainly merits the use of the <time> element. If something is a date, and it needs the <time> element, then maybe this should be a distinct field in Tridion so that you can guarantee it gets a proper semantic wrapper.

A front-end developer could make choices in their semantics that could drive decisions in how certain fields are formatted in your CMS. Not only that, good semantic markup could help you define how schemas should be structured. So don’t look at the aforementioned list and think, “man, the RTF editor doesn’t support those!”. Instead, look at it and think, “how can I make discrete schemas that can support the right semantics?”.

Semantics and embedded Schemas

Keeping in mind that HTML5 didn’t introduce semantics, but improve it, it is important to zero in on a few of the sectioning elements, both new and old:

Content that is tangentially related to the content in its parent section.
Introductory content for a section. This could be banners, at the top of 2nd tier pages. In Tridion, this could be an embedded schema for something that equated to an article, or a standalone item that could be reused on multiple pages, or the actual header for the entire site
Closing information for its parent section. Again, in Tridion this could be an embedded schema for something equivalent to an article, a standalone item on multiple pages, or the actual footer for the entire site
Contact Information! More specifically, contact information for the nearest sectioning ancestor. If an author wrote an <article>, his email address will go in here. Or, if your website has problems, and you still have an “email the webmaster” link, then this element is for you.
<figure> and <figcaption>
Self-contained content that improves the understanding of the sectioning container that it’s in.

When you’re looking at the <aside>, it’s the tangential relationship that makes it interesting. A component that has an optional callout, or a link to another product that may not be related, that ‘related content’ is what goes in your <aside>. A lot of your cross promoted content, or content that came from Smart Target, could be in an <aside>.

It’s entirely possible that <header> could be an embedded schema, with a field for the <h1>, another for the <h2>, and possible an image, an author name, and a date. Heck, consider that the <header> of a component is an embedded schema, and that this component, when viewed in another template, might only show what’s in the header, as its own article. As an embedded schema, you could guarantee that many different kinds of content which generate <article> could contain the same <header>.

Very much treat the <footer> with the same attitude as the header; this could simply be a pattern that’s reused in many schemas throughout the site.

Without a doubt, <figure> and <figcaption> have the most potential usage, on account of it being “captionable media”. It can be video, an image, a carousel, or a graph. And it may be captionable — but its position within its container doesn’t change the meaning of the content. In Tridion, this could be a content component that links to a multimedia component. Your figure component also contains a caption and author attribution if necessary. Then, another component, that wants to use this figure, has a component link for your figure, and possibly contain “positioning” fields; allowing the content author to select left, right, or center.

The Big Idea

There are semantic elements in HTML4 and in HTML5 that provide meaning for a piece of content that the text alone could not. Front-end developers are often very intentional about the semantic elements that they choose. Don’t look at the “phrasing-content” elements and think, “crap, the RTF or WYSIWYG editor don’t support those elements”. When they code patterns of semantic elements, this should drive a discussion on schema design.