Visit Citebite Deep link provided by Citebite
Close this shade
Source:  http://www.sitepoint.com/article/html-37-steps-perfect-markup/2

Article

Bulletproof HTML: 37 Steps to Perfect Markup

18. What are block-level and inline elements?

HTML employs two main categories of element types: block-level elements and inline elements. The differences between them are mainly semantic and grammatical.

Block-level elements are usually "containers" for other elements. Examples of block-level elements are div, p, form and table. Some block-level elements (e.g., p) can only contain text and inline elements. Others (e.g., form) can contain only block-level elements (in the Strict DTD). And some, like div, can contain text, inline elements and block-level elements. By default, block-level elements are rendered with an implicit line break before and after; in other words, we cannot have two block-level elements side by side using only Strict HTML. (To do so would require CSS.)

Inline elements are elements that can exist "inline" within text. Examples include a, em, q and span. An inline element can contain only text and other inline elements. An inline element cannot contain a block-level element, with one exception: object (which is known as a replaced inline element, the same as img). Inline elements, when rendered, do not have any implied line breaks before or after.

In some cases, additional restrictions are placed on child element types. For instance, anchor links (a) can contain text and inline elements, but not other a elements; you cannot nest links.

The rules are somewhat different between the Strict and Transitional DTDs. In the Strict DTD, some block-level elements, including body, blockquote and form, can only have block-level children. In the Transitional DTD, they can also contain text and inline elements as immediate children.

19. Can I make an inline element block-level with CSS?

No. This is a common misconception. Beginners sometimes think that by applying the display: block declaration to an a element, they will be able to put a block-level h1 inside the link. That is not the case.

HTML has block-level and inline elements. CSS has block and inline boxes (plus a few others). These are very different things. The distinction in HTML has to do with semantics and syntax, while the distinction in CSS has to do with rendering and presentation. By default, block-level elements generate block boxes, and inline elements generate inline boxes (this is a grossly simplified explanation, but is generally true). The display property can change the type of the generated box, but CSS cannot change the grammatical or syntactical rules of HTML.

20. Why are external CSS and JavaScript files a good idea?

From a maintenance perspective, a full separation of content, presentation and behaviour is something to strive for. Then, if we want to change the colours of our site, we can simply edit a single style sheet instead of updating possibly thousands of HTML documents. If we use style attributes and write inline CSS, we will have to edit all of those HTML documents when redesigning our site, instead of simply editing a single style sheet file.

There is also another issue: both CSS and JavaScript often contain characters that have special meanings in HTML. If the CSS code or JavaScript code is embedded into the HTML document, these characters need to be escaped. If we have embedded JavaScript, and use the archaic practice of "hiding" the script code within SGML comments (<!--...-->), we cannot use the decrement operator (--), because the double hyphen will terminate the comment.

21. Should I use p or br?

The p element marks up a paragraph of text. A paragraph is one or more sentences that deal with a single thought.

A line break (br) is mostly a presentational tool, and should be handled by CSS rather than HTML. However, there are a few cases where line breaks can be said to have semantic meaning, for instance in poetry, song lyrics, postal addresses and computer code samples. These can constitute legitimate uses for br, but using br to separate paragraphs is definitely an abuse of the br element.

On the other hand, p has a very clear semantic meaning: it denotes a paragraph. Sometimes web authors tend to treat p as a generic block-level container element, but that's not correct. It's not uncommon to see a label and an input field wrapped inside a p within a form, but I would argue that it's semantically wrong. A label and an input field do not constitute a "paragraph".

22. What does "semantic" mean?
se-man-tic [si-'man-tik]
adj. Of, pertaining to, or arising from the different meanings of words or other symbols.

(definition from dictionary.com)

When we talk about "semantic markup", we mean the proper use of element types -- based on their meaning -- to mark up content. The opposite is "presentational markup" or "tag soup", where authors choose element types because of their default rendering, rather than their semantic meanings.

An example: This is a semantically correct way to mark up the top-level heading of a web page:

<h1>Heading Text</h1>

This is an unsemantic (presentational) way to do it:

<br><font size="7"><b>Heading Text</b></font><br>

The semantic richness of HTML is quite limited. HTML was originally used by physicists to exchange scientific documents, and that shows quite clearly in the set of available element types. HTML would probably have had a very different set of element types if it had been invented by accountants or librarians.

HTML has two semantically neutral element types as well: the block-level div and the inline-level span. Neither of those two implies any particular semantics about its content; div is just a "division of the document", while span is a "span of characters". On the other side of the spectrum we have element types with clearly defined semantics: p (paragraph of text), table (tabular data), ul (unordered list), and so on.

The purpose of HTML is to mark up the semantics of a document, and -- to some extent -- to show the structure of its content. HTML has nothing at all to do with the way this document looks in a browser (although browsers have a default style for each element type).

23. Should I replace b and i with strong and em?

Only if you really mean to emphasise something. These notations are not interchangeable.

In the Bad Old Days, authors would use b and i to emphasise words.

In the Equally Bad Modern Days, authors use strong and em to make text boldfaced or italic.

em signifies semantic emphasis. The content to which it is applied should have some sort of emphasis when read out loud (louder, more slowly). strong indicates even stronger emphasis, but is now often considered redundant (you could nest em elements to indicate increasing emphasis). Some experts recommend that strong be used only for certain page elements that should be clearly indicated (like a "current page" indicator), and not to mark up words or phrases in the body copy.

b and i have no semantics; they only indicate bold or italics. They are useful for adhering to typographic conventions that do not have a semantically correct element type in HTML. For instance, ship names are traditionally written in italics, but there is no ship element type in HTML. Thus we can use <i>Titanic</i>.

24. Why are layout tables considered harmful?

  • It is semantically wrong to mark up non-tabular information as a table.
  • They can cause accessibility or usability problems (especially with some assistive technologies), particularly when nested several levels deep.
  • They mix presentational issues with the content, making it difficult or impossible to achieve alternate styling and output device independence.
  • They bloat the document markup with lots of unnecessary HTML tags, which can be detrimental for low-bandwidth users (those using dial-up connections or mobile devices) as well as for the web server's load and bandwidth.

25. Should I use divs instead of layout tables?

No, we should use semantically correct element types as far as possible, and only revert to divs when there are no other options.

Abusing divs is no better than abusing tables. We can set id and class attributes on virtually any element type. We can assign CSS rules to virtually any element type, not only to divs.

26. Are tables deprecated?

Not at all. table is the proper, semantically correct element type to use for marking up tabular data: information that has relationships in two or more dimensions. Tables are not deprecated, but layout tables are an issue.

27. What is the correct use of the address element type?

address is used to mark up contact information for the page (or for a part of a page). This can be a postal address, an email address, a telephone number, or virtually any contact details. address is a block-level element that can only contain text and inline elements. The default rendering is italic in most browsers, but that can be changed easily with CSS.

A common misconception is that address is meant to be used to mark up only postal addresses, but that is not the case.

28. What is the correct use of the dfn element type?

dfn is used to mark up the "defining instance" of a term. It's a typographic convention, especially common in scientific documents, to italicise a new term -- one with which the reader cannot be expected to be familiar -- the first time appears in the text. The default rendering of dfn is thus italic.

A common misconception is that dfn means "definition", and many authors use it in the same way they use abbr or acronym (by using the title attribute to provide an explanation of the term). A certain term should only be marked up with dfn once in a document (where it is first used and explained).

29. What is the correct use of the var element type?

var is used to mark up a variable, or replaceable, part of some text. It's a typographic convention to italicise such variables, which will be replaced by actual data in real life. For instance, in a telephone system manual, the instruction for relaying incoming calls to another extension could look something like this:

<kbd>* 21 * <var>extension</var> #</kbd>

Here, a var element is used to mark up "extension" (which will be italic by default). Someone trying to program the telephone system to relay his incoming calls to extension 942 would type "*21*942#". Thus the var element indicates that you shouldn't actually type "e-x-t-e-n-s-i-o-n", but enter the actual extension number instead. The word "extension" is a variable.

A common misconception is that var should be used for marking up variables in programming code samples.

30. Should I use quotation marks within or around a q element?

No, the specification clearly says that it is the responsibility of the user agent to add quotation marks to inline quotations. Unfortunately, some older browsers (such as Internet Explorer 6) do not comply with the specification and will not add quotation marks. An option is to insert the quotation marks with JavaScript, and use some special styling with CSS to insert quotations for IE users with JavaScript disabled. Some CSS-only solutions have been proposed, but they will fail in non-CSS browsers such as Lynx.

31. What is the difference between abbr and acronym?

No one really seems to know the answer to this one! Even the HTML specification contradicts itself on this point.

abbr was a Netscape extension to HTML during the "browser wars". acronym was Microsoft's extension. Both meant the same thing, more or less. Both element types were incorporated into the HTML specification, with different semantics. The problem is that no one seems to be able to explain what those semantics are.

Let us look at a couple of dictionary definitions, then:

ab-bre-vi-a-tion [uh-bree-vee-'ey-shuhn]
n. A shortened or contracted form of a word or phrase, used to represent the whole.
ac-ro-nym ['ak-ruh-nim]
n. A word formed from the initial letters or groups of letters of words in a set phrase or series of words.

The definition for acronym says that it is a word, i.e., it can be pronounced. Thus, "NATO" would be an acronym, formed from the initial letters in the phrase "North Atlantic Treaty Organization". "FBI", however, would not be an acronym according to the dictionary definition, because it is not pronounced as a word, but rather spelled out (eff bee eye). And this is where the problems begin. "FBI" is technically known as an initialism, about which the dictionary has the following to say:

in-i-tial-ism [i-'nish-uh-liz-uhm]
n. 1. A name or term formed from the initial letters of a group of words and pronounced as a separate word.
2. A set of initials representing a name, organization, or the like, with each letter pronounced separately.

The first definition is almost the same as for acronym, but the second is more relaxed. However, there is no initialism element type in HTML, and the confusion is exacerbated by the fact that "acronym" in normal American parlance is used as a synonym for "initialism".

The HTML specification offers the following definitions:

abbr: Indicates an abbreviated form (e.g., WWW, HTTP, URI, Mass., etc.).
acronym: Indicates an acronym (e.g., WAC, radar, etc.).

So far it looks like the specification is adhering to the dictionary definitions, which means that "FBI" should be marked up with abbr since it can't be pronounced as a word. However, a few paragraphs further down, the specification says,

Western languages make extensive use of acronyms such as "GmbH", "NATO", and "F.B.I.", as well as abbreviations like "M.", "Inc.", "et al.", "etc."

Are you confused yet? I am. The safe thing to do then should be to always use abbr, since all acronyms are abbreviations, but not vice versa. However, there's a slight problem with that approach. Microsoft was so miffed when the W3C decided to use abbr for abbreviations and initialisms instead of their acronym, that they actually refused to support abbr! (They've started supporting abbr in Internet Explorer 7, though.)

So, what's a poor web author to do? Why should we even bother? It might be nice to have an element to attach a title attribute to, but we could use SPAN for that. The idea, allegedly, is that marking up abbreviations and acronyms would be beneficial for assistive technologies; especially screen readers. But screen readers tend to ignore abbr and acronym, since no one knows how to use them properly and Microsoft doesn't support abbr. It's a catch-22.

The answer to this frequently asked question is: I don't know! I, personally, use abbr for obvious abbreviations like "Inc." and for initialisms like "FBI", and I use acronym for things that can be pronounced as words, like "GIF". But due to the ambiguity of the specification, I cannot fault anyone for marking up "FBI" as an acronym (although "Inc." certainly is not an acronym). And what about "SQL", which some spell out and others pronounce as "sequel"? (I'd use abbr.)

32. Why is <feature X> deprecated?

The most common "feature" that beginners ask about is the target attribute for links. This feature is deprecated (disapproved of) in HTML 4.01 Strict, but it's still valid in HTML 4.01 Transitional. Many other element types and attributes that are allowed in Transitional are removed from Strict.

The reason for deprecating those items is that the W3C wants to promote the separation between content (HTML), presentation (CSS) and behaviour (JavaScript). Making an element centred within the viewport is a presentational issue; thus it should be handled by CSS instead of a center element. Opening a new browser window is a behavioural issue; thus it should be handled by JavaScript rather than a target attribute.

Typically, the deprecated features are those that arose during the browser war era of the late 1990s, when browser vendors were competing by adding various extensions to HTML to make it into some sort of page layout language. These features were included in HTML 3.2 to bring some sort of order to the chaos, but this is not what HTML was intended for. When HTML 4 was released, the authors tried to "reclaim the Web" by deprecating what they saw as "harmful" parts of HTML 3.2, at least in the Strict DTD.

In other words, things are deprecated for a reason. Don't use those features unless you absolutely have to.

33. Must I have an alt attribute for every image?

Yes, the alt attribute is required for the img element type. Why? Well, not all users can see images and not all user agents can understand or display images. For example:

  • A person who is blind or has very low vision cannot see an image. A screen reader cannot describe an image.
  • Users with slow connections (dial-up or mobile) sometimes disable images for faster surfing.
  • Text browsers like Lynx do not support images.
  • Search engine bots cannot understand images.

Thus we have to provide a text equivalent for each image, using the alt attribute. This text equivalent should not describe the image; it should convey the equivalent information. Writing good text equivalents is not easy, and it takes a lot of practice. Remember that the text equivalent is displayed instead of the image.

So what is a good text equivalent for a given image? That depends on the context in which the image is used! It's not like there is a single "perfect" text equivalent for each image. Let us look at an example: say we have an image of a grazing cow. This particular cow happens to be an Aberdeen Angus. Let us then consider a few use cases for this image.

  • In the first case, this image is used as a generic illustration for an article about beef cattle farming in Scotland. The actual cow isn't germane to the article; it's just an illustration, a decorative design element that draws the reader's eye and relieves the monotony of the text. In this case, the image doesn't convey any relevant information. Therefore it should have an empty text equivalent: alt="".
  • In the second case, the image is used on a children's web site about farm animals. The page shows pictures of various animals: a horse, a sheep, a pig, a cow, etc. Next to each image is a block of text that presents some facts about each species. In this case, alt="Cow:" could be appropriate. It's not important that it's an Aberdeen Angus; the picture represents bovine quadrupeds in general.
  • In the third case, the image is used on a site about different breeds of cattle. Here it is used to illustrate what an Aberdeen Angus looks like, and how it is different from other breeds. The page comprises a number of images, each with a caption that identifies the breed, but no other textual information. In this case, the text equivalent should describe the particular attributes and traits that are specific to an Aberdeen Angus: the robust build, the massive chest, the relatively short legs, the buffalo-like hump behind the head, etc.
  • In the fourth case, the image is used on a photographer's portfolio page. It's one image among several others, with very different motifs. This is one of the few cases where the alt attribute might actually include a description of the image itself, e.g., "A black Aberdeen Angus grazing in the sunshine with Ben Nevis in the background."

As we can see, the appropriate text equivalent depends on the context. Sometimes (often, actually) it should be empty, because the image doesn't convey any information that isn't available in the accompanying text. Some claim that such images should be background images specified via CSS, but there are many cases where that is impractical and where the image is really part of the content -- even though it doesn't convey any useful information to those who cannot see it.

For images that contain text, the text equivalent should of course replicate the text in the image. For things like pie charts, the text equivalent should convey information about the percentages -- the same information as the image conveys.

The alternate text shouldn't be too long. Some browsers don't word-wrap text equivalents, and they cannot be formatted in any way. If we need a longer text equivalent, we should put it somewhere else and link to it via the longdesc attribute.

Internet Explorer and old Netscape browsers display the alt attribute in a tool-tip when the user hovers the mouse pointer over the image. This constitutes an incorrect use of text equivalent data. We should use the title attribute for "tool-tip" information. To suppress the alternate text appearing in tool tips, we can use an empty title: title="".

34. What is the difference between class and id?

An id uniquely identifies a particular element in an HTML document. It's like a social security number, providing a unique handle for that element. Just as two people cannot have the same social security number, no two elements in a document can have the same id. ids must be unique within the page.

A class says that an element has some traits which it (possibly) shares with other elements. An element can belong to more than one class. An analogy could be professions: a person could be both a carpenter and a nurse, and there are many carpenters and many nurses. (They all have unique social security numbers, though.)

Both ids and classes are mainly used with CSS and/or JavaScript. In CSS, an id has higher specificity than a class, making it easy to specify special rules for a specific element. With JavaScript we can look up an element using its id (document.getElementById()).

We assign ids to page elements that can occur, at most, once per page, like a navigation menu, a footer, a sidebar, etc. We can also assign ids to specific elements that only occur once in the whole site, like a specific image, if we want to have certain CSS rules for it, or manipulate it with JavaScript.

We assign classes to elements that share some common traits, usually display properties via CSS rules.

ids and class names should be as "semantic" as possible. They should describe what something is, not what it looks like. Thus, id="menu" is much better than id="left"; especially if we redesign and move the menu to the right-hand side.

ids and class names are case sensitive, even in HTML. We shouldn't rely on case-sensitivity, though (i.e., we should not use names that differ only in case).

35. Why doesn't id="123" work?

Values for the id, name and class attributes must start with a letter (A-Z or a-z).

36. Why doesn't <a href=My Cool Page.html> work?

There are two reasons in this case.

Attribute values that contain characters other than letters, digits and a few other characters must be enclosed in quotation marks (double or single). Any attribute value that needs to contain a space, for instance, must be quoted. The easiest and safest solution is to always quote attribute values. To include quotation marks in a quoted value, we have two options. We can either use the "other" quotation mark to enclose the value (alt='My "new" car', alt="Jane's car"), or use an entity or reference (alt="My &quot;new&quot; car", alt='Jane&#39;s car'). (Note that the &apos; entity cannot be used with HTML.)

The second reason is that there are spaces in the URI. These need to be encoded as shown here:

<a href="My%20Cool%20Page.html">

"%20" means "a character with code point 0x20". 0x20 is the code point for the space character. This applies to URIs only, not to attribute values in general.

37. How can I include an HTML page in another HTML page?

With a Strict DTD, there is only one valid option: the OBJECT element type:

<object type="text/html" href="http://example.com/foo.html">
 Alternate content here for browsers that don't support OBJECT.
</object>

Unfortunately, support for object is all but non-existent in Internet Explorer.

With a Transitional DTD, we can also use the iframe element type:

<iframe src="http://example.com/foo.html">
 Alternate content here for browsers that don't support IFRAME.
</iframe>

A much better approach is to handle inclusion on the server-side. Using server-side includes (SSI) is the simplest way to include a file into another, as long as they are from the same domain:

<!--#include virtual="/foo.shtml"-->

Note that this technique cannot be used to include a complete HTML document within another, though; it can be used only with fragments of HTML.

Other server-side technologies allow us to perform more advanced tasks. Your web server must support those technologies, of course. Often, shared servers with free hosting don't provide any such technologies -- not even SSI.

» Page 1 2

Sponsored Links

Rate This Article

  • 1
    Poor
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
    Great

Recent Comments

*Comment

In that case, you shouldn't have to worry about dashes at all. You don't 'escape' any CSS or JavaScript with comments? Like this
<script type="text/javascript">
<!--
... script code ...
// -->
</script>

  Posted by: AutisticCuckoo Apr 23rd, 2007 @ 1:09 PM EDT

 

*Comment

Very nicely written, refreshing article.

  Posted by: croatiankid May 6th, 2007 @ 5:44 PM EDT

 

*Comment

Really enjoyed your article. Although much should be standard knowledge for any webdesigner, some implementations were new to me and quite educational. Nice job!

  Posted by: Roy May 10th, 2007 @ 1:31 PM EDT

 

Post A Comment

OR log into the SitePoint Forums:

Best Seller!

The Principles of Beautiful Web Design

You don't need to go to Art School to design great looking web sites!

Book Cover: The Principles of Beautiful Web Design

Download the FREE sample chapters

SitePoint Kits

Download sample chapters of any of our popular kits.

The Email Marketing Kit
The Usability Kit
The Web Design Business Kit
The Search Engine Marketing Kit