Using HTML

by Gisle Hannemyr

This chapter discusses the concept of text formats and their security implivations. It also provides the recipe for stepping outside the conforts of Drupal's WYSWYG editor and using HTML in your content.

Table of contents

Introduction

HTML (Hyper Text Markup Language) is a tagging language that is used to create rich text on the world wide web. You briefly encountered HTML before in this ebook, in chapter 5, when learning how to toggle between the WYSYWYG and <source> representation of text in the editor.

For the record: You don't need to know how to use HTML to create web pages with Drupal. The rich text features (e.g. boldface and italic face) can be inserted by clicking on the buttons above the body area. This is what most of us need to create nice looking contents. The editor even provide a button to insert a hyperlink into the text.

Choosing a text format

If you know HTML, you can use it in your posts. Drupal offers you three choices: Restricted HTML, Basic HTML or Full HTML. Choose from the Input format section when you create or edit content (see figure 21-1).

fullhtml.png
Figure 21-1: The text format settings available when you create or edit content.

By default, Basic HTML is the selected.

Table 21-1 shows all three formats, the lowest role required to be able to use it. For instance, users with the role Authenticed user, will not be allowed to use the Full HTML text format.

NameRoleWYSIWYG
Full HTMLAdministrator
Basic HTMLAuthenticated
Restricted HTMLAnonymous
Table 21-1: Summary of text format default settings and the lowest role required to be able to use it.

Anonymous users are only permitted to use the Restricted HTML text format, which is not set up to use the Drupal WYSYWIG editor.

noteThe reason there are three different text formats for the three different roles is to maintain the site's security while letting users and visitors create content on the site. Only give users you trust 100 % the permission to use Full HTML. The Basic HTML format restricts the range of HTML tags available for good reasons. A lot of HTML tags can be abused to inject dangerous scripts and other hacking tools into a website. Remotely linked images may also a source of abuse. The same goes for anonymous visitors. Make sure this role is only permitted to use the "Restricted HTML text format.

The tables 21-2 to 21-5 shows the HTML tags that are permitted in the Restricted HTML and the Basic HTML text formats respectively. Both formats allow the same set of tags, with the exception of the <img> tag, which is only permitted i Basic HTML.

Table 21-2 Fundamental tags.
HTML tag Function Restricted Basic
<a> Anchor. This creates a link from words in your text to another location.
<p> Paragraph. Use this to enclose a paragraph of text.
<br> Break. This creates a hard line break in the text.
<img> Link to an image embedded in the page.

To use the anchor, embed it in HTML-constructs like these:

Hyperlink to a remote website <a href="https://drupalprimer.com">Drupal Primer</a>.
Hyperlink to <a href="#fragmentid">some fragment within the same webpage</a>
Table 21-3 Semantic tags
HTML tag Function Restricted Basic
<blockquote> Block quoted. This indicates that the enclosed text is an extended quotation.
<cite> This should be used for inlilene citations. By default it italicises the text.
<code> Display as computer code. By default uses monospace font.
<h2> Heading, level 2
<h3> Heading, level 3
<h4> Heading, level 4
<h5> Heading, level 5
<h6> Heading, level 6

In computer science, semantics refers to the meaning of a construct.

Semantic HTML tag sare tags that identify the meaning of the tagged element, rather than exactly how it should be rendered. The interpretation of the semativs is by default left to the browser's user agent stylesheet, but can be overriden by server side stylesheets. By default, most browser's user agent stylesheet will style an element tagged <h2> with a large font size to make it look like a level two heading.

Please note that the sematic tag <h1> is not allowed in Basic HTML. This is level is reserved for the main heading of the web page, and set by Drupal.

Table 21-4 List tags
HTML tag Function Restricted Basic
<dl> Start a definition list.
<dt> Put a definided term into a definition list.
<dd> Put a definition of a term into a definition list.
<ol> Start an ordered (numbered) list of items.
<ul> Start an unordered (bulleted) list of items.
<li> Identify an item in an ordered or unordered list.

The <ol>, <ul> <li> tags can be used in the Drupal WYSYWYG editor, but the defintion list dags cannot. To create defintion lists, you must use the HTML tags.

Table 21-5 Typographic tags
HTML tag Function Restricted Basic
<em> Italicise text.
<strong> Boldface text.

Using Basic HTML

The Basic HTML text format should be the default format on your website, and should be used for most of the content created, both by the authenticated users on the website, and by the administrators.

The list of permitted HTML tags for this text format is shown in tables 6-2 to 6-5 above.

Embedding images into the body of content is allowed, but if you navigate to Manage » Configuration » Content Authoring » Text formats and editors and press "Configure" in the "Operations" column for the Basic HTML text format, you'll notice that in the section "Enabled filter", the filter "Restrict images to this site" is checked. This means that the a tag sourcing the image remotely – like the first tag below – will not be allowed.

<img alt="Image example" src="https://image-cdn.com/image.png" />

Instead, the image has to be stored on the same server, like this:

<img alt="Image example" src="/sites/default/files/inline-images/image.png" />

If you want to host images remotely, use the Full HTML format instead.

Using Restricted HTML

The Restricted HTML text format provides a plain text input field, not a WYSIWYG one. It is only meant to be used to create content by people visiting your website without creating an account.

The Restricted HTML text format has some nice touches. If it discovers web URLs and email addresses embedded into text, and automatically converts those into hyperlinks.

HTML usually requires tags to be inserted to create paragraph and line breaks. However, with Restricted HTML, this is handled by Drupal, behind the scenes. This means that there is no nee for the user to insert <p> or <br> tags in the content to create paragraphs and line breaks.

This automatic formatting takes some of the work out of content cre­ation. When a visitor creates content using the Restricted HTML text format, it will look nice without any HTML.

But for the record: The permitted HTML tags for this text format is shown in tables 6-2 to 6-5 above. Look for a checkmark in the colum "Restricted".

Using Full HTML

Unlike Restricted HTML and Basic HTML, Full HTML does not restrict the tags you can use. It allows you to use any HTML tag you wish. You can select this text format for a piece of content from a pullown menu beloow the body (see figure 21-1).

Restricting users to Basic HTML also prevents inadvertent code mistakes or the inclusion of tags that give the content an appearance inconsistent with other content on your site. You should only use the Full HTML text format if there is a very good reason for doing so.

Another consideration: If you set up a content type configured so that all logged in users can edit the content (often called a wiki), setting the text format of a piece of content to Full HTML will block most users from editing those pages.


Last update: 2022-06-08.