Content inventory

by Gisle Hannemyr

This chapter first defines the term “content type” from the perspective of an information architect and a site builder. It then goes on to describe how to create a content inventory in order to identify the content types that shall make up the website, and how to make sure those content types are supported by the website's software. By means of example, this chapter is about conducting a content survey in order to create a Drupal version 10 website. However, the more general principles may be adapted for any WCMS platform.

Table of contents

Introduction

A content inventory is an inventory of content that exists on a website. It is usually registered in some tabular format (e.g. in a spreadsheet, or in a suitable database).

There are two main types of content inventory:

In this chapter, we distinguish between design and redesign. Design is a process that is carried out when no previous version of the website exists, and an outline of what types of content it shall contain must be created. Redesign is when a website is converted from some existing platform to another platform. In many cases, the starting point for a redesign may be static HTML, another WCMS (e.g. WordPress) or an legacy version of the target WCMS (e.g. Drupal version 7).

A content inventory created for the purposes of doing a content survey will only analyse pages that contain representative content types.

A content inventory created for the purposes of doing a content audit will often include all pages of an existing website. It is usually at least partially created by means of automatic tools.

Content types

To understand the purpose of doing content survey as part of the process of building a Drupal website, you need to understand the concept of content types in the context of a web content management system (WCMS) in general, and in the context of Drupal version 8 in particular.

A WCMS is a computer system that allows an organisation or a group of authors to manage and present contents on a website.

ia_circles_en.png
The infamous three circles of information architecture.

From the perspective of an information architect, content is structured data objects. From the perspective of a site builder, structured data objects must be defined in a way that can be represented internally in the website's database. In order to reconciliate these two perspectives, we use content types.

A content type is a pre-defined collection of data types that relate to each other by being components of “something” that from a content creator perspective should be considered as a correlated whole.

In Drupal, the term entity is used to refer to a structured data object that is managed by the WCMS. The particular collection of data types that make up an entity is called a bundle, and the container for a single data type in the bundle is called a field. The main entity used to hold content in Drupal is simply called Content.

noteThe content entity is Drupal's main container for content. Since “content” is a generic term, and the content entity is stored in the {node} base table in the database, the term “node content type” is sometimes used to refer to types of the content entity in order to distuingish it from other types of entities, such as users, comments, taxonomy terms and files. However, in the Drupal community, when you see the term “content type” it always means “node content type” (i.e. a type of the content entity – not a type of some other entity). Also, in the Drupal community, the term “node” always means “instance of node content type”. However, when surveying content, an information architect may use the term “content type” to refer to managed content that is not a “node content type”. For instance a “user” (i.e. a user profile) may also be viewed as a content type by an information architect.

When a content creator creates content for a website, he or she do so by creating an instance of content of a particular content type.

When content are presented to users, it may be in a format to corresponds closely to what the content creator create. However, a WCMS lets the site builder arrange for an infinte number of other ways to present content (e.g. aggregate lists, or the content of an individual field may be extracted from one content type joined into another).

Not everything that is visible on a web page is a content type. When doing content survey to identify the content types on a particular website, the information architect must use discretion to distinguish between instances of content types and auxilliary content.

This list of things that are not instances of content types may be handy until you have learnt from experience how to identify instances of content types:

noteYou also should not do a content survey of teasers. A teaser is an instance of a content type that only render an extract or a summary of a content type instance with a link to an expanded rendering of the same content type instance. Do not bother making a content survey of the teaser. Click through the teaser to the expanded rendering, and do a survey of that. Most teasers are part of an aggregation and is contextual navigation. Both these criteria are grounds for excluding teasers from the content survey.

If there is only a single instance of “something” on a website, it makes no sense for an information architect to create a content type for it. It is much more simple to create singular content as a block or as a decorative element. If you are only able to find a single instance “something” on a website, it is very unlikely that this an instance of a content type.

Predefined content types in Drupal

Drupal core comes with some predefined content types.

Predefined content types in Drupal version 7

Six different node content types are predefined by the Drupal version 7 core. These are listed below, along with a comment that mentions some of their characteristics. Note that the five first are very similar, with long text as their main field. What makes them different are the default settings and what aggregation structure they by default belong to.

Article
Main field: long text. Default settings: byline/date visible, comments open, promoted to front page. Aggregation: Main content.
Basic page
Main field: long text. Default settings: byline/date hidden, comments closed, not promoted to front page. Aggregation: None.
Blog entry
Main field: long text. Default settings: byline/date visible, comments open, promoted to front page. Aggregation: blog.
Book
Main field: long text. Default settings: byline/date visible, comments open. Aggregation: book.
Forum topic
Main field: long text. Default settings: byline/date visible, comments open. Aggregation: forum.
Poll
This is different from all the other content types. It allows your to create multiple choise questions and to capture votes on these.

Of those six, only Article and Basic page are enabled by default. When doing content survey of a website, expect to come across these two content types a lot. If a content item mainly consists of text with inline images, without a visible byline and date, it is very likely that it is a Basic page. If it carries a byline and date, it is very likely that it is an Article.

tipLong text is a text field that consists of several lines of text, and may also contain embedded markup for inline images and inline (integrated) links. Just text is a field that consists of a single line of text without embedded markup.

Predefined content types in Drupal version 10

In Drupal 10, there are only three node content types:

The three other node content types that were available in Drupal 7 are provided by extensions:

A custom content type

When doing content survey, expect to come across the content types Article and Basic page frequently.

One of the tasks of a site builder is to create custom content types that uses specific fields to enrich the information architecture of the site. By breaking out specific data types from a catch-all long text field, the informating architect can make content more findable by providing faceted search for the site, and use specific fields to create contextual navigation links.

The screenshot below show an instance of a content type that is taken from the website of a community of practice. This community is interested in keeping track of events (workshops, conferences) and publications (journals, monographs) where members of the community may want to participate and contribute. This particular content instance instance is about a conference named “NordiCHI 2018”. The date when papers are due 15th of April 2018, and the conference is scheduled to take place from October 1st to October 3rd the same year. The venue location is Oslo in Norway, and there is a link (shown in blue) to the conference website. There is also some text describing the conference.

ct_cfc.png
The fields that make up this instance of a custom content type.

The screenshot above shows a content instance, and an information architect has (with red pen) deconstructed all specific fields that make up the bundle of this particular content type.

tipDeconstruction of the fields that make up the content is done when you are doing a high fidelity content survey. Early in the design process, you will probably only do a low fidelity content survey, and you will not have to break down the content type in individual fields.

The deconstruction tells the information architect that the bundle for this particular content type consists of the following six fields:

The information architect names this content type “Call for Contributions” and creates a sketch that roughly shows all the fields that make up its bundle:

cm_wireframe.png
Sketch of the fields in this custom content type.

Given that we want to create a website for another community of practice that provides its members with a similar content type, it is now fairly simple for the site builder to “translate” this sketch into an actual content type on the Drupal website that is being built for this community of practice by means of Drupal's built-in function to create content types (see the chapter with the title Creating content types using fields for a description of this). The result of the site builders work will look like this when the newly created content type is examined in Drupal's administrative GUI:

cm_fields.png
The fields as they appear in the Drupal GUI.

Notice that the column with the heading “Field type” assigns a specific data type to the field. The colum with the heading “Widget” specify what user interaction widget to use when the content creator creates new content instances of the content type.

Separating content from clutter

When starting out surveying content, it is easy to be confused by the clutter of content that appears on most web pages. The ability to separate the content type instance from the clutter comes with experience.

In the example below, two different pages taken from the website of “Netfonds Bank” is surveyed. The first of those pages is shown below:

cm_netfonds1.png
A web page that is simple to survey.

Even an inexperienced information architect should manage to call this one correctly. They know that the navigation at the top and bottom of the page should be excluded from the content. What remains is a long text field. I have highlighted this contents by putting a red frame around it. The long text does not have a visible byline or date, so it should be obvious that the content type of the web page above is Basic page.

The second page in this example is much more cluttered. When surveying its contents, you know that you should exclude the navigation at the top of bottom when doing content survey – but what about the image of the girl and the speech bubble? What about the other “stuff”, such as the links to other geographical regions and links to most traded shares (in the left sidebar), the graphs and the key numbers (“nøkkeltall”) in the right sidebar?

cm_netfonds2.png
Cluttered web page whose content is of the type basic page.

In fact, all the stuff in the two sidebars as well as the speech bubble that extends over the first sidebar and the main content region are not “content” in the context of content survey:

What remains is the part of the page is an area filled with long text. It carries no byline or date, so we recognize the content type as an instance of your old friend Basic page. I have highlighted it by putting a red box around it.

Aggregates

Some web pages simply displays an aggregate where several instances of a content content type are displayed.

For instance, the main content field of the web page shown below is an aggregate that shows teasers linking to all the players on Machester United's first team:

cm_mufcfirstteam.png
The main content region gives a view of an aggregate of all the players on the first team.

Since the page is made up of global navigation (header region) local navigation (left sidebar). and an aggregate (main content region), there is no content to survey on the web page shown above.

The teaser showing an image of the player, his jersey number and his last name is a link, linking to the profile of the individual player. This means that this page also provides contextual navigation to help visitors navigate to individual player profiles. When you click through to an individual player profile, you see a page like this:

cm_mufcplayerprofile.png
An individual player profile.

On this page the main content area holds an instance of a content type which we can name “player profile”. The information architect is now able to deconstruct the fields that constitute the player profile content type. There are:

The team-class (the taxonomy term “first team”) does not appear as a field in the player's profile, but it must be part of the player bundle to let the site builder create the view of all players of the team-class “first team” on the aggregate page show above.

Note that four of the fields are of the data type “taxonomy term” rather than “text”. This is because the site builder may want to use these to create contextual navigation. We've already seen that by making “Team-class” a taxonomy term, the site bilder is able to create a page to view all players on a team and provide contextual navigation to individual player profiles. By making “Position” a taxonomy term, the site builder can provide a page to view all players that play at a particular position in this sports club. By making “International” a taxonomy term, the site builder can provide a page to view all players of a particular nationality.

When deconstructing fields, always think about whether it will be useful to use a taxonomy term as data type for a text or image field.

Also, as an information architect, you should understand that the links to the right (“Order your De Gea shirt”, “Read exclusive player interviews”, “Browse player photo galleries”, “Download free united wallpapers”) and the Adidas advert is not part of the player profile. The links provide contextual navigation, the advert is just decoration..

In the example above, it is obvious that the aggregate player page is not a content type because it lets visitors navigate to individual player profiles. So the main content area of the aggregate page is made up entirely of teasers that are used for contextual navigation. You've already learnt that neither a teaser nor navigation should be surveyed, so it should be obvious that there is no point in including this particular page in a content survey.

Now, look at a less clear cut case. Should the web page below, showing a list of fixtures and results, be surveyed?

cm_mufcfixtures.png
Aggregate of fixtures and results.

The main content region lists four fixtures. There is a link associated with each fixture. It goes to the foundation page for match reports for a fixture in the past, and links to the online ticket office if the match is in the future. But neither of those links leads to something that can be regarded as a content type from the perspective of an information architect. So unlike the previous example, where following links took you to a player profile page which you could deconstruct as a content type, this page offers no similar link. I.e. there are no teasers on this web page.

However, you should recognize this page as an aggregate of an underlying content type, named “fixture”. Below is how an information architect would deconstruct this type.

There are a few things to note here:

First, note that the information architect has choosen to use a taxonomy term to designate both the home team and the away team. By making this choice, the site builder can use the taxonomy term to pull the team's logo from a logo database, add the textual representation for the team's name, and render the field showing the logos and the text. This saves the content editor from having to create a juxtaposition of the logos and the text by writing HTML-markup.

Secondly, note that two of the columns in the aggregate shows different fields depending upon contect (i.e. whether the fixture is in the past or in the future). For fixtures in the past, the fifth column shows the result and the last column links to matchreports. For fixtures in the future, the fifth column shows the kickoff time, and the last column links to the ticket office.

These two notes explains why the information architect is able to deconstruct ten separate fields from seven columns.

Doing a content survey

The purpose of doing a content survey is to identify the content types that need to be supported on the website you are designing, redesigning or upgrading.

This content survey can be low fidelity or high fidelity. Both involve going on a mind-boggling detailed odyssey through the website and/or physical documents you are surveying. A low fidelity content survey only identifies the content types. A high fidelity content survey deconstructs the individual fields that constitutes the content type's bundle.

The process of creating a content survey, in the case of a redesign project, is the relatively straightforward process of clicking through the legacy website and recording what you find in a simple spreadsheet.

If you are designing a new website, or if the legacy website is very sparse and do not have the content your client want to put on the new website, your content survey need to take a slightly different route:

To do content survey for a website, start at the home page. Identify the major sections of your site, and dip down into those sections. See what's linked from it. At first, you will probably just see subsite foundation pages and aggregates, and very few actual instances of content types (see the list above to learn how to identify “stuff” you may come across when you click through a website that are not instances of content types).

For each page that you visit were you are able to identify distinguished instances of content types, make a note of the characteristics of the content type in the spreadsheet you use to record your content survey. Follow links and navigate through the website until you have a fairly good overview of what content types are used to populate the site.

Doing a complete walkthrough of a complete website is usually not practical (unless the website is very sparse). For the purposes of content surveying find, examine, analyse and record a representative sample of the system's content (e,g two of each kind). This is sometimes referred to as the “Noah's ark” approach. As for how many pages to analyse, and how much time to spend, there are no clear rules. You need to use some intuition and judgment, balancing the size of your sample against the time and budget constraints of the project.

Links to a spreadshet with content inventory templates:

Low fidelity content survey

When an information architect is designing, redesigning or upgrading a website, he or she needs to tell the site builder what content types the site is going to need. A low fidelity content survey is a design document (typically a spreadsheet) that represents the first step towards defining the content types the website will require.

Example spreadsheet for a low fidelity content survey:

contentsurvey_lofi.png
Spreadsheet for a low fidelity content survey.

Here's a description of the things to consider putting in the inventory where you create a content survey:

High fidelity content survey

A high fidelity content survey is deconstructing the fields that make up the bundle of each content type surveyed.

Example spreadsheet for a high fidelity content survey:

contentsurvey_hifi.png
Spreadsheet for a high fidelity content survey.

Content audit

A content audit is only done when some website already exists. The purpose is threefold:

  1. To have complete inventory of all existing content to make sure that all useful content is migrated.
  2. To prune out from migration content that is redundant, outdated or trivial (ROT-analysis).
  3. To refactor the migrated content, to improve the information architecture of the website.

Similar to a content survey, a content audit is conducted by inventoring content in a spreadsheet or database.

ROT-analysis

Identifying ROT is an essential part of content audit. It helps spot obvious content problems. When creating the spreadsheet to identify page titles, links, content types, keywords and other facts about your content (nodes), add a column for ROT in your content audit spreadsheet or database.

ROT are, for instance:

Fix and prune ROT before migrating the website.

Source: MeetContent.com.

Automatic tools

There exists automatic tools that will crawl a website and report the URL of every page that make up the site. See, for example:

This is usually what you want to start with if you are conduction a content audit. Using an automatic crawler will ensure that you don't miss any pages. After the site has been crawled, transfer the result to a spreadsheet or database for analysis.

You may also use one of these tools to generate a list of URLs and use that as a starting point for the content survey, rather than embarking on a manual oddyssey of the website. However, as you've already discovered, not all pages on a website are instances of content types. If you are doing a content survey: To translate the result of an automated crawl into a content survey you must go through the pages found by the automated tool by hand, and identify those that contain instances of content types.

Final word

Surveying content is a human task. In fact, you find that the process can often be as valuable as the final spreadsheet. If you invest the time in analyzing the website and deconstructing each pages into the content inventory (or at least an inventory of a representative selection of pages – i.e. the “Noah's ark” approach), you will gain invaluable insight into how it all goes together. That's important knowledge to possess when designing, redesigning or upgrading a website.


Last update: 2023-01-15 [gh].