Feeds

by Gisle Hannemyr

The Feeds module is designed for bulk importation and aggregation of nodes, users, taxonomy terms and database records. It provides both an API for developers, and a UI for administrators. In this chapter we shall only look at using the latter to bulk-import a csv-file.

Table of contents

Drupal projects discussed in this chapter: Feeds.

Introduction

This chapter is a tutorial explaining how to use Feeds to bulk import a csv-file.

Installing and enabling the module

The Feeds module relies on Chaos Tools and Job Scheduler, so you need to install and enable these as well unless you've already installed and enabled them. However, if you use drush to download and enable, it will automatically download and enable these as well as part of the process.

Since we're going to use Feeds by means of the user interface, we must enable the Feeds Admin UI and Feeds Import modules as well.

The required sequence of commands is:

$ drush dl feeds -y
$ drush en feeds -y
$ drush en feeds_ui -y
$ drush en feeds_import -y

File format

A csv-file is a file with character-separated values. It is a file format where a plain text file is used to hold tabular data (numbers and text).

There is no official specifications for csv-files, but Feeds expects the data to conform to the syntax exported by MS Excel. You may specify the character set of the source in the parser settings (the default is UTF-8). In most environments, the default character set for the export file from MS Excel is Windows-1252.

A csv-file is a text file. The first line provides the field name, the data are in the following lines, with one record per line. Each record consists of a sequence of fields, separated by some other character or string, most commonly a comma or simicolon. Any field containing the separation character must be enclosed in double quotes ("). A double quote character inside a field must be represented by two double quote characters, and the field must be quoted. The first line usually holds the names of the fields.

noteUsing a non-conforming csv-format is the main reason for missing or munged fields. To learn more about the csv-file format expected by Feeds, as well as a basic list of terms commonly used in Drupal related to feeds with a brief definition of each, see Feeds glossary.

Below is a an example of a csv-file where the separation character is a comma. The first line tells is that each record consists of four fields named “uid”, “lastname”, “givennames”, and “profession”. The next four lines are the four records. Each record consists of an unique identifier, followed by the name of a celebrity, and the person's profession or professions.

uid,lastname,givennames,profession
1,Bjørgen,Marit,skier
2,Clooney,George Timothy,"actor, director, producer"
3,Hall,"Jon ""maddog""",hacker
4,Nicklaus,Jack,golfer

When preparing the cvs-file for being imported, you should edit line one to use field names that can be found in the entity that is the target for the import. In this case, the designate the integer in the first field as the uid, wants the celebrity's last name to go in the title field and the profession to go in the body field.

Create the entity

To hold these fields, create a node entity named “Celebrity”. The bundle consists of four fields corresponding to the four fields in the csv-file. In this example the “uid” field will be handled by the special Feeds field “guid” (described below), the “title” and “body” fields exists by default. There is only one extra field: “givennames.” Create this extra field using the core fields feature.

feeds_bundle.png

Mek sure the default settings for the entity makes sense. You usually want these:

Importing the csv-file

A single configuration for importing is called an Importer. As many importers as desired can be created. An importer defines the following for a particular configuration:

To create an importer, navigate to Structure » Feed importers.

The Feeds Import module has already created two sample configurations: Node import and User import. These are good starting points for creating new configurations. Since we're going to import a feed into nodes we clone the Node import configuration. Clicking Clone brings up a screen where we're prompted for a name and description. We're going to name it “Celebrities csv feed” and add the description: “Imports list of celebrities”. When done, click Create.

This brings up the following screen:

feeds01.png

We're now going to go through all the settings screens, first click Settings for Basic settings, and review the following form:

feeds02.png

Unless you have some very advanced use case and know what you are doing, set the importer to “Use standalone form” (i.e. do not attach the importer to a content type). You will specify the content type to feed into when you specify the settings for the node processor.

Attaching an importer to a content type means that you need to create a node to specify a source to import. This is useful if you have multiple sources to import, all with the same data structure, but only create confusion for simple feed imports.

After looking at the “Basic Settngs” form, proceed to the next step.

You may change the “Fetcher” to download from an URL rather than from an uploaded local file, but for this tutorial, we'll stick with the default (“File upload”).

You may also change the “File upload” settings, but for this tutorial, we'll stick with the default settings.

You may change the “Parser”, but for this tutorial, we'll stick with the default (“CSV parser”).

The settings for the CSV parser lets you select the delimiter, whether there is a header line with fields, and the file character set encoding. In this example, the defaults are fine.

Three processors come Feeds: Node, taxonomy term and user. This example imports nodes, so use the “Node processor”.

In the settings for the node processor, we need to change the bundle to the bundle we created specifically for this import: “Celebrity”. The rest of setting may be left at the default.

feeds03.png

The setting for “Update existing nodes” will determine how duplicates with the same “unique target” will be treated. If it is set to “Do not update existing nodes”, the the first instance with the unique target will be retained. The two other will update the node with the last instance encountered. This setting works both within a single import, and for separate imports.

The mappings determine what field in the csv-files (source) goes into what fied in the bundle. For example:

feeds04.png

While the text on the screen says: “Make sure that at least one definition has a Unique target”, having a unique target is not really required. Not having an unique target will import all the records in the cvs-file into the database. A typical use case for allowing duplicates is to use Feeds for a raw import, and then, when the data are inside the datebase, manually or algorithmically consolidate duplicates.

You need a unique target if you want to update items and/or want to do multiple imports, and you want Feeds to reject or update duplicates. If you don't specify an unique target then, when you run a second import, you may get duplicate items, because Feeds will not check if these items were already imported.

However, if you have a unique target, it will used to eliminate duplicates. You designate that one of your targets are unique by clicking a checkbox in the target configuration to request that it is processed as an unique target. You should only have one unique target.

Feeds provides a special target named “GUID (guid)”, but as far as I am able to tell, any other target works equally well.

If you use the predefined “guid (GUID)” field, it will not be visible as an ordinaty field when you manage fields for the bundle.

noteIf you do not use the predefined “guid (GUID)” field, make sure it is deleted. If it left in the mapping, but not populated, all records except the first will be treated as duplicates and ignored.

You can map a single source field to multiple targets. For example: if your items have an unique serial ID named “uid”, you can map uid both to the special “guid (GUID)” field and a separate, visible field named “uid”.

After setting up the mappings, we're now ready to import the csv-file. To do so, for all types of entities, navigate to Content » Import. Alternatively, navigate to Structure » Feed importers and click on the link to the Import page.

Both end up on the same page showing a list of importers, including the importer just created named “Celebrities csv feed”.

feeds05.png

Click on the link to the relevant importer to activate.

This brings up the importer page. To import, first click the click Browse button to select the csv-file to upload. Then click click Import to upload and import it.

feeds06.png

This should result in the file being uploaded, and nodes with its content inserted into the database.

And again, the text: “Column guid is mandatory …” is not correct.

Final word

The Feeds module is the main workhorse in the Drupal universe for migrating content from some other source (e.g. a legacy WCMS) into Drupal. In addition to the csv-file format used in this chapter, it supports OPML and some XML formats. There exists contributed modules that may be installed as plugins and provide parsers for even more formats.

In addition to be used for one time imports, as described in this tutorial, it can be set up to poll an external site and import content at regular intervals.


Last update: 2018-09-02 for D7 [gh].