Preventing spam

by Gisle Hannemyr

This chapter discusses user roles, spam-fighting strategies and Drupal modules in the context of protecting a Drupal website against bogus registrations and spam.

noteThis chapter contains links to Wikipedia. While Wikipedia is often helpful by providing definitions, explanations and other useful information in plain language, it is not an authoritative source. Content on Wikipedia may be misleading and erronous. Use Wikipedia to read up on and get acquainted with an unfamiliar subject, but do not rely on it being factual. Citing Wikipedia in essays and other scholarly writings is deprecated.

Table of contents

Projects mentioned in this chapter that at least has a Drupal 9/10 dev-release: Akismet, Anonymous Publishing, CAPTCHA, Cloudflare, Honeypot, http:BL, Mother May I, reCAPTCHA, Registration Role, Spam Master, Spambot, Spamicide, Webform.

Projects mentioned in this chapter that have not yet been upgraded beyond Drupal 7 (checked 2023-09-07): Auto Assign Role, AntiSpam, BlogSpam, Block anonymous links, BOTCHA, Confident Captcha, Hidden CAPTCHA, Hashcash.

Introduction

Spam has become a huge problem for any site that allows users to publish content or fill in a form on the site. Spammers will try to submit on anything that has a submit button. Their favourite targets are guestbooks, blogs, forums, comments, contact forms and other and webforms.

Human spammers exist, but they are fortunately few and far between. Most of the spam that appears on websites are created by so-called spam-bots (often abbrivated to just 'bots). Spam-preventation usually boils down to identifying and blocking 'bots.

tipOne of the most notorious 'bots is named XRumer (Wikipedia). It bypasses most types of CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), and is capable of using email to confirm account registraton on most websites, including Hotmail and Gmail.

In the context of spam protection, users come in just two flavours: trusted and untrusted. In this particular context, trusted just means: Can be trusted to not create spam, so a trusted user will not be subject to any of the spam-fighting provisions described below. An untrusted user, however, may be subject to some provisions, whether that user is a 'bot or a real person.

In Drupal-speak anyone that visits your website without being logged in is known as the Anonymous user. The Anonymous user is always untrusted. A Drupal website also have a least one Administrator. The Administrator is always trusted. That leaves logged in users (Authenticated users in Drupal-speak) that are not Administrators. Such users may be either trusted or untrusted. Authenticated users may have additional roles (e.g. Content editor.) See the next section for further discussion.

The final decisions about user roles, spam-fighting strategies and deployment of Drupal contributed modules to prevent spam should be taken by the site's Administrator, who by defintion must be a trusted person. However, the Administrator may not be conversant with information architecture, website design and Drupal itself. Therefore, the site builder must analyse the task at hand (i.e. spam prevention) and lay out the available options and the consequences of certain design choices, so that the Administrator is in a position to make an informed decision when facing design choices.

User roles

Before working out the anti-spam strategy and deciding on what modules to use to implement that strategy, the first design decision is to identify the user roles that will be required, and what restrictions to put in place for all the untrusted user roles that exists.

tipOn a typical website, there may be user roles with higher level access than trusted users, such as Administrators and Content editors. These must always be trusted, and they may have access to content creation forms that untrusted users do not have access to. When you create the form inventory for the website (described below, put down “No access”) for forms that are reserved for these special high level roles.

In addition to the built-in Administrator, only three user roles – Anonymous user, Authenticated user and Content editor – are predefined in Drupal's Standard profile.

technicalThe Content editor is a late arrival, and first appeared in the Drupal version 9.3 in October 2021. See DO: Add new "Content Editor" role to Standard Profile for background. All users that register on the website automatically get the Authenticated user role. The role cannot be removed from the account, but blocking an Authenticated user prevents that user from logging in. The Content editor role must be granted by an administrator, and the role can be retracted without affecting the user's Authenticated user status.

Since the Anonymous user should always be considered untrusted, access to content creation forms on the site are contingent the anonymous user passing a spam protection measure, such as completing email verification or successfully meeting a CAPTCHA challenge. The Administrator is always trusted (this cannot be changed), and will never be subject to any anti-spam-provisions. This leaves the Authenticated user and Content editor, who may either be trusted or untrusted, As noted in the introduction, many community websites assumes that these can be trusted. This means that they are given access to content creation without being subject to anti-spam measures.

  1. Administrator (Drupal built-in): Trusted.
  2. Content editor (Drupal built-in): Trusted.
  3. Authenticated user (Drupal built-in): Trusted.
  4. Anonymous user (Drupal built-in): Untrusted. Posting requires meeting a CAPTCHA challenge or completing email verification.

Any Authenticated user that misbehaves will have his or her account blocked, and will in the future only be allowed to use the site as an anonymous user.

To avoid blocking users that misbehave, many community websites reserves the Content editor role for trusted users and treat Authenticated users that do not also have this role as untrusted. These sites have the following four roles:

  1. Administrator: Trusted.
  2. Content editor: Trusted.
  3. Authenticated user: Untrusted. Typically subject to posting limits and/or delays due to moderation.
  4. Anonymous user: Untrusted. No access to forms other than account registration form.

In a typical set-up, authenticated users are automatically promoted to the “Content editor” role as soon as they register, but shall lose this role if they later are caught misbehaving. To implement this, the contributed projects Auto Assign Role (D7) or Registration Role (D9) are helpful.

An even more granular approach requires the site-builder to create more user roles, where untrusted users of various roles are subject to various types of restrictions. The ecample below is created with some inspiration from the website named Stack Overflow. It is built around six user roles. For documentation purposes, all roles are recorded in a role inventory. The inventory also record trust level, and tasks and restrictions associated with the role. An example of a role inventory is shown below:

Role Trust level Tasks and policy
Administrator Trusted Provides general oversight and administration of site configurtion, content and users.
Site moderator Trusted Monitors spam flags queue, and evaluate flagged content. Block users or demote whose transgression merits it.
Content editor Trusted Have individualized CRUD access access to content, but for editorial, not pam control purposes.
Community user Trusted Is expected to review content created by Authenticated users and flag spam content. Is also able to promote an Authenticated user to a Confirmed user if the user is seen to be well-behaved over some time.
Confirmed user Trusted Have access to some content creation forms. Not subject to automated spam filtering or other restrictions. May have Confirmed user role revoked if caught spamming.
Authenticated user Untrusted Have access to some content creation forms, but is subject to automated spam filtering and other restrictions such as CAPTCHA. May also be subject to delays due to moderation. Will be instantly blocked if caught spamming.
Anonymous user Untrusted No access to content creation forms.

The CAPTCHA module and its plugis lets the administrator declare a role as trusted by giving it the permission Skip CAPTCHA. It treats all roles that do not have this permission as untrusted.

Anti-spam modules that are not plugins for CAPTCHA may have other criteria for trust. For instance: Anonymous Publishing CL and Block anonymous links treats the anonyous user as untrusted, and trusts everyone else.

Spam-fighting strategies

Most community websites wants their users to contribute by creating and publishing content. Usually, a community website tries to provide a positive UX by being welcoming and friendly, and by letting the user register and contribute with as little hassle as possible. But a community website also need to be protected. A too heavy handed spam-fighting strategy will detract from the UX.

This section discusses specific spam-fighting strategies, listing upsides and downsides, as well as how they are suitable for different types of users.

In these discussions, some statements are followed by the four letters “YMMV” (Your Milage May Vary). This means that while the preceeding judgement is my opinion about the proposed solution, this opinion may not be shared by anyone. As always, please make your own judgement.

Content moderation

Content moderation is a strategy that makes use of the fact that content (i.e. nodes and comments) on a Drupal website is not visible to a non-privileged users unless it is published. So to have content moderation on a Drupal site, you allow visitors and untrusted users to create content, but only trusted users (known as “moderators”) are granted the permission to publish it.

In Drupal, the comment entity comes with built-in settings for delaying publication until the comment has been vetted by a moderator. There is no similar setting for nodes, but it is easy to configure this, leveraging on the built-in node permission system. How to do this is decribed below.

Drupal allows node moderation to be set up per content type. For each content type you want to moderate, navigate to Structure » Content types » type » edit » Publishing options and untick Published. This will make any new content of the type unpublished by default.

You may grant the anonymous user and/or authenticated users permission to Create new content for the content type. If you want to enable iterative editing for authenticated users, you may also grant them the permissions View own unpublished content and Edit own content. However, do not enable these for the anonymous user.

Grant the moderator role permission to Access the content overview page, Administer content, and to Bypass content access control. Note that these (in particular the latter), has security implications, so only give people you trust the moderator role. Out of the box, Bypass content access control is the only way to access other users' unpublished content by means of the Drupal core.

Also note that the admin content link don't show up by default, so you need to tell your moderators to use this path: /admin/content (or put a link to it on a page only visible to moderators).

If you want your authenticated users to be able to find their own unpublished content (e.g. to allow iterative editing of content before publication), you also need to create a View that lists unpublished content.

noteThis strategy comes at a cost. Content posted on the website is not published until it has bee reviewed by a moderator. If the moderator is a slacker, using this strategy may result in very long delays before content is published, creating a frustrating UX for legitimate users. Also, if the moderator does not clean out the unpublished spam on a regular basis, the backlog may fill up the database and in worst case make the website slow and making backups impossible.

My experience with the content moderation strategy is that it works well with untrusted authenticated users provided:

  1. Visitor registration is disabled or restricted (see below).
  2. Anonymous publishing is not allowed (i.e. only authenticated users can post).
  3. The site has diligent moderators, so that legitimate users do not experience significant delays to have their content published.

To disallow anonymous publishing, navigate to People » Permissions and set the permissions to create comments and content as shown below.

perms.png
Settings to disallow anonymous posting.

The setting shown for the Article content type should be repeated for all enabled content types.

These settings will of course not prevent an authenticated user from creating spam, but you can block the user and remove all associated content as soon as you spot the undesirable behaviour.

If you simply allow annonymous users to create content, the site will probably be flooded with unpublished spam, putting a huge burden on the moderators. To use content moderation to control spam from anonymous users, install the module Anonymous Publishing CL (see below). It has been designed specifically with this use case in mind.

Posting limit

One of the downsides of content moderation is that it just places contributions from untrusted users in a moderation queue. If the moderator is not diligent, they may remain there for an indefinite period of time.

Some websites therefore allow untrusted users to create and instantly publish a limited number of posts within some time frame (for example maximum three posts per day). This allows untrusted users to create some content on the site, while not opening the floodgates to unlimited amounts of spam.

For the anonymous user role, the count is usually tied to a IP-address or a cookie. For authenticated users, the count is usually tied to the user ID.

Most sites that use limited postings to control spam usually lifts the posting limit for authenticated users after some criteria (e.g. the user has contributed 10 legitimate postings) has been reached.

For anonymous publishers, the Anonymous Publishing CL module let the admimistrator set a posting limit. I am currently note aware of any production grade Drupal module that can do the same for authenticated users. Some development versions exist (use Google to search for “node limit site:drupal.org”), but none of them smell good. YMMV.

Content review

Unlike content moderation, content review does not restrict the ability of untrusted users to publish. However, moderators are expected to review content after it has been published, and take appropritate action (e.g. remove the content if it is spam and relegate the user if the spam was created by an Authenticated user).

Currently, this strategy must be carried out manually, by having the moderators monitor newly created content and take action if required. I am currently not aware of any production grade Drupal tool to manage the workflow for content review.

See alsoThe website stackoverflow.com has manged to successfully crowdsource content review. How it works is explained in a blogpost by Stackexchange co-founder Jeff Atwood: A theory of moderation.

Disabling or restricting visitor registration

Many community websites trust all authenicated users. In order for this strategy to be successful, the website need to disable or restrict visitor registration. The reason is obvious: If you trust authenticated users by default, spammers will first register at the website, and then create content. In other words, without registration restrictions in place, your site will be just as exposed as if there was no restrictions on anonyous publishing.

When creating a community website, you need to think careful about the registration requirements, including who can register accounts and how accounts are verified.

Drupal let you set who can register accounts to one of the following:

Both the last two settings for account registration discussed may be set up with a requirement that the email-address used when registered must be verified. This means that the user must be able to confirm that they have access to this email-address by clicking on a link embedded in an email sent to that address. The concept has some merit as it ensures that the user is reachable on the email-address she or he used for registration, and it also filter out a lot (but not all) spam-bot registrations because most 'bots are not smart enough to complete the email-verification step. However, it also makes it more difficult for humans to register: It imposes a delay and an extra step, and may even prevent the user from completing the registration process because spam-filters at the user end sometimes hides the verification email from the user by placing it in the spam-folder. As a site-builder, I believe that the inconveniences of email verification outweigh the benefits. I always allow visitors to register without any involvement from the administrator. YMMV.

To configure this, navigate to: Configuration » Account settings. The settings in the Registration and cancellation panel determine who can register accounts. The settings below allow a visitor to register, and get access to the newly created account as soon as they verify that the email address they entered when registering belong to him or her.

regsettings.png
Allowing self-service registration.

Skipping the step where an administrator has to create or approve the account and not requiring email verification speeds up the registration process, lessens the burden for the administrator, but it also increases the probability of spam-bots and human spammers registering on the site. This means that you must protect the Drupal registration form with some anti-spam module. There are two use cases:

  1. It is a community website where registration is by invitation only. Solution: Protect the user registration form with Mother May I and distribute the secret word required to register along with the invitation.
  2. It is an open website where members of the public are allowed to register without any prior invitation. Solution: Use some sort of CAPTCHA to protect the user registration form.

Anti-spam modules

A lot of anti-spam modules exists for Drupal. However in 2023, it looks as if most of them are getting long in the tooth. As far as I able to tell, they have not been updated to match the more evolved spam-bots, making them not very effective spam-guards. For instance, the “distorted letters” CAPTCHA displayed on the Voight-Kampff Couples Testing screenshot shown in the xkcd cartoon below has long ago been cracked by XRumer and other state-of-the-art spam-bots.

I have not tested every anti-spam module that exist, but those that I have tested (with a few exceptions) have left me less than impressed. So I've ended up relying on just three anti-spam modules that I've found works well enough for all my use cases on different websites. YMMV. Skip ahead to the final section in this chapter to learn what those three modules are.

See alsoOthers may have a more generous view of these anti-spam module than me. For different opinions, visit these web pages: Acquia: CAPTCHA-solutions-and-alternatives, BeFused: Eliminate comment spam and BeFused: Alternatives to Mollom.

User registration form protection

If you give visitors access to the user registration form, you'll probable receive a lot of registrations. The vast majority of those registration will be from spam-bots that want to register in order to log in and create spam on your website.

Even if you've set up the site to just allow visitors to register, but administrator approval is required before an account is enabled, processing the a lot of account applications – most of them bogus – will create a lot of work for the administrator.

Most of the spam-protection modules for Drupal allow protection to be put in place for any form, so they can also be used to protect the user registration form. However, there exists a module designed specifically to protect the user registration form. It is named Mother May I. This module only permits an account to be created if the registrant enters a string, known as a “secret word” as part of the registration process.

The use cases for Mother May I are invitation-only websites, and websites for very specific communities. The “secret word” can then be disclosed in the invitation, or a hint may be placed on in the registration form that will be meaningsful to community members but not to others, For instance, a website for a social club can have the name of the street where the club is located as the “secret word” with this hint visible in the registration form: “Type the the name of the street where our club is located”. Even if the address can be found by looking it up on the About us or similar page on the club's website, no current spam-bot is capable of connecting the dots like this.

I tend to use this module to protect the user registration form on every website where the use case allows it. YMMV.

Protecting forms using CAPTCHA challenges

Most spam comes from so-called spam-bots, and the most popular approach for preventing spam from these is to use a CAPTCHA challenge, such as words written using distorted letters as is shown in the last frame in this xkcd cartoon:

suspicion.png
Source: xkcd - Suspicion, used under CC by-nc 2.5..

Some of the modules to provide CAPTCHA challenges are described below.

The most popular Drupal module for setting up challenges is named CAPTCHA. It comes with two built-in challenges, but is really a framework that can be used to manage other challenges. See the project page for a list of these.

I've evaluated the following challenges:

Of these, the only one I believe is effective is reCAPTCHA. YMMV. On a website I've set up to test anti-spam modules, no spam-bot were able to post anything during the five week test period. The two others let respectively 113 and 242 spam messages through in the same period. Unfortunately the fingerprinting techniques employed by reCAPTCHA is very problematic in terms of privacy.

See alsoJeff Geerling, the author and owner of Honeypot (see the next section) do not like CAPTCHA challenges – including reCAPTCHA – and think CAPTCHA challenges should only be used “the very, very rare times when they're necessary". According to the Kevin Davis You (probably) don’t need ReCAPTCHA. Before using reCAPTCHA, you may also want to look at this issue in the Druapl GDPR Compliance team issue thread.

The Drupal reCAPTCHA bridge module is compatible with version 2 of an extenal service provided by Google also named reCAPTCHA. It is a third party services provided by Google. You need a Google API-key to use the service, but the service is free.

Hidden CAPTCHAs

A second set of modules use a (usually hidden) field that spam-bots tend to fill out, but humans leave empty. These are less intrusive than CAPTCHA challenges, so they will provide a better UX. Projects using this approch include:

Modules using third party services

Some anti-spam modules provides an interface to some external third party service. The most sophisticated of these implments proprietary filtering and use data from all sites monitored to identify spam. Others are just externally hosted CAPTCHAs.

These are (third party service in parenthesises):

Some of these requires you to pay a subscription fee for the service.

Miscellaneous modules

Finally, I describe some Drupal modules that use miscellaneous techniques to identify and block spam. They are:

Of these, I've only used Anonymous Publishing CL. It will allow visitors to publish content without first registering on the site, provided they verify their email address. Spammers seldom provide a working email address, and even if the email address is real, they do not bother to verify. This means it works well for keeping keeping spam at bay. However, some administrators may think the email verification requirement a too heavy-handed approach and will deter some users from posting. YMMV.

tipDisclosure: I am the owner of the Anonymous Publishing project and Hannemyr Nye Medier AS sponsors the project.

Elsewhere in this ebook, there is a case study that explains how to use Anonymous Publishing CL with a max post limit to create a spam-resistant content type for guestbooks and similar use cases.

noteAnonymous Publishing CL can only protect nodes and comments. If you need to protect anything else (e.g. a webform) you need to use a more general form protection module.

Webform protection

A common use case for Webform is to create registation forms, booking forms, surveys and other forms where anonymous users are invited to fill and submit the form. I recommend using Honeypot to protect webforms. YMMV.

If Honeypot is enabled along with Webform, you can navigate to the Honeypot configuration page and turn it on for all webform nodes, or for webform comments.

tipIn the default configuration, Honeypot will not be visible to humans. To test that it actually is there, set the “Honeypot time limit” to more than 5 seconds (e.g. 60 seconds). This will “trap” humans, so you can verify that is there and working. Make sure you set this short enough not to “trap” human visitors of your site before moving to producation.

tipUsing a CAPTCHA to protect webforms is not recommended. However, if you want to do this, you'll discover that form identifiers for webforms do not by default appear on the configuration form for the CAPTCHA module. To add this, first determine the form id how the webform you want to protect. It will be the node id of the webform instance, prefixed with “webform_client_form_”. If the node id is “42”, the webform form id will be “webform_client_form_42”. Now, navigate to Configuration » People » Captcha and insert the webform form id in the blank field at the bottom of the column headed “FORM_ID”. Then, in the column headed “CHALLENGE TYPE” column make a selection from the pulldown menu. Finally, scroll to the bottom of the page and press “Save configuration”.

Final word

To sum up, creating spam prevention measures while also making sure that the UX for registration and content creation is acceptable, is not an easy task. Below is a very spedific checklist to assist the administrator and site-builder completing this task, based upon my belief in what are the best practices and best tools for the task. YMMV.

  1. Decide on what roles are required.

  2. Carefully consider the need for having a self-service registration form on the website. If this is not the case, just disable to user registraton form and instruct the administrator to only register trusted users.
  3. If a self-service registration form is a requirement, make sure it is protected with a suitable anti-spam module. Depending on the use case (see below) use Mother May I or Honeypot.
  4. Make an inventory of all other content creation forms that exists on the webside. The most common forms on a Drupal website are:
    • The user registration form.
    • The user login form.
    • The user password recovery form.
    • Node creation forms.
    • Comment creation forms
    • Webforms
    An example for the form I use is shown below. The column with the heading “Protection legacy” is left blank because it is only used when re-designing the spam-protection of a legacy website.
form_inventory.png
Example of a form inventory for a new project.
  1. Go through this inventory and decide on a strategy for every form. In the example form shown above, the following is planned. YMMV:
    • Configure Honeypot as the default spam protection method.
    • The user login block and user login form do not need additional protection, as it is already protected by a strong user password or passphrase.
    • The user password recovery form need some protection. It is by defintion available to the anonymous user and is routinely attacked spam-bots. It is annoying for the user to receive emails with bogus password reset links, so it should be protected. in the example, the “Default” spam-guard (i.e. “Honeypot”) is selected.
    • The user registration form requires strong protection. Use Mother may I or Honeypot, depending on the use case. In the example “Mother may I” is selected.
    • Node creation forms exposed to untrusted users should be protected. The site administrator should select what spam-guard to use for each form. The choices are Anonymous Publishing CL (a bit heavy handed, but powerful) and Honeypot (more lightweight, but may let more spam slip through). However, in the example, not even trusted users are allowed to create nodes. The planned protection for all the applicable node creation forms are set to “No access”. This means that only the site's administrators and content editors will be allowed to create content nodes.
    • Comment creation forms exposed to untrusted users should be protected, using the same strategy as for node creation forms. In the example, untrusted users are allowed to post reviews in the form of comments about content posted on the website for review. The review form is protected by the “Default” spam-guard (i.e. “Honeypot”).
  2. Implement it.

On a production website, I only recommend that these three anti-spam module are installed:

As for other contributed anti-spam modules that are in use on a legacy website, they should be replaced by one of the above three, or removed if the form they are set up to protect is not exposed to untrusted users.


Last update: 2023-09-07 [gh].