Solving Content Profile Privacy Problems on Drupal 6
On Drupal 6 it was difficult to add fields to user profiles – sites were limited to using a minimal set of field types using the built-in Profile module and couldn’t use the normal CCK fields. The workaround most commonly used to get beyond this was to use the Content Profile module which allowed use of an actual content type to house the user profile fields that were needed.
The Content Profile module provided a way for normal CCK fields to be shown during registration, which would be visible on the user's normal profile page (user/123) and were then available for other uses throughout the site. Furthermore, Content Profile allowed multiple content types to be used for different types of profiles, so the site could have one set of profile fields for regular visitors, other fields for staff, another for the company’s board members, etc. That said, most sites just use the fields for simple things like first & last name, shipping addresses, phone numbers, etc.
All of the site’s personal user profile data, every field added to the user profile content type(s), is publicly visible by default.
Furthermore, if any of the site's users live in the USA and are under 13, the site is also violating the Children's Online Privacy Protection Act aka COPPA.
I’ll let that soak in for a minute.
Content Profile creates a new node for each profile type that someone enters, and by default all nodes are both publicly visible and indexable by search engines. Furthermore, Drupal's default "/node" page is always available and will automatically list all content that has the "Publish on homepage" checkbox set, and, because this checkbox defaults to 'on' for all content types, all user profile nodes will be listed on the “/node” page allowing the content to be indexed by any search engines that happen to find it. Even worse, the same content will be, by default, published to the site’s default RSS feed (rss.xml), so any 3rd party sites & services pulling in that feed will automatically pull in the private information.
Putting all of this together, unless steps have been taken to resolve the above problem, the default configuration for Drupal and Content Profile leave all user profile data public & visible to anyone. For sites that do *not* want user data publicly visible, this could become an absolute disaster from a content privacy perspective and potentially open the site up to serious legal issues.
Thankfully there are steps that can be taken to solve the ongoing problem.
The vast majority of Drupal sites use a custom homepage, so the default /node page is no longer needed and just add confusions to search engines who shouldn’t be indexing content this way. To stop the /node page being indexed, and indeed being viewed by anyone, add a redirect rule to the site’s .htaccess file so that the /node page redirects back to the main homepage, like this:
# Redirect the default node page to the homepage. RewriteRule ^node$ /? [L,R=301] RewriteRule ^node/$ /? [L,R=301]
The next step is to block access to the profile node pages themselves. The profile nodes are, by default, included as part of each user’s normal user profile page at /user/123, so the separate profile node page is rarely, if ever, needed. Most of our sites used Panels to display nodes, so the simplest solution for our sites is to add a new page display (variant) for the Profile content type that redirects from the node page to the user’s profile page. To add this follow these steps:
- Go to the Pages admin page (admin/build/pages).
- Click ‘Edit’ for the Node template / node_view page.
- Click ‘Add variant’ to add a new variant for the Profile content type.
- On the ‘Add variant’ page, fill in “Profile” as the title, choose “HTTP response code” as the variant type, click the “Selection rules” checkbox and click “Create variant”.
- On the ‘Selection rules’ page, change the selector that says “Context exists” to say “Node: type” and click “Add”.
- In the “Add criteria” popup select the Profile content type for the “Node being viewed”, then click ‘Save’.
- Click ‘Continue’.
- On the next page change the ‘Response code’ to “301 Redirect” and then fill in “user/%node:uid” as the ‘Redirection destination’.
- Click the ‘Create variant’ button.
- Click the ‘Save’ button to finish the configuration.
Now anytime someone visits a user profile node they’ll be redirected to the profile owner’s user page instead. If the site still shows the Content Profile node, it is possible another node_view variant is being triggered before the new one so these additional steps should resolve it:
- Go back to the ‘Edit’ page for the Node template configuration in the Pages admin.
- Click the ‘Reorder variants’ link near the top of the page.
- Drag the new Profile variant to the top of the list.
- Click the ‘Update and save’ button.
Once the profile node pages are inaccessible, it’s also worth adjusting the display settings for other view modes to further hide details. To prevent the profile data being visible via the default RSS feed (rss.xml), change the display settings (admin/content/node-type/profile/display/rss) so that each of the fields are excluded from output. Please note that this will only hide the details but not hide the nodes themselves - if a view is not already used to override the built-in RSS feed then one needs to be created with the path “rss.xml” that specifically filters out the Profile content type. Also change the ‘teaser’ field settings (admin/content/node-type/profile/display), just in case the content is being listed somewhere else on the site.
One of Drupal’s most useful features is that it provides a quite functional search engine out of the box. However, by default all content will be indexed and accessible via the search results page, but thankfully there’s a contributed module to help. The Search Config module allows full control over which content types are indexed, thus completely removing any chance that the profiles show up via searching. Furthermore, sites that use ApacheSolr or other alternative search engines should also be able to customize their module settings to block the Profile content type.
There are a few other items that are worth looking into. One item to examine is the XMLSitemap module, which is used to inform search engines that there is new content to be indexed on the site. While it actually needs to be enabled for each content type and doesn’t automatically publish any content, it is possible that someone working on the site enabled the option for the Profile content type, so double-check the settings on the admin/content/node-type/profile/edit page.
Lastly, take a look through the custom Views that have been added, along with custom pages built from hook_menu in custom modules, and make sure all content lists specifically do not include the Profile content type – after all of your work to secure the site you don’t want a single errant query ruining your day.
If a site has used the Content Profile module for any amount of time then it is highly likely that search engine may have crawled the site and already indexed private content - it’s time for damage control.
First off, to make other steps easier, compile a list (spreadsheets are good) of all profile node URLs that would have been publicly available. This should be quite simple to do, just use Views to build a table display of all content filtered by the “profile” content type and display the node’s URL – make sure that the list includes the absolute URLs, i.e. “http://example.com/content/joe-bloggs-profile” rather than just “node/123”. Once that list has been compiled, add to the list the standard “/node”, “/rss.xml” and any other one-off pages that might have been available and displaying content lists. In addition to this list of URLs, it may help to compile a list of all of the profile fields which had been visible and examples of the types of data that would have been exposed. Putting this information together will help ensure that a true extent of the problem can be ascertained.
Once a list of the problem data has been compiled, seek legal council. Most countries, especially in Europe, have laws and regulations regarding private data, so speaking with professional council will be the best approach to working out what needs to be done to keep everything above-board.
Lastly, it is really important to take steps to remove the private content pages from search engine indexes. Google, Bing and other search engines have ways that allow webmasters to request pages be purge, so a quick search on each of the major search engines will help identify the proper procedures for each one. Once the main search engines are handled it is then important to do some searches for some sample user profile data to see if it may have been scraped & published elsewhere, e.g. RSS feed aggregator sites, etc; if anything is found contact the sites’ owners to have the content removed. This part of the cleanup effort will take time and may take several iterations to ensure all of the private content has been purged, so make this a priority.
But what about Drupal 7?
Thankfully in Drupal 7 none of this is necessary. Drupal now allows for regular fields to be directly added to the user object rather than doing weird hacks like using public nodes, so the user’s profile data is never automatically exposed. Put another way, thanks to how Drupal 7 works it would take a deal of work to replicate the same level of data “sharing” as is automatically available on Drupal 6 with Content Profile.
For any D6 sites looking to upgrade, there’s a set of Drush commands available via the CP2P2: Content Profile Converter module. These commands provide options to migrate user profile nodes from the Content Profile module into either standard user fields or the more powerful Profile2 system if needed – either one will suffice.
While having a great website can be a extremely rewarding experience for both companies and visitors alike, it is always extremely important to pay attention to the setup of visitor’s private data to ensure there are no data “leaks” – a pound of prevention could save your business.