Drupal at the Pulitzer Prizes

Last modified: October 17, 2008 - 08:24

The Pulitzer Prizes, awarded annually since 1917, are regarded among the highest honors in U.S. print journalism, literary achievement and musical composition. The Openflows Community Technology Lab has recently finished a re-implementation of their 15-year-old website using Drupal on a Linux, Apache, and MySQL platform, using free software to make their existing data more accessible, and to allow for continued expansion of the website's features and content.


Diagram of site structure (PNG)

The organization's site was first built in 1993. Two years later it moved to its own domain name and was updated to a system using custom Perl scripts and a Sybase database. This hand-rolled solution, while leading edge when it was built, provided little support for special cases or metadata, and contained a large amount of archived prize-winning content stored as flat HTML. HTML authoring of the content had been done via any number of different tools over the years, based on templates which followed the best practises of the time, which could not be easily updated later to reflect evolving standards.

The new Drupal system which has just launched is capable of storing and presenting all this content in a consistent and intuitive format to both editors and end users. Project specifications were largely documented via HTML mockups, instead of an overly long functional spec document. We also handled the time-consuming data conversion from the existing database and HTML files via a series of Perl scripts, which helped give us a thorough understanding of the site requirements.

Basic Site Architecture

We decided to base the site on Views and CCK, which meant using Drupal 5 with the 1.x branch -- since despite all the great new features of the 2.x branch under Drupal 6, it wasn't quite yet ready for professional production sites when we began development.

Diagram of page structure (PNG)

Content is entered using one of eighteen separate node types, containing about forty CCK fields between them, and tagged using six vocabularies. Virtually all content is presented via one of twenty-two Views.

So that editors can see unpublished draft content in place, the Views do not filter on node published status. Instead, the Views templates check the roles of the current user to determine what to display. This required adding a new Views field for a node's published status.

Since the Panels module didn't give us enough flexibility, we created some complex pages by loading up to six views directly from a PHP node, passing View arguments parsed from the URL. This can be seen at the landing pages for each year, as shown in the attached diagram.

We also had to manage filtering Views based on taxonomy terms from multiple vocabularies, something which has been discussed widely in the Views issues queue. Doing this requires using custom argument handling code blocks. Not surprisingly, every view has a custom template; after the PHP nodes & argument handling code, most heavy lifting is done there.

Notable Modules

As with most Drupal sites, we relied on a large number of contributed modules to create the final site. In particular, we would like to thank the Drupal community for the following, listed in no particular order:

A special mention is due to the Faceted Search module by David Lesieur. This powerful tool is ideal for searching the Prizes' extensive archives, and David graciously helped us solve several searching related issues during development. Faceted searching provides users with the ability to refine their searches by taxonomy terms and content types in addition to keywords, thus taking advantage of the structured nature of our dataset.

Example of faceted search results (PNG)

We use Views to present search results, via a custom template, and since it's not possible to simultaneously sort on multiple taxonomy terms from multiple taxonomies, we elected to sort this View by node type and creation date. We used the 'authored on' value to represent the date the content was first created, as opposed to the date on which it was entered into the CMS. Since our archives go back before the Unix timestamp epoch, we had to patch node.module to allow dates before 01 Jan 1970 (this patch has been submitted for Drupal 7). (The problem was simplified by the fact that the Pulitzer Prizes were founded after 13 Jan 1901.)

We also sort the search results according to several other criteria important to the Prize administrators by assigning 'authored on' dates within each year via an external script that modifies the database directly. There are other, more elegant ways to handle this, but since faceted searching is database intensive, we chose this approach to optimize our site so it would be able to handle heavy traffic. Also for this reason, we backported a patch for database.mysqli.inc from Drupal 6 to create temporary tables in memory and not on disk.

Credits

This project is the result of an intensive collaboration between the staff of the Pulitzer Prizes (notably website manager Claudia Stone-Weissberg and administrator Sig Gissler), Alan Brooks and Cheryl Taylor of Chips & Inc, who handled site design and project management, Jim Pietrangelo of WebCampOne, who provided CSS consulting and help with the inevitable IE display problems, and the Columbia University IT Department, who host the site on a shared LAMP system with two load-balanced webservers and a dedicated database server. This partnership allowed us at Openflows to focus on site implementation and Drupal development. Work was primarily done by Eric Goldhagen, Matt Corks, and Nat Meysenburg.

 
 

Drupal is a registered trademark of Dries Buytaert.