Helping search engines and robots.txt

Last modified: January 9, 2006 - 08:27

Drupal by itself is very search engine friendly. For example it is not uncommon for Drupal based sites to have have a Google ranking of 5 or higher (out of 10) where using the same content on another CMS would score much lower.

Still, you can make Drupal even more search engine friendly by changing some default parameters. There are several Drupal settings you can tweak to make Drupal even more search engine friendly.

  1. First of all you might want to enable friendly URL's
  2. Then, make sure that you get rid of the session ID in the URL by changing the .htaccess if you are using version 4.5.x. On 4.6, session IDs in URLs are disabled by default.
  3. Optionally, use URL aliasing for some or all nodes. You can use the pathauto module to automatically create aliases for new nodes.

"Google ranking"

Shnapoo - March 3, 2008 - 15:33

I'm sure with "Google ranking" (see paragraph 1) you actually mean the Google Page Rank (tm). Page Rank is from 0 to 10. There is no other value that would make sense to be referred by "Google ranking" in that way. Actually PR is a value out of 11 (it's from 0 to 10), but that's not the only mistake located on this tiny book-page.

It is wrong to assume the "Page Rank would have been much lower with the same content on another CMS". Page Rank doesn't depend on content or on a specific CMS at all. PR solely depends on how much the URL is linked from other URLs. Trust me, there is absolutely no other factor than the so called link structure. Even the URL format doesn't matter. You could build an URL with lots of parameters and receive a good PR if you point some fat links to it.

There is also an interesting false conclusion. A "PR of 5 or higher" is a quite common value for popular sites, and popular sites often use a solid back-end like Drupal. From this relation the authors conclude Drupal sites often get a PR of 5 or higher, and Drupal thus proves to be very search engine friendly. To me that seems like seeing birds singing when the sun rises, concluding the birds song makes the sun rise.

I'm sorry to say so (as I'm an absolute fan of Drupal), but it's absolutely for sure: an arbitrary CMS with arbitrary content would get the same PR if it is linked in exactly the same way.

However, not all information of this page is wrong or bad. Follow the three points at the bottom to make sure search engines can easily crawl the whole site and extract keywords from the URLs.

3y

bertboerland@ww... - March 3, 2008 - 19:23

this page is 3 years old, it needs love and i'll add your comment some time next week

--
groets
bert boerland

I'm glad to see it's still

Shnapoo - March 4, 2008 - 07:25

I'm glad to see it's still maintained. You're great!

"Google ranking"

TheMule - June 5, 2008 - 23:18

Page Rank doesn't depend on content or on a specific CMS at all.

I'm not 100% sure about that. According to Google "Webmaster Guidelines", there may be sites whose ranking is penalized just for violating the guidelines.

Webmasters who spend their energies upholding the spirit of the basic principles will provide a much better user experience and subsequently enjoy better ranking than those who spend their time looking for loopholes they can exploit.

(emphasis mine)

Among others, the listed guidelines include:

  • Make sure that your TITLE tags and ALT attributes are descriptive and accurate.
  • Check for broken links and correct HTML.
  • If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.
  • Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. (...)
  • If your company buys a content management system, make sure that the system can export your content so that search engine spiders can crawl your site.
  • Avoid tricks intended to improve search engine rankings
  • Don't load pages with irrelevant keywords.
  • Don't create multiple pages, subdomains, or domains with substantially duplicate content.

It seems to me that you may receive worse ranking because of either bad (by their definition, of course) contents, or bad metadata, or bad site structure, or bad urls. CMSes can do little about contents (despite the name) but they do a lot about the page layout, metadata, and site structure. The "don't create multiple pages" for example always scared me, what if I use some shortcut alias paths, but forget to change all the links? I end up with two different paths to the same page (and use them both, so that google will see them). Does Google recognize that as an attempt to create "multiple pages"?

Anyway, I think that a "good" CMS should play nice with HTML, with URLs, with metadata. The less cluttered the HTML is, the easier the crawlers pick up the right text from your pages and index it correctly.

an arbitrary CMS with arbitrary content would get the same PR if it is linked in exactly the same way

might be true, except when the CMS gets in the way imposing a weird site structure (200 bytes long query string embedding sessionids, multiple paths to the same pages), or if it produces bad HTML (even if the contents look the same) or does anything that doesn't smell right to google.

Admittedly, Drupal is quite good at not imposing anything (even the default query string based URLs look pretty good, expecially compared to those produced by some other CMS you see around the web).

.TM.

 
 

Drupal is a registered trademark of Dries Buytaert.