Using robots.txt file to Allow or Deny Search Engines

This article briefly covers how a robots.txt file can deny or allow certain files from being indexed by Search Engines.

If you have portions of a website that you do not wish for search indexes to see, you can protect them with a "robots.txt" file dictating which search engines are allowed or disallowed from seeing specific folders/files.

There are many options which you can specify in a robots.txt file to explicitly deny or allow specific search-bots to index certain folders or files.

The simplest robots.txt file uses two rules:

User-agent: the robot the following rule applies to
Disallow: the URL you want to block

These two lines are considered a single entry in the file. You can include as many entries as you want. You can include multiple Disallow lines and multiple user-agents in one entry.

Please see the following articles which explains how a robots.txt file works, and how to configure one.

http://www.robotstxt.org/

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360

http://en.wikipedia.org/wiki/Robots_Exclusion_Standard