Home > Tips > Search Engine Tips > The Robots Exclusion Protocol
Learn from the Expert
Robots.txt File - No Website Should Be Without One
by Jerry West
Updated April 15, 2005
The Robots Exclusion
Protocol -- Robots.txt File
by Jerry West
When a search engine spider or robot
visits a web site if first checks for the presence
of a robots.txt file. If this file is found, the search
engine spider or robot will analyze the contents of
the file for:
User-agent: *
Disallow: /
The Robots Exclusion
Protocol is a method that allows website administrators
to indicate which parts of their site should NOT be
visited by a search engine robot.
There can only be one robots.txt file
per domain. If you have users with sub-domains you
must either merge all information to the one robots.txt
file or instruct your users to use the Robots
Meta Tag.
The robots.txt file is case sensitive
and you should use all lowercase letters.
What To Put Into the robots.txt file
The "robots.txt" file usually
contains a record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /temp/
Disallow: /images/
In the above example, three directories
are excluded. You need to separate the "Disallow"
line for each directory.
A good source is: The
Robots Text Pages.
If you wish to check the syntax of your
robots.txt file, visit:
The
Robot.txt Syntax Checker
Robots.txt File Facts
- if it is present, search engines will obey it
- without a robots.txt file Google will not index
your site as deep
- you cannot exclude "bad sites" using
a robots.txt file as bad sites ignore the file
- exclude your images folder to not allow the search
engines (like Yahoo! and Google) to grab your images
for their image directory
------
© 2000 - 2005, WebMarketingNow.com
Jerry West is the Director of Internet Marketing for
WebMarketingNow. He has been consulting on the web since
1996 and has assisted hundreds of companies gain an
upper-hand over their competition. Visit Web
Marketing Now for the latest in marketing tips that
are tested and proven.
The above article can be reproduced
on your site or e-zine as long as the signature file.
Article Search Phrases: robots.txt,
robots txt, exclude search engines, disallow search
engines
|