4/21/2007  Sitemap auto discovery via the robots.txt protocol

By: Melanie Prough

      All the crawlers currently recognize the robots.txt protocol, so auto discovery was the natural evolution.  The top 3 engines,  Yahoo; Google; and Ask.com., have announced their support of the sitemap inclusion protocol.  So supposedly  no more submitting sitemaps manually, but I would still submit new sitemaps for a few months to be safe.  Here you can read the 4/11/07 post from Vanessa Fox concerning the development and the protocol.  I played around with this for several hours, and to my dismay could not validate the robots file after adding the sitemap.  After much searching, posting and reading I found some help and suggestions.  Putting all that I read in to force...Below is how to add your sitemap without a syntax error.

Sitemap: http://www.your_domain.com/sitemap.xml

User-agent: *

Disallow: /cgi-bin/

Ok first thing, if your map is titled

# Robots.txt file for www.your_domain.com

     Then you will space a line under it before adding the sitemap line.  The sitemap line above is accurate for sitemaps.org protocol.  If you do not space between the top/title and the sitemap command it will not validate in Goggle's Webmaster Tools.  To avoid any other possible syntax issues, I also spaced a line after the sitemap directive.  The spaces in theory mean nothing to a robot. 

      I went ahead and got on board with this, I will keep this article up to date as the stats develop changes in either direction.  Going forward in this early stage is a risk, but also an opportunity for a lower PR to get a leg up. 

Melanie Prough [Federation of Webmasters]

Feel free to reprint as long a credit & links remain intact.

 © 2007-2008 Webchronic.com