August 23, 2007

Sitemap Protocol | Deep Crawling and Faster Indexing


I recently took place in a discussion regarding the use of sitemaps. The debate was whether or they are necessary or beneficial on smaller sites. So today I am going to spell the process out, as I understand it.

 

First of all, I will admit there are 2 downfalls to the sitemap protocol as I see it.

  1. Your sitemap can provide and easy road map for scrapers. I recommend hiding your sitemap. put it in an odd directory somewhere. This will only help a little if your sitemap is in your robots.txt as the new auto discovery calls for. The fact is if a scraper wants your content still they will just crawl using one of the many available free programs, and they will have it.
  2. The second problem with the protocol is only 50K pages are allowed. If your site has over 50K pages, I'm jealous, but check in to nested sitemaps.

 

They are far more benefits for using the sitemap protocol which is now supported by Google, Ask.com, Yahoo, and MSN / Live.

  1. Sites with complicated structure or deep clicks to internal pages are more easily crawled.
  2. You will receive a more complete crawl, and save bandwidth by setting low priority on the pages that spiders don't need to visit as often. You can set a different crawl rate for every page.
  3. When you update or add new pages, you can ping, all of the above engines except MSN / Live with a new sitemap. In my experience this has brought a fairly quick crawl.
  4. The use and submission of you sitemap make many of the Google Webmasters Tools work. Like 404's, broken links, and crawl errors.
  5. You can report a "last modified" date for each URL.
  6. If you have a huge amount of URLs you can submit a sitemap of just changed URLs by creating a sitemap index file and using a lastmod tag to spell out when each sitemap in it has been modified. On really large site this saves a ton of bandwidth.
  7. There are many program to convert your sitemap.xml to a html version suitable for visitors. The real benefit of this is to add it in each page, perhaps in the footer, and it will seriously reinforce your internal linking structure.

 

So if you are considering making yourself a sitemap, lets explore that. First thing is to find a service suitable for making your map. Here are some generators, check them out...you want them to create sitemap protocol .90 and make sure they will spider enough URLs for your site. If your site is too large for these then look at the Google sitemap generator. It is also nice to have a generator that pings the engines. Anyhow, explore them.

  1. Wordpress sitemap plugin
  2. xml-sitemaps.com
  3. Audit My PC - Has directions for encoding to ping too (click the webmaster tool image)
  4. GSite Crawler - PC Platform

 

Now that you have a sitemap, lets put in in your robots.txt. View the syntax below..

 

 

Notice the space above and below the command. Sitemap command is capitalized. Before we go any further validate your robots.txt and sitemap.

Now lets tell Google, Yahoo, and Ask.com about your new sitemap. Google has it's own Webmasters tools, just add your site and verify it. The in the dashboard link, click "add sitemap" and enter the path to your sitemap and submit. Yahoo Site Explorer is pretty much the same as process Google. Now ping Ask.com with it's location. MSN / Live Tools are in Beta to be released in the fall.

I have some helpful articles on this subject you might find interesting. There is a little known fact about a type of sitemap Yahoo still uses, read Yahoo Site Explorer. To ROR or not to ROR, ROR sitemaps are currently supported by Yahoo and Google.

I would be interested to hear your sitemap experiences...comment below.

Peace and SEO

Melanie Prough
"Baby"


 
SEOCog.com      Digg This Post

   

  Creative Commons License
This work is licensed under a

Creative Commons Attribution-No Derivative Works 3.0 License.

 

 © 2007-2008 Webchonic.com

Feel free to reprint as long a credit & links remain intact.