Search Engine Spiders
October 28th, 2008 by J.K.
Also known as Webcrawlers or Web Robots a search engine spider is basically a program used by a search engine to scour the web, find sites, read and record the information that is on the pages of that site. A spider can read the text on a page, follow links, read meta tags and other code that is on the site and then send all of this information back to the search engine database which crunches it, indexes it and adds it to the search engine’s results.
While this might be an over-simplified version of how things works, it remains true that the spider is very important to the webmaster so webmasters want to make sure that their sites are “spider friendly.” The easier it is for the spider to crawl your site the more information it can send back to the database. Here are a few things to keep in mind to help make your sites as spider friendly as possible.
- Spiders can’t click. They can read, but they can’t click. If you are using fancy JavaScript buttons or Flash menus the spider will not be able to follow any links to pages that those menus/buttons provide. This could leave entire sections of your website un-crawled. If you use these types of menus/buttons it is best to also offer a text version on the page so the spider has something to follow. There have been recent articles that newer spiders will be able to read these Java and Flash menus and buttons, but it is always better to be safe than sorry and good old text is still safe.
- Site depth can cause problems for spiders. Many SEO’s say that the perfect depth of a site is three levels. For example: Splash Page — Main Page—Content Page. When you start getting deeper than that the spider could find itself getting lost and missing pages.
- Spiders can use direction. You can control spiders by using a robots.txt file on you site which can tell them what to crawl and not to crawl. You can also use internal links within your site to control them. If you link to and from your pages the spider will be more likely to find and crawl them all.
Keeping these few basic things in mind can help search engine spiders fully crawl you page and return to the search engine plenty of good data about your site. The last thing you want to do is spend a ton of time optimizing a site then have it be unfriendly to the web spider and much of it never get indexed.
|