Friday, November 4, 2011

Get Indexed by Google's Googlebot Right Away, the Right Way

Everyone in the online world knows extremely well that the most sought after traffic to one's site comes from a Google search. Folks, 80% of searches on the internet are done in Google.

In theory, it is simple - if you have something interesting to someone else, if you build a website with the honest to goodness goal to provide something useful for someone else, that someone else will find you. That is also how the creators of Google describe their main goal, to more or less have a great repository of information, and help people of our planet find useful stuff.

In practice, it is not that simple. It is not that simple because there are thousands, possibly even millions of sites like yours, because you might be running a very honest online business, selling some very useful product, but do not have unheard of, exceptionally grand 'content'. If your site is listed on page 265 of a search results set, be sure you will never get any visitors that way.

Unlike Yahoo and others, who rely on human involvement, Google does everything through automation. Websites are indexed (or crawled, or spidered - all terms refer to the same process) by their indexing software called Googlebot. Googlebot looks at websites daily, and rules programmed into the software decide which of your pages make it into the main Google index and which don't. After your site was indexed, whether it was submitted for indexing by a human or the robot just stumbled upon it, your pages are ranked, so Google knows on which page of a search to put your site on, and on what search phrases should your site even be part of the result search.

The Googlebot is very smart and works really well. Keep in mind however, that is just a piece of software, a very sophisticated one, but it's just a computer program. Consequently, it has a set of algorithms (rules) it uses to index web site content (information), a set of capabilities (as I said before, Googlebot is really intelligent) and a set of limitations. As such, there is an impressive number of ways in which one can trip up the Googlebot and make it impossible for it to index your content. Alternately, the Googlebot can index your site well, and then people will find it when searching for words it contains.

This article will try to teach you all the basics necessary to achieve consistency and persistency in Google, starting with the very basic step: getting indexed by Googlebot, Google's indexing robot.

1. Read Google's own Webmaster Guidelines

The people behind Google seem to have two main things down to a science: One, most of their algorithms (rules) are so secret, that all us non-Google employees do is speculate. Two, their guidelines are very simple, direct and precise. Following their guidelines will never hurt your site's ranking. Disregarding their guidelines can and probably will hurt you in the long run. So go to http://www.google.com/webmasters/guidelines.html and read what Google has to say about itself.

2. Have text links.

Make every single page on your site accessible via a text-based link, as opposed to Javascript, Flash, DHTML (Dynamic HTML), etc. Googlebot's native language is text.
Google says: "Make a site with a clear hierarchy and text links. Every page should be reachable from at least one static text link."

This is probably the number one key to your site's existence in Google. Googlebot is actually a robotic, browser-like software, based on the venerable Lynx browser. The reasoning behind this approach is that the creators are trying to get as close as possible to emulating human browsing, making sure your website is actually human friendly. Consequently, by downloading Lynx on your computer and looking at your site through Lynx (http://lynx.isc.org), you will see more or less exactly the information Googlebot can read and index and the links Googlebot can follow. You will also see HTML errors on your pages and places where a robot would be stuck and could not reach the rest of your site.

I know it is very unfair to those of us who understand and love the potential of websites built completely in Flash, or other engines. However, until the nice folks who run Google figure out a good way to crawl inside a Flash file and extract the appropriate information, we are stuck with standard HTML.

This is not to say that you cannot make your site really pretty and fill it with Java Script and Flash eye candy. But you must have regular text and standard text links. Usually you can achieve the desired effect by having extra navigation menus based on standard text links.

3. Avoid frames.

Avoid frames at all cost. If you must use them (for example to make someone else's page look like it's part of your site), do not use them on your front page.

Frames are like the plague, they sneak up on you. It is incredibly easy to lose Googlebot's tracks inside a badly formatted frameset. You might hear that some of the robots, including Google's Googlebot and Yahoo's Slurp are quickly gaining capabilities to go inside frames properly. My philosophy is, until a feature becomes ubiquitous, if you're uncertain, leave it in the closet.

4. Keep the number of links on a given page less than 100.

This comes straight from Google's Webmaster Guidelines: "Keep the links on a given page to a reasonable number (fewer than 100)."

This looks more like a suggestion and I am not 100% sure if you get penalized in any way or if Googlebot just stops reading your links after 100. I can however tell you from personal experience that I tried a page with 700 links and it seemed fine. Then one day I tried to view the page from my Blackberry PDA and I got this strange error message saying my page is illegally formatted. After I split the page into several ones with 80 links each, the pages worked on the PDA also.

Who cares about the Blackberry? Well, if you're reading this and your goal is to get visitors, then your main concern should be not to alienate anyone. Remember, today more than ever, people use different devices and different software to access the web. Every visitor is a potential customer. Every employee at a major US lawfirm and many other corporate people use a Blackberry.

Lastly, why would you need that many links on one page anyway? Let's say, for example, that you specialize in promotional products - corporate branded gifts, such as pens, caps, mints and other products (called sometimes 'premiums') imprinted with one's logo. Your name is John Doe, and you decided to name your company JDPromos (not very imaginative, but will do for our examples). You would want to have every item in your catalog as a text link, so every item gets indexed as a link and as a keyword. Also, those who run forums, ezines, blogs, might want to have standard links to their articles, as the software they use might create dynamic links, invisible to certain robots.

5. Give every page a meaningful title.

Give every single page on the site a complete and meaningful title. This is also directly from Google's Webmaster Guidelines. See Rule #1.

Incidentally, for those who are fascinated by the debates on the death of the Meta Tags, the

<title></title>
tag is not a Meta Tag, but a required element for every page.

The "title" tag is supported by every web creation tool out there, and goes in the header of a web page (between the "head" and the "/head" tags).

Google offers the 'allintitle' syntax, which lets users search only text that appears in a page title. A lot of people who integrate a Google bar into their websites allow users to get results only by title. There are over 29 million results returned for Untitled Document.

Most of us - myself included - copy and paste template pages, out of the convenience of not having to recreate all design elements from scratch. If you do so, do not forget to change the title.

Make sure your title is not just a list of keywords and that it is related to the actual content of the page. Google can and will check that, before deciding on your page's 'relevance'.

6. Do not place important text inside images.

Google says: "Try to use text instead of images to display important names, content, or links. The Google crawler doesn't recognize text contained in images."

It is very tempting to create images with text inside them, for the very simple reason that as designers, we are not limited to the very few font (type) options that basic HTML allows. Also, different browsers tend to display things differently nowadays, so it is much easier to create a text image, which will be shown consistently and not worry about styles, operating systems, etc.

7. Use descriptive "ALT" tags.

The "ALT" tag is used as a text alternative (hence the name) for images and image links and was designed so that text browsers (such as Lynx) do not just display a generic 'Image' for every picture link you might have. If all your links say 'Image', how would a potential visitor know what they are?

Make sure that the text description is meaningful and accurate. Take our promotional items company as an example. Let's say they have a picture of a tradeshow display, as an example of a service they provide outside the ordinary imprinted mint boxes, calculators and keychains. If the "ALT" tag only says "display", that is what Googlebot will see and index. If the tag says something like "example of a tradeshow display design", that is certainly more useful and more Googlebot friendly.

Please note that although the "ALT" tag does count and Google seems to put a high price on this tag, it ranks lower than plain text.

8. Use meaningful descriptions for links

With the risk of sounding like a scratched CD, I'll have to say this again: Whether you use picture links or text links, please use meaningful text inside your tags so that Googlebot can associate that text with that href link.

In other words, let's pretend again that we are designing that website for that imaginary promotional items company we called JDPromos. If you intend to put a link to a set of sample coffee mugs promos, say something like "link to JDPromos samples of branded coffee mugs", not just "coffee mugs", or even worse, "click here for pictures". Never use link text like "read more" or "go here" or "download it", "click here", "don't click here", you get the picture - I hope.

Don't try to fool the Googlebot with hidden links or duplicate content or irrelevant pages of words like "sex" and "hot girls." The Googlebot doesn't like being played and you will be penalized, one way or another, in the long run.

9. Use a "description" tag for every page

Include a

<meta name="description" content="[insert your site's description here]">

tag in your page header to summarize your site. Use a meaningful one or two sentence description, do not keyword spam.

Even better, include descriptive text on the site's front page where users can actually read it. This text will appear as the description for your site in Google results.

Place more important content higher in the page than less important content in a page, Google does categorize text on a page based on it's position, text at the bottom of a page is considered less important, or 'relevant', to use one of Google's own terms.

10. Use short query strings

Use URLs with query strings sparingly, if at all possible. Query strings are also called dynamic pages. You can usually recognize dynamic pages by the presence of the "?" character. Keep in mind that the shorter the list of query string parameters, the better. Be aware that not every search engine robot can crawl dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.

11. Never use the "&id=" parameter

If you must use query strings, or dynamic pages, never use the "&id=" parameter as part of the string.

I know this might sound ridiculous, as it might be hard or impossible for you not to use the "&id=" parameter, but if you are a programmer and you can change the variable's name, replace "id" with something else. Otherwise, Googlebot will just skip that page.

Google says: "Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index."

12. Use robots.txt

Use robots.txt to show the Googlebot around your site. This ancient and very standard mechanism for directing well-behaved robots like the Googlebot will allow you to specify places where the robot is not welcome, whether for privacy reasons, or for reasons of avoiding Google penalties. You might want to keep the robot away from your cgi-bin directory and other places you maybe don't want available to the entire searching population of the globe. Remember this is a guideline, not a barrier, robots that are not programmed to comply, will disregard. Bottom line, use the robots.txt to guide Googlebot, but not to enforce strict security.

Google says: "Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled."

13. Make a sitemap

A site map is just a page on your website where you guide your users through the structure of your site. The most basic form of sitemap is a page that lists all of your pages, with a brief description and a link - all text, of course. When you make the sitemap, follow all the rules above and don't forget that the purpose of the sitemap is to guide your human visitor.

Google says: "Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages."

14. Use the Google Sitemaps project

At the time of this writing, the fastest, best and most accurate way to make sure your site is properly crawled and indexed by Googlebot is to participate in the Google Sitemaps project.

In a nutshell, you make a sitemap as an XML page and submit it directly to Google. Google then sends Googlebot to index your site. Besides the speedy free submission, you also get a good amount of statistics and the opportunity to fix potential errors in your site.

Please note that the XML sitemap needed for the Google Sitemap project is intended specifically for Googlebot, and is different from the sitemap described in the previous Rule, which is intended solely for human users.

Also, do not be afraid of XML, Google's sitemap is a very simple text file and they give you all the necessary information and directions at: https://www.google.com/webmasters/sitemaps

Good luck!

---
Andrei co-owns Bsleek - a company that specializes in web design, hosting, promotional items, printing, tradeshow displays, logos, CD presentations, SEO and more. Andrei has amassed an extensive technical knowledge and experience through his career as the CIO for a major travel management company and through his past careers in military research, data acquisition and airspace engineering. He also consults for Trinity Investigations, a New York based PI firm.



---
Bsleek - Redefining cheap web hosting [http://www.bsleek.com/hosting/]

Article Source: http://EzineArticles.com/?expert=Andrei_Smith



No comments: