Wednesday, November 29, 2006

Steps to Take for Optimal Google Indexing of Your Site



The cornerstone of any good search engine is highly relevant results. Google's unprecedented success has been due to its uncanny ability to match quality information with a user's search terms. The core of Google's search results are based upon a patented algorithm called PageRank.
There is an entire industry focused on getting sites listed near the top of search engines. Google has proven to be the toughest search engine for a site to do well on. Even so, it isn't all that difficult for a new web site to get listed and begin receiving some traffic from Google.
It can be a daunting task to learn the ins and outs of getting your site listed with any search engine. There is a vast array of information about search engines on the Web, and not all of it is useful or proper. This discussion of getting your site into the Google database focuses on long term techniques for successfully promoting your site through Google. It will stay well away from some of the common misconceptions and problems that a new site owner faces.
Search Engine Basics
When you type in a search term at a search engine, it looks up potential matches in its database. It then presents the best web page matches first. How those web pages get into the database, and consequently, how you can get yours in there too, is a three step process:
1. A search engine visits a site with an automated program called a spider (sometimes they're also called robots). A spider is just a program similar to a web browser that downloads your site's pages. It doesn't actually display the page anywhere, it just downloads the page data.
2. After the spider has acquired the page, the search engine passes the page to a program called an indexer. An indexer is another robotic program that extracts most of the visible portions of the page. The indexer also analyzes the page for keywords, the title, links, and other important information contained in the code.
3. The search engine adds your site to its database and makes it available to searchers. The greatest difference between search engines is in this final step where rankings or results positions under a particular keyword are determined.
Submitting Your Site to Google
For the site owner, the first step is to get your pages listed in the database. There are two ways to get added. The first is direct submission of your site's URL to Google via its add URL or Submission page. To counter programmed robots, search engines routinely move submission pages around on their sites. You can currently find Google's submission page linked from their Help pages or Webmaster Info pages (http://www.google.com/addurl.html).
Just visit the add URL page and enter the main index page for your site into the Google submission page form, and press submit. Google's spider (called GoogleBot) will visit your page usually within four weeks. The spider will traverse all pages on your site and add them to its index. Within eight weeks, you should be able to find your site listed in Google.
The second way to get your site listed in Google is to let Google find you. It does this based upon links that may be pointing to your site. Once GoogleBot finds a link to your site from a page it already has in its index, it will visit your site.
Google has been updating its database on a monthly basis for three years. It sends its spider out in crawler mode once a month too. Crawler mode is a special mode for a spider when it traverses or crawls the entire Web. As it runs into links to pages, it then indexes those pages in a never ending attempt to download all the pages it can. Once your pages are listed in Google, they are revisited and updated on a monthly basis. If you frequently update your content, Google may index your search terms more often.
Once you are indexed and listed in Google, the next natural question for a site owner is, "How can I rank better under my applicable search terms?"
The Search Engine Optimization Template
This is my general recipe for the ubiquitous Google. It is generic enough that it works well everywhere. It's as close as I have come to a "one-size-fits-all" SEO—that's Search Engine Optimization—template.
Use your targeted keyword phrase:
- In META keywords. It's not necessary for Google, but a good habit. Keep your META keywords short (128 characters max, or 10).
- In META description. Keep keyword close to the left but in a full sentence.
- In the title at the far left but possibly not as the first word.
- In the top portion of the page in first sentence of first full bodied paragraph (plain text: no bold, no italic, no style).
- In an H3 or larger heading.
- In bold—second paragraph if possible and anywhere but the first usage on page.
- In italic—anywhere but the first usage.
- In subscript/superscript.
- In URL (directory name, filename, or domain name). Do not duplicate the keyword in the URL.
- In an image filename used on the page.
- In ALT tag of that previous image mentioned.
- In the title attribute of that image.
- In link text to another site.
- In an internal link's text.
- In title attribute of all links targeted in and out of page.
- In the filename of your external CSS (Cascading Style Sheet) or JavaScript file.
- In an inbound link on site (preferably from your home page).
- In an inbound link from off site (if possible).
- In a link to a site that has a PageRank of 8 or better.
Other search engine optimization things to consider include:
- Use "last modified" headers if you can.
- Validate that HTML. Some feel Google's parser has become stricter at parsing instead of milder. It will miss an entire page because of a few simple errors—we have tested this in depth.
- Use an HTML template throughout your site. Google can spot the template and parse it off. (Of course, this also means they are pretty good a spotting duplicate content.)
- Keep the page as .html or .htm extension. Any dynamic extension is a risk.
- Keep the HTML below 20K. 5-15K is the ideal range.
- Keep the ratio of text to HTML very high. Text should out weight HTML by significant amounts.
- Double check your page in Netscape, Opera, Mozilla Firefox, and IE. Use Lynx if you have it.
- Use only raw HREFs for links. Keep JavaScript far, far away from links. The simpler the link code the better.
- The traffic comes when you figure out that 1 referral a day to 10 pages is better than 10 referrals a day to 1 page.
- Don't assume that keywords in your site's navigation template will be worth anything at all. Google looks for full sentences and paragraphs. Keywords just laying around orphaned on the page are not worth as much as when used in a sentence.
Five don'ts and one do for getting your site indexed by Google.
A high ranking in Google can mean a great deal of traffic. Because of that, there are lots of people spending lots of time trying to figure out the infallible way to get a high ranking from Google. Add this. Remove that. Get a link from this. Don't post a link to that.
Submitting your site to Google to be indexed is simple enough. Google's got a site submission form (http://www.google.com/addurl.html), though they say if your site has at least a few inbound links (other sites that link to you), they should find you that way. In fact, Google encourages URL submitters to get listed on The Open Directory Project (DMOZ, http://www.dmoz.org/) or Yahoo! (http://www.yahoo.com/).
Nobody knows the holy grail secret of high page rank without effort. Google uses a variety of elements, including page popularity, to determine page rank. Page rank is one of the factors determining how high up a page appears in search results. But there are several things you should not be doing combined with one big thing you absolutely should.
Does breaking one of these rules mean that you're automatically going to be thrown out of Google's index? No; there are over 2 billion pages in Google's index at this writing, and it's unlikely that they'll find out about your rule-breaking immediately. But there's a good chance they'll find out eventually. Is it worth it having your site removed from the most popular search engine on the Internet?
Thou shalt not:
Cloak. "Cloaking" is when your web site is set up such that search engine spiders get different pages from those human surfers get. How does the web site know which are the spiders and which are the humans? By identifying the spider's User Agent or IP—the latter being the more reliable method.
An IP (Internet Protocol) address is the computer address from which a spider comes from. Everything that connects to the Internet has an IP address. Sometimes the IP address is always the same, as with web sites. Sometimes the IP address changes—that's called a dynamic address. (If you use a dial-up modem, chances are good that every time you log on to the Internet your IP address is different. That's a dynamic IP address.)
A "User Agent" is a way a program that surfs the Web identifies itself. Internet browsers like Mozilla use User Agents, as do search engine spiders. There are literally dozens of different kinds of User Agents; see the Web Robots Database (http://www.robotstxt.org/wc/active.html) for an extensive list.
Advocates of cloaking claim that cloaking is useful to absolutely optimize content for spiders. Anticloaking critics claim that cloaking is an easy way to misrepresent site content—feeding a spider a page that's designed to get the site hits for pudding cups when actually it's all about baseball bats. You can get more details about cloaking and different perspectives on it at http://pandecta.com/, http://www.apromotionguide.com/cloaking.html, and http://www.webopedia.com/TERM/C/cloaking.html.
Hide text. Text is hidden by putting words or links in a web page that are the same color as the page's background—putting white words on a white background, for example. This is also called "fontmatching." Why would you do this? Because a search engine spider could read the words you've hidden on the page while a human visitor couldn't. Again, doing this and getting caught could get you banned from Google's index, so don't.
That goes for other page content tricks too, like title stacking (putting multiple copies of a title tag on one page), putting keywords in comment tags, keyword stuffing (putting multiple copies of keywords in very small font on page), putting keywords not relevant to your site in your META tags, and so on. Google doesn't provide an exhaustive list of these types of tricks on their site, but any attempt to circumvent or fool their ranking system is likely to be frowned upon. Their attitude is more like: "You can do anything you want to with your pages, and we can do anything we want to with our index—like exclude your pages."
Use doorway pages. Sometimes doorway pages are called "gateway pages." These are pages that are aimed very specifically at one topic, which don't have a lot of their own original content, and which lead to the main page of a site (thus the name doorway pages).
For example, say you have a page devoted to cooking. You create doorway pages for several genres of cooking—French cooking, Chinese cooking, vegetarian cooking, etc. The pages contain terms and META tags relevant to each genre, but most of the text is a copy of all the other doorway pages, and all it does is point to your main site.
This is illegal in Google and annoying to the Google-user; don't do it. You can learn more about doorway pages at http://searchenginewatch.com/webmasters/bridge.html or http://www.searchengineguide.com/whalen/2002/0530_jw1.html.
Check your link rank with automated queries. Using automated queries (except for the sanctioned Google API) is against Google's Terms of Service anyway. Using an automated query to check your PageRank every 12 seconds is triple bad; it's not what the search engine was built for and Google probably considers it a waste of their time and resources.
Link to "bad neighborhoods". Bad neighborhoods are those sites that exist only to propagate links. Because link popularity is one aspect of how Google determines PageRank, some sites have set up "link farms"—sites that exist only for the purpose of building site popularity with bunches of links. The links are not topical, like a specialty subject index, and they're not well-reviewed, like Yahoo!; they're just a pile of links. Another example of a "bad neighborhood" is a general FFA page. FFA stands for "free for all"; it's a page where anyone can add their link. Linking to pages like that is grounds for a penalty from Google.
Now, what happens if a page like that links to you? Will Google penalize you page? No. Google accepts that you have no control over who links to your site.
Thou shalt:
Create great content. All the HTML contortions in the world will do you little good if you've got lousy, old, or limited content. If you create great content and promote it without playing search engine games, you'll get noticed and you'll get links. Remember Sturgeon's Law ("Ninety percent of everything is crud.") Why not make your web site an exception?
What Happens if You Reform?
Maybe you've got a site that's not exactly the work of a good search engine citizen. Maybe you've got 500 doorway pages, 10 title tags per page, and enough hidden text to make an O'Reilly Pocket Guide. But maybe now you want to reform. You want to have a clean lovely site and leave the doorway pages to Better Homes and Gardens. Are you doomed? Will Google ban your site for the rest of its life?
No. The first thing you need to do is clean up your site—remove all traces of rule breaking. Next, send a note about your site changes and the URL to help@google.com. Note that Google really doesn't have the resources to answer every email about why they did or didn't index a site—otherwise, they'd be answering emails all day—and there's no guarantee that they will reindex your kinder, gentler site. But they will look at your message.
What Happens if You Spot Google Abusers in the Index?
What if some other site that you come across in your Google searching is abusing Google's spider and pagerank mechanism? You have two options. You can send an email to spamreport@google.com or fill out the form at http://www.google.com/contact/spamreport.html. (I'd fill out the form; it reports the abuse in a standard format that Google's used to seeing.)
Before you submit your site to Google, make sure you've cleaned it up to make the most of your indexing.
You clean up your house when you have important guests over, right? Google's crawler is one of the most important guests you site will ever have if you want visitors. A high Google ranking can lead to incredible numbers of referrals, both from Google's main site and those site that have search powered by Google.
To make the most of your listing, step back and look at your site. By making some adjustments, you can make your site both more Google-friendly and more visitor-friendly.
If you must use a splash page, have a text link from it. If I had a dollar for every time I went to the front page of a site and saw no way to navigate besides a Flash movie, I'd be able to nap for a living. Google doesn't index Flash files, so unless you have some kind of text link on your splash page (a "Skip This Movie" link, for example, that leads into the heart of your site) you're not giving Google's crawler anything to work with. You're also making it difficult for surfers who don't have Flash or are visually impaired.
Make sure your internal links work. Sounds like a no-brainer, doesn't it? Make sure your internal page links work so the Google crawler can get to all your site's pages. You'll also make sure your visitors can navigate.
Check your title tags. There are few things sadder than getting a page of search results and finding "Insert Your Title Here" as the title for some of them. Not quite as bad is getting results for the same domain and seeing the exact same title tag over and over and over and over.
Look. Google makes it possible to search just the title tags in its index. Further, the title tags are very easy to read on Google's search results and are an easy way for a surfer to quickly get an idea of what a page is all about. If you're not making the most of your title tag you're missing out on a lot of attention on your site.
The perfect title tag, to me, says something specific about the page it heads, and is readable to both spiders and surfers. That means you don't stuff it with as many keywords as you can. Make it a readable sentence, or—and I've found this useful for some pages—make it a question.
Check your META tags. Google sometimes relies on META tags for a site description when there's a lot of navigation code that wouldn't make sense to a human searcher. I'm not crazy about META tags, but I'd make sure that at least the front page of my web site had a description and keyword META tag set, especially if your site relies heavily on code-based navigation (like from JavaScript).
Check your ALT tags. Do you use a lot of graphics on your pages? Do you have ALT tags for them so that visually impaired surfers and the Google spider can figure out what those graphics are? If you have a splash page with nothing but graphics on it, do you have ALT tags on all those graphics so a Google spider can get some idea of what your page is all about? ALT tags are perhaps the most neglected aspect of a web site. Make sure yours are set up.
By the way, just because ALT tags are a good idea, don't go crazy. You don't have to explain in your ALT tags that a list bullet is a list bullet. You can just mark it with a *.
Check your frames. If you use frames, you might be missing out on some indexing. Google recommends you read Danny Sullivan's article, "Search Engines and Frames," at http://www.searchenginewatch.com/webmasters/frames.html. Be sure that Google can either handle your frame setup or that you've created an alternative way for Google to visit, such as using the NOFRAMES tag.
Consider your dynamic pages. Google says they "limit the number of amount of dynamic pages" they index. Are you using dynamic pages? Do you have to?
Consider how often you update your content. There is some evidence that Google indexes popular pages with frequently updated content more often. How often do you update the content on your front page?
Make sure you have a robots.txt file if you need one. If you want Google to index your site in a particular way, make sure you've got a robots.txt file for the Google spider to refer to. You can learn more about robots.txt in general at http://www.robotstxt.org/wc/norobots.html.
If you don't want Google to cache your pages, you can add a line to every page that you don't want cached. Add this line to the section of your page:

This will tell all robots that archive content, including engines like Daypop and Gigablast, not to cache your page. If you want to exclude just the Google spider from caching your page, you'd use this line:


Getting the Most Out of AdWords

Scrape the AdWords from a saved Google results page into a form suitable for importing into a spreadsheet or database.
Google's AdWords™—the text ads that appear to the right of the regular search results—are delivered on a cost-per-click basis, and purchasers of the AdWords are allowed to set a ceiling on the amount of money they spend on their ad. This means if even if you run a search for the same query word multiple times, you won't necessarily get the same set of ads each time.
If you're considering using Google AdWords to run ads, you might want to gather up and save the ads that are running for the query words you're interested in. Google AdWords are not provided by the Google API; of course you can't automatically scrape Google's results outside the Google API, because it's against Google's Terms of Service. .
AdWords (https://adwords.google.com/select/?hl=en) is just about the sort of advertising program you might expect to roll out of the big brains at Google. The designers of the advertising system have innovated thoroughly to provide precise targeting at low cost with less work—it really is a good deal. The flipside is that it takes a fair bit of savvy to get a campaign to the point where it stops failing and starts working.
For larger advertisers, AdWords Select is a no-brainer. Within a couple of weeks, a larger advertiser will have enough data to decide whether to significantly expand their ad program on AdWords Select or perhaps to upgrade to a premium sponsor account.
I'm going to assume you have a basic familiarity with how cost-per-click advertising works. AdWords Select ads currently appear next to search results on Google.com (and some international versions of the search engine) and near search results on AOL and a few other major search destinations. There are a great many quirks and foibles to this form of advertising. My focus here will be on some techniques that can turn a mediocre, nonperforming campaign into one that actually makes money for the advertiser while conforming to Google's rules and guidelines.
One thing I should make crystal clear is that advertising with Google bears no relationship to having your web site's pages indexed in Google's search engine. The search engine remains totally independent of the advertising program. Ad results never appear within search results.
I'm going to offer four key tips for maximizing AdWords Select campaign performance, but before I do, I'll start with four basic assumptions:
High CTRs (click-through rates) save you money, so that should be one of your main goals as an AdWords Select advertiser. Google has set up the keyword bidding system to reward high-CTR advertisers. Why? It's simple. If two ads are each shown 100 times, the ad that is clicked on eight times generates revenue for Google twice as often as the ad that is clicked on four times over the same stretch of 100 search queries served. So if your CTR is 4% and your competitor's is only 2%, Google factors this into your bid. Your bid is calculated as if it were "worth" twice as much as your competitor's bid.
Very low CTRs are bad. Google disables keywords that fall below a minimum CTR threshold ("0.5% normalized to ad position," which is to say, 0.5% for position 1, and a more forgiving threshold for ads as they fall further down the page). Entire campaigns will be gradually disabled if they fall below 0.5% CTR on the whole.
Editorial disapprovals are a fact of life in this venue. Your ad copy or keyword selections may violate Google's editorial guidelines from time to time. Again, it's very difficult to run a successful campaign when large parts of it are disabled. You need to treat this as a normal part of the process rather than giving up or getting flustered.
The AdWords Select system is set up like an advertising laboratory; that is to say, it makes experimenting with keyword variations and small variations in ad copy a snap. No guru can prejudge for you what will be your "magical ad copy secrets," and it would be irresponsible to do so, because Google offers such detailed real-time reporting that can tell you very quickly what does and does not catch people's attention.
Now onto four tips to get those CTRs up and to keep your campaign from straying out of bounds.
Matching Can Make a Dramatic Difference
You'll likely want to organize your campaign's keywords and phrases into several distinct "ad groups" (made easy by Google's interface). This will help you more closely match keywords to the actual words that appear in the title of your ad. Writing slightly different ads to closely correspond to the words in each group of keywords you've put together is a great way to improve your clickthrough rates. You'd think that an ad title (say, "Deluxe Topsoil in Bulk") would match equally well to a range of keywords that mean essentially the same thing. That is, you'd think this ad title would create about the same CTR with the phrase "bulk topsoil" as it would with a similar phrase ("fancy dirt wholesaler"). Not so. Exact matches tend to get significantly higher CTRs. Being diligent about matching your keywords reasonably closely to your ad titles will help you outperform your less diligent competition.
If you have several specific product lines, you should consider better matching different groups of key phrases to an ad written expressly for each product line. If your clients like your store because you offer certain specialized wine varieties, for example, have an ad group with "ice wine" and related keywords in it, with "ice wine" in the ad title. Don't expect the same generic ad to cover all your varieties. Someone searching for an "ice wine" expert will be thrilled to find a retailer who specializes in this area. They probably won't click on or buy from a retailer who just talks about wine in general. Search engine users are passionate about something, and their queries are highly granular. Take advantage of this passion and granularity.
The other benefit of getting more granular and matching keywords to ad copy is that you don't pay for clicks from unqualified buyers, so your sales conversion rate is likely to be much higher.
Copywriting Tweaks Generally Involve Improving Clarity and Directness
By and large, I don't run across major copywriting secrets. Psychological tricks to entice more people to click, after all, may wind up attracting unqualified buyers. But there are times when the text of an ad falls outside the zone of "what works reasonably well." In such cases, excessively low CTRs kill any chance your web site might have had to close the sale.
Consider using the Goldilocks method to diagnose poor-performing ads. Many ads lean too far to the "too cold" side of the equation. Overly technical jargon may be unintelligible and uninteresting even to specialists, especially given that this is still an emotional medium and that people are looking at search results first and glancing at ad results as a second thought.
The following example is "too cold":
Faster DWMGT Apps
Build GMUI modules 3X more secure than KLT. V. 2.0 rated as
"best pligtonferg" by WRSS Mag.
No one clicks. Campaign limps along. Web site remains world's best kept secret.
So then a "hotshot" (the owner's nephew) grabs the reins and tries to put some juice into this thing. Unfortunately, this new creative genius has been awake for the better part of a week, attending raves, placing second in a snowboarding competition, and tending to his various piercings. His agency work for a major Fortune 500 client's television spots once received rave reviews. Of course, those were rave reviews from industry pundits and his best friends, because the actual ROI on the big client's TV "branding" campaign was untrackable.
The hotshot's copy reads:
Reemar's App Kicks!
Reemar ProblemSolver 2.0 is the real slim shady. Don't trust
your Corporate security to the drones at BigCorp.
Unfortunately, in a non-visual medium with only a few words to work with, the true genius of this ad is never fully appreciated. Viewers don't click and may be offended by the ad and annoyed with Google.
The simple solution is something unglamorous but clear, such as:
Easy & Powerful Firewall
Reemar ProblemSolver 2.0 outperforms BigCorp
Exacerbator 3 to 1 in industry tests.
You can't say it all in a short ad. This gets enough specific (and true) info out there to be of interest to the target audience. Once they click, there will be more than enough info on your web site. In short, your ads should be clear. How's that for a major copywriting revelation?
The nice thing is, if you're bent on finding out for yourself, you can test the performance of all three styles quickly and cheaply, so you don't have to spend all week agonizing about this.
Be Inquisitive and Proactive with Editorial Policies (But Don't Whine)
Editorial oversight is a big task for Google AdWords staff—a task that often gets them in hot water with advertisers, who don't like to be reined in. For the most part, the rules are in the long term best interest of this advertising medium, because they're aimed at maintaining consumer confidence in the quality of what appears on the page when that consumer types something into a search engine. Human error, however, may mean that your campaign is being treated unfairly because of a misunderstanding. Or maybe a rule is ambiguous and you just don't understand it.
Reply to the editorial disapproval messages (they generally come from adwords-support@google.com). Ask questions until you are satisfied that the rule makes sense as it applies to your business. The more Google knows about your business, in turn, the more they can work with you to help you improve your results, so don't hesitate to give a bit of brief background in your notes to them. The main thing is, don't let your campaign just sit there disabled because you're confused or angry about being "disapproved." Make needed changes, make the appropriate polite inquiries, and move on.
Avoid the Trap of "Insider Thinking" and Pursue the Advantage of Granular Thinking
Using lists of specialized keywords will likely help you to reach interested consumers at a lower cost per click and convert more sales, than using more general industry keywords. Running your ad on keywords from specialized vocabularies is a sound strategy.
A less successful strategy, though, is to get lost in your own highly specialized social stratum when considering how to pitch your company. Remember that this medium revolves around consumer search engine behavior. You won't win new customers by generating a list of different ways of stating terminology that only management, competitors, or partners might actually use, unless your ad campaign is just being run for vanity's sake.
Break things down into granular pieces and use industry jargon where it might attract a target consumer, but when you find yourself listing phrases that only your competitors might know or buzzwords that came up at the last interminable management meeting, stop! You've started down the path of insider thinking! By doing so, you may have forgotten about the customer and about the role market research must play in this type of campaign.
It sounds simple to say it, but in your AdWords Select keyword selection, you aren't describing your business. You're trying to use phrases that consumers would use when trying to describe a problem they're having, a specific item they're searching for, or a topic that they're interested in. Mission statements from above versus what customers and prospects actually type into search engines. Big difference. (At this point, if you haven't yet done so, you'd better go back and read over The Cluetrain Manifesto to get yourself right out of this top-down mode of thinking.)
One way to find out about what consumers are looking for is to use Wordtracker (http://www.wordtracker.com) or other keyword research tools (such as the one that Google offers as part of the AdWords Select interface, a keyword research tool Google promises it's working on). However, these tools are not in themselves enough for every business; because more businesses are using these "keyphrase search frequency reports," the frequently searched terms eventually become picked over by competing advertisers—just what you want to avoid if you're trying to sneak along with good response rates at a low cost per click.
You'll need to brainstorm as well. In the future, there will be more sophisticated software-driven market research available in this area. Search technology companies like Ask Jeeves Enterprise Solutions are already collecting data about the hundreds of thousands of customer questions typed into the search boxes on major corporate sites, for example. This kind of market research is under used by the vast majority of companies today.
There are currently many low-cost opportunities for pay-per-click advertisers. As more and larger advertisers enter the space, prices will rise, but with a bit of creativity, granular thinking, and diligent testing, the smaller advertiser will always have a fighting chance on AdWords Select. Good luck!
Google, PageRank, AdWords, and I'm Feeling Lucky are trademarks of Google Technology, Inc.

0 Comments:

Post a Comment

<< Home