Google Penalty Myths

November 19, 2009

Google has always used penalties to remove certain sites entirely from its search results or to artificially place them so far down the rankings that they will never be found. These penalties used to be reserved for spam, or sites caught attempting to cheat Google’s algorithms. Recently, however, Google has quietly introduced a new breed of ‘blameless’ penalty, targeted at legitimate vertical search and directory services.

Unfortunately, the secrecy surrounding these new penalties coupled with the stigma attached to any Google penalty has given rise to several myths and misunderstandings that suppress meaningful debate.

The following is a rebuttal of some of the most common and recurring myths and misunderstandings:

1. You shouldn’t put all of your eggs in one basket

The Myth: Websites need to have diverse sources of traffic. Any company that puts itself in a position of dependence on a single search engine is asking to go out of business.

The Rebuttal: It is reasonable to suggest that a business ought to be able to survive without Google, but it is unreasonable to suggest that a business ought to be able to compete without Google. Search engines are the undisputed gateway to the Internet, and Google’s 85% share of the global search market means that no business (especially an internet-based business) can possibly compete when it is barred from this channel while its competitors are not.

Even the survival aspect of the argument is flawed. No business, big or small, can be immune to the devastating effect of a Google search penalty. Even Amazon, one of the Internet’s strongest brands, would be placed in serious jeopardy if Google removed Amazon’s pages from its search results, or even just bumped them down a few tens of places.

2. It’s just bad SEO; give me 5 minutes, and I’ll fix it

The Myth: The guys running Site X clearly don’t know what they’re doing. I’ve had a cursory glance, and I’m certain I could fix all their problems in five minutes with some basic SEO (search engine optimisation).

The Rebuttal: There is some confusion here between Google search rankings and Google search penalties. This isn’t surprising, as Google does much to encourage this confusion.

A Google search penalty takes a page’s natural ranking (after any SEO) and artificially lowers it by some amount. Penalties vary in severity, but they generally make any SEO futile. A penalty lowers a page’s ranking by thirty, fifty, or even hundreds of places. Whether a page ranked position 1 or position 20 before the penalty makes no material difference, because being listed on page 3 or below of a search engine’s results is very nearly equivalent to not being listed at all.

A Google penalty is a bit like turning the brightness on your television set all the way down to zero. Having done so, no matter what else you tweak or what TV station you tune to, you will not see a brighter image until you disable this ‘penalty’ and set the brightness back to normal.

Much of the confusion (and stigma) surrounding Google penalties arises from a lack of knowledge about them. Google’s arsenal of penalties has expanded in recent years to include a new breed of ‘blameless’ penalty, such as that targeted at legitimate vertical search services. Whereas sites penalised by old-style, ‘spam’ or ‘cheating’ penalties can escape them by mending their ways, sites caught by these new penalties cannot—their only means of escape is via manual intervention by Google (e.g. whitelisting).

Because these new penalties are aimed at legitimate services, they are difficult for Google to defend publicly. So instead Google does everything possible to conceal their existence, keeping the debate focused on “if” a site is penalised, rather than on “why”.

When questioned about these new penalties, for example, Google’s habitual response tends to be vague and elusive, such as “rankings are determined via an interplay of an enormous number of algorithms”.

But this is misleading. Because penalties can be switched on or off (enabled or disabled), they are logically separate from the ranking process. It doesn’t matter how complex Google’s ranking algorithms are, or to what extent its penalty algorithms are physically interweaved or separate, because the penalties can be completely isolated from this complexity by simply disabling them.

Google’s elusive response is like an engineer visiting your house, turning the brightness on your television down to zero, and then trying to persuade you that there is nothing that can be done to fix the problem because a television is a highly complex machine comprised of tens of thousands of interconnected parts. The engineer’s description of a television may be broadly accurate, but it is concealing the fact that with the flick of a switch he could easily bypass all of this complexity to fix the problem that he deliberately caused.

3. Your site lacks unique / original content

The Myth: “If you do not have unique content, you should not expect traffic from Google, end of story”.

The Rebuttal: Google tends to emphasise the value of original content, while downplaying the value of useful service. This is a convenient line for Google, first, because Google requires 3rd party content to hang its ads on, and, second, because this line helps foster the view that rival search services have little inherent value.

An unfortunate consequence of Google’s relentless focus on content over service is the online proliferation of made-for-AdSense (MFA) material. Google’s AdSense advertising network allows any site that can attract traffic to instantly monetise it. A growing amount of the Internet’s content has been created and tailored more for its appeal to search engines and their ad networks than for its genuine value to users. Unfortunately, it is not possible to quantify how much of the Internet’s content has been created primarily for this purpose, but it is certainly already at alarming levels.

Accusing a search service of having little or no original content is like accusing a library of not writing its own books. While accurate, it is clearly missing the point of the valuable service that a library provides. Besides, the same accusation can be levelled at Google. When KinderStart did just that in 2006, Google’s attorneys vigorously objected:

“[the] argument that Google’s search engine is a mere conduit is nonsense. According to KinderStart, Google’s search results function solely to link a user to third party websites and contain no original content. But Google’s search results are original content, expressing Google’s opinion of the relative significance of websites”, Google Attorneys, 2006.

Advocates of the “no original content” myth argue that, if a hundred online shops are selling the Canon EOS 500D camera, then it makes sense for Google to list only one or two of them. They all contain the same photographs, the same manufacturer-supplied product descriptions, and the same list of features. Pretty much the only differences between them are the prices and stock availability. In other words, the shops’ pages are virtually identical, so why should Google list them all? Remarkably, this bizarre argument is commonplace.

By gathering all of the relevant information (including prices and stock availability) about a product from dozens of suppliers and presenting it all on one sortable and filterable page, a high-quality price comparison site clearly provides a valuable service, whether or not it adds any “original” content of its own.

4. Everybody hates price comparison sites

The Myth: It’s great that Google is blocking price comparison site X from its search results because I don’t like price comparison sites. These sites are always cluttering up my search results.

The Rebuttal: This argument generally arises from the false assumption that all searches are vague and open-ended. A search engine cannot read user’s minds to discern their intent when they search on something vague like “canon eos 500d”. So these days, search engines tend to take a pragmatic approach to such open-ended queries by returning a mix of results from a variety of different kinds of sites, such as price comparisons, review sites, forums, stores, and so on.

But many searches are not vague or open-ended. Proponents of the “everybody hates price comparison” argument tend to overlook the vast array of intent-specific queries that users routinely type into search engines, such as “compare prices canon eos 500d” or “best price canon eos 500d”. It clearly makes no sense to exclude price comparison sites from these results, yet, when a site is penalised, this is exactly what Google does. Once penalised, a site’s relevance to a query is effectively ignored for almost everything apart from its brand name.

Besides, even if you believe that all price comparison results are bad in all circumstances, it is clear that Google doesn’t agree with you. Google do not penalise anything like all price comparison sites. On the contrary, at the moment, well established brands (that would be conspicuous if suddenly absent) are effectively granted immunity, while emerging, and possibly highly innovative, brands are discriminated against. Most significantly (and worryingly), Google now routinely features its own price comparison results at or near the top of all product-related search results, no matter how vague or open-ended.

There is perhaps a certain irony in the fact that much of the animosity sometimes directed at price comparison sites may originate from a period, not so long ago, when Google’s ranking algorithms allowed certain price comparison sites to run roughshod over many of Google’s search results. In the UK, for example, it was not uncommon to find a certain European price comparison site utterly dominating the first three or four pages of Google’s search results for most product-related searches.

5. Search engines don’t list other search engines

The Myth: “The point is that Google is a search engine to help you find content. If its results are full of other search engines, it’s failed. That’s like going to a library and finding only directions to other libraries.”

The Rebuttal: It is true that if horizontal search engines featured each other’s search results, we would have a recursive nightmare from which we might never escape. Fortunately, there is no risk of this happening anytime soon. Horizontal, keyword-based search engines like Google, Yahoo, and Bing are largely form-driven. Most commonly, these forms consist of a single text box into which users enter their keyword search terms. Search engines do not and cannot index each other’s keyword-based result pages (which is just as well, as there are as many such pages as there are possible combinations of keywords).

But vertical search services are different from and complementary to horizontal, keyword-based search services. In general, horizontal search engines like Google are broad and shallow, whereas vertical search engines are narrow and deep: a horizontal search engine, for example, can return a list of sites that sell flights to New York, whereas the right vertical search engine can delve deeper to return details of the actual flights.

The right vertical search engine can achieve in seconds what it would take users several minutes or even hours to achieve on their own. Not surprisingly, therefore, some savvy Internet users regularly search for these services through horizontal search engines, just as they do for any other useful content or service.

Most price comparison and vertical search services provide a clickable hierarchical interface alongside any form-based search boxes. These clickable interfaces are preferred by many users and are easily crawled and indexed by search engines. These deep clickable links are, for example, what allows a search engine to deliver a user straight to the right page on a price comparison site for the product they are searching for, rather than simply dropping them at the site’s home page.

It is therefore both straightforward and desirable for horizontal search engines to crawl and feature search results from vertical search and price comparison sites. And all of the major horizontal search engines, including Google, Yahoo!, and Bing, routinely do so.

6. Appearing in Google is a privilege, not a right / Google isn’t a monopoly / “Competition is just a Click Away”

The Myth: “Google doesn’t have a monopoly, because users can choose to use another search engine…Google may very well be [manipulating its search results], and as a private business, in it for profit, they have every right to do so.”

The Rebuttal: Much of this is a question of perspective. From the point of view of users, there are viable alternative search engines, so it can be argued that Google is not necessarily a monopoly.

From the point of view of businesses or websites needing to reach those users, however, there are currently no viable alternative search engines. The vast majority of users choose Google, so only Google can reach them. From this perspective Google’s dominance in search (90% in the UK, 72% in the US, and 85% globally) means that it almost certainly is a monopoly and is therefore subject to certain important legal constraints.

But, even laying legal issues aside and looking only at the user perspective, if Google is going to continue to exclude or penalise legitimate sites without regard to quality or relevance, then it needs to start being open and up-front about this pernicious practice.

7. Price comparison is easy; it’s just a bunch of regurgitated “affiliate” feeds

The Myth: “I could knock up a price comparison site in an afternoon”, or “You or I could do the same thing very quickly indeed.”

The Rebuttal: As anyone with any detailed knowledge of vertical search or price comparison will testify, providing quality services in these areas is extremely challenging.

Delivering any search service (including vertical search) can be broken down into three broad areas:

Acquisition – this involves getting hold of the data that is to be searched. It is true that for certain verticals, such as price comparison, the majority of data tends to come from seller product feeds. In the case of Google’s own Product Search, for example, this is always the case. But more sophisticated services tend to be more flexible and also integrate data obtained from crawling websites, querying APIs, and in some cases even querying 3rd party databases. Most commonly, data is acquired periodically, but for some search applications, such as a flight search, data will often be acquired in real-time in direct response to user queries.

Classification and Normalisation – this involves trying to work out what things are and exactly how and what should be compared when trying to compare things between different suppliers.

Presentation – for a vertical search service, this involves presenting and ranking a user’s detailed search results—the flights, the Apple iPod Classics, etc.—in a clear and uniform way that allows quick and easy, context-specific comparison, sorting, and filtering.

Each of these areas poses its own challenges, and it seems that many of the difficulties involved are not readily appreciated by outside observers.

Taking the classification and normalisation task within the context of a price comparison services as an example:

Because vertical search is context-specific, its results tend to be more detailed than those of a horizontal, keyword-based search engine, such as Google. Depending on the vertical, results will have prices, availability, delivery charges, manufacturers, model names, colours, sizes, airlines, departure and arrival airports, star-ratings, authors, descriptions, and so on. Extracting and making sense of this detailed information from a wide variety of different sellers all using different underlying technologies (websites, product feeds, spreadsheets, APIs, databases), different fields and field names, different topologies, and even different manufacturer and model names makes the task extremely challenging.

Even what really ought to be the most straightforward of tasks—working out the make and model of a particular product, such as a television—is considerably more complex than you might expect. In fact, all price comparison services make a hash of this to one degree or another. The problem is that a product is almost always called different things by different sellers (and often even by the same seller in different contexts). A silver Apple iPod Classic 160GB, for example, will be called anything from “MB145ZO/A”, to “APPLE IPODCLASS 160SIL”, to “New iPod classic -160GB – Silver”, to “160GB silver”. Some sellers won’t mention that “Apple” is the manufacturer; while others might refer to Apple by the abbreviation “APL”. And of course, this pattern is repeated across a constantly changing set of tens of thousands of active products.

Given the level of disparity for something as straightforward as the make and model of a product, which, after all, tends to be written fairly consistently on the product’s box, you shouldn’t be surprised that the disparity elsewhere is even worse. Where one supplier might include the Apple iPod Classic within its “MP3 Players” category, another might include it within “Portable Audio Devices”, another in something less helpful such as “Gifts for Him”, and another in something downright misleading, such as “Lingerie and Chocolates”.

Then there is the task of trying to make sense of the wide variety of ways in which a product’s features can be described: ‘Silver, ‘Sil’, ‘-S’, ‘32 inch’, ‘32”’, ‘Screen Size: 32’), and so on.

Making sense of all of this disparate data and working out which products are essentially the same and ought to be compared is actually a “grand challenge” task that no one has yet completely mastered. To do all of it well requires either a great deal of manual effort or some very clever technology.

Comments are closed.