This wide- and large- screen layout may not work quite right without Javascript.

Maybe enable Javascript, then try again.

Home Fiddling with PCs

Web Indexing Penalty

My suspicion is that there's a web indexing Xlist [early 2012]; being on the list is like being in purgatory. The web search service exacts a minor penalty —not as serious as de-listing your site entirely, or marking that your site might harm visitors— but a penalty nevertheless. Your SEO suffers. I've found several other suspicions of this going back several years, but I've never been able to find any confirmation (explicit nor even tacit) of this, nor any seemingly authoritative information about it.

The symptom is your whole site acts like it's at the other end of an extremely long (one or two light-weeks!) network connection. All the information in the web index is at least one or two weeks old. Whenever you add a page, no matter what you do, the web search service continues to say the page doesn't exist for at least one or two more weeks. And whenever you modify a page, no matter what you do, the web search service continues to supply index information about the old version for at least one or two more weeks.

There seems to be more than one cause that can lead to such a delay. One possible reason is a change in crawler software a few years back that wasn't completely friendly to many blogs. (Perhaps all those blogs were inadvertently relying on a particular interaction of software bugs, and stopped working once some of the bugs were corrected. Or perhaps an early attempt at handling near-duplicate content [before rel="canonical"] didn't work so well.) A second possible reason for a delay is it's normal treatment of a site with low traffic volume and infrequent changes. A third possible reason for a delay is the delay is the penalty for being on the Xlist. I don't know how to distinguish exactly what the cause of a delay is in any particular case; I suspect though the Xlist penalty is more widespread than is generally realized.

Here's what I've learned about the Xlist, largely from my own experience (and often not supported by much other information on the web:-).

Who Should Care?

If your site has never yet shown up well in web indexes, do not pay attention to this issue first. Rather, pay attention to the typical SEO things: valid robots.txt, sitemap, good page titles, careful uses of Flash and Javascript, etc. Only pay attention to this if you used to have good SERPs but suddenly fell from grace.

Likewise if this problem focused your attention on your SEO and you realized that in general it was mediocre, improve your approaches to the regular SEO issues first. Only focus on the Xlist after you've taken care of all the typical SEO things.

How can I tell?

I've never found an easy way to confirm that a site is on the Xlist. At one point several years ago, having a web search result flagged Supplemental, or searching just the supplemental index but not the main index and finding yourself, was a pretty good indicator. But this isn't possible any more. Web search services won't even confirm that a Xlist exists, let alone that your site is on it.

All you can do is infer. The Xlist may be your problem if your situation matches at least half of these indicators:

  1. You had really good SERPs for a while and were used to it, but then abruptly one day (or at least so it seemed) your site fell from grace.
  2. You added a page recently, and you know from your server logs the web indexing crawler has visited the new page (at least a day ago), but you can't find it in the web index, not even searching with site: and the exact page title.
  3. You changed a page recently, and you know from your server logs the web indexing crawler has visited the revised page (at least a day ago), but the cached copy is still an older version.
  4. All web index listings of all the pages on your site are at least one or two weeks old, even all the ones you know could be newer.
  5. In your server logs there is an abnormally high ratio of 304s to 200s returned to the web indexing crawler. (Note that just a few unexpected 304s in your server logs are not necessarily confirmation your site is on the Xlist. There may be a few legitimate 304s sprinkled in with the suspect ones; it seems as if it's possible the web indexing crawler sometimes gets its page data from a cache in the network, then just confirms the date with the server. I know of no way to tell which 304s are legitimate and which aren't.)

How does a web search service implement this?

Your site is crawled normally. Looking at web indexing crawler traffic, you see about the same number of requests from about the same IPaddresses at about the same times. In fact, just from looking at a normal server log, I know of no way to tell with any certainty whether or not your site is on the Xlist. Web search results for your site look normal too (except if you look really carefully, everything seems to have squeezed through a time warp:-).

On each GET request, the web indexing crawler does something as simple as adding one or two weeks to the date of its most recent copy of the page in its If-Modified-Since: request header. As a result, it mostly gets back 304 Not Modified.

Why is my site on the list?

I can find no definitive list of problems; in fact I'm pretty sure there isn't one. I'm also pretty sure web search services change this all the time, so a list that was 100% accurate a few months ago might be only 85% accurate today.

Most but not all problems show up in the web search service's records of having crawled your site. Log on to Webmaster Tools, select the problem site, then look at Diagnostics->Crawl Errors (and also at Malware). This will often provide a very good clue to what the problems are; in fact fix everything listed there, then recheck the web search service's indexes the next day. If after all these issues are addressed it still seems your site is on the Xlist, then look at these other possibilities:

How can I get off the list?

First, fix the problems. Don't even start thinking about getting off the Xlist until after you're sure you're done reworking your site. (!)

Normal web indexing crawls will detect that all the problems have been fixed, and will restore your site's status automatically. I know of no way anyone can manually remove your site from the Xlist. Trying to get some sort of human attention to your site in the meantime can be quite frustrating, and almost certainly wouldn't help anyway. Save the "request review" option only for very very serious problems; don't use it for this kind of thing (remember the story about the boy who cried wolf too often).

Often you'll want your site's indexing restored sooner than the automatic mechanisms will take - say three days rather than eight. Although I'm not too certain how to do this, I do have some guesses: First, some sort of manual review will not help ...except possibly if the problem is malware had infected your own site, and you've now cleaned it off and plugged the hole. And second, if the complete list of pages where the web indexing crawler noticed any problems is short, sign in to Webmaster Tools, and follow the web search service's procedures there to re-fetch and/or re-crawl each of those pages where the web indexing crawler noticed a problem.

Location: (N) 42.67995, (W) -70.83761
 (North America> USA> Massachusetts> Boston Metro North> Ipswich)

Email comments to Chuck Kollars
Time: UTC-5 (USA Eastern Time Zone)
 (UTC-4 summertime --"daylight saving time")

Chuck Kollars headshot Chuck Kollars' other web presences include Chuck's books and Chuck's movies.

You may also wish to look at Dad's photo album.

All content on this Personal Website (including text, photographs, audio files, and any other original works), unless otherwise noted on individual webpages, are available to anyone for re-use (reproduction, modification, derivation, distribution, etc.) for any non-commercial purpose under a Creative Commons License.