The Problem: I have disallowed one of my WebPages in my robots.txt file, and it’s still appearing in search result. What’s the deal ??
Have you blocked any of your webpage/sub-folder of your website using robots.txt, and still wonder why it is showing up in search results? If so, read on..
This is the most common complaints from many webmasters and fortunately we have an answer from Google Webmaster Guru Matt Cutts,
Let me take an example to show you how this can happen.Take robots.txt of the Google.com itself.
It’s located in google.com/robots.txt , you can find many entries in that file which are part of the Google and blocked by Google itself.Let’s pick any one of them at random, here I’ve picked up
google.com/m/trends
For some reasons Google has blocked this URL from being crawled by search engines.So if you search for google.com/m/trends, you’ll probably see something like this in the snippet of the search results :
Sometimes you’ll find similar results for your sites too and may not be sure why is that URL is showing up in results? in general, something like this:
According to Matt Cutts, this URL is not the indexed one.It’s just showing a link that Google finds it may be of some use to the search users.
So even if you have disallowed a URL in robots.txt, if someone links to it Google still may consider it of some value to the users and may show it up in results even though it hasn’t really crawled it.Sometimes it may even take up the description snippet from places like Open Directory project if that is listed there.
“If you truly don’t want that page to be in the search results, use ‘Noindex’ HTML meta tag at the top of the page.Another options is to use the URL Removal Tool from Google Webmaster tools.”
Watch this video for more details:
So the next time if you see any such URL blocked in robots.txt being shown up in search results, you need not worry, you may want to try the above said alternatives instead.
Special thanks to Matt Cutts and Google for letting us know about this.