Remove Old Links from Google

Posted: August 7, 2009 at 6:00 pm by
Filed under Computers SEO Web Design

google icon Remove Old Links from Google

After looking into Search Engine Optimization, I have learned the hard way on how to best create a website. Search engines, especially Google, begin indexing your site once your site exists on the Internet. It does not matter if you are done or not, Google’s spiders still come and index what is there. This means content your not ready for the world to see will appear in search engines, leading people to a unfinished website. Once your site does go live, your site’s listing in Google will contain many broken or out of date links, which will make your site look bad to both Google and your intended users. How do we remove those old links from Google, so we only have relevant links, which improve our rankings?

The solution is Google’s URL Removal tool, found under Google’s Webmaster Tools. First, you need to sign up with Google (its free), and submit your site for verification. This is a must anyway, as Google has plenty of tools to help with your website. Once your site is verified, go to Webmaster Tools, then click on your website. Now, click on site configuration, then crawler access. There will be three links, Test robots.txt, Generage robots.txt, and Remove URL. Remove URL is the link you want, but you are not yet ready to remove a URL. Why not, you ask? First, it is time to play with your robots.txt file.

What’s a robots.txt file? The easiest explinication is: it is a file that defines what search engines can index and cannot index on your site. Usually the file looks like the following:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes

Now, there is usually more to a robots.txt file, but we will get to that later. First, that’s break down the code above. First, robots.txt is a text file, which can be created with notepad. The name must be robots.txt. The first line, User-agent: *, tells all search engines to follow the rules below. The *, called the wildcard, means all. Google, Yahoo, and Bing will follow all these rules the same. The next line, Disallow: /cgi-bin, states: Disallow the following directory: cgi-bin. All search engines will not index all content from cgi-bin, which is a good thing. The rest of the commands listed are directories that will not be indexed. A key to remember is you do NOT want all areas of your website to appear in search engines. An example of this is your admin area. Users should NEVER have access to what is in your admin area.

So, what does this have to do with removing pages from Google? Simple, the page you want removed needs to blocked from search engines, which is done in your robots.txt file. This means you need to add the specific page you want blocked from google in your robots.txt file. Below is an example of this:

#Removed Pages From Google
User-agent: *
Disallow: /example_page.php

The first line of code, #Removed Pages From Google, is a comment telling you want this section of the robots.txt file is for. The next line, User-agent: *, is the same as before, it tells all search engines to follow this rule. The next line, Disallow: /example_page.php, is the page you want removed. Just change the name of the page, and put the correct path to the file. Copy the last line as many times, and add in all pages you want removed from Google, into your robots.txt file. Once done, save your file, and upload your robots.txt file to the ROOT DIRECTORY OF YOUR SITE. Your robots.txt file must be in the root directory of your site to work.

Note: The robots.txt file assumes you are at the root. This is why all Disallow’s begin with / before the entry. Never put in the following:

Disallow: http://www.<example_site>/example_page.php

Now that your robots.txt file is in the root directory of your website, you are now ready to use the Google Remove URL Tool. Click on the link for the tool, and then click on the New removal request button. Now you can choose to remove individual web pages, specific directories, your entire site, or cached copies of your pages from Google. For now, choose Individual URLs and press next. Now, in the box, put in the URL of the page you want to remove. NOTE: Google has already put in the main address of your website, so only put in the URL minus the root directory. Below is an example:

If the page you want removed is the following URL:


http://www.testsite.com/example_page.php

Put in the following only:

example_page.php

Next, choose if the URL is a search result, or an image search result, then press the Add button. Do this now for all links you want removed, just remember to add all these pages you are removing to your robots.txt file, or it WILL fail.

Once done, press the Submit Removal Request button. Now your done, and in a day or two, assuming you did everything correctly, the links will be removed from Google.

Having a proper robots.txt file will reduce the need to do this, which will be covered in a future article.


avatar Remove Old Links from Google About the author:  Psychcomp is owned and maintained by Nathan Driskell, a Licensed Professional Counselor - Intern specializing in Internet Addiction and Asperger's Disorder. Nathan is also a Web Designer and Network Administrator. Contact Nathan at the following locations: dami...@psychcomp.com">E-Mail Twitter


Related Posts

  1. Example robots.txt for WordPress and PHPBB3
  2. Blocking Search Engine Bots
  3. Get Listed in Google Faster
  4. SEO Centro: Rank Checker
  5. How to Establish Yourself Online – Part 2: Domain Name and Web Host
  6. How to Establish Yourself Online – Part 5: WordPress SEO
  7. Why I Use Google Chrome
  8. How to Establish Yourself Online – Part 3: Setting up WordPress
  9. How to Establish Yourself Online – Part 4: Customizing WordPress
  10. PHPBB3 – Private Messaging Error

Tags: , , ,

2 Responses to “Remove Old Links from Google”

  1. Vote -1 Vote +1PaulNo Gravatar
    says:

    Hi Nathan

    Nicely written article. Its amazing how often webmasters need to use that Google service – especially when it comes to getting rid of pages with duplicate tags fast. I also noticed you massacred your email address so it wouldn’t get spidered by nasty people – have a look at this contact button tool – http://wikiworldbook.com/create-your-contact-button

  2. Thanks for the Link, Paul. I may consider using a button like that in the future. And yes, it is sad that Google index’s almost anything even though at times you do not want them to. I love how cached versions of pages can sometimes exist for months.

Leave a Reply

 

captcha service