You can find handy links to all the tools, the new blog, Google Group, help center and more from our newly launched Google webmaster central. Join us at the blog's new location at googlewebmastercentral.blogspot.com where you'll learn about all the exciting things we're planning. Oh, and please don't forget to update your bookmarks and feed!
- If you want to keep Google from indexing some pages of your site, you can use a robots.txt file, a simple text file that you place in the root of your site. We provide detailed information about creating a robots.txt file. You can also use the robots.txt analysis tool in Google Sitemaps to make sure that your file blocks and allows the pages you intend.
- If Google has indexed content that you don't want to appear in search results, you have two options:
- If you want to keep the content on your site, you can block that page with a robots.txt file and then request that we remove the page from our index;
- If you remove the content from your site (for instance, you decide that you revealed information that is too personal and you edit your blog post), you can request a cache removal.
- Good content keeps visitors coming back and compels other sites to link to you. In addition to blog posts, you can provide othe types of content, such as video. A number of video hosting sites are available including YouTube, Yahoo! Video, and our very own Google Video. At the conference, some attendees were asking about copyright, and to answer that question, you retain all rights to any content you upload to Google Video.
- If you're interested in running ads on your site, take a look at AdSense. The ads are related to what you're talking about that day on your blog -- and you can control what ads display.
These bloggers tend to already put into practice a lot of the things we tell site owners who ask how to get more of their context indexed and make it easier to find through Google searches. As group, they:
- Provide unique perspectives and content on topics
- Think about their visitors, making sure the sites meet their visitors' needs
- Give visitors reasons to come back, and other sites reasons to link to them (they update content regularly, and most offer in-depth information, such as tutorials, reviews, or in-depth explanations).
Given all of the dedication these bloggers put into their sites, they are of course interested in attracting visitors. Some want the joy of sharing; others are interested in making money from their writing. Here are a few tips to help make your site easier to find, whatever your motivation:
1) High-quality links from other sites help bring visitors to your site and can help your site get indexed and well-ranked in Google. The number one question asked at this conference was "What is your site?" People want to know so they can go read it, and if they like it, link to it. But the number one answer to this question was the name of the site, not the URL. Bloggers had T-shirts of their sites available, and many of those didn't have URLs. Tell people your URL so they can find you, read you, and link to you!
2) Make sure Google can crawl your site. We use an automated system (called "Googlebot") to visit pages on the web, determine their contents, and index them. Sometimes, Googlebot isn't able to view pages of a site, and therefore can't index those pages. There are two primary things you can do to check your site.
a. Read our webmaster guidelines to learn about how we crawl sites and what can make that easier.
b. Sign up for a Google Sitemaps account to see a list of errors we encountered when we tried to crawl your site. This way you can find out if we can't reach your site, and why. Once you sign up for a Google Sitemaps account and add your site URL, you need to verify site ownership before you can see diagnostic information. To do this, simply click the Verify link and follow the steps outlined on the Verify page.
You can verify site ownership in one of two ways:
- Upload an HTML file with a specific name to your site
- Add a <meta> tag to your site's home page
You can also see what words other sites use to link to you (which helps explain why your site might show up for searches that you think are unrelated to your site).
3) Submit an RSS feed of your site to quickly tell us about all of your pages, so we know to crawl and index them.
The webmaster help center is also now available in Polish. This includes all the content in the help center, including our webmaster guidelines. Simply choose "Polish" from the Change Language menu. We also have a Polish Google Group for discussing Sitemaps and other webmaster issues.
Also, as noted in our webmaster help center, you can combine parameters in the meta tag. So, for instance, you could use the following meta tag:
<meta="robots" content="noodp, noarchive">
One source we use to generate snippets is the Open Directory Project, or ODP. Some site owners want to be to able to request not using the ODP for generating snippets, and we're happy to let you all know we've added support for this. All you have to do is add a meta tag to your pages.
To direct all search engines that support the meta tag not to use ODP information for the page's description, use the following:
<META NAME="ROBOTS" CONTENT="NOODP">
Note that not all search engines may support this meta tag, so check with each for more information.
To direct Google specifically from using this information to describe a page, use the following:
<META NAME="GOOGLEBOT" CONTENT="NOODP">
For more information, visit the webmaster help center.
Once you add this meta tag to your pages, it may take some time for changes to your snippets to appear. Once we've recrawled your pages and refreshed our index, you should see updated snippets.
If you want your site to show up for country-restricted searches, make sure it uses a country-specific domain (such as www.example.com.br). If you use a domain that isn't country specific (such as .com), make sure that the IP address of the site is located in that country.
If you want to know what visitors from different countries are searching for, take a look at the query stats in Sitemaps. This lets you see the difference in searches for each location, as well as what languages visitors use to type in their queries.
As we mentioned last week, we've added a new option for verifying site ownership. This method requires that you place a specific <meta> tag in the source code of your home page. Many features are available only to site owners and we want as many webmasters as possible to have access. Most site owners who can't upload files or specify names for files should be able to use this new method to verify. For instance, if you use Blogger, you can verify using this option.
To verify using the <meta> tag, simply click the Verify link for your site, choose Add a meta tag as the verification option, and then copy the tag provided to the <head> section of your home page.
This tag looks like this:
<meta name="verify-v1" content="unique-string">You must place this meta tag:
- On the home page of your site (sometimes called the index page or root page).
- In the source code for that page.
- In the first <head> section of the page, before the first <body> section.
For instance, this is correct:
<html>The meta tag inside the style section is incorrect:
<meta name="verify-v1" content="unique-string">
<!-- style info here -->
<!-- body of the page here -->
<head>The meta tag inside the body section is incorrect:
<meta name="verify-v1" content="unique-string">
<html>The meta tag on a page with no head section is incorrect:
<meta name="verify-v1" content="unique-string">
</html>Below are some questions you might have about verification.
<meta name="verify-v1" content="unique-string">
I have a blog, and anyone can post comments. If they post this meta tag int a comment, can they can claim ownership of my site?
No. We look for this meta tag in the home page of your site, only in the first section and before the first . If your home page is editable (for instance, your site is a wiki-like site or a blog with comments or has a guest book), someone can add this meta tag to the editable section of your page, but cannot claim that they own it.
So, how do you generate these cryptic tags anyway?
The unique string is generated by base-64 encoding the SHA256 hash of a string that is composed of the email address of the proposed owner of the site (for instance, firstname.lastname@example.org) and the domain name of the site (for instance, example.com).
From this unique string, can someone determine my email address or identity?
Short answer, no. Long answer, we use a hashing scheme to compute the contents of the meta tag. Hashes cannot be "decrypted" back into the message.
Can the meta tag contents be cracked through a dictionary attack?
To reduce the risk of dictionary attacks, we use a random sequence of bytes (called salt) as a seed to the hash function. This makes dictionary attacks much more difficult.
Can someone determine if the same webmaster own multiple sites?
We use the domain name of your site (for instance, example.com ) to compute the unique string. Based on the contents of the tag, someone can determine if a webmaster owns different sites on the same domain, but not if the webmaster owns sites on different domains. For instance, someone can determine if the same webmaster owns http://www.example.com/ and http://subdomain.example.com/, but can't determine if the same webmaster owns http://www.example.com/ and http://www.google.com/.
What if my home page is not HTML content?
This method may not work for you. You must have a <head> section in order to be able to verify using this approach. Instead, try uploading a verification file to verify ownership of your site.
I've added my tag, but Google Sitemaps says it couldn't find it. Why?
Make sure your tag is in the first <head> section, and before the <body> section. Also ensure that it's not within the <style> tags or other specialized tags. The easiest way to make sure that the placement is one we can recognize is placing it right after your opening <head> tag, as follows:
<meta name="verify-v1" content="unique-string">
Increased crawl errors
Previously, we showed you up to 10 URLs for each error type. We now show all URLs we’ve had trouble crawling. We’ve also put 404 (not found) errors in a separate table from other HTTP errors.
Just choose an error type and either browse the table using the Next and Previous links or download the entire table as a CSV file.
Expanded query stats
Query stats show you the top 20 search queries that brought up your site in the Google search results (both when users clicked on your site in the results and when they didn’t), along with the average top position of your site for that query. Previously, you could view aggregate data across all properties and countries, as well as mobile-specific queries.
Now, you can view data for individual properties and countries as well. For instance, you can see the search queries from users searching Google Images in Germany that returned your site in the results. You’ll only see properties and countries for which your site has data.
Site owners can also view aggregate information for all properties and languages. Properties include Images, Froogle, Groups, Blog search, Base, and Local. More than 100 countries are available.
Previously, query stats were available for sites that were located at the top-level domain (for instance, http://www.example.com/). These stats are now also available for sites located in a subfolder (for instance, http://www.example.com/mysite/).
Increased number of common words
On the Page analysis page, we’ve expanded the list of words we show in the report of common words on your site and in external links to your site from 20 to 75 and we've removed http and www from the words we list.
Increased limit of sites and Sitemaps that can be added to an account
In response to requests, we’ve raised the number of sites and Sitemaps that site owners can add to a Google Sitemaps account from 200 to 500 — a direct result of a request from a Google Group member.
robots.txt analysis tool addition
Our robots.txt analysis tool is a great way to ensure that the robots.txt file on the site blocks and allows only what’s intended. We’ve added the ability to test against the new Adsbot-Google user agent, which crawls AdWords landing pages for quality evaluation. We only use this bot if you use Google AdWords to advertise your site. You can find out more about this user agent in the AdWords help center.
We want to know what you think
We are constantly looking to improve Google Sitemaps and appreciate the feedback we get from our Google Group, other places online, and at conferences. But we know that we don’t get to hear from everyone that way. And so, to gather more feedback, we’ve added a rating tool to each feature in Sitemaps. Tell us if you love the feature, would like us to improve it, or if you don’t find it useful. Simply click your choice beside each feature.
- Using a robots.txt file
- Understanding HTTP status codes
We've added a new section of help topics in the How Google crawls my site section. These topics include information on:
- How to create a robots.txt file
- Descriptions of each user-agent that Google uses
- How to use pattern matching
- How often we recrawl your robots.txt file (around once a day)
This section explains HTTP status codes that your server might return when we request a page of your site. We display HTTP status codes in several places in Google Sitemaps (such as on the robots.txt analysis page and on the crawl errors page) and some site owners have asked us to provide more information about what these mean.
Thanks for your participation and input!
Thanks for your feedback and patience.
We had a great time at Search Engine Watch Live Seattle last week, answering questions and getting feedback. We even got to meet one of our Google Group members! We wanted to share some of the questions we answered for those who couldn't be there.
When I do a link: search, the results don't include all the links to my site. How can I tell you about the other links?
A search using the link: operator returns only a sampling of pages that link to a site. It doesn't include the full list we know about. We find links to your site through our regular crawling mechanisms, so there's no need to tell us separately. Keep in mind that our algorithms can distinguish natural links from unnatural links.
Natural links are links to your site that develop as part of the dynamic nature of the web when other sites find your content valuable and think it would be helpful for their visitors. Unnatural links are links to your site placed there specifically to make your site look more popular to search engines. Some of these types of links are covered in our webmaster guidelines:
- Don't participate in link schemes designed to increase your site's ranking or PageRank. In particular, avoid links to web spammers or "bad neighborhoods" on the web, as your own ranking may be affected adversely by those links.
- Avoid "doorway" pages created just for search engines.
In general, linking to web spammers and "bad neighborhoods" can harm your site's indexing and ranking. And while links from these sites won't harm your site, they won't help your indexing or ranking. Only natural links add value and are helpful for indexing and ranking your site.
My site participates in an affiliate program. What tips can you provide?
Google's goal is to provide relevant and useful results to searchers. Make sure that your site provides unique content that adds value beyond an affiliate link or the content provided as part of the program. We talk about this in our webmaster guidelines as well:
- Avoid "cookie cutter" approaches such as affiliate programs with little or no original content.
- If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first.
Look at your site and determine what you can offer that will make searchers want to visit and what can distinguish it from other sites in the same affiliate program. And while we expanded on the information in our guidelines specifically because so many people asked us about their affiliate sites, this information is true for all sites. If there's no added value to users, then it's unlikely that search engines will find added value either.
Why do I have to add my Sitemap file to my Google Sitemaps account? Can't I just link to it from my site?
There are several reasons we ask you to add the Sitemap file. Here are a couple of them:
- If you aren't yet indexed, submitting a Sitemap file lets us know about your site--you can proactively tell us about it rather than wait for us to find it.
- When you add your Sitemap file to your Google Sitemaps account, we can let you know if the file has any errors, and then you can resubmit the file once you've fixed the errors.
This can happen when both versions of the domain (for instance, http://www.example.com/ and http://example.com/) point to the same physical location, and links to your site use both versions of the URL. To tell us which version you want the content indexed under, we recommend you do a 301 redirect from one version to the other. If your site runs on an Apache server, you can do this using an .htaccess file. You can also use a script. Do a Google search for [301 redirect] for more information on how to set this up for your site. Note that once you implement the 301 redirect, it may take some time for Googlebot to recrawl the pages, follow the redirects, and adjust the index.
If your pages are listed under both versions of the domain, don't use our URL removal tool to remove one version of the pages. Since the pages are at the same physical location for both versions of the domain, using the URL removal tool will remove both versions from the index.
We also suggest you link to other pages of your site using absolute, rather than relative, links with the version of the domain you want to be indexed under. For instance, from your home page, rather than link to products.html, link to http://www.example.com/products.html . And whenever possible, make sure that other sites are linking to you using the version of the domain name that you prefer.
- site: queries where you type in a trailing slash (such as site:www.example.com/)
- site: queries for a domain with punctuation (such as site: www.example-site.com)
We've got fixes for all of these rolling out in the next few days. They didn't come out sooner because we've been testing them thoroughly, making sure you don't get any unexpected surprises.
This bug doesn't involve any pages being dropped from the index. It's the site: operator that isn't working properly. We're freezing all refreshes of the supplemental results until these issues are fixed, and things should be back to normal in a few days. We'll keep you posted when all fixes have been made.
In the meantime, site: queries without the trailing slash may provide a better result (such as site:www.example.com). If you are checking your site using the Index Stats page of Google Sitemaps, note that it uses the trailing slash in the query, so you may see incorrect results until this bug is fixed.
Thanks for your patience as we resolve this issue.
A couple of weeks ago, we launched a robots.txt analysis tool. This tool gives you information about how Googlebot interprets your robots.txt file. You can read more about the robots.txt Robots Exclusion Standard, but we thought we'd answer some common questions here.
What is a robots.txt file?
A robots.txt file provides restrictions to search engine robots (known as "bots") that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.
Does my site need a robots.txt file?
Only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one).
Where should the robots.txt file be located?
The robots.txt file must reside in the root of the domain. A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain. For instance, http://www.example.com/robots.txt is a valid location. But, http://www.example.com/mysite/robots.txt is not. If you
don't have access to the root of a domain, you can restrict access using the Robots META tag.
How do I create a robots.txt file?
You can create this file in any text editor. It should be an ASCII-encoded text file, not an HTML file. The filename should be lowercase.
What should the syntax of my robots.txt file be?
The simplest robots.txt file uses two rules:
- User-Agent: the robot the following rule applies to
- Disallow: the pages you want to block
These two lines are considered a single entry in the file. You can include as many entries as you want. You can include multiple Disallow lines in one entry.>
A user-agent is a specific search engine robot. The Web Robots Database lists many common bots. You can set an entry to apply to a specific bot (by listing the name) or you can set it to apply to all bots (by listing an asterisk). An entry that applies to all bots looks like this:
The Disallow line lists the pages you want to block. You can list a specific URL or a pattern. The entry should begin with a forward slash (/).
- To block the entire site, use a forward slash.
- To block a directory, follow the directory name with a forward slash.
- To block a page, list the page.
URLs are case-sensitive. For instance, Disallow: /private_file.html would block http://www.example.com/private_file.html, but would allow http://www.example.com/Private_File.html.
How do I block Googlebot?
Google uses several user-agents. You can block access to any of them by including the bot name on the User-Agent line of an entry.
- Googlebot: crawl pages from our web index
- Googlebot-Mobile: crawls pages for our mobile index
- Googlebot-Image: crawls pages for our image index
- Mediapartners-Google: crawls pages to determine AdSense content (used only if you show AdSense ads on your site).
Can I allow pages?
Yes, Googlebot recognizes an extension to the robots.txt standard called Allow. This extension may not be recognized by all other search engine bots, so check with other search engines you're interested in to find out. The Allow line works exactly like the Disallow line. Simply list a directory or page you want to allow.
You may want to use Disallow and Allow together. For instance, to block access to all pages in a subdirectory except one, you could use the following entries:
Those entries would block all pages inside the folder1 directory except for myfile.html.
I don't want certain pages of my site to be indexed, but I want to show AdSense ads on those pages. Can I do that?
Yes, you can Disallow all bots other than Mediapartners-Google from those pages. This keeps the pages from being indexed, but lets Googlebot-MediaPartners bot analyze the pages to determine the ads to show. Googlebot-MediaPartners bot doesn't share pages
with the other Google user-agents. For instance, you could use the following entries:a
I don't want to list every file that I want to block. Can I use pattern matching?
Yes, Googlebot interprets some pattern matching. This is an extension of the standard, so not all bots may follow it.
Matching a sequence of characters
You can use an asterisk (*) to match a sequence of characters. For instance, to block access to all subdirectories that begin with private, you could use the following entry:
How can I make sure that my file blocks and allows what I want it to?
You can use our robots.txt analysis tool to:
- Check specific URLs to see if your robots.txt file allows or blocks them.
- See if Googlebot had trouble parsing any lines in your robots.txt file.
- Test changes to your robots.txt file.
Also, if you don't currently use a robots.txt file, you can create one and then test it with the tool before you upload it to your site.
If I change my robots.txt file or upload a new one, how soon will it take affect?
We generally download robots.txt files about once a day. You can see the last time we downloaded your file by accessing the robots.txt tab in your Sitemaps account and checking the Last downloaded date and time.