The new format of news sitemaps
On 5th of November, 2009, Inbal from the Google News team announced that a new format is available and publishers are advised to update their news sitemaps.
Why a news sitemap is important?
News sitemaps, simply put, ensures that Google News will index all of the articles you publish if these articles meet the technical requirements. Even if you have digits in your URL it's a good idea to have a news sitemap because you have double insurance.
What's new in the updated news sitemap format?
Google's Sitemap team has came up with some genial ideas for the new format. If you're using the new format you can specify the access to your articles, a single article's genre, the article's title and stock tickers. The following list provides the full list of possible elements and their description (those marked with red are obligatory):
- publication - acts as a node for the "name" and "language" tag
- name - your site's label as it figures on Google News. To discover your publication label, you can do a site search on Google News; you can see your publication label below an article's title. Please note that you don't need to include anything from the publication label which is in parentheses. For example:
JohnDoe network (blog) - your publication label is: JohnDoe network
JohnDoe network (blog) - your publication label is: JohnDoe network
- language - the language the article has been written in. This is expressed with 2 or 3 letters picked from the ISO 639 List.
-
access - Optional: if your site's content can be accessed only via subscription or registration, you will have to specify in this tag this. Possible values are: "Subscription"/"Registration"
This is an optional tag and you have to use it if the access to your articles is not open. -
genre - Optional: if you think an article is rather an Op-ed or Satire, you can specify it in this tag. Possible values are: PressRelease, Satire, Blog, OpEd, Opinion, UserGenerated;
If the content is user generated, make sure this went through an editorial review before publishing. - publication_date - the exact date (and time) when the article was published; it should be expressed in standard W3C format. You can use either a simple date stamp (i.e. YYYY-MM-DD, 1970-01-01) or a more complex date and time stamp (i.e. YYYY-MM-DDThh:mm:ss , 1970-01-01T02:59:59 ). Optionally, you can include the timezone fraction as well: 1970-01-01T02:59:59Z-05:00
- title - it has to contain the title of your article, exactly as it appears on your site.
- keywords - Optional: a list of keywords which can be applied on the article
-
stock_tickers - Optional: a list of stock tickers of financial entities which are the main subject of the article
Other changes:
A news sitemap in the new format:
- should contain only articles published in the past 48 hours, that is two days
- it may contain up to 50,000 URLs (fifty thousand)
- you can specify your news sitemap's location in your robots.txt file and if you want, you can 'ping' the Webmaster Tools interface to notice Google a sitemap change
What will the new format solve?
The new 'title' tag will solve the incorrectly detected titles. It was somewhat common that if the HTML of a webpage confused the indexing algorithms and detected odd titles such us "Share this" or "by John Doe". If you supply the correct title in your news sitemap, this can be avoided or fixed.
Multilingual websites now can use a single news sitemap to submit all of their articles with the introduction of the 'language' tag from within the 'publication' node.
You can pick a genre on per-article basis for your articles which can avoid a whole site being labeled as "satire" or "blog". Specifying an "OpEd" genre for an article opens the door for your article to appear in the new Op-ed section of Google News.
If you're concerned that Google isn't downloading the sitemap when You want, you can initiate a ping-shot (a HTTP query to the Webmaster Tools) interface to let Google know about the sitemap change.
*This is a cached copy of the document