Is Google News out on the hunt for stolen content?

News sites have complained openly about content theft taking place on the web. And many signs lead to the fact that this digtal robbery may be held partially responsible for the decrease of news organizations’ publicity revenue. From the thieves´ point of view it´s an excellent strategy. Simply copy and paste content generated by another organization onto your free WordPress or Blogger site and wait for Adsense to place Google Adds. From then on it´s all fun, right

In November 16th 2010 Google unveiled a couple of meta-tags for use by news organization when syndicating their content across a web of affiliated sites. Meta-tags are HTML-elements, which describe to search engines the meaning of some elements and to set some of a web site’s configurations.

Both meta-tags, syndication-source and original-source, shall help Google to rank and to identify original sources when it comes to organize the SERP (search engine´s result page) at Google News.

What do they do

Syndication-source meta-tag informs the search engine´s crawler about the main page for a certain kind of content or keyword. So when, The Los Angeles Times published “72 charged in online child pornography ring”, The Chicago Tribune, part of the Tribune Gruoup, which runs both news-sites has this Meta-Tag set into it’s source code: <meta name=”syndication-source” content=”http://www.latimes.com/news/nationworld/nation/la-na-child-porn-20110804,0,3195254.story” />, when replicating LATimes first published story.

Yet LATimes had this same Meta-Tag in its own source, when what they ought to have done was setting the orignal-source, claiming ownership over that particular piece of content.

Many news sites do have a NFL portal which produces dozens of stories every day. Each and every page related to this subject shopud carry a rel=cannonical could have a syndication-source pointing to the main NFL portal. It´s like saying to Google: hey, this page is the most important for this subject, please consider showing it higher than the other pages of my site”. It´s a Google News only meta-tag which replaces the traditional “rel=canonical” tag, although I´m not sure if this meta-tag also transfers juice from one page to another.

Original-source equals to the page yelling out at Google stating to be the original creator of such content.

Both meta-tags should be placed within the <Head> section of a HTML file and may point to the page they´re on. However, a syndication-source meta-tag is really useful when indicating another URL. If used together on one same URL, Google´s robot will decide which is to be considered.

At the support page within Google, it informs that for now, the original-source Meta-tag is only being evaluated and is not having any effect on the Google News web site´s ranking. Makes sense, since claiming the ownership on some content in the web is complicated, so original-source pointing to a file itself may as well be seen as a suggestion, to be confirmed if a cross-site original-source meta-tag is spotted by Google News robot.

How about using syndicate-source to point at the news agencies

Doesn´t really make sense, since those organizations should have no interest in ranking higher than the sites they provide wires to, and which they are paid for. However, news agencies also produce free content, displayed publicly on their web sites. Whenever those sites are the source for an story and the news organization decides to publish it the way it is, it would be reasonable to refer to them using the syndication-source.

Hard news and the original-source meta-tag

Unless it is a long article with lots of information on it, publishers will hardly benefit from placing the original-source Meta-tag in hard-news stories. Mainly because hard 200-words-long breaking news will carry many similar information. Thanks to LSI (Latent Semantic Indexing) a mathematical formula used to retrieve and compare content, changing a few words in the article is no suitable way out of this. By using LSI Google relates associated keywords within a story when it comes to define how relevant that content really is. Such a procedure enables Google to prevent webmasters and publishers to commit the crime of keyword stuffing, on the other hand, it blinds the search engine when several news sites publish just broken news, claiming to be the real owner (even if they are).

Therefore my guess is that it will be useful to place the original-source meta-tag in book and film reviews. Authorial articles and opinions are also subject to hold such tags in their code, for being of specific nature and certainly hold exclusive and relevant content.

Can it be used retroactively?

Let´s say you decide to use this set of meta-tags for older stories. If it´s your original content I surely would consider that. Only problem is to convince the crawler to re-crawl those URLs. News sites produce an immense amount of pages every day and it´s widely known that news organizations should produce bookmark-ready (won´t ever change) URLs for every story, because once the crawler has read the content of an URL it rarely takes on the same URL again.

An experiment one can do to promote a re-crawl, could be displaying the URL on the homepage (which is indexed and crawled several times a day) and wait for a new reading on that specific URL by the crawler. Since Google News does not display the URL´s cached version, it might be necessary to change the content by adding a single line or a very specific keyword to any part of the text when verifying if the file has been crawled once again.

On Google News only and why it may be a good idea using them

Unlike the universal organic search results page, at Google News you’ll haredly find endless copies of the same article. But there is a locally driven ranking factor for the news clusters at GN. SO The Chicago Tribune using the syndication-source, may very well be a way telling the news crawler about you beeing a trustworthy channel for that contente when search are made in that state.

Undeniable such meta-tags will help to identify the real owner of news content, but considering that many blogs copy and paste text from news sites and display it on their personal blog which do not come up on queries made at Google News´ sites, it is limited to prevent any cross-news-site content theft.

Ideally, publishers should use both meta-tags in and outbound. It would be a public (logical) display of ethics and may help to enhance the news site´s reputation among the crawl, spider and robot´s community.For further information on these meta-tags, please refer to the Google help page .

Klaus Junginger is a reporter for IDGNOW! Brazil and covers mostly search-related themes. Follow Klaus on Twitter at @computerklaus. Klaus´ e-mail address is klaus.junginger@nowdigital.com.br

One Response to Is Google News out on the hunt for stolen content?

  1. Pingback: Search and Online-News: Barry Adams Speaks his Mind | SEO & Jornalismo Online

Deixe um Comentário

O seu endereço de email não será publicado Campos obrigatórios são marcados *

*

Você pode usar estas tags e atributos de HTML: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>