If you suspect that your website text content is being duplicated on another site, you should do a comparison search. Duplicate content does appear on the internet, and there's an estimate that 29% of content on the web is a duplicate.
Duplicate content can impact search engine ranking of a website. SEOs and webmasters should read Google's guidelines on duplicate content. In case a large size of content is found to be duplicate Google may blacklist the site. Therefore you must be careful as the last thing any website master wants is for Google to penalize the site for uploading duplicate content.
Simply defined, duplicate content is text content that appears on more than one website across the internet. Duplicate content can appear on the same site, but when it appears on multiple sites, it creates a problem for search engines. The three most important issues faced by search engines when they encounter duplicate content are:
- They don’t know which version to include or exclude in indexing
- They don’t know whether to direct the links to one or separate pages
- They don’t know which website the owner is of the original content.
The problems encountered due to duplicate text appearing on websites for the site owners are:
- Search engines use the most authentic site to show the results which cause other sites to lose visibility
- Link equity dilutes as each copy of the duplicate content can be linking to a different site. Inbound links to content can impact the impact on search engines.
In some cases, content writers, website masters, and SEOs don’t check content for plagiarism. It can cause duplicate content to appear. Therefore it's important for website owners and masters to ensure that they do not upload any plagiarized content. There are several plagiarism tools available on the internet which you as a site master must use to ensure the integrity of the content on your website.
Another reason for duplicate content occurring on sites is: When a new site goes live and copies its home page text into press releases and puts it out on wire services. Alarm bells will ring in Google, and the site will be blacklisted.
When a major breaking news event occurs, reporters on TV and other media covering the news event will immediately release it on their sites. Now if it’s a government or government agencies issue a press release the same content will be uploaded on multiple sites. However, search engines are smart and don't penalize such actions.
Scrappers and bloggers consistently surf the internet for content, and wherever they happen to find suitable material, they will copy/paste it. It's up to the search engines to identify and punish such sites. Google and other search engines do that when they come across duplicate content. They then examine the timestamp of the content upload.
URL parameters and variations of it can cause duplicate content occurrence. This problem can be caused by the order in which parameters appear. For example:
- www.widgets.com/red-widgets?color=red is a duplicate of www.widgets.com/red-widgets
- www.widgets.com/red-widgets?color=red&cat=3 is a duplicate of www.widgets.com/red-widgets?cat=3&color=red
Duplicate content can be created when URLs assign different session IDs. Another reason by which duplicate content gets created when content is printer friendly. Multiple versions of the content get indexed.
If you are referring to your website as www.mysite.com within all the content pages and external links of your site then avoid referring to it as mysite.com. It can also cause duplication of content.
The first step to take when you suspect that content from your website has been copied is to run searchenginereports.net comparison check. You can do this if you know the URL on which your content has been copied. After running the test compare search results. If the massive content has been duplicated, then you are aware that a problem exists. You can also run Google search comparison tool or Google compare search results.
- Maintain consistency when creating all internal links on the website. For example, all links should go to http://www.mysite.com and not to http://mysite.com.
- Make sure syndicating content sites link back to the same original website and not variations of it.
- As an extra safeguard, you can add a self-referential rel=canonical link to your original content.
Scrappers cannot hurt your website. A website with no visitors no original writing doesn't confuse Google. In the rare case that Google does bet confused, you can use the Google Scraper Report Tool and inform them. They will fix the problem.
Digitally sign your content: In case you fear that your content will be duplicated without your consent, you can digitally sign your content. You can use Google Authorship to do this. When you use this every content that you have authored will be credited to you.
You can take harsh action against plagiarists, but this would require you to involve lawyers. Therefore before opting to take severe measures, make sure you need to get involved in a costly lawsuit.
Remember Google pays regular visits to websites. Therefore if it comes across some content on a website that was published earlier on another website. It will ignore the latest duplicate version and move on. Don’t forget Google employs a huge number of math PhDs who manage all these issues that can occur. Duplicate content started in 1997 and reached its pinnacle in 2005. It had subsided since then when search engine started penalizing such sites.
Duplicate content has a very long history, and Google and other search engines are well aware of it and how to deal with it. Therefore if you discover duplicate content when you run compare search do not get over worried. Take the precautions that have been discussed earlier. Also, avoid uploading duplicate content on your website. If you are uploading any guest content, make sure you give credit to the original author of the content.