Understanding Search Engine Penalties

A friend of mine contacted me asking my opinion on why Google isn’t loving Celebrity Cowboy. Celebrity Cowboy is a celebrity blog that should be ranking well for a variety of terms is, for some reason, continually under-performing for its niche.

I told him that I would take a look at it, and while my speciality isn’t really search engines, I did notice a few things right off the bat. 数据挖掘研究院

Code

Positioning
One of the first things I noticed about the xhtml generated by the theme used at Celebrity Cowboy is that the blogroll is near the top of the page, with more than twenty items linking out to other sites. While this is only on the front page of the site now, it wasn’t always like this and could have lead to a black mark for the site.

数据挖掘研究院

Then there is the content, and then the list of internal links to each one of the more than two dozen categories. Could Google be penalizing the site for having so many outbound links at the top of the page of code, and so many links near the bottom? Could they see this as an attempt to effect search engine rankings by stuffing links in a site?

Things like this have happened before and Google has always been harsh on such things. The flip side though is that all of these links are relevant. Google doesn’t penalize for relevant links, do they?

With Google’s war against paid links, I would be surprised if a few sites got caught in the crossfire, and with these links being site-wide, Google may have mistaken them as paid links. 数据挖掘论坛

No doubt they would like sites to make sure to no-follow their blogrolls and other external links that aren’t part of the normal daily content, despite being relative.

数据挖掘研究院

Validation
The theme that Celebrity Cowboy is using doesn’t validate. Google has proved time and time again that if you don’t work hard on making your code valid, you can cause yourself to drop in the rankings, and even sometimes to be marked as a “bad” site.
数据挖掘研究院

Sometimes sites get listed on stopbadware.org just because their JavaScript doesn’t work correctly, or advertising doesn’t load properly. I have seen this happen to more than a few sites.

Fixing up as many validation issues as possible, could help remove the penalty placed on the site, as Google’s indexing bots might then be able to index the content more efficiently, and without error. 数据挖掘研究院

One of the things I first noticed was that there is an ID used more than once, something that probably doesn’t effect the Google search bots, but something that is not correct in xhtml. Classes should be used for repeating items, not ID’s.

Correcting such things should also improve how various browsers render the site, which could have the side effect of increasing traffic, page views, and even links to the blog.

Just Plain Strange
There was one more thing about the coding of the site that really got me scratching my head. It seems that the header image is displayed via CSS, and so rather than showing an image with the proper hyperlink code around it, the coder chose to use JavaScript to make the div that the header is shown thanks to, into a clickable item that uses location.href to bring the visitor back to the index page.
数据挖掘实验室

To me this seems like a very bad way to do this effect, and probably not one that Google looks highly on. 数据挖掘实验室

Content

One issue that Google has with many sites, especially celebrity sites is “thin content”. They constantly adjust rankings based on this issue. So many articles on Celebrity Cowboy have less than one hundred words, and this can make Google cranky.

数据挖掘交友

An example of a post that has really thin content is the George Clooney Reacts to Nicole Kidman’s Pregnancy post. There are less than two dozen words here, and an image. Surely the writer could give a few more points about both George Clooney, Nicole Kidman, and older actresses being pregnant.

I suggest increasing the number of articles that include over a one hundred words, reducing blockquotes from other sites, lists of external links, and other content which doesn’t increase the usefulness of Celebrity Cowboy.

数据挖掘论坛

There should be at least one hundred words of fresh, original content in as many articles as possible. 数据挖掘交友

This is made worse when you remove information around a single post. Remove images, external links, advertisements, repeated content like the popular articles list, and about text, and you are left with very little actual content for Google to sift through, with a very high amount of code. 数据挖掘实验室

Another way of reducing this thin content issue, on the front page, and each subsequent page and archive area is to increase the amount of stories shown per page, and while this might not be as important on the front page, it is definitely an issue on other pages, and archive areas where only summaries are shown.

There is a plug-in for WordPress that allows you to change how many posts are shown depending on where the user is on the site. I would suggest enacting this plug-in, and increasing each page to between fifteen and twenty-five posts. While this will make pages longer, it will mean more content per page for Google to see, and the increase in code and loading times should be negligible.

Comments
Another suggestion on how to help your blog would be to find a way to increase user participation in the form of comments. You could highlight the person who has commented the most, or feature the best comment of the week. The cost would be an outbound link, but the reward could be more comments, which can help keep a post fresh in the eyes of search engines, and contribute to the content indexed on the page.
数据挖掘研究院

I have actually ranked fairly high for a keyword that I didn’t write, but instead it was because a person commented and Google saw it.

数据挖掘实验室

Links

Just like the “thin content” penalty that Celebrity Cowboy may be getting, another issue might be the sheer number of links on each and every page. Also, in talking to the owner of the site, it seems that at one point they were both selling text link advertisements and promoting unrelated websites, neither of which Google looks very kindly on.

数据挖掘实验室

Content Scrapers

We all know what it means to deal with spam blogs, but it looks like Celebrity Cowboy has been targeted hard. A search on the popular search engines for specific titles show some very interesting results. It seems that Google has basically given in to content scrapers in this case, and Celebrity Cowboy is nowhere to be seen. 数据挖掘交友

Something that might help identify content scrapers is to make sure you use plugins for WordPress that allow you to put a copyright notice before the content only in RSS. Also, Feedburner should list people using your content for bad things. Then it is just a matter of “nicely” asking them to stop or using htaccess to block their server’s IP address from even seeing the feed.

数据挖掘研究院

A good example of what is happening in regards to the content scrapers is when you take an article, with a unique title and search for it on Google.

Celebrity Cowboy Search Results 数据挖掘实验室

You will see a the post that was made on Celebrity Cowboy about Constantine Maroulis showing up on a content scraper site, before Celebrity Cowboy, where it was originally written. This unfortunately seems to be happening on nearly every post. 数据挖掘实验室

Images

One of the things I noticed on a quick view of the code being generated is that on each image and embedded item in the content, there is extra style being added to remove padding, margins and borders. This is only beefing up the amount of code that Google sees, and contributing to the “thin content” problem I mentioned earlier.

数据挖掘工具

This is most likely being added thanks to the WYSIWYG editor built into WordPress.

Advertising

Recently, there was a big issue with sites selling text link advertisements, and Google took steps to persuade people to stop selling such links, and while Celebrity Cowboy no longer has paid text links on the site, Google may not have fully restored the site in their search engine rankings.

数据挖掘实验室

I haven’t heard of anyone else really having such issues, in fact, most people that saw their PageRank drop due to a penalty for selling links, didn’t notice any shift in their search engine results. 数据挖掘研究院

Context

When a site changes both in design and server, which Celebrity Cowboy has, I am sure that Google “raises and eyebrow”. Google, or at least the program that indexes our blogs probably asks itself, “has the site been sold, or changed so much that we should re-index it? Is it really the same site we have come to know and love, or should we put it in the sandbox for a while?”

数据挖掘论坛

Surely, with all the changes that Google has seen happen to the site, it isn’t going to carry on without giving it an extra dose of scrutiny. Unfortunately, if there was some extra weight behind the site in the past, it may have lost this through changing IP addresses. I haven’t heard of this happening before, but it doesn’t mean it isn’t a possibility.

数据挖掘交友

Google has done some strange things in the past, and Celebrity Cowboy isn’t the only example of something like this happening. You could be doing everything right for a long period of time, yet something changes with Google, and you are pushed so far down the rankings, that you almost have to start again with a new site. 数据挖掘交友

Steps Already Taken

Over on Celebrity Cowboy, they have been trying very hard to take every step they can think of to rectify their search engine problem, and here are just a few that were mentioned to me:

数据挖掘交友

  1. Changed the blogroll to only show up on the front page
  2. Show categories in the right column after the content
  3. Only show categories with at least 5 posts
  4. Removed all text link ads
  5. Removed all unrelated, self-owned cross promotion
  6. Cleaned up theme code to remove unnecessary tags
  7. Worked on building more links within the celebrity niche

It seems like many steps have been taken, but the search engines, especially Google, haven’t changed their view towards Celebrity Cowboy. Could it be a waiting game now, as the last major change was changing servers in the middle of November 2007? I doubt it. I am positive that there needs to be more work done on the site in order for the search engine penalty to be removed. 数据挖掘实验室

Suggestion Rundown

  • Get your code to validate
  • Increase content per post
  • Promote people that comment
  • Decrease outbound links
  • Fight content scrapers
  • Deal with garbage code being added to posts
  • Nofollow all links that are not in your content, or part of your own site

There is no reason why Google would prefer content scrapers over the original content providers, and I hope this is just an error on their part that they will eventually fix, but the reality is that their mistakes effect bloggers and business owners, and if they don’t get better at providing helpful, valuable, and correct search results, people will eventually move somewhere that does.

For Celebrity Cowboy, and other sites that have been similarly effected, I know this can be very frustrating, but the best thing you can do is complain loudly, and hope that the Internet backs you up in your fight to be properly considered in the major search engines. 数据挖掘工具

Hopefully, this will change soon and that the tips I have listed will help enough to at least be ranked higher than the scrapers. 数据挖掘论坛


[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:百度新闻后台逻辑流程分析
下一篇:A better way to find Internet images
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • Mercator: A Scalable, Extensible Web Cra
  • 什么是垂直搜索引擎(之二)
  • Writing a web crawler
  • 互联网搜索的未来
  • 国家版权局版权司副司长许超:关于搜索引擎
  • 百度数分钟内闪电裁员 企业软件事业部遭抛
  • 我对垂直搜索引擎的几点认识
  • Google Patent Filings by the Dozen
  • Manageability - Open Source Web Crawlers
  • 微软卡位第三代搜索技术 认为Google将很快
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • 谷歌宣布进军可替代能源 计划投资4.4万亿美
  • 搜索大战成Web 2.0操作系统之争
  • 7月美国搜索市场环比增长2% 雅虎微软成输家
  • 网页面向搜索引擎的搜索引擎优化
  • 史上最具技术创新的10大搜索引擎
  • Google如何预测下一届美国总统
  • 微软1亿美元收购语义搜索引擎Powerset
  • 很黄很暴力:人肉搜索引擎
  • OpenSocial只不过是Google公关骗局
  • 数据之美 百度GOOGLE统计的秘密
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静