网页不会被谷歌索引 [英] pages not indexed by Google

查看:189
本文介绍了网页不会被谷歌索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我公司对我们的网站运行谷歌搜​​索索引的所有网页,因为据我所知。我已经开发了也被收录谷歌文件系统。在该系统的页面是动态生成的,所以我有www.mysite.com/doc.aspx?id=234,www.mysite.com/doc.aspx?id=236,等它被索引。问题是,一些随机的网页(比如,www.mysite.com/doc.aspx?id=235)不收录一些未知的原因。我在哪里可以看看有没有这个决心?任何想法?

My company have Google Search running on our sites indexing all pages, as far as I know. I've developed a document system that is also being indexed by Google. The pages in the system are dynamically generated, so I have www.mysite.com/doc.aspx?id=234, www.mysite.com/doc.aspx?id=236, etc which are indexed. The thing is that some random pages (say, www.mysite.com/doc.aspx?id=235) are not indexed for some unknown reason. Where do I look to have this resolved? Any ideas?

推荐答案

下面是谷歌如何处理您的网站(S)很短,很simpliefied大纲

here is a short and very simpliefied outline on how google processes your site(s)

discovery -> crawling -> indexing -> ranking (->feedback)

发现: 是谷歌发现你的网站(S)的网页的过程中,这种设计可以通过HTML或通过sitemap.xml的链接(和URL中的onpage的JavaScript,RSS完成或Atom源,...基本上任何URL谷歌可以找个地方)

discovery: is the process of google discovering the pages of your site(s), this can either be done via links in html or via an sitemap.xml (and urls in onpage javascript, rss or atom feeds, ... basically any url google can find somewhere)

爬行: 的谷歌获取一个发现URL的内容(和推新发现的网址进去发现队列)的过程

crawling: the process of google fetching the content of a discovered url (and pushing newly found URLs into the discovery queue)

索引: 存储发现并抓取内容到他们的数据库,并使其可搜索

indexing: storing the discovered and crawled content into their database and making it searchable

排名: 匹配索引内容与用户查询和 - 如果它够重要 - 返回它作为一个可见的SERP上市给用户

ranking: matching the indexed content with a user query and - if it is important enough - return it as a visible SERP listing to the user.

反馈 根据该喀/无点击行为和其他来源(presumed ISDN数据和谷歌工具栏,铬浏览器报告,...)谷歌收集有关用户行为上是SERP反馈收集的数据(和之后点击)

feedback based on the click/no-click behavior and data collected from other sources (presumed ISDN data and google toolbar, chrome browser reports, ...) google gathers feedback about the user behavior on it's serp (and after the click).

    每一步之间的
  • 在很多 的质量度量(最后一步是 只是一个质量度量集合 步骤)。
  • 在每一步报告回 在previous步骤。
  • between each and every step are a lot of quality metrics (the last step is just a quality metric collection step).
  • each and every step reports back to the previous steps.

即使你沟通你所有的网址给Google(即通过sitemap.xml的)谷歌并不一定会抓取所有网址或索引,或者对他们进行排名可见,所以基本上是这样。

so basically even if you communicate all your urls to google (i.e. via sitemap.xml) google will not necessarily crawl all of your urls or index or rank them visible.

好了,那么什么是低挂fruites ,以获得更多的网页进入指数(他们至少有机会排名东西)?

ok, so what are the low hanging fruites to get more pages into the index (where they at least have a chance to rank for something)?

  • 在每页的沟通只有一个URL(使用HTTP 301重定向,规范标签和清理网页上的所有链接)
  • 使您的网站更快(巨大的冲击力)
  • 使其更轻KB聪明(漂亮的影响,主要是因为它的速度更快,太)
  • 把你的网页更独特的内容。
  • prevent重复的内容
  • 在外部(从其他网站)链接到您的网页(不是总数是重要的,但随着时间的推移稳步增长)

PS:正如一个侧面说明 - 爬行一步是可选的。即使未抓取网址(即,如果他们被通过的robots.txt封锁)可以被索引(及排名) - 但同时,这不是很常见。

p.s.: just as a side-note - the crawling step is optional. even uncrawled urls (i.e. if they were blocked via robots.txt) can get indexed (and rank) - but well that's not very common

这篇关于网页不会被谷歌索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆