一些带有虚假链接的网站如何显示在搜索引擎的结果中 [英] How some site with fake links show up in Search Engine's results

查看：35 发布时间：2021/7/17 20:05:28 search-engine

本文介绍了一些带有虚假链接的网站如何显示在搜索引擎的结果中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近，我遇到了几个 Google 搜索结果，其中包含的网站链接与我的搜索词完全匹配.网站怎么可能动态地改变他们的内容，或者更确切地说，他们是如何欺骗谷歌为我的关键字索引他们的页面的.我读过关于内容农场的内容，但这似乎不是一个正确的答案.有人可以让我知道这种技术叫什么吗?我会努力了解更多.

解决方案

我的理解是，进入 Google 或任何其他索引引擎的唯一方法是让机器人实际抓取您的网站并生成结果.显然，Google 可以抓取动态网站:

http://googlewebmastercentral.blogspot.com/2008/09/dynamic-urls-vs-static-urls.html

不过，我认为就您的问题而言，这是一种进化而非革命性的变化.

我认为幕后发生的是这些事情的结合:

内容索引
准备好的索引
用户提交的内容
推荐人搜索更新

我将尝试在一个销售音乐的虚构网站上逐一解释这些内容 - 您有很多示例可以比较体验.它当然会在 example.com 域上.

内容索引

显然，作为一个想要提供一些东西的网站，您实际上拥有一些内容.通常，您以某种方式对这些内容进行分组.假设我们的音乐网站可以按不同类别对内容进行分组:

作者
音乐类型
用户提交
内容分级

每一个都可以抽象地表示为一个标签.例如，我们的网站可以选择使用 example.com/tags/eagles 来代表 Eagles 或使用 example.com/tags/rock 来代表所有摇滚乐队.Google 可以将这些内容编入索引，因此任何潜在的搜索都可以生成指向我们网站的链接.

准备好的索引

Prepared index 类似，但是是通用索引而不是真实内容.这可以通过多种方式准备，例如:

拿一本字典并添加所有单词
从网络上抓取数百万个页面(可能使用搜索引擎提供的链接！)并从中获取经常重复的短语
从免费论坛中获取内容
使用维基百科
从免费提供的书籍中获取文本，例如来自 Project Gutenberg

例如，我们的网站会从以任何方式与音乐相关的文本中获取任何单词，并制作与之前的标签类似的标签.例如.只需爬取维基百科上的摇滚音乐页面，您就可以获得很多标签.>

用户提交的内容

这通常是在您的网站启动并运行之后出现的.假设我们在我们的网站上放置了一个搜索框，然后用户进入并输入摇滚音乐".Doh，我们已经知道了，所以从搜索中没有什么好处.但是，假设我们查看了我们的 Web 服务器日志并看到了一些对 langeleik 的搜索.现在，这将是我们之前可能没有索引的东西.很酷，刚刚在我们的网站上生成了另一个标签.

显然，Google 不知道这一点 - 因此我们在站点地图，它在另一次 Googlebot 抓取后出现.当用户在 Google 上搜索langeleik"时，其中一个链接可能是指向 example.com/tags/langeleik 的链接.

还有其他可能更有价值的用户输入形式 - 评论、论坛帖子等.因此，有许多通用论坛除了托管论坛外没有其他用途.这是一个很好的数据源，您可以免费获得新内容.

最后，所有这些都应该转到您的站点站点地图.您可以拥有庞大的站点地图，请参阅:

https://webmasters.stackexchange.com/questions/26964/google-sitemap-for-dynamic-url-structure

`结论 - 这是一种错觉`

我认为所有这些的组合是一个非常丰富的站点地图构建源.使用上述技术，您可以非常轻松地生成数百万个唯一标签.因此，您输入的任何内容"都可以在 example.com/tags 上找到.

但是，您必须注意，这只是一种错觉.例如，如果您搜索ertfghedctgb"(很容易在普通 QWERTY 键盘上输入 - ert + fgh + edc + tgb)，您很可能不会从 Google 获得任何信息(我目前没有).对于任何人来说，将它放在他们的站点地图中是不够常见的(或者对于搜索引擎索引它来说不够常见).

These days I come across several Google search results that contain sites with links that exactly match my search words. How is it possible for the sites to dynamically change their content or rather how are they fooling google into indexing their page for my keyword. I've read about content farms but that doesn't seem to be a right answer. Can someone let me know what this technique is called? I'll try to understand more about it.

解决方案

My understanding is that the only way to get on Google or any other indexing engine is to have the robot actually crawl your site and generate results. Obviously, Google can crawl dynamic sites:




http://googlewebmastercentral.blogspot.com/2008/09/dynamic-urls-vs-static-urls.html


however I find this to be an evolutionary rather then revolutionary change with regard to your question.

What I think is happening behind the scenes is the combination of these things:


Content index
Prepared index
User submitted content
Referrer search updates


I'll try to explain each of these on a fictional site that sells music - you have plenty of examples to compare the experience. It will of course be on example.com domain.

Content index

Obviously, as a site that wants to offer something, you actually have some content. Usually, you group this contents somehow. Let's assume our music site can group content by different categories:


Author
Music genre
User submitted
Content ratings


Each of these can be represented abstractly as a tag. For example, our site could choose to have example.com/tags/eagles to represent Eagles or example.com/tags/rock to represent all rock bands. Google would be able to index these, so any potential search could yield a link to our site.

Prepared index

Prepared index is similar, but is a generic index instead of real content. This can be prepared in several ways, such as:


Take a dictionary and add all words
Crawl a few million pages from the Web (possibly using links provided by search engines!) and get often repeated phrases from there
Grab content from free forums
Use Wikipeda
Get text from freely available books, such as those from Project Gutenberg


Our site would, for example, get any words from texts that are related to music in any way and make tags similar to the previous ones. E.g. just by crawling the Rock music page on Wikipedia, you can get a lot of tags.

User submitted content

This is something that usually comes after your site is up and running. Let's say that we put a search box on our site and then users come in and type "rock music". Doh, we already knew that, so nothing good from that search. However, let's say we go throughout our Web server logs and see some searches for langeleik. Now, that would be something we might not have indexed before. Cool, just generated another tag on our site. 

Obviously, Google doesn't know that - so we create an entry in our sitemap and it's there after another Googlebot crawl. When an user searches on Google for "langeleik", one of the links might be a link to example.com/tags/langeleik.

There are other and possibly far more valuable forms of user input - comments, forum posts, etc. Hence the reason there are many generic forums that have no other purpose except hosting forums. It's a great data source and you get new content for free.

At the end, all this should go to your site sitemap. You can have huge sitemaps, see this:


https://webmasters.stackexchange.com/questions/26964/google-sitemap-for-dynamic-url-structure


Referrals

The last thing is referrals. Again after your site is up and running, some of the Google searches will come directly to you. That's when you can take advantage of the HTTP Referer header (yes, it's a misspelling - check it out on Wikipedia), see this:


Is it possible to capture search term from Google search?


Note that Google search is both:


Incomplete
Fuzzy


Thus, you can search for "langeleik" above, but some of the links have the title of e.g. "Langeleik and Harpe". Nothing unusual, but note also the reverse - if you search for "langeleik and harpe", it will not only find all pages with both terms, but also pages with one or another. If our we know for harpe, but not for langeleik, and somebody searches for "langeleik and harpe", we will get through HTTP Referer header a q paramter such as q=langeleik+harpe. Cool - just got another word to add to our sitemap, if we want.

As for fuzziness, note that when you search for "eagles", you can get everything from birds through NFL teams to a rock band. Thus, even though we are a music site, we might expand our horizon (if desired) to latest NFL news - something totally unrelated and very useful for some sites.

Conclusion - it's an illusion

I consider the combination of all these a very rich sitemap building source. You can very easily generate millions of unique tags using the above techniques. Thus, "anything" you type will be found on example.com/tags. 

However, you have to note that this is just an illusion. For example, if you search for "ertfghedctgb" (easily typed on regular QWERTY keyboard - ert + fgh + edc + tgb), you will most likely not get anything from Google (I do not currently). It just was not common enough for anybody to put this in their sitemaps (or not common enough for search engines to index it).

                        这篇关于一些带有虚假链接的网站如何显示在搜索引擎的结果中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

一些带有虚假链接的网站如何显示在搜索引擎的结果中 [英] How some site with fake links show up in Search Engine's results

问题描述

内容索引

准备好的索引

用户提交的内容

推荐

`结论 - 这是一种错觉`

Content index

Prepared index

User submitted content

Referrals

Conclusion - it's an illusion

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

一些带有虚假链接的网站如何显示在搜索引擎的结果中 [英] How some site with fake links show up in Search Engine&#39;s results

问题描述

内容索引

准备好的索引

用户提交的内容

推荐

结论 - 这是一种错觉

Content index

Prepared index

User submitted content

Referrals

Conclusion - it's an illusion

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

一些带有虚假链接的网站如何显示在搜索引擎的结果中 [英] How some site with fake links show up in Search Engine's results

`结论 - 这是一种错觉`

登录关闭