torrentz.eu之类的网站如何收集其内容？ [英] How do websites like torrentz.eu collect their content?

查看：87 发布时间：2020/9/24 7:06:42 search web search-engine business-intelligence

本文介绍了torrentz.eu之类的网站如何收集其内容？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道一些搜索网站如何获得其内容。
我在标题中使用了 torrentz.eu示例，因为它包含来自多个来源的内容。
我想知道这个系统的背后；他们只是解析他们支持的所有网站，然后显示内容吗？还是使用某些Web服务？还是两者都用？

I would like to know how some search website get their content. I have used in the title the example of 'torrentz.eu' because it has content from several sources. I would like to know what is behind this system; do they 'simply' parse all the website they support and then show the content? Or using some web service? Or both?

推荐答案

您正在寻找信息检索的Web_crawler rel = nofollow noreferrer>爬网。

You are looking for the Crawling aspect of Information Retrieval.

基本爬网是：假设初始设置的网站 S 个网站，请尝试通过探索链接进行扩展（查找传递闭包 ¹）。

Basically crawling is: Given an initial set S of websites, try to expand it by exploring the links (Find the transitive closure¹).

某些网站还使用了重点爬虫，如果他们尝试从一开始就只索引一部分网络。

Some web sites also used focused crawlers, if they try to index only a subset of the web from the first place.

PS某些网站既不使用，也不使用 Google自定义搜索API /提供的服务雅虎老板 / 必应开发API API（当然是收费的），并使用它们的索引，而不是自己创建一个。

P.S. Some website do neither, and use the service provided by Google Custom Search API/Yahoo Boss/Bing Deveoper APIs (for a fee, of course), and use their index, instead of creating one by their own.

PPS这提供了一种理论上的方法，我不知道所提到的网站是如何工作的。

P.P.S This is providing a theoretic approach how one can do it, I have no idea how the mentioned website actually works.

（1）由于时间问题，通常无法找到传递闭包，但足够接近它。

(1) Due to time issues, the transitive closure is usually not found, but something close enough to it.

这篇关于torrentz.eu之类的网站如何收集其内容？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

torrentz.eu之类的网站如何收集其内容？ [英] How do websites like torrentz.eu collect their content?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

torrentz.eu之类的网站如何收集其内容？ [英] How do websites like torrentz.eu collect their content?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭