对于抓取网页或调用的API(尤其是iTunes的)最快的服务? [英] Fastest service for crawling web pages or invoking APIs (iTunes in particular)?

查看:331
本文介绍了对于抓取网页或调用的API(尤其是iTunes的)最快的服务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们需要下载的元数据为所有iOS应用每天的基础上。我们计划通过抓取iTunes网站,并使用iTunes搜索API提取信息。既然有700K +的应用程序,我们需要一个有效的方式来做到这一点。

We need to download metadata for all iOS apps on a daily basis. We plan on extracting the information by crawling the iTunes website and by using the iTunes search API. Since there are 700K+ apps, we need an efficient way to do this.

一种方法是设置了一堆在EC2上的脚本,并在并行运行。之前,我们走上了这条路,有没有像80legs服务,人们已经用于完成类似的任务?从本质上讲,我们想要的东西来帮助我们抓取数十万页(或作一堆API调用)非常快的。

One approach is to set up a bunch of scripts on EC2 and run them in parallel. Before we embark down this path, are there services like 80legs that people have used to accomplish a similar task? Essentially, we want something to help us crawl hundreds of thousands of pages (or make a bunch of API calls) very fast.

推荐答案

您可能想看看苹果的<一个href=\"http://www.apple.com/itunes/affiliates/resources/documentation/itunes-enterprise-partner-feed.html\"相对=nofollow>企业合作伙伴饲料(EPF)。它可能会比得到一堆EC2的机器或建立爬行基础设施刮数据更便宜。从EFP描述本身:

You might want to look into Apple's Enterprise Partner Feed (EPF). It will probably be much cheaper than getting a bunch of EC2 machines or building up the crawling infrastructure to scrape the data. From the EFP description itself:

该企业合作伙伴饲料是从iTunes和App Store完整的元数据集的数据饲料。 它可用于子公司的合作伙伴,以完全融入iTunes和App Store的目录方面进入一个网站或应用程序。

The Enterprise Partner Feed is a data feed of the complete set of metadata from iTunes and the App Store. It is available for affiliate partners to fully incorporate aspects of the iTunes and App Store catalogs into a web site or app.

EPF有两种饲料的模式

EPF has two feed modes

iTunes的模式有两种生成EPF数据:

iTunes generates the EPF data in two modes:

全模式结果
     增量模式

full mode
incremental mode

完整的出口每周生成,其中包含iTunes的元数据的完整快照作为一代人的日子。增量导出每日生成,其中包含自上次完全导出已添加或修改的记录。增量出口位于相对完整的出口上,他们为主。

The full export is generated weekly and contains a complete snapshot of iTunes metadata as of the day of generation. The incremental export is generated daily and contains records that have been added or modified since the last full export. The incremental exports are located relative to the full export on which they are based.

显然,你会使用完整的模式时,要填充数据,那么你可以使用增量一个为每日更新。

Obviously, you'd use the full mode when you want to populate your data, then you would use the incremental one for the daily updates.

祝你好运。

这篇关于对于抓取网页或调用的API(尤其是iTunes的)最快的服务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆