用于抓取网页或调用 API(尤其是 iTunes)的最快服务? [英] Fastest service for crawling web pages or invoking APIs (iTunes in particular)?

查看:25
本文介绍了用于抓取网页或调用 API(尤其是 iTunes)的最快服务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们需要每天为所有 iOS 应用下载元数据.我们计划通过抓取 iTunes 网站和使用 iTunes 搜索 API 来提取信息.由于有 70 万多个应用程序,我们需要一种有效的方法来做到这一点.

We need to download metadata for all iOS apps on a daily basis. We plan on extracting the information by crawling the iTunes website and by using the iTunes search API. Since there are 700K+ apps, we need an efficient way to do this.

一种方法是在 EC2 上设置一堆脚本并并行运行它们.在我们走上这条道路之前,是否有像 80legs 这样的服务供人们用来完成类似的任务?本质上,我们想要一些东西来帮助我们非常快速地抓取数十万个页面(或进行大量 API 调用).

One approach is to set up a bunch of scripts on EC2 and run them in parallel. Before we embark down this path, are there services like 80legs that people have used to accomplish a similar task? Essentially, we want something to help us crawl hundreds of thousands of pages (or make a bunch of API calls) very fast.

推荐答案

您可能想要查看 Apple 的 企业合作伙伴供稿 (EPF).它可能比获得一堆 EC2 机器或构建爬行基础设施来抓取数据便宜很多.来自 EFP 描述本身:

You might want to look into Apple's Enterprise Partner Feed (EPF). It will probably be much cheaper than getting a bunch of EC2 machines or building up the crawling infrastructure to scrape the data. From the EFP description itself:

企业合作伙伴供稿是来自 iTunes 和 App Store 的完整元数据集的数据供稿.附属合作伙伴可以将 iTunes 和 App Store 目录的各个方面完全整合到网站或应用程序中.

The Enterprise Partner Feed is a data feed of the complete set of metadata from iTunes and the App Store. It is available for affiliate partners to fully incorporate aspects of the iTunes and App Store catalogs into a web site or app.

EPF 有两种供稿模式

EPF has two feed modes

iTunes 以两种模式生成 EPF 数据:

iTunes generates the EPF data in two modes:

完整模式
增量模式

full mode
incremental mode

完整导出每周生成一次,包含截至生成当天的 iTunes 元数据的完整快照.增量导出每天生成,包含自上次完全导出以来添加或修改的记录.增量导出相对于它们所基于的完整导出进行定位.

The full export is generated weekly and contains a complete snapshot of iTunes metadata as of the day of generation. The incremental export is generated daily and contains records that have been added or modified since the last full export. The incremental exports are located relative to the full export on which they are based.

显然,当您想要填充数据时,您将使用完整模式,然后您将使用增量模式进行每日更新.

Obviously, you'd use the full mode when you want to populate your data, then you would use the incremental one for the daily updates.

祝你好运.

这篇关于用于抓取网页或调用 API(尤其是 iTunes)的最快服务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆