沉重的 iTunes Connect 抓取 [英] heavy iTunes Connect scraping

查看:26
本文介绍了沉重的 iTunes Connect 抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找不同的选项来从 iTunes Connect 网站获取销售报告和其他数据.由于 Apple 不提供 API,我找到的所有解决方案都是基于抓取页面.

因为我需要我们提供的产品的信息,所以我不太乐意将所有 iTunes 帐户提供给 3rd 方服务.这就是为什么我想自己刮它或使用在我们的服务器上运行的产品.

我的问题是:

  • 是否有人体验过 Apple 更改 Web 前端的频率?
  • 有人体验过从一台服务器到站点的最大请求数吗?我怕被苹果封号.
  • 还有什么我需要考虑的会导致严重问题的吗?
<小时>

如果有人对我看过的工具感兴趣,这里是一个列表:

服务:

产品:

开源工具:

更新:

我开始使用 Kirby 的 Python 脚本(https://github.com/kirbyt/appdailysales)和效果很好.

解决方案

<块引用>

有没有人体验过苹果改变网页前端的频率?

我不能代表所有的 iTunes Connect,只能下载每日销售报告.我的脚本坚如磐石,在 2009 年 11 月到 2010 年 9 月之间不需要任何更改.这在 2010 年 9 月 Apple 推出新网站时发生了变化.这打破了旧剧本,必须写一个新剧本.自从推出新网站后,我每隔几天就会进行更改以处理来自 Apple 的调整.我希望调整很快就会结束.

查看 appdailysales.py 的下载页面.日期会让您大致了解我更改脚本的频率.

https://github.com/kirbyt/appdailysales

同样,这仅适用于每日销售报告.我不确定 iTC 其他领域的变化频率.

<块引用><块引用>

有人体验过从一台服务器到站点的最大请求吗?我怕被苹果封号.

我没有经历过这种情况,但我的服务器每天只运行一次脚本.我在编写脚本时经常点击 iTC,但不足以导致 Apple 服务器的负载.

<块引用><块引用>

还有什么我需要考虑的会导致严重问题的吗?

我不知道什么可能会让您在 Apple 上遇到麻烦,但确实让您头疼的一件事是网站的更改.虽然网站的新版本使网站的屏幕抓取更容易,但它确实涉及编写新脚本.Apple 不会提醒您他们正在改变某些东西.当屏幕刮板中的某些东西损坏时,您会事后发现.

如果您每天都依赖数据,那么您必须放弃一切并进行必要的修复.没有什么能阻止 Apple 在未来某个时候推出另一个新网站.

希望有所帮助.

-柯比

I'm looking at different options to get the sales reports and other data out of the iTunes Connect website. Since Apple doesn't provide an API, all the solutions I found are based on scraping the page.

As I need the information for a product that we offer, I'm not that happy to give all the iTunes accounts to a 3rd party service. This is why I want to scrape it myself or use a product that runs on our servers.

My questions are:

  • does someone have experience how frequent apple is changing the web front-end?
  • has someone experience in maximum request from one server to the site? I'm afraid of being baned by apple.
  • anything else I have to have in mind that will cause serious trouble?

Just if someone is interested in the tools I looked at, here is a list:

Services:

Products:

Open Source Tools:

UPDATE:

I started using Kirby's python script (https://github.com/kirbyt/appdailysales) and it works very well.

解决方案

does someone have experience how frequent apple is changing the web front-end?

I can't speak for all of iTunes Connect, only downloading daily sales reports. My script was rock solid and didn't require a single change between November 2009 and September 2010. This changed in September 2010 when Apple rolled out the new web site. This broke the old script, and a new one had to be written. Since rolling out the new web site, I make changes every few days to handle the tweaks from Apple. I'm hoping the tweaks will end soon.

Take a look at the download page for appdailysales.py. The dates will give you a general idea of how often I make changes to the script.

https://github.com/kirbyt/appdailysales

Again, this is only for daily sales reports. I'm not sure how frequently others areas of iTC change.

has someone experience in maximum request from one server to the site? I'm afraid of being baned by apple.

I've not experienced this, but my server runs the script only once a day. I frequently hit the iTC when working on the script, but not enough to cause a load on Apple's servers.

anything else I have to have in mind that will cause serious trouble?

I don't know what might get you in trouble with Apple, but one thing that does cause a serious headache is changes to the web site. While the new version of the web site makes screen scraping the site easier, it did involve writing a new script. Apple does not give you a heads up that they are changing something. You find out after the fact when something in your screen scraper breaks.

If you depend on the data daily, then you have to drop everything and make the necessary fixes. And there is nothing stopping Apple from rolling out another new site sometime in the future.

Hope that helps.

-KIRBY

这篇关于沉重的 iTunes Connect 抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆