正在使用API​​总是preferable刮? [英] Is using an API always preferable to scraping?

查看:173
本文介绍了正在使用API​​总是preferable刮?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找了大量的(> 100K至少)数据从Web 2.0网站的一个研究项目。我想用公开的API来获取数据,但会在这种情况下,更好的报废工作的?

I am looking for a large amount (>100k at least) of data from web 2.0 sites for a research project. I am thinking of using the exposed API to get the data, but would scrapping work better in this case?

API是很好的(较少的工作相比,书面方式刮刀),但我真的不知道多少时间,我需要收集大量的数据,考虑到平时有时间/呼叫限制。我不是说有在刮,虽然没有限制;只是我很好奇这是完成工作的一种更好的方式。

The API is good (less work compared to writting a scraper), but I really have no idea how much time I need to collect that much data, considering there is usually a time/call limit. I'm not saying there is no limit in scraping though; just that I am curious which is a better way of getting the job done.

推荐答案

如果该网站提供了一个API,然后使用它。

这是更简单的,通用的,合法的。如果该网站是一种流行,你经常会发现你正在使用的语言包装。

It's much simpler, generic, and legal. If the site is kind of popular, you often find wrappers for the language you're using.

当然,如果你开发一个刮板,你不会有限制,但也许站点不允许被刮掉,而这正是为什么他们对用户/开发者的API

Of course, if you develop a scraper, you won't have limitations, but maybe the site doesn't allow being scraped, and that's exactly why they have an API for users/developers.

关于 Jeffrey04 的意见:

About Jeffrey04 comment:

让我们来看看...这是一个道德的事情。如果你愿意,你可以得到的数据几倍的金额没有被阻止。您可以随时更改的用户代理的,更改IP后的 N 的要求(当然,这一切编程),并做一些技巧与缓存的,但是这不是想法。我的意思是,不使用网站的意见刮是不是因为从网站得到取缔。

Let's see... this is a moral thing. If you want, you can obtain that amount of data several times without being blocked. You can always change User-Agents, change IP after N requests (of course, all of this programatically), and do some tricks with Cookies, but that's not the idea. What I mean is that the advice of not using website scraping is not because of getting banned from the website.

这篇关于正在使用API​​总是preferable刮?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆