是否允许网页抓取? [英] Is web scraping allowed?

查看:19
本文介绍了是否允许网页抓取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个项目,该项目需要来自另一个网站的某些统计数据,我创建了一个 HTML 抓取工具,每 15 分钟自动获取一次这些数据.然而,我现在停止了机器人,因为在他们的使用条款中,他们提到他们不允许这样做.

I'm working on a project that requires certain statistics from another website, and I've created an HTML scraper that gets this data every 15 minutes, automatically. However, I stopped the bot now, as in their terms of use, they mention they do not allow it.

我真的很想尊重这一点,尤其是如果有法律禁止我获取这些数据,但我已经多次通过电子邮件与他们联系而没有得到任何答复,所以现在我得出的结论是将简单地获取数据,如果它是合法的.

I really want to respect this, and especially if there's a law prohibiting me from taking this data, but I've been contacting them through email several times without a single answer, so now I've come to the conclusion that I'll simply grab the data, if it is legal.

在某些论坛上,我读到它是合法的,但我更愿意在 StackOverflow 上得到更准确"的答案.

On certain forums I've read that it IS legal, but I would much rather get a more "precise" answer here on StackOverflow.

假设这实际上并不违法,他们是否有任何软件可以发现我的机器人每 15 分钟建立几次连接?

And let's say that this is in fact not illegal, would they have any software to spot my bot making several connections every 15 minutes?

此外,在谈论获取他们的数据时,我们谈论的是每个团队"的一个号码,我会将这个号码转入我们自己的号码.

Also, when talking about taking their data, we're talking about a single number for each "team", and this number I will transfer in to our own number.

推荐答案

我将引用 Pablo Hoffman(Scrapinghub 联合创始人)对网络抓取的合法性是什么?"的回答,我在其他网站上找到:

I'll quote Pablo Hoffman's (Scrapinghub co-founder) answer to "What is the legality of web scraping?", I found on other site:

><块引用>

首先:我不是律师,这些评论只是根据我在 Scrapinghub 工作的经验,请寻求合法的相应的帮助.

First things first: I am not a lawyer and these comments are solely based on my experience working at Scrapinghub, please seek legal assistance accordingly.

从网站抓取公共数据时需要考虑以下几点(请注意,以下仅针对美国法律):

Here are a few things to consider when scraping public data from websites (note that the following addresses only US law):

  • 只要不以破坏性的速度爬行,抓取工具就不会违反任何合同(以使用条款的形式)或犯罪(根据《计算机欺诈和滥用法》的定义).
  • 网站的用户协议不可强制执行,因为公司没有提供足够的通知网站访问者的条款.
  • 抓取工具以访问者的身份访问网站数据,并遵循类似于搜索引擎的路径.这可以做到无需注册为用户(并明确接受任何条款).
  • 在 Nguyen 诉 Barnes & 案中Noble, Inc. 法院 规则 简单地放置一个链接到网页底部的使用条款不足以引起建设性通知."换句话说,什么都没有在一个公共页面上,这意味着仅仅访问信息受任何合同条款的约束.刮板给既不明确也不暗示同意任何协议,因此不违反合同.
  • 例如,社交网络将成为用户的价值(基于公共页面上的号召性用语)分配为以下能力:i) 访问完整个人资料,ii) 识别共同的朋友/联系,iii) 介绍给他人,以及 iv) 直接联系会员.只要抓取工具不尝试执行任何这些操作,他们就不会获得对其服务的未经授权的访问",因此不会违反 CFAA
  • 可在此处查看对所涉及法律问题的全面评估:http://www.bna.com/legal-issues-raised-by-the-use-of-web-crawling-and-scraping-tools-用于分析目的
  • As long as they don't crawl at a disruptive rate, scrapers do not breach any contract (in the form of terms of use) or commit a crime (as defined in the Computer Fraud and Abuse Act).
  • Website's user agreement is not enforceable as a browsewrap agreement because companies do not provide sufficient notice of the terms to site visitors.
  • Scrapers accesses website data as a visitor, and by following paths similar to a search engine. This can be done without registering as a user (and explicitly accepting any terms).
  • In Nguyen v. Barnes & Noble, Inc. the courts ruled that simply placing a link to a terms of use at the bottom of webpage is not sufficient to "give rise to constructive notice." In other words, there is nothing on a public page that would imply that merely accessing the information is subject to any contractual terms. Scrapers gives neither explicit nor implicit assent to any agreement, therefore breaches no contract.
  • Social networks, for example, assign the value of becoming a user (based on call-to-action on public page), as the ability to: i) Gain access to full profiles, ii) Identify common friends/connections, iii) Get introduced to others, and iv) Contact members directly. As long as scrapers makes no attempt to perform any of these actions they do not gain "unauthorized access" to their services and thus does not violate CFAA
  • A thorough evaluation of the legal issues involved can be seen here: http://www.bna.com/legal-issues-raised-by-the-use-of-web-crawling-and-scraping-tools-for-analytics-purposes

这篇关于是否允许网页抓取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆