机器人伦理.txt [英] Ethics of robots.txt

查看:42
本文介绍了机器人伦理.txt的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个严肃的问题.忽略网站上的 robots.txt 文件是否合乎道德?这些是我想到的一些注意事项:

I have a serious question. Is it ever ethical to ignore the presence of a robots.txt file on a website? These are some of the considerations I've got in mind:

  1. 如果有人建立了一个网站,他们就会期待一些访问.诚然,网络爬虫使用带宽而不点击可能支持该网站的广告,但网站所有者将他们的网站放在网络上,对,那么他们期望自己永远不会被机器人访问有多么合理?

  1. If someone puts a web site up they're expecting some visits. Granted, web crawlers are using bandwidth without clicking on ads that may support the site but the site owner is putting their site on the web, right, so how reasonable is it for them to expect that they'll never get visited by a bot?

某些网站显然使用 robots.txt 是为了防止他们的网站被 Google 或其他可能抓取价格的实用程序抓取,从而使人们可以轻松地进行价格比较.他们在网站上有私人搜索引擎,所以他们显然希望人们能够搜索网站;显然,他们只是不希望人们能够轻松地将他们的信息与其他供应商进行比较.

Some sites apparently use a robots.txt exactly in order to keep their site from being crawled by Google or some other utility that might grab prices and therefore allow people to do price comparisons easily. They have private search engines on the site so they obviously want people to be able to search the site; apparently they just don't want people to be able to easily compare their information with other vendors.

正如我所说,我不是想争辩;我只是想知道是否有人提出过这样的案例:忽略 robots.txt 文件的存在在道德上是允许的?我想不出一个允许忽略 robots.txt 的情况,主要是因为人们(或企业)花钱建立他们的网站,所以他们应该能够告诉世界上的谷歌/雅虎/其他 SE不想在他们的索引上.

As I said, I'm not trying to be argumentative; I would just like to know if anyone has ever come up with a case where it's ethically permissible to ignore the presence of a robots.txt file? I cannot think of a case where it's permissible to ignore the robots.txt mainly because people (or businesses) are paying money to put up their web sites so they should be able to tell the Googles/Yahoos/Other SE's of the world that they don't want to be on their indices.

为了把这个讨论放在上下文中,我想创建一个价格比较网站,其中一个主要供应商有一个 robots.txt,基本上可以防止任何人获取他们的价格.我希望能够获得他们的信息,但是,正如我所说,我不能简单地无视网站所有者的意愿.

To put this discussion in context, I'd like to create a price comparison website and one of the major vendors has a robots.txt that basically prevents anyone from grabbing their prices. I'd like to be able to get their information but, as I said, I can't justify simply ignoring the wishes of the site owner.

我在这里看到了一些非常尖锐的讨论,这就是为什么我想听听关注 Stack Overflow 的开发人员的意见.

I have seen some very sharp discussion here and that's why I would like to hear the opinions of developers that follow Stack Overflow.

顺便说一下,在 Hacker News question 上有一些关于这个主题的讨论但他们似乎主要关注这方面的法律方面.

By the way, there is some discussion of this topic on a Hacker News question but they seem to mainly focus on the legal aspects of this.

推荐答案

参数:

  1. robots.txt 文件是一种隐含的许可,尤其是在您意识到这一点之后.因此,继续抓取他们的网站可能被视为未经授权的访问(即黑客攻击).糟透了,但最近在其他法律案件中也提出了类似的论点(与 robots.txt 没有直接关系,但与其他被动控件"有关.)
  2. 抢价不违反版权法,包括 DMCA,因为版权不包括事实信息,仅包括创意信息.
  3. 从道德上讲,您不应该抢价,因为供应商应该有能力更改价格,而不必担心被来自您网站的人指责为诱饵/转换.
  4. 您是否采取了高尚的做法,向他们解释了该网站并表示您愿意将他们包括在您的供应商列表中?也许他们会喜欢这个想法,并以一种易于您使用且生产所需资源较少的方式实际公开数据.
  5. 没有直接关于 robots.txt 的法律,因为一般遵循网络礼节.不要成为坏人"之一.
  6. 有些人过滤机器人是因为他们使用 URL 链接来执行操作",例如将东西添加到购物车,而机器人在他们的数据库中留下了大量废弃的购物车.
  7. 有些人过滤机器人是因为他们有独家价格,根据与供应商的协议,他们不能公开做广告.在您的网站上公开这些价格可能会使它们处于不利地位.
  8. 在这种经济环境下,如果一家公司不想尽一切可能为自己做广告,那么您不包括他们是他们自己的错.

这篇关于机器人伦理.txt的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆