防止网站数据被抓取和翻录 [英] Prevent site data from being crawled and ripped

查看:34
本文介绍了防止网站数据被抓取和翻录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑构建一个内容站点,其中可能包含数千个不同的条目,可通过索引和搜索进行访问.

I'm looking into building a content site with possibly thousands of different entries, accessible by index and by search.

我可以采取哪些措施来防止恶意抓取工具从我的网站上窃取所有数据?我不太担心 SEO,虽然我不想一起阻止合法的爬虫.

What are the measures I can take to prevent malicious crawlers from ripping off all the data from my site? I'm less worried about SEO, although I wouldn't want to block legitimate crawlers all together.

例如,我想过随机更改用于显示我的数据的 HTML 结构的一小部分,但我想这不会真正有效.

For example, I thought about randomly changing small bits of the HTML structure used to display my data, but I guess it wouldn't really be effective.

推荐答案

理论上,任何人眼可见的网站都有可能被翻录.如果您甚至想尝试访问,那么根据定义,必须就是这种情况(如果您的内容不是机器可读的,那么语音浏览器如何能够提供您的内容).

Any site that it visible by human eyes is, in theory, potentially rippable. If you're going to even try to be accessible then this, by definition, must be the case (how else will speaking browsers be able to deliver your content if it isn't machine readable).

您最好的办法是研究为您的内容添加水印,这样至少如果它被撕毁,您可以指向水印并声明所有权.

Your best bet is to look into watermarking your content, so that at least if it does get ripped you can point to the watermarks and claim ownership.

这篇关于防止网站数据被抓取和翻录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆