如何防止数据抓取有价值的数据网络服务? [英] How to prevent data-scraping a valuable data web service?

查看:31
本文介绍了如何防止数据抓取有价值的数据网络服务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于 Windows 商店应用的好主意.我想做这个应用程序.但是,它需要一个大型且有价值的数据库,我需要为其创建服务,以便人们无法轻易窃取它.我的想法是可能在 Azure 上托管一个移动服务(我从未尝试过)并创建一个 .net Web API 项目来接受请求并将 Json 像糖果一样分发到 Windows 8 mvvmclient.但是,我不想要的是有人从应用程序到服务来回嗅探我的流量,并弄清楚如何通过使用我的应用程序和服务来获取/发布数据,然后设置他们自己的应用程序/网站以使用我的带宽来显示这些数据让他们赚钱.

I have a great idea for a windows store app. I'd like to make this app. However it requires a large and valuable database that I will need to create a service for so that people cannot easily steal it. My thinking is maybe host a mobile service on Azure (which I've never tried) and create a .net Web API project to take requests and dish out Json like candy to a windows 8 mvvmclient. However what I don't want is someone sniffing my traffic back and forth from app to service and figuring out how to get/post data from using my app and service then setting up their own app / website to display this data using my bandwidth to make them money.

如何保护我的应用程序到数据库的数据访问,使其无法在我身上进行逆向工程.

How can I protect my app-to-db data access so it can't be reverse engineered on me.

这也是开发此类高容量 Windows 8 应用程序的最佳设置吗?你有更好的建议吗?

Also is this the best setup for developing a high volume windows 8 app like this? Do you have a better suggestion?

我知道我可以使用 SSL 等来加密进出的流量.我试图保护的是有人使用 Firebug 或 Fiddler 找出可以发布哪些参数来获取特定记录.然后创建他们自己的站点,该站点仅使用我的服务作为端点并窃取我的数据并占用我的带宽.IE.仅使用 firebug 我知道我可以使用 https://www.google.com/search?q=dallas 在 google 上搜索单词 dallas.即使我加密了页面,他们也可以在浏览器中看到那么多.因此,如果有人在他们自己的应用程序中执行相同的获取/发布操作,他们将获得相同的记录,从而使用我的东西.

I know I can use SSL etc to encrypt traffic to and from. What I am trying to protect is someone using Firebug or Fiddler to figure out what parameters can be posted to get a particular record back. Then creating their own site that simply uses my service as the end point and siphons my data and whores my bandwidth. ie. Just using firebug I know I can use https://www.google.com/search?q=dallas to search the word dallas on google. Even if I encrypt the page, they can see that much in their browser. so if someone does the same get/post in their own application they would get the same records back thus using my stuff.

推荐答案

您可以做的最直接的事情是使用诸如 OAuth.这将允许您确保不会以匿名方式与您的服务进行通信.

The most straight forward thing you can do is to setup authentication for your users using something like OAuth. This will allow you to ensure no communication happens with your service in an anonymous fashion.

一旦您对您的请求进行了身份验证,您就可以对那些不会影响普通用户的请求进行控制.您可以对请求进行速率限制或节流,或采取任何数量的策略,使抽取大部分数据集的时间变得非常昂贵.

Once you have authenticated your requests you can place controls on those requests that won't impact a normal user. You could rate limit or throttle requests or any number of tactics to make it very expensive time wise to siphon off large portions of your data set.

例如,当您注意到大量用户从单个 IP 地址聚集时,您可以开始阻止请求.您可以对每个用户设置合理的限制(例如每分钟 10 个 API 调用,结果集限制为 50).你明白我的想法.

For instance, you can start blocking requests when you notice a large number of users clustering from a single IP address. You could place sensible limits on each user (like 10 API calls per minute with a result set limited to 50). You get the idea I'm sure.

这篇关于如何防止数据抓取有价值的数据网络服务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆