保存用户将访问的每个网站 [英] save every web site that will be visited by the user

查看:81
本文介绍了保存用户将访问的每个网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题

i不知道从哪里开始

i不会在端口80上听,这样我就可以保存每个将被访问的网站用户

浏览器是什么......

mmmmm

**************** *******



http://fiddler2.com /docimages/SessionsList.png [ ^ ]





************************

这是我在我的一些用户界面中寻找的东西

这是正确的方法吗?还是有更好的方式

什么是最好的语言工作



我已经做了一个关键记录器,但它不适合监控浏览器

thnx

解决方案

首先,您需要嗅探用户的HTTP活动并收集有关它的数据。我稍后会注意到这些数据。例如,看看作为CodeProject文章提出的这个解决方案: C#中的网络嗅探器 [ ^ ] 。



我相信这对于你开发这个部分来说已经足够了。



现在,您要下载用户访问的一些资源。您应该明白,这不仅仅是一组URL。每次用户获取某个页面(或其他资源)时,此人都会通过浏览器发送HTTP请求。请求在TCP级别上作为包发送(传输级协议, http://en.wikipedia.org/wiki/Transport_layer [ ^ ],http://en.wikipedia.org/wiki/Transmission_Control_Protocol [ ^ ])包含大量特定于HTTP的信息:URL,它可能不仅仅是一个页面地址,还可以携带一些URL参数(这是调用查询字符串,HTTP请求方法(如GET,HEAD,POST,PUT等),有关客户端的一些信息,引用,等等,重要的是,一些HTTP 参数,通常包括在用户使用Web表单或可以通过Ajax添加时。有关详细信息,请参阅:

http://en.wikipedia.org/wiki/HTTP [ ^ ],

http://en.wikipedia.org/wiki/Query_string [ ^ ],

http:// en .wikipedia.org / wiki / Ajax_%28programming%29 [ ^ ]。



要下载同一组资源和用户,您需要收集每个请求的所有详细信息并模仿它在HTTP请求中,获取HTTP响应并保存每个响应的数据块。所有你需要的是类 System.Net.HttpWebRequest

http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx [ ^ ]。



现在,您应该了解您可能无法获得与用户完全相同的数据。如果网站或访问过的网页和其他获取的资源是静态的,您只能获得相同的内容。一般情况下,情况并非如此。如果你不是很明显,那就需要相当复杂的解释,这可能需要单独回答。



-SA


Hello Ahmad,



我认为你想要实现的功能与代理服务器或者代理服务器提供的功能非常相似防火墙。考虑使用这样的软件。我能想到的另一种方法是你可以创建一个BHO,但它必须安装在每台机器上。与开发类似功能相比,设置代理服务器(免费/商用)将更容易。如果你想自己编码,请在下面找到几个可以帮你入门的链接。



问候,


现在,我想回答您的保存每个网站的请求。可能会有很大的误解。我希望你真的了解什么是网站。看起来你觉得你无法解释你想要什么,或者我无法理解你想要达到的目标,但情况要简单得多。



拯救的概念一个页面或其他一些资源在100%的情况下没有意义,但在大多数现实情况下,这是可能的。与此相反,保存网站的问题是不适定问题的典型例子。与其他一些不适定的问题不同,这个问题的解决方案通常是不可行



更准确地说,它在某些特殊情况下是可行的,一个静态站点的情况,其中所有页面(或其他资源)可从站点的某个(或主要)页面到达,例如,在站点是静态的并且提供全面的<我>网站地图。在这种情况下,您可以下载,例如,使用 System.Net.HttpWebRequest 解析获取的HTML,这样的站点地图,然后对每个URL再次使用 System.Net.HttpWebRequest 。或者,在更复杂的情况下,您只需加载所有已知页面,解析每个页面,查找所有新页面(目前尚未下载的页面),并继续以递归方式下载所有链接。如果这是可以满足你的,你就可以做到。



然而,许多复杂的网站都不是那样的,而且很多网站都很复杂到不喜欢这几天。这些网站是动态的。 HTTP响应可能依赖于唯一的HTTP请求并且是唯一的。您可以通过考虑一些最大HTTP块大小并估计所有可能的请求(甚至基于相同的域名)来估计要尝试的所有可能请求的数量。在实践中,这是完全不可行的。我甚至不想估计它。如果我估计数十亿年的发布(考虑到一些现实的带宽),你发现我错了,估计只需千年来尝试所有变种,它会让任务变得更容易吗? :-)



但即便如此也许还不够。即使是单个页面也可能会在其中提供带有伪随机数据的HTTP响应。这不是我的幻想 - 这对于游戏来说是司空见惯的。在这种情况下,保存这样的页面不仅是可行的,它根本没有意义。



-SA

i got a question
i don't know where to start
i wont to listen on port 80 so that i can save every web site that will be visited by the user
what ever the browser is ...
mmmmm
***********************

http://fiddler2.com/docimages/SessionsList.png[^]


************************
this is what i am looking for in some of my UI
is it the right way to do it or is there a better way
and what is the best language work with

will i already done a key logger but it not good for monitoring the browsers
thnx

解决方案

First of all, you need to sniff your user's HTTP activity and collect data about it. I'll note on that data later. For example, look at this solution presented as the CodeProject article: A Network Sniffer in C#[^].

I'm sure that would be quite enough material for you to develop this part.

Now, you want to download some resources visited by the user. You should understand that this is not just the set of URLs. Each time the user gets some page (or other resource), this person sends an HTTP request through the browser. The request is sent as a package on the TCP level (transport-level protocol, http://en.wikipedia.org/wiki/Transport_layer[^], http://en.wikipedia.org/wiki/Transmission_Control_Protocol[^]) which contains a good amount of HTTP-specific information: URL, which may be not just a page address but also can carry some URL parameters (this is called query string, HTTP request method (such as "GET", "HEAD", "POST", "PUT", etc.), some information about client side, referral, and so on, and, importantly, some HTTP parameters which are typically included when the user uses a Web form or can be added via Ajax. For further detail, please see:
http://en.wikipedia.org/wiki/HTTP[^],
http://en.wikipedia.org/wiki/Query_string[^],
http://en.wikipedia.org/wiki/Ajax_%28programming%29[^].

To download the same set of resources and your user did, you need to gather all this detail on each request and mimic it in your HTTP request, obtain HTTP response and save the block of data for each response. All you need is the class System.Net.HttpWebRequest:
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx[^].

Now, you should understand that you might not get exactly the same data as your user. You can only get the same stuff if the site, or just visited pages and other obtained resources are static. In general case, this is not so. If it's not immediately obvious to you, it would need pretty complex explanation, which would probably take a separate answer.

—SA


Hello Ahmad,

I think what you are trying to achieve is very much similar to the functionality provided by a proxy server or a firewall. Consider using such a software. Another way I can think of is may be you can create a BHO, but it will have to get installed on every machine. Setting up a Proxy Server (Free/Commercial) will be much easier compared to developing similar functionality. Please find below few links which should get you started if you want to code it all by yourself.

Regards,


Now, I want to answer on your request to "save every site". There can be a big misunderstanding. I hope you really understand what is a Web site. It looks like you feel that you cannot explain what you want or I cannot understand what you want to achieve, but the situation is much simpler.

The notion of "saving a page" or some other resource does not make sense in 100% cases, but in most real-life cases, this is possible. In contrast to that, the problem of "saving a site" is a typical example of an ill-posed problem. And unlike some other ill-posed problems, the solution of this one is generally infeasible.

More exactly, it's feasible in some special cases, the cases of a static site with all pages (or other resources) reachable from some (or main) page of the site, for example, in the cases when a site is static and provides a comprehensive site map. In such case, you can download, for example, such a site map using, say, System.Net.HttpWebRequest, parse the obtained HTML, then use System.Net.HttpWebRequest again for every URL. Or, in more complex cases, you simply load all known pages, parse each of them, find all new pages (those you did not download so far), and continue with downloading all of the links, recursively. If this is what can satisfy you, you can do it.

However, many complicated sites are not like that, and very many site are complicated enough to be not like that these days. The sites are dynamic. The HTTP responses may depend on unique HTTP requests and be unique. You can estimate the number of all possible requests to try out, by considering some maximum HTTP block size and estimating all possible requests (even based on the same domain name). In practice, this is totally infeasible. I don't even want to estimate it. If my estimate comes to some billions of years of posting (considering some realistic bandwidth), and you find that I'm wrong and the estimate will be "only" thousand years needed to try all variants, will it make the task easier? :-)

But even that may not be enough. Even a single page may give HTTP responses with pseudo-random data in them. This is not my fantasy — this is a commonplace for games. In such cases, "saving" such page is not just feasible or not, it simply makes no sense.

—SA


这篇关于保存用户将访问的每个网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆