为什么要使用用户代理访问URL? [英] Why should I access a url using a User Agent?

查看:162
本文介绍了为什么要使用用户代理访问URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有与类似的代码这个问题.在可接受的答案中扩展代码也对我有用.

I had a similar code as in this question. Extending the code, in accepted answer, worked for me too.

在此之前,我使用过这种类型的代码,从未遇到过任何异常.

Before this time, I used this type of codes and never meet any exception.

现在,我的问题是:

  1. 我为什么要使用用户代理?
  2. 为什么有必要在我的程序中使用它?
  3. 是否需要在每个程序中使用?

  1. Why should I use the USER AGENT?
  2. Why it became necessary to use in my program?
  3. Is it necessary to use in every program?

  • 如果是,我的程序之前如何运行得如此好?
  • 如果没有,为什么我现在必须处理这个问题?

请注意:

我修复该程序的程序,我每天都在使用,但是以前从未出现过任何问题.

The program where I fixed it, I use it daily, but it never had any issue before.

推荐答案

许多网络管理员希望阻止漫游器访问其网站,因为它们的工作是定期抓取数据,但所有者无法从中获取任何广告收入命中.因此,没有明显的好处,但是他们继续使用资源.因此,它们会阻止任何看起来不像人类使用的浏览器的东西.如您所见,让您的程序假装成另一个完全是微不足道的.因此,这种技术对知道自己在做什么的任何人都无效.一般来说,不假装自己不喜欢的东西(互联网礼节)被认为是有礼貌的.

Many web administrators want to prevent bots from accessing their sites because what they do is scrape data at regular intervals but the owner can't earn any ad revenue from these hits. So no obvious benefits but they keep using resources. For this reason they block anything that doesn't look like a browser used by a human. As you have seen, it is completely trivial to make your program pretend to be another. So this technique is not effective against anyone who knows what they are doing. In general though, it is considered polite to not pretend something you're not (internet etiquette).

从技术上讲,用户代理字符串可以是您想要的任何内容,但是大多数应用程序遵循$product/$version之类的通用模式.您可以在此处看到一些示例.

User agent strings can technically be anything you want, but most applications follow a common pattern such as $product/$version. You can see some examples here.

有关更多信息,请查看此问题的维基百科文章.

For more information, check out the wikipedia article on the matter.

如此简短的总结:

  1. 您应该使用它,因为服务器希望所有客户端都拥有一个
  2. 该库可能具有默认的用户代理(例如JavaLib/1.1),但是由于上述原因,您必须设置自己的代理.
  3. 并非所有程序都必需,但假装自己是浏览器对机器人很有用.只要记住,它被认为是不礼貌的.例如, wget 对我来说99%的时间都可以正常工作,而无需进行修改,但是有些站点阻止了其用户代理.
  4. 不会生成字符串,它只是从现有的浏览器(在这种情况下为IE 6.0)复制的.您要连接的服务器似乎接受了它.
  1. You should use it because the servers expect all clients to have one
  2. The library probably has a default user agent (eg. JavaLib/1.1), but you had to set your own for the reasons stated above.
  3. Not necessary for all programs, but pretending to be a browser is useful for bots. Just remember that it is considered impolite. For example wget works 99% of the time for me without modification, but some sites block its user agent.
  4. The string is not generated, it's just copied from an existing browser, IE 6.0 in this case. And the server you're connecting to seems to accept it.

这篇关于为什么要使用用户代理访问URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆