WebClient.DownloadString()不产生精确的HTML [英] WebClient.DownloadString() Not Producing Exact HTML

查看:161
本文介绍了WebClient.DownloadString()不产生精确的HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以这里的交易。我创建一个蜘蛛机器人一个网站,扫描所有的产品页面和记录产品数据。我使用C#和Web客户端库下载HTML字符串。站点我爬行必须特制因为从WebClient.DownloadString()接收到的HTML比本人得到当我上的浏览器查看的,当它查看HTML源的HTML不同。这似乎是故意的,因为唯一的信息,我不能得到的是价格。

So here's the deal. I'm creating a spider bot for a website that scans all the product pages and records the product data. I'm using C# and the WebClient library to download the HTML string. The site I'm crawling must be specially made because the HTML that is received from WebClient.DownloadString() is different than the HTML that I get when I view the source of the HTML when visiting it on a browser. This seems intentional because the only info I can't get is the price.

有谁知道一个办法解决这个问题,或者任何人都可以解释发生了什么?谢谢你。

Does anyone know a workaround for this problem or can anyone explain what is happening? Thanks.

推荐答案

这可能是使用用户代理字符串来决定哪些内容发送。这个例子这里说明如何设置用户代理头。

It is probably using the the user agent string to decide what content to send. The example here shows how to set the user agent header.

这篇关于WebClient.DownloadString()不产生精确的HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆