WebClient.DownloadString（）返回字符串特殊字符 [英] WebClient.DownloadString() returns string with peculiar characters

查看：893 发布时间：2015/11/24 14:59:58 c# asp.net .net character-encoding special-characters

本文介绍了WebClient.DownloadString（）返回字符串特殊字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个问题，我们从网络上下载的屏幕抓取工具，我建立了一些内容。

在下面的code，该字符串从Web客户端下载字符串方法返回返回一些奇怪的字符源下载了几个（并非全部）网站。

我最近增加的HTTP头文件如下。 previously相同的code被称为无头达到同样的效果。我没有尝试过的变化上的接收字符集的头，我不知道很多关于轻基础等文字编码。

这是我指的是字符，或字符序列是：

ï»¿

和

当您在Web浏览器中使用查看源文件，这些人物都没有见过。这可能是造成这一点，我怎么能纠正问题？

 字符串urlData =的String.Empty;
Web客户端WC =新的Web客户端（）;

//添加标题冒充Web浏览器。有些网站
//将没有这些标题正确响应
wc.Headers.Add（用户代理，Mozilla的/ 5.0（Windows系统; U; Windows NT的6.1; EN-GB; RV：1.9.2.12）的Gecko / 20101026火狐/ 3.6.12）;
wc.Headers.Add（接受，* / *）;
wc.Headers.Add（接受语言，EN-GB，EN; Q = 0.5）;
wc.Headers.Add（接收字符集，ISO-8859-1，UTF-8，Q = 0.7，*; Q = 0.7）;

urlData = wc.DownloadString（URI）;

解决方案

ï»¿是八位位组的窗口1252年重新presentation EF BB BF 。这是的UTF-8字节顺序标记，这意味着您的远程Web页面的连接codeD的UTF-8，但你读它，如果它是窗口1252。根据文档， WebClient.DownloadString 使用<一个href="http://msdn.microsoft.com/en-us/library/system.net.webclient.encoding.aspx"><$c$c>Webclient.Encoding作为它的编码时，远程资源转换为String。将其设置为 System.Text.Encoding.UTF8 ，事情应该从理论上工作。

I have an issue with some content that we are downloading from the web for a screen scraping tool that I am building.

in the code below, the string returned from the web client download string method returns some odd characters for the source download for a few (not all) web sites.

I have recently added http headers as below. Previously the same code was called without the headers to the same effect. I have not tried variations on the 'Accept-Charset' header, I don't know much about text encoding other than the basics.

The characters, or character sequences that I refer to are:

"ï»¿"

and

"Â"

These characters are not seen when you use "view source" in a web browser. What could be causing this and how can I rectify the problem?

string urlData = String.Empty;
WebClient wc = new WebClient();

// Add headers to impersonate a web browser. Some web sites 
// will not respond correctly without these headers
wc.Headers.Add("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12");
wc.Headers.Add("Accept", "*/*");
wc.Headers.Add("Accept-Language", "en-gb,en;q=0.5");
wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");

urlData = wc.DownloadString(uri);

解决方案

ï»¿ is the windows-1252 representation of the octets EF BB BF. That's the UTF-8 byte-order marker, which implies that your remote web page is encoded in UTF-8 but you're reading it as if it were windows-1252. According to the docs, WebClient.DownloadString uses Webclient.Encoding as its encoding when it converts the remote resource into a string. Set it to System.Text.Encoding.UTF8 and things should theoretically work.

这篇关于WebClient.DownloadString（）返回字符串特殊字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

WebClient.DownloadString（）返回字符串特殊字符 [英] WebClient.DownloadString() returns string with peculiar characters

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

WebClient.DownloadString（）返回字符串特殊字符 [英] WebClient.DownloadString() returns string with peculiar characters

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭