下载字符串和特殊字符 [英] DownloadString and Special Characters

查看:188
本文介绍了下载字符串和特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在使用Webclient从网站下载的字符串中找到Mauricio的索引并下载字符串.但是,它在网站上包含外国字符Maurício.所以我在其他地方找到了一些代码

I am trying to find the index of Mauricio in a string that is downloaded from a website using webclient and download string. However, on the website it contains a foreign character, Maurício. So I found elsewhere some code

string ToASCII(string s)
{
return String.Join("",
     s.Normalize(NormalizationForm.FormD)
    .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
}

转换外来字符.我已经测试了代码,并且可以正常工作.所以我的问题是,当我下载字符串时,它以MaurA-cio的形式下载.我都尝试过

that converts foreign characters. I have tested the code and it works. So the problem I have is that when I download the string, it downloads as MaurA-cio. I have tried both

wc.Encoding = System.Text.Encoding.UTF8; wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");

wc.Encoding = System.Text.Encoding.UTF8; wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");

也没有停止将其下载为MaurA-cio.

Neither stop it from downloading as MaurA-cio.

(而且,由于要从列表中获取搜索字词,因此我无法更改搜索.)

(Also, I cannot change the search as I am getting the search term from a list).

我还能尝试什么? 谢谢

What else can I try? Thanks

推荐答案

DownloadString不会查看HTTP响应标头.它使用先前设置的WebClient.Encoding属性.如果必须使用它,请先获取标题:

DownloadString doesn't look at HTTP response headers. It uses the previously set WebClient.Encoding property. If you have to use it, get the headers first:

// call twice 
// (or to just do a HEAD, see http://stackoverflow.com/questions/3268926/head-with-webclient)
webClient.DownloadString("http://en.wikipedia.org/wiki/Maurício");
var contentType = webClient.ResponseHeaders["Content-Type"];
var charset = Regex.Match(contentType,"charset=([^;]+)").Groups[1].Value;

webClient.Encoding = Encoding.GetEncoding(charset);
var s = webClient.DownloadString("http://en.wikipedia.org/wiki/Maurício");

BTW-Unicode没有定义外来"字符.从毛里西奥的角度来看,毛里西奥"将是他名字的外国拼写.

BTW--Unicode doesn't define "foreign" characters. From Maurício's perspective, "Mauricio" would be the foreign spelling of his name.

这篇关于下载字符串和特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆