读取包含0x00个字符的网页时出现问题。响应内容被截断 [英] Problem to read Web pages wich contain 0x00 characters. Response content is truncated
问题描述
您好,
我编写了一个下载网页的程序。它适用于大多数网页,但我找到了一些不起作用的页面。
I write a program wich download web pages. It works fine for most of web pages but i have found some pages where it doesn't work.
这些页面包含0x00个字符。
These pages contains 0x00 characters.
I能够阅读页面内容直到这个角色,而不是之后的内容。响应流将0x00字节视为流的末尾。
I'm able to read page content until this character, but not the content after. The response stream consider the 0x00 byte as the end of the stream.
我使用这部分代码来读取响应:
I use this part of code to read the response :
IAsyncResult ar = null;
HttpWebResponse resp = null;
Stream responseStream = null;
String content = null;
...
resp = (HttpWebResponse)req.EndGetResponse(ar);
responseStream = resp.GetResponseStream();
StreamReader sr = new StreamReader(responseStream, Encoding.UTF8);
content = sr.ReadToEnd();
在这个例子中,我使用异步请求,但我尝试使用同步问题,我也有同样的问题。
In this example i use asynchronous request, but i try with synchronous one and i have the same probleme.
我也尝试使用相同的结果:
I also try this with the same result :
HttpWebResponse resp = null;
Stream responseStream = null;
String content = new String();
...
responseStream = resp.GetResponseStream();
byte[] buffer = new byte[4096];
int bytesRead = 1;
while (bytesRead > 0)
{
bytesRead = responseStream.Read(buffer, 0, 4096);
content += Encoding.UTF8.GetString(buffer, 0, bytesRead);
}
例如,此网址出现此问题 http://www.daz3d.com/i/search/searchsub?sstring=ps% 5Ftx1662b&%5Fm = dps%5Ftx1662b
for example, the problem occurs for this url http://www.daz3d.com/i/search/searchsub?sstring=ps%5Ftx1662b&%5Fm=dps%5Ftx1662b
感谢您的回复
Euyeusu
> PS:问题与类WebClient相同
Euyeusu
PS : The problem is the same with class WebClient
推荐答案
字符串变量可以包含0x00字符。所以你需要使用BinaryReader或Stream.Read方法。如果您尝试在字符串变量中分配,那么它将被截断。如果要将内容保存到文件,则保存字节数组而不是字符串。 (File.WriteBytes())
希望它能解决你的问题
String variable can contain 0x00 character. So you need to use BinaryReader or Stream.Read method. If you try to assign in string variable then it will get truncated. If you are saving the content to a file then save bytes array instead of string. (File.WriteBytes())
Hope it could solve your problem
这篇关于读取包含0x00个字符的网页时出现问题。响应内容被截断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!