Invoke-Webrequest 响应的编码 [英] Encoding of the response of the Invoke-Webrequest

查看:65
本文介绍了Invoke-Webrequest 响应的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当对某些具有非英文字符的 Web 使用 cmdlet InvokeWebRequest 时,我看不到定义响应/页面内容编码的方法.

When using cmdlet InvokeWebRequest against some web with non-english characters, I see no way of defining the encoding of the response / page content.

我在 http://colours.cz/ucinkujici/ 上使用简单的 GET,这些艺术家的名字是损坏.你可以用这个简单的行来试试:

I use simple GET on http://colours.cz/ucinkujici/ and names of those artists are corrupted. You can try it with this simple line:

Invoke-WebRequest http://colours.cz/ucinkujici

这是由 cmdlet 的设计引起的吗?我可以以某种方式在某处指定编码吗?是否有任何解决方法可以正确解析响应?

Is this caused by design of the cmdlet? Can I specify encoding somwhere somehow? Is there any workaround to get properly parsed response?

推荐答案

在我看来你是对的 :/

It seems to me you are correct :/

这是获取正确内容的一种方法,首先将响应保存到文件中,然后将其读入具有正确编码的变量中.但是,您不是在处理 HtmlWebResponseObject:

Here is one way to get the content right, by saving the response to a file first and then reading it into a variable with the correct encoding. however, you are not dealing with a HtmlWebResponseObject:

Invoke-WebRequest http://colours.cz/ucinkujici -outfile .\colours.cz.txt
$content = gc .\colours.cz.txt -Encoding utf8 -raw

这会让你同样走得更远:

This will get you equally far:

[net.httpwebrequest]$httpwebrequest = [net.webrequest]::create('http://colours.cz/ucinkujici/')
[net.httpWebResponse]$httpwebresponse = $httpwebrequest.getResponse()
$reader = new-object IO.StreamReader($httpwebresponse.getResponseStream())
$content = $reader.ReadToEnd()
$reader.Close()

如果你真的想要这样一个 HtmlWebResponseObject,这里有一种方法可以得到例如来自 ParsedHtml 的内容或多或少是可读"的,Invoke-WebRequest($bad$better):

Should you really want such a HtmlWebResponseObject, here is a way to get e.g. stuff from ParsedHtml more or less "readable" with Invoke-WebRequest ($bad vs. $better):

Invoke-WebRequest http://colours.cz/ucinkujici -outvariable htmlwebresponse
$bad = $htmlwebresponse.parsedhtml.title
$better = [text.encoding]::utf8.getstring([text.encoding]::default.GetBytes($bad))
$bad = $htmlwebresponse.links[7].outerhtml
$better = [text.encoding]::utf8.getstring([text.encoding]::default.GetBytes($bad))

<小时>

更新:了解您想使用 ParsedHtml,这是对此的新看法.
获得内容后(请参阅第一个 2 行代码段,其中 1)将响应保存到文件,然后 2)使用正确的编码读取"文件内容),您可以执行以下操作:


Update: Here is a new take on this, knowing you want to work with ParsedHtml.
Once you have your content (see first 2-line snippet which 1) saves response to file and then 2) 'reads' the file content with the correct encoding), you can do this:

$ParsedHtml = New-Object -com "HTMLFILE"
$ParsedHtml.IHTMLDocument2_write($content)
$ParsedHtml.Close()

Et voilà :] 例如$ParsedHtml.title 现在可以正确显示了,猜测其余的也可以了……

Et voilà :] E.g. $ParsedHtml.title now shows correctly, guessing the rest will be OK as well…

这篇关于Invoke-Webrequest 响应的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆