Invoke-Webrequest 响应的编码 [英] Encoding of the response of the Invoke-Webrequest
问题描述
当对某些具有非英文字符的 Web 使用 cmdlet InvokeWebRequest 时,我看不到定义响应/页面内容编码的方法.
When using cmdlet InvokeWebRequest against some web with non-english characters, I see no way of defining the encoding of the response / page content.
我在 http://colours.cz/ucinkujici/ 上使用简单的 GET,这些艺术家的名字是损坏.你可以用这个简单的行来试试:
I use simple GET on http://colours.cz/ucinkujici/ and names of those artists are corrupted. You can try it with this simple line:
Invoke-WebRequest http://colours.cz/ucinkujici
这是由 cmdlet 的设计引起的吗?我可以以某种方式在某处指定编码吗?是否有任何解决方法可以正确解析响应?
Is this caused by design of the cmdlet? Can I specify encoding somwhere somehow? Is there any workaround to get properly parsed response?
推荐答案
在我看来你是对的 :/
It seems to me you are correct :/
这是获取正确内容的一种方法,首先将响应保存到文件中,然后将其读入具有正确编码的变量中.但是,您不是在处理 HtmlWebResponseObject
:
Here is one way to get the content right, by saving the response to a file first and then reading it into a variable with the correct encoding. however, you are not dealing with a HtmlWebResponseObject
:
Invoke-WebRequest http://colours.cz/ucinkujici -outfile .\colours.cz.txt
$content = gc .\colours.cz.txt -Encoding utf8 -raw
这会让你同样走得更远:
This will get you equally far:
[net.httpwebrequest]$httpwebrequest = [net.webrequest]::create('http://colours.cz/ucinkujici/')
[net.httpWebResponse]$httpwebresponse = $httpwebrequest.getResponse()
$reader = new-object IO.StreamReader($httpwebresponse.getResponseStream())
$content = $reader.ReadToEnd()
$reader.Close()
如果你真的想要这样一个 HtmlWebResponseObject
,这里有一种方法可以得到例如来自 ParsedHtml
的内容或多或少是可读"的,Invoke-WebRequest
($bad
与 $better
):
Should you really want such a HtmlWebResponseObject
, here is a way to get e.g. stuff from ParsedHtml
more or less "readable" with Invoke-WebRequest
($bad
vs. $better
):
Invoke-WebRequest http://colours.cz/ucinkujici -outvariable htmlwebresponse
$bad = $htmlwebresponse.parsedhtml.title
$better = [text.encoding]::utf8.getstring([text.encoding]::default.GetBytes($bad))
$bad = $htmlwebresponse.links[7].outerhtml
$better = [text.encoding]::utf8.getstring([text.encoding]::default.GetBytes($bad))
<小时>
更新:了解您想使用 ParsedHtml
,这是对此的新看法.
获得内容后(请参阅第一个 2 行代码段,其中 1)将响应保存到文件,然后 2)使用正确的编码读取"文件内容),您可以执行以下操作:
Update: Here is a new take on this, knowing you want to work with ParsedHtml
.
Once you have your content (see first 2-line snippet which 1) saves response to file and then 2) 'reads' the file content with the correct encoding), you can do this:
$ParsedHtml = New-Object -com "HTMLFILE"
$ParsedHtml.IHTMLDocument2_write($content)
$ParsedHtml.Close()
Et voilà :] 例如$ParsedHtml.title
现在可以正确显示了,猜测其余的也可以了……
Et voilà :] E.g. $ParsedHtml.title
now shows correctly, guessing the rest will be OK as well…
这篇关于Invoke-Webrequest 响应的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!