XHR请求响应文本具有意外的字符集 [英] XHR request response text has unexpected character set

查看:97
本文介绍了XHR请求响应文本具有意外的字符集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过@OmegaStripes看这个问题的答案



从快速的研究,请参阅参考资料,我是猜测这可能是一个编码问题?



我尝试设置 .SetRequestHeader



<$ p $ .setRequestHeaderContent-Type,_
application / x-www-form-urlencoded; charset = UTF-8

这对输出没有影响。



说实话,我没有线索如何解决这个问题。



有关我如何获得预期响应文本的任何建议?即我可以解析感兴趣的 href



上下文: p>

这是一个更大的工作的一部分,其中:
$ b $ 1)我想要抓取CSV链接名称将每月更改),没有浏览器弹出



<2>下载目标文件内容



3)使用ADODB.Stream将二进制文件写出。

这个过程由@OmegaStripes在回答我的问题时概述。我试图了解并实施该建议。



代码:

  Option Explicit 

Public Const url As String =https://www.england.nhs.uk/statistics/statistical-work-areas/ambulance-quality-indicators/
public aBody As String

Sub Testing()

'通过XHR下载
使用CreateObject(MSXML2.XMLHTTP)

。打开GET,url,False
.setRequestHeaderContent-Type,application / x-www-form-urlencoded; charset = utf-8
.send
'响应内容
aBody = .responseBody

End With

ActiveSheet.Range(A1)= aBody

End Sub

参考文献:

1) XMLHTTP和特殊字符(例如,口音)



<2>


3)
VBA HTML Scraping - 复杂的'.innertext'表格



4) Msxml2.ServerXMLHTTP和UTF-8字符集问题 解决方案

>



如上所述,问题确实是 .responseBody 返回一个编码为UTF-8的字节数组。正如我指出的那样,我将它转换为字符串(UTF-16编码),因此所有这些外来字符。



我使用@ Tomalak的函数 BytesToString ,只需稍作更改即可处理转换



代码:

 选项显式

Public Const url As String =https://www.england.nhs.uk/statistics/statistical-work-areas/ambulance-quality-indicators/
Public aBody As String'这是造成转换
Const adTypeBinary As Byte = 1
Const adTypeText As Byte = 2
Const adModeReadWrite As Byte = 3
Public Const strPath As String =C:\Users\ User'\\Desktop\testXMLHTTPOutput

Public Sub Testing()
'通过XHR下载
使用CreateObject(MSXML2.XMLHTTP)

。打开GET,url,False
.send
'获取二进制文件响应内容
aBody = BytesToString(.responseBody,UTF-8)

End With

Dim fso As Object'late binding
Set fso = CreateObject(Scripting.FileSystemObject)
Dim oFile As Object
Set oFile = fso.CreateTextFile(strPath)
oFile.WriteLine aBody
oFile.Close
Set fso = Nothing
Set oFile = Nothing

End Sub
'ADODB.Stream with stream.CharSet =UTF-8
'http:// msdn。 microsoft.com/en-us/library/windows/desktop/ms675032%28v=vs.85%29.aspx


公共函数BytesToString(ByVal bytes As Variant,ByVal charset As String )As String

使用CreateObject(ADODB.Stream)
.Mode = adModeReadWrite
.Type = adTypeBinary
.Open
。写入字节数
.Position = 0
.Type = adTypeText
.charset = charset
BytesToString = .ReadText
End With
End Function

这里有用的其他链接:

保存用VBA编码的UTF-8文本文件


I was looking at the answer, by @OmegaStripes, to this question How to get a particular InnerText from a specific class? Here one uses the Split function, and a specified delimiter string, to extract an href from .responseBody.

I then tried to replicate this to extract the following href :

"https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv" 

from NHS England's Ambulance Quality Indicators

HTML snippet:

<main class="main group" role="main">
        <div class="page-content" id="main-content">
            <header>
                <h1>Ambulance Quality Indicators</h1>
            </header>
            <article class="rich-text">
               <p></p>
              <p></p>
              <p></p>
               <p></p>
              <p></p>
              <p><strong>CSV Data</strong><br>
These files have the same data as other published spreadsheets, but without any formatting:<br>
                <a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv" class="csv-link" onclick="ga('send', 'event', 'Downloads', 'CSV', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv');">New Systems Indicators August 2017 to January 2018 (CSV, 23KB)</a><br>
            </article>
    </div>
</main>

Problem:

I am getting response text back that looks as follows:

Example response text:

From a quick bit of research, see references, I am guessing this is perhaps an encoding problem?

I tried setting a .SetRequestHeader

 .setRequestHeader "Content-Type", _
     "application/x-www-form-urlencoded; charset=UTF-8"

This made no difference to the output.

To be honest, I haven't a clue how to resolve this.

Any suggestions please on how I get the expected response text? i.e. that I can parse for the href of interest.

Context:

This is part of a bigger piece of work where:

1) I want to scrape that CSV link (the name of which will change each month), without having the browser pop-up

2) Download the target file content

3) Use ADODB.Stream to write the binary file out.

This process was outlined by @OmegaStripes in response to my question Return focus to ThisWorkbook.Activesheet after XMLHTTP60 file download . I am trying to understand and implement that suggestion currently.

Code:

Option Explicit

Public Const url As String = "https://www.england.nhs.uk/statistics/statistical-work-areas/ambulance-quality-indicators/"
Public aBody As String

Sub Testing()

    ' Download via XHR
    With CreateObject("MSXML2.XMLHTTP")

        .Open "GET", url, False
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=utf-8"
        .send
        ' Get binary response content
        aBody = .responseBody

    End With

    ActiveSheet.Range("A1") = aBody

End Sub

References:

1) XMLHTTP and Special Characters (eg, accents)

2) setRequestHeader Method (IXMLHTTPRequest)

3) VBA HTML Scraping - '.innertext' from complex table

4) Msxml2.ServerXMLHTTP and UTF-8 charset issues

解决方案

So credit goes to @FlorentB for this solution and a shout out to @OmegaStripes for the suggestion.

As suggested, the problem indeed was the .responseBody was returning an array of bytes encoded as UTF-8. As pointed out, I was casting it to a String (UTF-16 encoding) hence all these foreign characters.

I used @Tomalak's function BytesToString, with minor changes, to handle the conversion to string.

Code:

Option Explicit

Public Const url As String = "https://www.england.nhs.uk/statistics/statistical-work-areas/ambulance-quality-indicators/"
Public aBody As String 'this is causing the conversion
Const adTypeBinary As Byte = 1
Const adTypeText As Byte = 2
Const adModeReadWrite As Byte = 3
Public Const strPath As String = "C:\Users\User\Desktop\testXMLHTTPOutput"

Public Sub Testing() 
    ' Download via XHR
    With CreateObject("MSXML2.XMLHTTP")

        .Open "GET", url, False
        .send
        ' Get binary response content
        aBody = BytesToString(.responseBody, "UTF-8")

    End With

    Dim fso As Object  'late binding
    Set fso = CreateObject("Scripting.FileSystemObject")
    Dim oFile As Object
    Set oFile = fso.CreateTextFile(strPath)
    oFile.WriteLine aBody
    oFile.Close
    Set fso = Nothing
    Set oFile = Nothing

End Sub
'ADODB.Stream with stream.CharSet = "UTF-8"
'http://msdn.microsoft.com/en-us/library/windows/desktop/ms675032%28v=vs.85%29.aspx


Public Function BytesToString(ByVal bytes As Variant, ByVal charset As String) As String

    With CreateObject("ADODB.Stream")
        .Mode = adModeReadWrite
        .Type = adTypeBinary
        .Open
        .Write bytes
        .Position = 0
        .Type = adTypeText
        .charset = charset
        BytesToString = .ReadText
    End With
End Function

Useful other link here:

Save text file UTF-8 encoded with VBA

这篇关于XHR请求响应文本具有意外的字符集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆