在Vbscript中的字符编码Microsoft.XmlHttp [英] Character encoding Microsoft.XmlHttp in Vbscript

查看:195
本文介绍了在Vbscript中的字符编码Microsoft.XmlHttp的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在写一个vbscript从网页中提取一些数据,删除一些关键的信息并将它们写入文件。

I'm writing a vbscript to pull some data from a webpage, strip out a few key pieces of information and write those to a file.

我的脚本访问页面并将文件内容保存到字符串是:

At the moment my script to access the pages and save the file contents to a string is this:

Set WshShell = WScript.CreateObject("WScript.Shell")
Set http = CreateObject("Microsoft.XmlHttp")

'Load Webpage where address is URL
http.open "GET", URL, FALSE
http.send ""
'Assign webpage contents as a string to variable called Webpage
WEBPAGE = http.responseText

我需要将内容保存到一个字符串,所以我可以使用正则表达式来拉出我需要的内容。

I need to save the content to a string so I can use a regular expression on it to pull out the content that I need.

此脚本完美地工作,除非页面包含非标准字符(例如é)。当页面包含这样的东西时,脚本会抛出一个错误并停止。

This script works perfectly, EXCEPT for when the pages contain non-standard characters (such as é). When the page contains something like this, the script throws up an error and stops.

我猜这是与编码有关,但我不能制定如何解决它。任何人都可以指向正确的方向?感谢各位

I'm guessing this is something to do with the encoding, but I can't work out how to fix it. Can anyone point me in the right direction? Thanks guys

编辑

问了错题!事实证明,我正在下载内容很好 - 问题是,后来,我试图编辑它,并写出一个文件,并且该文件是错误的格式。我有这个:

Thanks to the help here I realised I've asked the wrong question! It turns out I was downloading the content fine - the problem was, afterwards I was trying to edit it and write it out to a file, and the file was in the wrong format. I had this:

Set objTextFile = objFSO.OpenTextFile(OutputFile, 8, True,)

将其更改为:

Set objTextFile = objFSO.OpenTextFile(OutputFile, 8, True, -1)

。什么疯狂的世界,呃?感谢您的帮助。

Seems to have fixed it. What a crazy world, eh? Thanks for the help.

推荐答案

您可能需要在发送前设置正确的标题块

You may need to set the correct header blocks before send

例如下面是一个例子。

eg the following is an example only. You will need to find out what this is exactly for your website

   http.open "GET", URL, FALSE
    http.SetRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
    http.SetRequestHeader "Accept", "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
    http.SetRequestHeader "Accept-Language", "en-us,en;q=0.5"
    http.SetRequestHeader "Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7"
    http.send ""

编辑

关于这一点。

Dim XMLHttpReq,URL,WEBPAGE
Const Eacute  = "%C3%89"

Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")

URL = "http://en.wikipedia.org/wiki/%C3%89"
'Load Webpage where address is URL
XMLHttpReq.Open "GET", URL, False
XMLHttpReq.send ""
'Assign webpage contents as a string to variable called Webpage
WEBPAGE = XMLHttpReq.responseText
WEBPAGE = Replace(WEBPAGE, Eacute, "É")
'Debug.Print WEBPAGE


b $ b

在这种情况下,E acute返回字符串%C3%89,如果需要,你可以强制它选择任何字符。

The E acute in this case returns as string %C3%89 and you can force it to whatever character you choose if required.

EDIT2:

只要添加,如果您使用VBScript执行此操作,您可能会发现此方法很有用

Just to add, if you're doing this with VBScript you may find this method useful

Dim XMLHttpReq, URL, WEBPAGE, fso, f
Const Eacute = "%C3%89"
Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")
URL = "http://en.wikipedia.org/wiki/%C3%89"
XMLHttpReq.Open "GET", URL, False
XMLHttpReq.send ""
WEBPAGE = XMLHttpReq.responseText

Save2File WEBPAGE, "C:\Users\osknows\Desktop\test.txt"

Sub Save2File (sText, sFile)
    Dim oStream
    Set oStream = CreateObject("ADODB.Stream")
    With oStream
        .Open
        .CharSet = "utf-8"
        .WriteText sText
        .SaveToFile sFile, 2
    End With
    Set oStream = Nothing
End Sub

这篇关于在Vbscript中的字符编码Microsoft.XmlHttp的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆