VBA从网站提取数据并将其解析到Word [英] VBA extract and parse data from website to Word

查看:109
本文介绍了VBA从网站提取数据并将其解析到Word的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从此处提取一些数据: http://www.hnb.hr /tecajn/f140215.dat

I'm trying to extract some data from here: http://www.hnb.hr/tecajn/f140215.dat

这是克罗地亚国家银行的汇率列表.文件名"f140215.dat"基本上是日期,其格式如下:

This is the exchange rate list from the Croatian National Bank. The file name "f140215.dat" is basically a date, formatted in the following order:

"f""DDMMYY"".dat"

"f" "DDMMYY" ".dat"

我打算将数据组织在Word表中,该表包含以下单元格:

I intend to have the data organized in a Word table, which contains the following cells:

  • 单元格1,用户可以在其中手动输入以下日期 格式:"MMM DD,YYYY"
  • 单元格2,用户将在其中手动输入所需的货币代码 名称(美元,英镑等)
  • 第3个单元格,其中应显示提取的汇率 指定的日期和货币.
  • Cell#1 where a user would manually input a date in the following format: "MMM DD, YYYY"
  • Cell#2 where a user would manually input the requested currency code name (USD, GBP, etc)
  • Cell#3 where the extracted exchange rate should appear for the specified date and currency.

表格下方有一个"UPDATE"按钮,用于更新Cell#3信息.我要的脚本应该连接到该按钮.

Underneath the table there is an "UPDATE" button that updates the Cell#3 information. The script I'm asking for should be connected to that button.

单击按钮后,我希望脚本执行以下操作:

After clicking the button, I'd like the script to do the following:

  • 根据单元格1中输入的日期找出要转到的页面. 例如,如果单元格1包含"2015年2月14日",则脚本 应该指向" http://www.hnb.hr/tecajn/f140215.dat "
  • 在该页面上,获取在中指定的货币的中间值 单元格2.例如,如果单元格2中包含"USD",则脚本应
    提取"6,766508",这是"840USD001"的中间值.仅有的 中间值是相关的.
  • 将此值写入Cell#3.
  • Figure out which page to go to based on the date inputted in Cell #1. For example, if the Cell#1 contains "February 14, 2015", the script should point to "http://www.hnb.hr/tecajn/f140215.dat"
  • On that page, grab the middle value for the currency specified in Cell#2. For example, if Cell#2 contains "USD", the script should
    extract "6,766508" which is the middle value for "840USD001". Only the middle value is relevant.
  • Write this value to Cell#3.

总而言之,脚本需要根据两个表单元格中指定的条件,来确定要转到的页面以及从中提取哪些数据,并用该数据填充第三个单元格.

So to sum it up, based in the criteria specified in the two table cells, the script needs to identify which page to go to and what data to extract from it, and with that data populate the third cell.

希望我解释得足够好.这只是我正在构建的整个发票生成器的一部分.到目前为止,我已经完成了所有工作,但是我什至不知道如何开始.如果需要的话,我可以将整个邮件发送出去,但认为这并不完全相关.

Hope I explained it well enough. This is only a part of the whole invoice generator I'm building. So far I've gotten everything to work, but this I really don't even know how to start. I can send the whole thing if needed, but figured it's not exactly relevant.

我看了一些教程并玩了一下,这就是我到目前为止所得到的.

I watched some tutorials and played around, and this is what I got so far.

Enum READYSTATE
    READYSTATE_UNINITIALIZED = 0
    READYSTATE_LOADING = 1
    READYSTATE_LOADED = 2
    READYSTATE_INTERACTIVE = 3
    READYSTATE_COMPLETE = 4
End Enum

Sub Test()

Dim ie As New InternetExplorer
ie.Visible = False
ie.navigate "http://www.hnb.hr/tecajn/f140215.dat"

Do While ie.READYSTATE <> READYSTATE_COMPLETE
DoEvents
Loop

Dim html As HTMLDocument
Set html = ie.document

MsgBox html.DocumentElement.innerText

End Sub

我知道不多,但是就像我说的那样,我是新来的.我能够将数据放入消息框中,但是我不知道如何解析它,没有它,我真的不能做上面提到的任何事情.现在怎么办?

I know it's not much, but like I said, I'm new at this. I was able to get the data into the message box, but I have no idea how to parse it, and without that I can't really do anything mentioned above. What now?

好的!!取得了一些进展!我已经设法通过使用split函数来解析它:

Alright!! Made some progress! I've managed to parse it by using the split function:

Sub Test()

Dim ie As New InternetExplorer
ie.Visible = False
ie.navigate "http://www.hnb.hr/tecajn/f140215.dat"

Do While ie.READYSTATE <> READYSTATE_COMPLETE
DoEvents
Loop

Dim html As HTMLDocument
Set html = ie.document

Dim getData As String
getData = html.DocumentElement.innerText

'replaced all the space fields with line breaks
Dim repData As String
repData = Replace(getData, "       ", vbCrLf)

'used line breaks as separators
Dim splData As Variant
splData = Split(repData, vbCrLf)

MsgBox splData(1)
MsgBox splData(2)
MsgBox splData(3)

End Sub

现在,它在消息框中显示已解析的数据.其余的应该很容易!

Right now it displays the parsed data in message boxes. The rest should be easy!

OP评论的附录:

这是后续代码的一部分:

This is a part of the continued code:

Dim cur As String
cur = ActiveDocument.SelectContentControlsByTitle("valCombo").Item(1).Range.Text
If cur = "USD" Then
  ActiveDocument.Tables(1).Cell(7, 3).Range.Text = splData(40) & " HRK"
End If
If cur = "EUR" Then
  ActiveDocument.Tables(1).Cell(7, 3).Range.Text = splData(20) & " HRK"
End If

以这种方式工作,但是我想将ActiveDocument.Tables(1).Cell(7, 3).Range.Text设置为字符串.但是,一旦我这样做,它什么也不会做.为什么会这样?

This way it works, but I'd like to set ActiveDocument.Tables(1).Cell(7, 3).Range.Text as a string. However, once I do that, it doesn't do anything. Why is that?

推荐答案

这应该对项目的前半部分有所帮助;那就是数据的检索.正如我在较早的评论中提到的那样,像这样的数据检索更适合MSXML2.ServerXMLHTT类型的对象.

This should help you with the first half of your project; that being the retrieval of the data. As I mentioned in my earlier comment, data retrieval such as this is better suited to an MSXML2.ServerXMLHTT type of object.

您将必须进入VBE的工具"►引用"并添加 Microsoft XML v6.0 .

You will have to go into the VBE's Tools ► References and add Microsoft XML v6.0.

Sub scrape_CNB()
    Dim u As String, dtDATE As Date, xmlHTTP As MSXML2.ServerXMLHTTP60
    Dim sTMP As String, sCURR As String
    Dim i As Long, j As Long, vLINE As Variant, vRATE As Variant

    On Error GoTo CleanUp

    Set xmlHTTP = New MSXML2.ServerXMLHTTP60

    sCURR = "USD"
    dtDATE = CDate("February 14, 2015")
    With xmlHTTP
        u = "http://www.hnb.hr/tecajn/f" & Format(dtDATE, "ddmmyy") & ".dat"
        .Open "GET", u, False
        .setRequestHeader "Content-Type", "text/xml"
        .send
        If .Status <> 200 Then GoTo CleanUp

        sTMP = .responseText
        vLINE = Split(sTMP, Chr(13) & Chr(10))
        For i = LBound(vLINE) To UBound(vLINE)
            If CBool(InStr(1, vLINE(i), sCURR, vbTextCompare)) Then
                Do While CBool(InStr(1, vLINE(i), Chr(32) & Chr(32))): vLINE(i) = Replace(vLINE(i), Chr(32) & Chr(32), Chr(32)): Loop
                vRATE = Split(vLINE(i), Chr(32))
                For j = LBound(vRATE) To UBound(vRATE)
                    MsgBox j & ": " & vRATE(j)
                Next j
                Exit For
            End If
        Next i

    End With

CleanUp:
    Set xmlHTTP = Nothing
End Sub

由于您没有启动完整的Internet.Explorer对象,因此应该更快,并且返回的.responseText是原始文本,而不是HTML.

Since you are not initiating a full Internet.Explorer object, this should be much quicker and the .responseText that is returned is raw text, not HTML.

TBH,我发现很难处理Word中基于光标位置的VBA编程;希望使用Excel工作表一对一的显式定义的关系.您可能要考虑使用Excel作为数据存储库,并与Word合并以提供发票输出.

TBH, I find the cursor position based VBA programming within Word to be hard to deal with; preferring the one-to-one explicitly defined relationship(s) with an Excel worksheet. You may want to consider using Excel as a data repository and merging with Word to provide your invoice output.

附录:

Dim cur As String, t as long, r as long, c as long
cur = ActiveDocument.SelectContentControlsByTitle("valCombo").Item(1).Range.Text
t = 1: r = 7: c = 3
Select Case cur
  Case "USD"
    ActiveDocument.Tables(t).Cell(r, c).Range.Text = splData(40) & " HRK"
  Case "EUR"
    ActiveDocument.Tables(t).Cell(r, c).Range.Text = splData(20) & " HRK"
End Select

这篇关于VBA从网站提取数据并将其解析到Word的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆