使用VBA HTML从网页下载文件 [英] Download files from a web page using VBA HTML
问题描述
我一直在拼命地尝试几个月,自动化一个csv文件被下载,管理并保存在给定位置的过程。
到目前为止,我只管理excel vba打开网页,点击底部下载csv文件,但代码停止并需要手动干预以完成,我希望它完全自动化,如果可能。
看到使用的代码(我不是作者):
I have been trying desperately for months to automate a process whereby a csv file is downloaded, maned and saved in a given location. so far I only managed with excel vba to open the web page and click the bottom to download the csv file, but the code stop and required a manual intervention to to be completed, i would like it to be fully automated if possible. see the code used (I am not the author):
Sub WebDataExtraction()
Dim URL As String
Dim IeApp As Object
Dim IeDoc As Object
Dim ieForm As Object
Dim ieObj As Object
Dim objColl As Collection
URL = "http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT"
Set IeApp = CreateObject("InternetExplorer.Application")
IeApp.Visible = True
IeApp.Navigate URL
Do Until IeApp.ReadyState = READYSTATE_COMPLETE
Loop
Set IeDoc = IeApp.Document
For Each ele In IeApp.Document.getElementsByTagName("span")
If ele.innerHTML = "CSV" Then
Application.Wait (Now + TimeValue("0:00:15"))
DoEvents
ele.Click
'At this point you need to Save the document manually
' or figure out for yourself how to automate this interaction.
Test_Save_As_Set_Filename
File_Download_Click_Save
End If
Next
IeApp.Quit
End Sub"
提前感谢
Nunzio
推荐答案
我发布了第二个答案,因为我相信我的第一个答案对于许多类似的应用是足够的,在这种情况下它不起作用。
I am posting a second answer, since, as I believe my first answer is adequate for many similar applications, it does not work in this instance.
为什么其他方法失败:
-
.Click
方法:这引发了一个新的窗口,期望用户在运行时输入,似乎不可能使用WinAPI
来控制这个窗口,或者至少不是我可以确定的任何方式,代码执行在.Click
行停止,直到用户手动介入,没有办法使用GoTo
或等待
或任何其他方法来规避此行为。 - 使用
WinAPI
函数下载urce文件直接不起作用,因为按钮的URL不包含文件,而是一个动态地为文件服务的js函数。
- The
.Click
method: This raises a new window which expects user input at run-time, it doesn't seem to be possible to use theWinAPI
to control this window. Or, at least not any way that I can determine. The code execution stops on the.Click
line until the user manually intervenes, there is no way to use aGoTo
or aWait
or any other method to circumvent this behavior. - Using a
WinAPI
function to download the source file directly does not work, since the button's URL does not contain a file, but rather a js function that serves the file dynamically.
以下是我提出的解决方案解决方案:
您可以阅读网页的 .body.InnerText
,使用 FileSystemObject
将其写入纯文本/ csv文件然后使用正则表达式
和字符串操作的组合,将数据解析为正确分隔的CSV文件。
You can read the webpage's .body.InnerText
, write that out to a plain text/csv file using FileSystemObject
and then with a combination of Regular Expressions
and string manipulation, parse the data into a properly delimited CSV file.
Sub WebDataExtraction()
Dim url As String
Dim fName As String
Dim lnText As String
Dim varLine() As Variant
Dim vLn As Variant
Dim newText As String
Dim leftText As String
Dim breakTime As Date
'## Requires reference to Microsoft VBScript Regular Expressions 5.5
Dim REMatches As MatchCollection
Dim m As Match
'## Requires reference to Microsoft Internet Controls
Dim IeApp As InternetExplorer
'## Requires reference to Microsoft HTML object library
Dim IeDoc As HTMLDocument
Dim ele As HTMLFormElement
'## Requires reference to Microsoft Scripting Runtime
Dim fso As FileSystemObject
Dim f As TextStream
Dim ln As Long: ln = 1
breakTime = DateAdd("s", 60, Now)
url = "http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT"
Set IeApp = CreateObject("InternetExplorer.Application")
With IeApp
.Visible = True
.Navigate url
Do Until .ReadyState = 4
Loop
Set IeDoc = .Document
End With
'Wait for the data to display on the page
Do
If Now >= breakTime Then
If MsgBox("The website is taking longer than usual, would you like to continue waiting?", vbYesNo) = vbNo Then
GoTo EarlyExit
Else:
breakTime = DateAdd("s", 60, Now)
End If
End If
Loop While Trim(IeDoc.body.innerText) = "XML CSV Please Wait Data Loading Sorting"
'## Create the text file
fName = ActiveWorkbook.Path & "\exported-csv.csv"
Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.OpenTextFile(fName, 2, True, -1)
f.Write IeDoc.body.innerText
f.Close
Set f = Nothing
'## Read the text file
Set f = fso.OpenTextFile(fName, 1, False, -1)
Do
lnText = f.ReadLine
'## The data starts on the 4th line in the InnerText.
If ln >= 4 Then
'## Return a collection of matching date/timestamps to which we can parse
Set REMatches = SplitLine(lnText)
newText = lnText
For Each m In REMatches
newText = Replace(newText, m.Value, ("," & m.Value & ","), , -1, vbTextCompare)
Next
'## Get rid of consecutive delimiters:
Do
newText = Replace(newText, ",,", ",")
Loop While InStr(1, newText, ",,", vbBinaryCompare) <> 0
'## Then use some string manipulation to parse out the first 2 columns which are
' not a match to the RegExp we used above.
leftText = Left(newText, InStr(1, newText, ",", vbTextCompare) - 1)
leftText = Left(leftText, 10) & "," & Right(leftText, Len(leftText) - 10)
newText = Right(newText, Len(newText) - InStr(1, newText, ",", vbTextCompare))
newText = leftText & "," & newText
'## Store these lines in an array
ReDim Preserve varLine(ln - 4)
varLine(ln - 4) = newText
End If
ln = ln + 1
Loop While Not f.AtEndOfStream
f.Close
'## Re-open the file for writing the delimited lines:
Set f = fso.OpenTextFile(fName, 2, True, -1)
'## Iterate over the array and write the data in CSV:
For Each vLn In varLine
'Omit blank lines, if any.
If Len(vLn) <> 0 Then f.WriteLine vLn
Next
f.Close
EarlyExit:
Set fso = Nothing
Set f = Nothing
IeApp.Quit
Set IeApp = Nothing
End Sub
Function SplitLine(strLine As String) As MatchCollection
'returns a RegExp MatchCollection of Date/Timestamps found in each line
'## Requires reference to Microsoft VBScript Regular Expressions 5.5
Dim RE As RegExp
Dim matches As MatchCollection
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = True
.IgnoreCase = True
'## Use this RegEx pattern to parse the date & timestamps:
.Pattern = "(19|20)\d\d[-](0[1-9]|1[012])[-](0[1-9]|[12][0-9]|3[01])[ ]\d\d?:\d\d:\d\d"
End With
Set matches = RE.Execute(strLine)
Set SplitLine = matches
End Function
这篇关于使用VBA HTML从网页下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!