通过链接循环并下载PDF [英] Loop through links and download PDF's
问题描述
我有一段代码在这里已经有一段时间了,涉及不同类型的问题.它越来越接近最终版本.但是,现在我有一个问题,就是代码中有错误,并且部分代码无法正常运行.
I have a code that has been here for a while with different types of questions. This is getting closer to it's final version. However now I have a problem that there is mistake in the code and part of it is not functioning correct.
这个想法是浏览链接并获取PDF文件.链接存储在sLinks
中,请参见带有注释的行检查链接是否存储在sLinks中".代码继续前进,文件被存储在C:\temp\
中,但是在文件夹中有12个PDF之后,我遇到了一个错误,调试器指向了xHttp.Open "GET", sLink
.
The idea is to go through the links and grab PDF files. Links are getting stored in sLinks
, see line with comment "Check that links are stored in sLinks". Code goes forward and files are getting stored in C:\temp\
, but then after 12 PDF's are in folder I am getting an error and debugger is pointing to xHttp.Open "GET", sLink
.
我浏览了PDF,看起来好像所有文件都被下载了……因为在几页上有一些相同,并且至少在两页上有一个Policy PDF.这就是为什么有17个链接和12个文件的原因.无论如何,为什么会引发错误?
I went through PDF's and it looks like all are downloaded... as there are some are the same on several pages and also there is one Policy PDF on two pages at least. That's why there are 17 links and 12 files. Anyway why it is throwing an error?
可能是什么问题?
这是我的代码:
Sub DownloadFiles()
Dim xHttp As Object: Set xHttp = CreateObject("Microsoft.XMLHTTP")
Dim hDoc As MSHTML.HTMLDocument
Dim Anchors As Object
Dim Anchor As Variant
Dim sPath As String
Dim wholeURL As String
Dim internet As InternetExplorer
Dim internetdata As HTMLDocument
Dim internetlink As Object
Dim internetinnerlink As Object
Dim arrLinks As Variant
Dim sLink As String
Dim iLinkCount As Integer
Dim iCounter As Integer
Dim sLinks As String
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = False
internet.navigate ("https://www.nordicwater.com/products/waste-water/")
Do While internet.Busy
DoEvents
Loop
Do Until internet.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Set internetdata = internet.document
Set internetlink = internetdata.getElementsByTagName("a")
i = 1
For Each internetinnerlink In internetlink
If Left$(internetinnerlink, 36) = "https://www.nordicwater.com/product/" Then
sLinks = sLinks & internetinnerlink.href & vbCrLf
i = i + 1
Else
End If
ThisWorkbook.Worksheets("Sheet1").range("B1").Value = sLinks ' Check that links are stored in sLinks
Next internetinnerlink
wholeURL = "https://www.nordicwater.com/"
sPath = "C:\temp\"
arrLinks = Split(sLinks, vbCrLf)
iLinkCount = UBound(arrLinks) + 1
For iCounter = 1 To iLinkCount
sLink = arrLinks(iCounter - 1)
'Get the directory listing
xHttp.Open "GET", sLink ' DEBUGGER IS POINTING HERE
xHttp.send
'Wait for the page to load
Do Until xHttp.readyState = 4
DoEvents
Loop
'Put the page in an HTML document
Set hDoc = New MSHTML.HTMLDocument
hDoc.body.innerHTML = xHttp.responseText
'Loop through the hyperlinks on the directory listing
Set Anchors = hDoc.getElementsByTagName("a")
For Each Anchor In Anchors
'test the pathname to see if it matches your pattern
If Anchor.pathname Like "*.pdf" Then
xHttp.Open "GET", wholeURL & Anchor.pathname, False
xHttp.send
With CreateObject("Adodb.Stream")
.Type = 1
.Open
.write xHttp.responseBody
.SaveToFile sPath & getName(wholeURL & Anchor.pathname), 2 '//overwrite
End With
End If
Next
Next
End Sub
用于在链接外生成文件名的功能:
Function to build file name out of link:
Function getName(pf As String) As String
getName = Split(pf, "/")(UBound(Split(pf, "/")))
End Function
我已经解决了第一期. arrLinks = Split(p_sLinks, vbCrLf)
更改为arrLinks = Split(sLinks, vbCrLf)
.现在我面临另一个问题.
I have fixed first issue. arrLinks = Split(p_sLinks, vbCrLf)
changed to arrLinks = Split(sLinks, vbCrLf)
as it should be. Now I am facing another problem.
推荐答案
我将在调用HTTP GET之前添加If Len(sLink) > 0
检查.
I would add a If Len(sLink) > 0
check before calling the HTTP GET.
问题出在这一行:
sLinks = sLinks & internetinnerlink.href & vbCrLf
它将在sLinks列表的末尾添加一个额外的vbCrLf
.应该是:
It will add an extra vbCrLf
at the end of the sLinks list. It should be:
If sLinks <> "" Then sLinks = sLinks & vbCrLf
sLinks = sLinks & internetinnerlink.href
这样,最后一个链接之后就不会出现vbCrLf
This way there won't be a vbCrLf
after the last link
这篇关于通过链接循环并下载PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!