在VBA滚动网页时等待窗口重新加载 [英] Wait for window to reload when scrolling web page in VBA
问题描述
我已经编写了一个VBA宏来计算Google搜索返回的特定术语的(大概)数量。通过近似我的意思是程序应该计算返回的图像数量,向下滚动以加载更多(如适用),最多可计入400张图像。以下是(简化)代码:
Sub GoogleCount()
'''
'[构造URL('fullUrl')的代码]
'''
Set objIE = New InternetExplorer
objIE.navigate fullUrl
Do While objIE.Busy = True或objIE .readyState<> 4:DoEvents:Loop
设置currPage = objIE.document
'返回的图像数
newNum = currPage.getElementById(rg_s)。getElementsByTagName(IMG)。长度
'向下滚动直到count = 400(max)或没有变化值
Do While newNum> = 100 And newNum< 400和newNum oldNum
oldNum = newNum
currPage.parentWindow.scrollBy 0,100000
Do While objIE.Busy = True或objIE.readyState<> 4:DoEvents:Loop
newNum = currPage.getElementById(rg_s)。getElementsByTagName(IMG)。长度
循环
'''
'[代码将newNum的值粘贴到我的工作簿中,并执行其他进度报告]
'''
End Sub
我对滚动感到不高兴,感觉非常'manual',特别是当您用固定值滚动时(任何使其变为动态的点)即找到页面的结尾并滚动到那里)。
但主要的问题是它不起作用:当我执行代码,它计数前100个(或更少)的图像精细。但是当它应该滚动并计数一些时,我得到返回值100。慢慢地通过F8代码,我得到正确的数字(最多400),这导致我得出结论,代码运行太快(我可能是错的)。
为了减慢代码,我尝试添加 objIE.readyState
检查循环,但是因为我只是滚动我不认为它是页面'加载',所以循环在等待新图像加载时无效。
我考虑过添加时间延迟。我已经在使用
私有声明Sub Sleep Libkernel32(ByVal dwMilliseconds As Long)
工作表中的其他地方 - 所以我可以添加一个毫秒级的延迟。
但是我真的想避免使用,因为这个代码运行c。 50个不同的搜索,并且需要足够长的时间才能执行,添加足够长的固定延迟来适应缓慢的连接速度并不理想。另外,互联网速度变化很大,一个固定的延迟是非常不可靠的 - 我可以进行一些连接测试,以获得更好的球场数字,但最好的选择显然只是等待只要你必须。 / p>
或者更好的还是找到一种不同的图像计数方法,最好是不需要重新加载页面的方法4次!
任何想法?
NB。如果你想调试自己,一个很好的图像搜索将 fullUrl
设置为 https://www.google.com/search?q=堆栈溢出|交换& tbm = isch& source = lnt& tbs = isz:ex,iszw:312,iszh:390
因为它返回> 100个图像,但少于400个,所以你可以测试所有方面代码
通过进一步的研究,我提出了这种方法:
Dim myDiv As HTMLDivElement:Set myDiv = currPage.getElementById(fbar)
Dim elemRect As IHTMLRect:Set elemRect = myDiv.getBoundingClientRect
直到elemRect.bottom> 0
currPage.parentWindow.scrollBy 0,10000
设置elemRect = myDiv.getBoundingClientRect
循环
myDiv.ScrollIntoView
其中 currPage
是HTML网页( Dim currPage As HTMLDocument
)和 myDiv
是一个特定的元素。该类型并不重要,但应该注意的是, c $ c \\ myDiv 总是位于文档的底部,而是。因此,对于Google图片,这是帮助栏,您只需滚动浏览所有图片结果后才能查看。
如何运作
代码的工作原理如下: myDiv.getBoundingClientRect
是一种在浏览器中检查元素是否可见的方式 - 这就是为什么我们需要查看页面底部的元素,就像滚动一样,直到可见,然后其他所有内容都必须加载。
Do Until ... Loop
来自;我们循环直到 elemRect.bottom
值不为零(当元素不在视图中时,它为零,一旦它被视为非零数字)。有关更多信息,请参阅此处
最后,使用 myDiv.ScrollIntoView
让浏览器正确到底;这是必要的,因为元素在屏幕上之前 BoundingClientRect 稍微可见,所以我们需要滚动最后一位才能加载最终图像。
为什么不从起始开始使用 ScrollIntoView
因为元素尚未加载。
I have written a VBA macro to count the (approximate) number of images returned for a Google search of a specific term. By approximate I mean that the program should count the number of images returned, scroll down to load some more (where applicable) up to a max of 400 images counted. Here's the (simplified) code:
Sub GoogleCount ()
'''
'[Code to construct the URL ('fullUrl')]
'''
Set objIE = New InternetExplorer
objIE.navigate fullUrl
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Set currPage = objIE.document
'Count images returned
newNum = currPage.getElementById("rg_s").getElementsByTagName("IMG").Length
'Scroll down until count = 400 (max) or no change in value
Do While newNum >= 100 And newNum < 400 And newNum <> oldNum
oldNum = newNum
currPage.parentWindow.scrollBy 0, 100000
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
newNum = currPage.getElementById("rg_s").getElementsByTagName("IMG").Length
Loop
'''
'[Code to paste the value of newNum into my workbook, and do some other progress reporting]
'''
End Sub
I'm unhappy about scrolling, it feels very 'manual', especially when you are scrolling by a fixed value (any point making it dynamic? i.e. finding the end of the page and scrolling to there).
But the main problem is that it doesn't work: when I execute the code, it counts the first 100 (or fewer) images fine. But when it's supposed to scroll and count some more, I get the value of 100 returned. Slowly stepping through the code with F8 I get the proper numbers (max 400), which leads me to conclude that the code is running through too quickly (I may be wrong).
To slow the code down I tried adding the objIE.readyState
check loop, but because I'm only scrolling I don't think it counts as the page 're-loading' so the loop is ineffective in waiting for the new images to load.
I've thought about adding in a time delay instead. I am already employing
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
elsewhere in the worksheet - so, I could add as small as a millisecond-order delay.
But I really want to avoid using that, as this code runs for c. 50 different searches and takes long enough to execute already, adding in fixed delays that are long enough to accommodate slow connection speeds would not be ideal. Also, internet speeds vary so much that a fixed delay is very unreliable - I could carry out some kind of connection test to get a better ball-park figure, but the best option is obviously only to wait as long as you have to.
Or better still find a different way of counting the images, preferably one which doesn't involve re-loading the page 4 times! Any ideas?
NB. If you want to debug yourself, a good image search to set fullUrl
to might be https://www.google.com/search?q=stack overflow|exchange&tbm=isch&source=lnt&tbs=isz:ex,iszw:312,iszh:390
as it returns >100 images but fewer than 400 so you can test all aspects of the code
Through further research I've come up with this approach:
Dim myDiv As HTMLDivElement: Set myDiv = currPage.getElementById("fbar")
Dim elemRect As IHTMLRect: Set elemRect = myDiv.getBoundingClientRect
Do Until elemRect.bottom > 0
currPage.parentWindow.scrollBy 0, 10000
Set elemRect = myDiv.getBoundingClientRect
Loop
myDiv.ScrollIntoView
Where currPage
is the HTML webpage (Dim currPage As HTMLDocument
) and myDiv
is a particular element. The type is not important, but it should be noted that myDiv
is always located at the bottom of the document and is only loaded once everything else has been. So for Google images that's the help bar, which you only get to after scrolling through all the image results.
How it works
The code works as follows: myDiv.getBoundingClientRect
is a way of checking whether an element is visible in the browser - that's why we need to look at an element at the bottom of the page, as if we scroll until that becomes visible, then everything else must have loaded too.
That's of course where the Do Until...Loop
comes from; we loop until the elemRect.bottom
value is not zero (as when the element is not in view, it's zero, once it's in view it becomes a non-zero number). More info on that see here
Finally, use a myDiv.ScrollIntoView
to get the browser right to the bottom; this is necessary because the BoundingClientRect
is visible slightly before the element is on screen, so we need to scroll the last bit in order to load the final images.
Why not just use ScrollIntoView
form the start? It doesn't work, since the element hasn't loaded yet.
这篇关于在VBA滚动网页时等待窗口重新加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!