在VBA滚动网页时等待窗口重新加载 [英] Wait for window to reload when scrolling web page in VBA

查看:533
本文介绍了在VBA滚动网页时等待窗口重新加载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了一个VBA宏来计算Google搜索返回的特定术语的(大概)数量。通过近似我的意思是程序应该计算返回的图像数量,向下滚动以加载更多(如适用),最多可计入400张图像。以下是(简化)代码:

  Sub GoogleCount()

'''
'[构造URL('fullUrl')的代码]
'''
Set objIE = New InternetExplorer
objIE.navigate fullUrl
Do While objIE.Busy = True或objIE .readyState<> 4:DoEvents:Loop
设置currPage = objIE.document
'返回的图像数
newNum = currPage.getElementById(rg_s)。getElementsByTagName(IMG)。长度
'向下滚动直到count = 400(max)或没有变化值
Do While newNum> = 100 And newNum< 400和newNum oldNum
oldNum = newNum
currPage.parentWindow.scrollBy 0,100000
Do While objIE.Busy = True或objIE.readyState<> 4:DoEvents:Loop
newNum = currPage.getElementById(rg_s)。getElementsByTagName(IMG)。长度
循环

'''
'[代码将newNum的值粘贴到我的工作簿中,并执行其他进度报告]
'''
End Sub

我对滚动感到不高兴,感觉非常'manual',特别是当您用固定值滚动时(任何使其变为动态的点)即找到页面的结尾并滚动到那里)。



但主要的问题是它不起作用:当我执行代码,它计数前100个(或更少)的图像精细。但是当它应该滚动并计数一些时,我得到返回值100。慢慢地通过F8代码,我得到正确的数字(最多400),这导致我得出结论,代码运行太快(我可能是错的)。



为了减慢代码,我尝试添加 objIE.readyState 检查循环,但是因为我只是滚动我不认为它是页面'加载',所以循环在等待新图像加载时无效。



我考虑过添加时间延迟。我已经在使用

 私有声明Sub Sleep Libkernel32(ByVal dwMilliseconds As Long)

工作表中的其他地方 - 所以我可以添加一个毫秒级的延迟。



但是我真的想避免使用,因为这个代码运行c。 50个不同的搜索,并且需要足够长的时间才能执行,添加足够长的固定延迟来适应缓慢的连接速度并不理想。另外,互联网速度变化很大,一个固定的延迟是非常不可靠的 - 我可以进行一些连接测试,以获得更好的球场数字,但最好的选择显然只是等待只要你必须。 / p>

或者更好的还是找到一种不同的图像计数方法,最好是不需要重新加载页面的方法4次!
任何想法?



NB。如果你想调试自己,一个很好的图像搜索将 fullUrl 设置为 https://www.google.com/search?q=堆栈溢出|交换& tbm = isch& source = lnt& tbs = isz:ex,iszw:312,iszh:390 因为它返回> 100个图像,但少于400个,所以你可以测试所有方面代码

解决方案

通过进一步的研究,我提出了这种方法:

  Dim myDiv As HTMLDivElement:Set myDiv = currPage.getElementById(fbar)
Dim elemRect As IHTMLRect:Set elemRect = myDiv.getBoundingClientRect
直到elemRect.bottom> 0
currPage.parentWindow.scrollBy 0,10000
设置elemRect = myDiv.getBoundingClientRect
循环
myDiv.ScrollIntoView

其中 currPage 是HTML网页( Dim currPage As HTMLDocument )和 myDiv 是一个特定的元素。该类型并不重要,但应该注意的是, c $ c \\ myDiv 总是位于文档的底部,而是。因此,对于Google图片,这是帮助栏,您只需滚动浏览所有图片结果后才能查看。



如何运作



代码的工作原理如下: myDiv.getBoundingClientRect 是一种在浏览器中检查元素是否可见的方式 - 这就是为什么我们需要查看页面底部的元素,就像滚动一样,直到可见,然后其他所有内容都必须加载。



Do Until ... Loop 来自;我们循环直到 elemRect.bottom 值不为零(当元素不在视图中时,它为零,一旦它被视为非零数字)。有关更多信息,请参阅此处



最后,使用 myDiv.ScrollIntoView 让浏览器正确到底;这是必要的,因为元素在屏幕上之前 BoundingClientRect 稍微可见,所以我们需要滚动最后一位才能加载最终图像。



为什么不从起始开始使用 ScrollIntoView 因为元素尚未加载。


I have written a VBA macro to count the (approximate) number of images returned for a Google search of a specific term. By approximate I mean that the program should count the number of images returned, scroll down to load some more (where applicable) up to a max of 400 images counted. Here's the (simplified) code:

Sub GoogleCount ()

'''
'[Code to construct the URL ('fullUrl')]
'''
    Set objIE = New InternetExplorer
    objIE.navigate fullUrl
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    Set currPage = objIE.document
    'Count images returned
    newNum = currPage.getElementById("rg_s").getElementsByTagName("IMG").Length
    'Scroll down until count = 400 (max) or no change in value
    Do While newNum >= 100 And newNum < 400 And newNum <> oldNum
        oldNum = newNum
        currPage.parentWindow.scrollBy 0, 100000
        Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
        newNum = currPage.getElementById("rg_s").getElementsByTagName("IMG").Length
    Loop

'''
'[Code to paste the value of newNum into my workbook, and do some other progress reporting]
'''
End Sub

I'm unhappy about scrolling, it feels very 'manual', especially when you are scrolling by a fixed value (any point making it dynamic? i.e. finding the end of the page and scrolling to there).

But the main problem is that it doesn't work: when I execute the code, it counts the first 100 (or fewer) images fine. But when it's supposed to scroll and count some more, I get the value of 100 returned. Slowly stepping through the code with F8 I get the proper numbers (max 400), which leads me to conclude that the code is running through too quickly (I may be wrong).

To slow the code down I tried adding the objIE.readyState check loop, but because I'm only scrolling I don't think it counts as the page 're-loading' so the loop is ineffective in waiting for the new images to load.

I've thought about adding in a time delay instead. I am already employing

Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

elsewhere in the worksheet - so, I could add as small as a millisecond-order delay.

But I really want to avoid using that, as this code runs for c. 50 different searches and takes long enough to execute already, adding in fixed delays that are long enough to accommodate slow connection speeds would not be ideal. Also, internet speeds vary so much that a fixed delay is very unreliable - I could carry out some kind of connection test to get a better ball-park figure, but the best option is obviously only to wait as long as you have to.

Or better still find a different way of counting the images, preferably one which doesn't involve re-loading the page 4 times! Any ideas?

NB. If you want to debug yourself, a good image search to set fullUrl to might be https://www.google.com/search?q=stack overflow|exchange&tbm=isch&source=lnt&tbs=isz:ex,iszw:312,iszh:390 as it returns >100 images but fewer than 400 so you can test all aspects of the code

解决方案

Through further research I've come up with this approach:

Dim myDiv As HTMLDivElement: Set myDiv = currPage.getElementById("fbar")
Dim elemRect As IHTMLRect: Set elemRect = myDiv.getBoundingClientRect
Do Until elemRect.bottom > 0
    currPage.parentWindow.scrollBy 0, 10000
    Set elemRect = myDiv.getBoundingClientRect
Loop
myDiv.ScrollIntoView

Where currPage is the HTML webpage (Dim currPage As HTMLDocument) and myDiv is a particular element. The type is not important, but it should be noted that myDiv is always located at the bottom of the document and is only loaded once everything else has been. So for Google images that's the help bar, which you only get to after scrolling through all the image results.

How it works

The code works as follows: myDiv.getBoundingClientRect is a way of checking whether an element is visible in the browser - that's why we need to look at an element at the bottom of the page, as if we scroll until that becomes visible, then everything else must have loaded too.

That's of course where the Do Until...Loop comes from; we loop until the elemRect.bottom value is not zero (as when the element is not in view, it's zero, once it's in view it becomes a non-zero number). More info on that see here

Finally, use a myDiv.ScrollIntoView to get the browser right to the bottom; this is necessary because the BoundingClientRect is visible slightly before the element is on screen, so we need to scroll the last bit in order to load the final images.

Why not just use ScrollIntoView form the start? It doesn't work, since the element hasn't loaded yet.

这篇关于在VBA滚动网页时等待窗口重新加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆