在Excel VBA中,检查网页是否已完全加载的方法是什么? [英] In Excel VBA, what is the way to check if the web page is fully loaded?

查看:535
本文介绍了在Excel VBA中,检查网页是否已完全加载的方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要暂停代码直到网页完全加载,我几乎一直都在使用以下方法取得巨大成功。

To pause code until a web page is fully loaded, I've been using the method below with great success almost all of the time.

Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

但偶尔,我看到文本内容加载后方法确定页面是完全加载,以便不提取内容。

But occasionally, I see text content load after the method determines that the page is fully loaded, and so that content is not extracted.

但是,如果我通过F8单步执行代码,则每次都会提取内容。这是我可以反复按F8键的速度。

However, if I step through the code via F8, the content is extracted every time. This is done about as fast as I can press the F8 key repeatedly.

那么如何确保页面及其所有内容在完成之前完全加载代码继续提取数据?

So how can I check to ensure the page, and all its content, is fully loaded before the code continues to extract data?

在这两种情况下,IE都是无形的运行。但是,我已经尝试使用IE浏览器,并且在我正在使用的页面上的这个特定位置实际上有内容。

In both cases, IE is running invisibly. However, I've tried this with IE visible and there is actually content in this specific location on the pages I'm working with.

这是在Excel 2016中完成的,使用VBA脚本。具体内容请求如下所示:

This is being done in Excel 2016, using VBA script. The specific content request is written like:

 'get item name from page and write it to the first cell on the first empty row available
 Set itemName = objIE.document.querySelector(".the-item-name")
 Worksheets("Results").Range("A1048576").End(xlUp).Offset(1, 0).Value = itemName.innerText

我读过 Excel VBA:等待在Internet Explorer中执行JavaScript 因为我认为值可能是在加载文档后添加,以防止任何人抓取数据。但是,我似乎无法识别可能正在执行此操作的任何脚本。并不意味着它不在那里。我还是看不到它。

I've read through Excel VBA: Wait for JavaScript execution in Internet Explorer because I think that maybe the values are getting added after the document is loaded, in an effort to prevent anyone from scraping data. However, I can't seem to identify any script that may be doing that. Doesn't mean it isn't there. I just can't see it yet.

此问题页面的一个具体示例是网址

A specific example of the page with this issue is URL

https:// www。 homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html

最初 product-total-price div元素包含破折号( - ),在加载价格之前,这就是请求将返回的内容: - / each 而不是 $ 11.29 /每个

Initially product-total-price div element contains dash (-), prior to the price being loaded, so that's what the request will return: - / each instead of $11.29 / each.

我有一个解决方法,但它没有效率或简洁如我希望它是。我测试返回的字符串是否存在破折号。如果它在那里,循环并再次检查它,否则捕获它并将其插入工作表。

I have a workaround, but it's not as efficient or as concise as I'd like it to be. I test the string returned for the presence of the dash. If it's there, loop and check it again, else capture it and insert it into the worksheet.

setPriceUM:
    Set hdPriceUM = objIE.document.querySelector(".product-total-price").innerTe‌​‌​xt
    hdPriceUMString = hdPriceUM.innerText
    stringTest = InStr(hdPriceUMString, "-")
    If stringTest = True Then
        GoTo setPriceUM
    Else
        Debug.Print hdPriceUMString
    End If

感谢您花时间阅读并考虑它。

Thank you for taking the time to read this and consider it.

推荐答案

功能网页是非常不同的,因此没有适合所有网页的解决方案。

Functionality of webpages is very different, so there is no solution that will fit to all of them.

关于您的示例,您的解决方法是一个有效的解决方案,代码可能如下:

Regarding your example, your workaround is a working solution, the code might be like:

Sub TestIE()

    Dim q

    With CreateObject("InternetExplorer.Application")
        .Visible = True
        .Navigate "https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html"
        ' Wait IE
        Do While .readyState < 4 Or .Busy
            DoEvents
        Loop
        ' Wait document
        Do While .document.readyState <> "complete"
            DoEvents
        Loop
        ' Wait element
        Do
            q = .document.querySelector(".product-total-price").innerText
            If Left(q, 1) <> "-" Then Exit Do
            DoEvents
        Loop
        .Quit
    End With
    Debug.Print q

End Sub

无论如何,您需要使用浏览器开发人员工具(F12)查看网页加载过程,XHR和DOM修改。通过这种方式,您可能会发现众多XHR中的一个以JSON格式返回价格。它在浏览器开发者工具的网络选项卡上登录,正好在页面加载时出现价格。 XHR是由一个加载的JS创建的,特别是在页面加载事件之后。试试这个网址(我刚从网络标签中复制过来):

Anyway, you need to look into the webpage loading process, XHRs and DOM modifications, using browser developer tools (F12). Going that way, you may find that one of the numerous XHRs returns the price in JSON format. It's logged on network tab of browser developer tools right before the price appearing while the page is loading. That XHR is made by one of the loaded JS, notably after the page loaded event. Try this URL (I just copied it from network tab):

https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online& lang = en

所以你可以重现那个XHR并通过拆分来提取价格:

So you may just reproduce that XHR and extract the price by splitting:

Sub TestXHR()

    Dim q

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en", False
        .Send
        q = .ResponseText
    End With
    q = Replace(q, " : ", ":")
    q = Split(q, """displayPrice""", 2)(1)
    q = Split(q, """formattedValue"":""", 2)(1)
    q = Split(q, """", 2)(0)
    Debug.Print q

End Sub

但同样,没有常见的情况。

But again, there is no common case.

你也可以使用JSON解析器,看看一些例子

You may also use JSON parser, take a look at some examples.

这篇关于在Excel VBA中,检查网页是否已完全加载的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆