Excel VBA中,如何检查网页是否完全加载? [英] In Excel VBA, what is the way to check if the web page is fully loaded?

查看:67
本文介绍了Excel VBA中,如何检查网页是否完全加载?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了在网页完全加载之前暂停代码,我几乎一直都在使用下面的方法并取得了巨大的成功.

To pause code until a web page is fully loaded, I've been using the method below with great success almost all of the time.

Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

但偶尔,我看到文本内容在方法确定页面已完全加载后加载,因此未提取内容.

But occasionally, I see text content load after the method determines that the page is fully loaded, and so that content is not extracted.

但是,如果我通过 F8 单步执行代码,则每次都会提取内容.这样做的速度与我反复按 F8 键的速度差不多.

However, if I step through the code via F8, the content is extracted every time. This is done about as fast as I can press the F8 key repeatedly.

那么如何在代码继续提取数据之前检查以确保页面及其所有内容已完全加载?

So how can I check to ensure the page, and all its content, is fully loaded before the code continues to extract data?

在这两种情况下,IE 都在隐形运行.但是,我已经在 IE 可见的情况下进行了尝试,并且在我正在处理的页面上的这个特定位置实际上有内容.

In both cases, IE is running invisibly. However, I've tried this with IE visible and there is actually content in this specific location on the pages I'm working with.

这是在 Excel 2016 中使用 VBA 脚本完成的.具体内容请求写成:

This is being done in Excel 2016, using VBA script. The specific content request is written like:

 'get item name from page and write it to the first cell on the first empty row available
 Set itemName = objIE.document.querySelector(".the-item-name")
 Worksheets("Results").Range("A1048576").End(xlUp).Offset(1, 0).Value = itemName.innerText

我已阅读Excel VBA:等待 JavaScript在 Internet Explorer 中执行,因为我认为在加载文档后可能会添加值,以防止任何人抓取数据.但是,我似乎无法确定任何可能正在执行此操作的脚本.不代表它不存在.我只是还看不到.

I've read through Excel VBA: Wait for JavaScript execution in Internet Explorer because I think that maybe the values are getting added after the document is loaded, in an effort to prevent anyone from scraping data. However, I can't seem to identify any script that may be doing that. Doesn't mean it isn't there. I just can't see it yet.

存在此问题的页面的具体示例是 URL

A specific example of the page with this issue is URL

https:///www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html

最初 product-total-price div 元素包含破折号 (-),在加载价格之前,这就是请求将返回的内容:-/each$11.29/每个.

Initially product-total-price div element contains dash (-), prior to the price being loaded, so that's what the request will return: - / each instead of $11.29 / each.

我有一个解决方法,但它并不像我希望的那样高效或简洁.我测试返回的字符串是否存在破折号.如果它在那里,循环并再次检查它,否则捕获它并将其插入到工作表中.

I have a workaround, but it's not as efficient or as concise as I'd like it to be. I test the string returned for the presence of the dash. If it's there, loop and check it again, else capture it and insert it into the worksheet.

setPriceUM:
    Set hdPriceUM = objIE.document.querySelector(".product-total-price").innerTe‌​‌​xt
    hdPriceUMString = hdPriceUM.innerText
    stringTest = InStr(hdPriceUMString, "-")
    If stringTest = True Then
        GoTo setPriceUM
    Else
        Debug.Print hdPriceUMString
    End If

感谢您花时间阅读本文并考虑它.

Thank you for taking the time to read this and consider it.

推荐答案

网页的功能非常不同,因此没有适合所有网页的解决方案.

Functionality of webpages is very different, so there is no solution that will fit to all of them.

关于您的示例,您的解决方法是一个可行的解决方案,代码可能如下所示:

Regarding your example, your workaround is a working solution, the code might be like:

Sub TestIE()

    Dim q

    With CreateObject("InternetExplorer.Application")
        .Visible = True
        .Navigate "https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html"
        ' Wait IE
        Do While .readyState < 4 Or .Busy
            DoEvents
        Loop
        ' Wait document
        Do While .document.readyState <> "complete"
            DoEvents
        Loop
        ' Wait element
        Do
            q = .document.querySelector(".product-total-price").innerText
            If Left(q, 1) <> "-" Then Exit Do
            DoEvents
        Loop
        .Quit
    End With
    Debug.Print q

End Sub

无论如何,您需要使用浏览器开发工具 (F12) 查看网页加载过程、XHR 和 DOM 修改.按照这种方式,您可能会发现众多 XHR 之一以 JSON 格式返回价格.它在页面加载时价格出现之前登录浏览器开发人员工具的网络选项卡.XHR 是由其中一个加载的 JS 生成的,特别是在页面加载事件之后.试试这个网址(我只是从网络标签中复制了它):

Anyway, you need to look into the webpage loading process, XHRs and DOM modifications, using browser developer tools (F12). Going that way, you may find that one of the numerous XHRs returns the price in JSON format. It's logged on network tab of browser developer tools right before the price appearing while the page is loading. That XHR is made by one of the loaded JS, notably after the page loaded event. Try this URL (I just copied it from network tab):

https:///www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en

因此,您可以复制该 XHR 并通过拆分来提取价格:

So you may just reproduce that XHR and extract the price by splitting:

Sub TestXHR()

    Dim q

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en", False
        .Send
        q = .ResponseText
    End With
    q = Replace(q, " : ", ":")
    q = Split(q, """displayPrice""", 2)(1)
    q = Split(q, """formattedValue"":""", 2)(1)
    q = Split(q, """", 2)(0)
    Debug.Print q

End Sub

但同样,没有常见的情况.

But again, there is no common case.

你也可以使用 JSON 解析器,看看一些例子.

You may also use JSON parser, take a look at some examples.

这篇关于Excel VBA中,如何检查网页是否完全加载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆