CSS选择器QuerySelector替代 [英] CSS selector QuerySelector alternative
问题描述
为了找到有关如何使用XMLHTTP获取元数据的资料,我进行了大量搜索.而且我认为使用早期绑定"方法无法做到这一点.唯一有效的方法是通过 CreateObject("HTMLFile")
进行后期绑定,并处理该后期绑定的HTML.这种方法的缺点是它不支持使用 QuerySelector
或 QuerySelectorAll
.现在,我尝试不使用 QuerySelector
I have searched a lot and a lot so as to find material about how to get meta data using XMLHTTP. And I think that's impossible to do that using the Early binding method. The only approach that will work is the late binding by CreateObject("HTMLFile")
and dealing with that HTML which is late binding. The disadvantage of this approach is that it doesn't support the use of the QuerySelector
or QuerySelectorAll
..
Now I am trying to find alternative to this CSS selector .. without using the QuerySelector
Set post = .querySelector("table div span[itemprop='lowPrice']")
这会产生一个错误..我找不到更简单的方法来查找元素这是HTML内容
This arises an error .. and I can't find easier way to find the element Here's the HTML content
<table class="p">
<tbody><tr>
<td class="foto">
<div class="foto">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/#gallery-open" target="_blank" class="gallery-link product-detail__gallery-link" onclick="dataLayer.push({'event':'sendEvent','event_category':'Product Detail - Desktop','event_action':'Gallery','event_label':'Otev\u0159en\u00ed galerie','event_value':0});">
<img src="https://im9.cz/iR/importprodukt-orig/4c2/4c2b1733c8b233edd5052d3063ac46d9--mmf250x250.jpg" alt="Brit Premium by Nature Adult L 15 kg" width="250" height="250" id="picture-main">
<span class="image-hover">
<span class="image-overlay"></span>
<span class="js-test-image-count-info image-count-info">Galerie <span class="picture-count">(2)</span></span>
</span>
<span class="product-detail__gallery-link__image__count-info">Galerie
<span class="product-detail__gallery-link__image__count-info__count">(2)</span>
</span>
</a>
<a href="https://krmivo-psy.heureka.cz/top-produkty/" class="top-ico gtm-header-link" data-gtm-link-description="Pořadí v TOP produktech"><span>Top</span><strong>1.</strong></a>
<div class="poty-ico">
<a href="http://www.produktroku.cz/" target="_blank"><img src="https://im9.cz/iR/recenze-externi/107.png" alt="Produkt Roku 2019" class="product-of-year-badge"></a></div>
</div>
</td>
<td>
<div class="main-info">
<div class="text-cover">
<div id="n649054946" data-id="649054946" class="item js-public-product-id">
<h2 itemprop="name">Brit Premium by Nature Adult L 15 kg</h2>
</div>
<div class="rating-box" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
<p class="eval">
<strong itemprop="ratingValue">95%</strong>
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/pridat-uzivatelskou-recenzi/#section">
<span class="rating"><span class="hidden">Hodnocení produktu: 95%</span><span class="over" title="Hodnocení produktu: 95%"><span style="width: 75px;"></span></span></span>
</a>
</p>
<span class="hidden-microdata" itemprop="ratingCount">
456
</span>
<p class="review-count delimiter-blank">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/recenze/#section" class="gtm-header-link" data-gtm-link-description="Počet recenzí">
<span itemprop="reviewCount">344</span>
recenzí
</a>
</p>
<div class="cleaner"></div>
<p class="rating-box__item rating-box__favourite">
<a href="https://ucet.heureka.cz/prihlaseni?callbackUrl=https%3A%2F%2Fkrmivo-psy.heureka.cz%2Fbrit-premium-by-nature-adult-l-15-kg%2F" title="Chci to" class="gtm-header-link" data-gtm-link-description="Akce - oblíbené">Přidat do oblíbených</a>
</p>
<p id="cli649054946" class="rating-box__item rating-box__compare delimiter-blank cl-add">
<a class="checkbox gtm-header-link" data-gtm-link-description="Akce - porovnání" href="#" title="Porovnat">Přidat do porovnání</a>
</p>
<p class="delimiter-blank rating-box__item rating-box__price-watch js-price-watch-button">
<a href="#" title="Hlídat cenu" class="gtm-header-link" data-gtm-link-description="Akce - hlídat cenu">
Hlídat cenu
</a>
</p>
<p class="add-review rating-box__item rating-box__add-review delimiter-blank">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/pridat-uzivatelskou-recenzi/#section" class="gtm-header-link" data-gtm-link-description="Akce - přidat recenzi">
Přidat recenzi
</a>
</p>
</div>
<div id="top-shop-info" class="top-shop-info">
<div class="inner">
<div class="guar">
<div>
<img class="guar-badge" src="https://im9.cz/css-v2/images/guaranty-seal.png?1" alt="Garance nákupu - SpokojenyPes.cz" width="27" height="34">
</div>
</div>
<div class="shop-claim bold">
<strong>Produkt vám dodá:</strong>
</div>
<div class="shop-logo">
<a href="https://www.heureka.cz/exit/spokojenypes-cz/3180319922/?z=41" target="_blank" rel="nofollow noopener" class="gtm-header-link" data-gtm-link-description="Exit - produkt vám dodá">
<img src="https://im9.cz/iR/importobchod-orig/1983_logo--mmf130x40.png" alt="SpokojenyPes.cz" width="130" height="40">
</a>
</div>
<div class="recommendation">
<a href="https://obchody.heureka.cz/spokojenypes-cz/recenze/" class="gtm-header-link" data-gtm-link-description="Hodnocení - Produkt vám dodá">
99% zákazníků doporučuje obchod
</a>
</div>
<div class="delivery-info bold price-delivery-free">
Doprava zdarma
</div>
<div class="availability-info bold in-stock">
skladem
</div>
</div>
<a data-gtm-link-description="Další nabídky" id="top-shop-count-info" href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/porovnat-ceny/#section" class="top-shop-count-info box-active gtm-header-link">Dalších 134 nabídek od 728 Kč</a>
</div>
<p class="desc">
<span id="product-short-description">
Kompletní krmivo Brit Premium pro dospělé psy. Kuřecí receptura pro dospělé psy velkých plemen (25 - 45 kg).
<a id="product-short-description-button" href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section" title="celá specifikace Brit Premium by Nature Adult L 15 kg">celá specifikace</a>
</span>
</p>
</div>
<div itemprop="offers" itemscope="" itemtype="http://schema.org/AggregateOffer" style="display:none">
<span itemprop="lowPrice">728.00</span>
<span itemprop="highPrice">1579.00</span>
<span itemprop="offerCount">135</span>
<link itemprop="availability" href="http://schema.org/InStock">
</div>
<div itemprop="offers" itemscope="" itemtype="http://schema.org/Offer" class="price-from shopping-cart">
<link itemprop="itemCondition" href="http://schema.org/OfferItemCondition" content="http://schema.org/NewCondition">
<link itemprop="availability" href="http://schema.org/InStock">
<link itemprop="category" href="http://schema.org/category" content="Hobby / Chovatelství / Pro psy / Krmivo pro psy">
<link itemprop="image" href="http://schema.org/image" content="https://im9.cz/iR/importprodukt-orig/4c2/4c2b1733c8b233edd5052d3063ac46d9.jpg">
<div class="top-left">
<div id="top-button" class="buy-click-observed">
<p class="buy">
<a href="#" class="flat-button flat-button--top-position flat-button--orange buy-btn hb hb-3180319922 js-top-pos-btn" data-cart-position="0">
<i class="ico basket"></i>
<i class="ico check"></i>
<span class="in">Koupit na Heurece</span>
<span class="in replace">Přidáno do košíku</span>
</a>
</p>
</div>
<div class="n" id="top-offer-price">
<p class="buy-price">
<span itemprop="price" class="js-top-price" content="839.00">839 Kč</span>
<span class="price-vat-title small">s DPH</span>
<span itemprop="priceCurrency" content="CZK"></span>
</p>
</div>
<div class="clear"></div>
<div class="js-top-gifts-info top-shop-gifts-info-box">
</div>
</div>
<div class="clear"></div>
<div class="clear"></div>
</div>
<span id="new-pd"></span>
<script>
(function() {
loadScript("https:\/\/im9.cz\/js\/cache\/7e39f733-1-42bd9e7837b830d87e1af94da6d0e4a82055c56f.hash.js", function () {
var productHeadObserver = new ProductHeadObserver({ 'topShortDescElm': $('product-short-description'), 'topShopBox': $('top-shop-info'), 'maxOfferNameLength': 90 });
productHeadObserver.oneOfferInit();
});
H.Awards._reviewClick($$('#awards-list span.pa'));
var notSelectedCallback = function() {
if ('undefined' != typeof H.ShoppingCartHelper.BuyMoreOptions &&
typeof H.ShoppingCartHelper.BuyMoreOptions.buyClickNotSelectedCallback == 'function') {
H.ShoppingCartHelper.BuyMoreOptions.buyClickNotSelectedCallback();
}
};
H.ShoppingCartHelper.observeBuyClick($('top-button'), new H.ShoppingCart(), notSelectedCallback, 'js-top-pos-btn');
})();
</script>
<div class="clear"></div>
</div>
</td>
</tr>
</tbody></table>
这是整个HTML https://pastebin.com/Dgu1wk2b
这是到目前为止的代码
Sub MyTest()
Dim source As Object
Dim obj As Object
Dim resp As String
Dim post As Object
Dim a, i As Long
With CreateObject("MSXML2.xmlHttp")
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
resp = .responseText
End With
With CreateObject("HTMLFile")
.write resp
Set post = .getElementsByTagName("meta")
For i = 0 To post.Length - 1
On Error Resume Next
Debug.Print post.item(i).getAttribute("name")
If post.item(i).getAttribute("name") = "gtm:product_id" Then
Cells(2, 1).Value = post.item(i).Value
End If
If post.item(i).getAttribute("name") = "gtm:product_name" Then
Cells(2, 3).Value = post.item(i).Value
End If
If post.item(i).getAttribute("name") = "gtm:product_brand" Then
Cells(2, 4).Value = post.item(i).Value
End If
On Error GoTo 0
Next i
Set post = Nothing
Set post = .getElementsByTagName("link")
For i = 0 To post.Length - 1
On Error Resume Next
If post.item(i).getAttribute("rel") = "canonical" Then
Cells(2, 2).Value = post.item(i).href
End If
On Error GoTo 0
Next i
'I am stuck here
'Set post = .querySelector("table div span[itemprop='lowPrice']")
'Debug.Print .getElementsByTagName("table")(0).innerHTML
End With
End Sub
推荐答案
当您使用 document.body时,发现
和早期绑定的 HEAD
标记信息(元数据所在的位置)会被剥离.innerHTML = .responseText MSHTML.HTMLDocument
.请考虑您要填充的内容( document.body
).这就是为什么您无法选择 meta
信息的原因.对于后期绑定的 HTMLFile
(您不能使用 querySelector
),您正在使用 .write
方法写入文档( HTMLFile
),从而保留 HEAD
信息.
As you have discovered HEAD
tag info (where meta stuff lives) is stripped out when you use document.body.innerHTML = .responseText
with early-bound MSHTML.HTMLDocument
. Kinda what you would expect considering what you are populating (document.body
). That is why you are unable to select the meta
info. With your late bound HTMLFile
(where you can't use querySelector
) you are using .write
method which is writing to your document (HTMLFile
) and thereby retaining the HEAD
info.
您需要确保 HEAD
信息最终位于 BODY
标记内.如果希望使用早期绑定,则可以将其作为响应正文的一部分,也可以将提取的 HEAD
与新的 BODY
标记连接起来并写入 HTMLDocument
.
You need to ensure that the HEAD
info ends up within BODY
tags. Either as part of response body or extracted HEAD
concatenated with new BODY
tags and written to HTMLDocument
if wishing to use early binding.
例如为了清楚起见,我仅在 BODY
标记之间编写 HEAD
信息(不包含现有响应的其余部分)
E.g. for clarity I am writing HEAD
info between BODY
tags only (Without rest of existing response)
Option Explicit
Public Sub MetaInfoEarlyBound()
Dim html As MSHTML.HTMLDocument, htmlHead As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60
Dim re As VBScript_RegExp_55.RegExp
Set htmlHead = New MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
Set xhr = New MSXML2.XMLHTTP60
Set re = New VBScript_RegExp_55.RegExp
re.Pattern = "<head>([\s\S]+)<\/head>"
With xhr
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
htmlHead.body.innerHTML = Replace$(Replace$(re.Execute(.responseText)(0), "<head>", "<body>"), "</head>", "</body>")
html.body.innerHTML = .responseText
End With
Debug.Print htmlHead.querySelector("[name='gtm:product_price']").Value
Debug.Print html.querySelector("[itemprop=lowPrice]").innerText
End Sub
顺便说一句,我添加了两个较短的方法(比当前其他答案更短)来实现您的后期绑定目标.请注意,我已经评论了一个.
As an aside, I add two shorter methods (than current other answer) to achieve your goal with late-bound. Note I have commented one out.
Public Sub MetaInfoLateBound()
Dim resp As String
With CreateObject("MSXML2.xmlHttp")
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
resp = .responseText
End With
With CreateObject("HTMLFile")
.write resp
' Dim post As Object
'
' Set post = .getElementById("new-pd")
' Debug.Print post.PreviousSibling.PreviousSibling.getElementsByTagName("span")(0).innertext
'
Dim metas As Object, i As Long
Set metas = .getElementsByTagName("meta")
For i = 0 To metas.Length - 1
If metas.Item(i).Name = "gtm:product_price" Then
Debug.Print metas.Item(i).Value
Exit For
End If
Next
End With
End Sub
这篇关于CSS选择器QuerySelector替代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!