Selenium,通过 Xpath 获取元素 - 仅抓取页面上的最后 60 个元素 [英] Selenium, Get Elements By Xpath - Only grab last 60 elements on page
问题描述
我在计算如何指定页面上的最后 60 个元素时遇到了一些麻烦
I'm having a little trouble working out how I can specify the last 60 elements on a page
posts = driver.find_elements_by_xpath("""(//div[@class='hotProductDetails'])""")
for post in posts:
print(post.text)
此代码打印网页上这些元素中的每一位文本.但我正在尝试抓取一个带有加载更多"按钮的网站.
This code prints every bit of text within those elements on the webpage. But I'm trying to scrape a site that has a 'Load More' button on it.
加载更多"按钮可加载另外 60 种产品,我希望我的代码只获取这些产品.这样我就可以把它全部放在一个循环中,点击按钮,抓取它加载的产品,附加到 Pandas Dataframe 并重复一定次数的迭代.
The 'Load More' button loads 60 more products, and I'd like my code to only grab those products. That way I can stick it all in a loop which clicks the button, grabs the products it loads, append to a Pandas Dataframe and repeats for a set number of iterations.
我一直无法获得可以为我执行此操作的代码,一旦多次按下加载更多按钮,抓取元素会杀死 chrome,进而杀死我的脚本.
I've been unable to get code that will do this for me, and once that load more button is pressed a lot of times, grabbing the elements can kill chrome and in turn my script.
"(//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50)
有人与我分享了这段代码,但它让我崩溃了这个错误:
Someone shared this code with me, but it crashes me with this error:
invalid selector: Unable to locate an element with the xpath expression (//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50 because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '(//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50' is not a valid XPath expression.
(Session info: chrome=60.0.3112.90)
(Driver info: chromedriver=2.31.488763 (092de99f48a300323ecf8c2a4e2e7cab51de5ba8),platform=Windows NT 10.0.14393 x86_64)
这是我第一次有一个网络抓取项目并使用 Selenium(这是一个了不起的包,对它印象深刻),我不知道该怎么做才能修复它.我怀疑这与页面"代码有关,因为所有内容都位于同一个网页上,随着您加载更多产品,该网页只会变得更大.
This is the first time I've ever had a web-scraping project and used Selenium (which is an amazing package, so impressed with it) and I'm not sure what to do to fix it. I suspect it something to do with the 'page' code, as everything sits on the same webpage which just gets larger as you load more products.
如果有帮助,我可以分享我正在抓取的网站 - 就像我说的,这是我的第一个抓取项目,也是我刚加入的一家公司.我不知道这是不是他们会因为我分享而感到不安.
I can share the website I'm scraping if that helps - like I said this is my first scraping project and for a company I just joined. I don't know if this is something they would be upset about me sharing.
推荐答案
如果您得到一个无效的 XPATH 选择器,则说明有问题.最后多了一个)".以下对我有用
If you are getting a invalid XPATH selector then something is wrong. There was extra ")" at the end. Below works for me
page = 2
xpath_selector = "(//div[@class='hotProductDetails'])[position() > {} and position() <= {}]".format ((page -1 ) * 50, page * 50)
另外,如果你想要像最后 60 个元素那样的东西,那么你甚至可以在下面使用
Also if you want something like last 60 elements then you can even use below
xpath_selector = "(//div[@class='hotProductDetails'])[position() > last() - 60]"
这篇关于Selenium,通过 Xpath 获取元素 - 仅抓取页面上的最后 60 个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!