Selenium,通过 Xpath 获取元素 - 仅抓取页面上的最后 60 个元素 [英] Selenium, Get Elements By Xpath - Only grab last 60 elements on page

查看:33
本文介绍了Selenium,通过 Xpath 获取元素 - 仅抓取页面上的最后 60 个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在计算如何指定页面上的最后 60 个元素时遇到了一些麻烦

I'm having a little trouble working out how I can specify the last 60 elements on a page

posts = driver.find_elements_by_xpath("""(//div[@class='hotProductDetails'])""")
for post in posts:
    print(post.text)

此代码打印网页上这些元素中的每一位文本.但我正在尝试抓取一个带有加载更多"按钮的网站.

This code prints every bit of text within those elements on the webpage. But I'm trying to scrape a site that has a 'Load More' button on it.

加载更多"按钮可加载另外 60 种产品,我希望我的代码只获取这些产品.这样我就可以把它全部放在一个循环中,点击按钮,抓取它加载的产品,附加到 Pandas Dataframe 并重复一定次数的迭代.

The 'Load More' button loads 60 more products, and I'd like my code to only grab those products. That way I can stick it all in a loop which clicks the button, grabs the products it loads, append to a Pandas Dataframe and repeats for a set number of iterations.

我一直无法获得可以为我执行此操作的代码,一旦多次按下加载更多按钮,抓取元素会杀死 chrome,进而杀死我的脚本.

I've been unable to get code that will do this for me, and once that load more button is pressed a lot of times, grabbing the elements can kill chrome and in turn my script.

"(//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50)

有人与我分享了这段代码,但它让我崩溃了这个错误:

Someone shared this code with me, but it crashes me with this error:

invalid selector: Unable to locate an element with the xpath expression (//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50 because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '(//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50' is not a valid XPath expression.
  (Session info: chrome=60.0.3112.90)
  (Driver info: chromedriver=2.31.488763 (092de99f48a300323ecf8c2a4e2e7cab51de5ba8),platform=Windows NT 10.0.14393 x86_64)

这是我第一次有一个网络抓取项目并使用 Selenium(这是一个了不起的包,对它印象深刻),我不知道该怎么做才能修复它.我怀疑这与页面"代码有关,因为所有内容都位于同一个网页上,随着您加载更多产品,该网页只会变得更大.

This is the first time I've ever had a web-scraping project and used Selenium (which is an amazing package, so impressed with it) and I'm not sure what to do to fix it. I suspect it something to do with the 'page' code, as everything sits on the same webpage which just gets larger as you load more products.

如果有帮助,我可以分享我正在抓取的网站 - 就像我说的,这是我的第一个抓取项目,也是我刚加入的一家公司.我不知道这是不是他们会因为我分享而感到不安.

I can share the website I'm scraping if that helps - like I said this is my first scraping project and for a company I just joined. I don't know if this is something they would be upset about me sharing.

推荐答案

如果您得到一个无效的 XPATH 选择器,则说明有问题.最后多了一个)".以下对我有用

If you are getting a invalid XPATH selector then something is wrong. There was extra ")" at the end. Below works for me

page = 2

xpath_selector = "(//div[@class='hotProductDetails'])[position() > {} and position() <= {}]".format ((page -1 ) * 50, page * 50)

另外,如果你想要像最后 60 个元素那样的东西,那么你甚至可以在下面使用

Also if you want something like last 60 elements then you can even use below

xpath_selector = "(//div[@class='hotProductDetails'])[position() > last() - 60]"

这篇关于Selenium,通过 Xpath 获取元素 - 仅抓取页面上的最后 60 个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆