Python-被认为更适合用于刮擦的方法:硒还是含硒的beautifulsoup? [英] Python - which is considered better for scraping: selenium or beautifulsoup with selenium?

查看:63
本文介绍了Python-被认为更适合用于刮擦的方法:硒还是含硒的beautifulsoup?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是针对Win10上的Python 3.6.3,bs4和Selenium 3.8.

This question is for Python 3.6.3, bs4 and Selenium 3.8 on Win10.

我正在尝试抓取具有动态内容的页面.我要抓取的是数字和文本(例如,来自 http://www.oddsportal.com ).以我的理解,使用request + beautifulsoup将无法完成任务,因为动态内容将被隐藏.因此,我必须使用其他工具,例如我们的Selenium Webdriver.

I am trying to scrape pages with dynamic content. What I am trying to scrape is numbers and text (from http://www.oddsportal.com for example). From my understanding using requests+beautifulsoup will not do the job, as dynamic content will be hidden. So I have to use other tools such us selenium webdriver.

然后,考虑到我仍然会使用硒webdriver,建议您不要忽略beautifulsoup并坚持使用硒webdriver功能,例如

Then, given that I will use selenium webdriver anyway, do you recommend ignoring beautifulsoup and stick with selenium webdriver functions, e.g.

elem = driver.find_element_by_name("q"))

或者使用硒+美丽汤被认为是更好的做法吗?

Or is it considered better practice to use selenium+beautifulsoup?

您对两条路线中的哪条路线会给我带来更便捷的功能有什么看法?

Do you have any opinion as to which of the two routes will give me more convenient functions to work with?

推荐答案

Beautifulsoup

Beautifulsoup 是用于 Web抓取的强大工具.它使用 urllib.request Python库. urllib.request 非常强大,可以从静态页面提取数据.

Beautifulsoup

Beautifulsoup is a powerful tool for Web Scraping. It use the urllib.request Python library. urllib.request is quite powerful to extract data from static pages.

Selenium 当前是网络自动化的最广泛接受且最有效的工具.Selenium支持与动态页面,内容和元素进行交互.

Selenium is currently the most widely accepted and efficient tool for Web Automation. Selenium supports interacting with Dynamic Pages, Contents and Elements.

要创建一个强大而高效的框架来抓取具有动态内容的页面,您必须集成 Selenium Beautifulsoup >在您的框架中.通过 Selenium 浏览动态元素并与之交互,并通过 Beautifulsoup

To create a robust and efficient framework to scrape pages with dynamic content you must integrate both Selenium and Beautifulsoup in your framework. Browse and interact with dynamic elements through Selenium and scrape the contents efficiently through Beautifulsoup

这是

Here is an example using Selenium and Beautifulsoup for Scraping

这篇关于Python-被认为更适合用于刮擦的方法:硒还是含硒的beautifulsoup?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆