漂亮的汤还能打网页事件吗? [英] Can beautiful soup also hit webpage events?

查看:57
本文介绍了漂亮的汤还能打网页事件吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Beautiful Soup是一个Python库,用于从HTML和XML文件中提取数据.我将使用它来提取网页数据,但是我没有找到任何方法来单击按钮anchor label,这些按钮在本例中是用于页面导航的.因此,为此我必须使用其他任何我不知道的功能或beautiful soup.

Beautiful Soup is a Python library for pulling data out of HTML and XML files. I will use it to extract webpage data,but i didn't find out any way to click the buttons,anchor label which are used in my case the page navigation. So for this shall I have to use any other or beautiful soup has the capability i didn't aware of.

请给我建议!

推荐答案

要回答您的标签/评论,是的,您可以将它们一起使用(Selenium和BeautifulSoup),否,您不能直接使用BeautifulSoup执行事件(点击等).尽管我本人从未在相同的情况下一起使用过它们,但是一种假设的情况可能涉及使用Selenium通过某个路径导航到目标页面(即click()这些选项,然后click()下一页的按钮),然后使用BeautifulSoup读取driver.page_source(其中driver是您创建的用于驱动"浏览器的Selenium驱动程序).由于driver.page_source是页面的HTML,因此您可以像往常一样使用BeautifulSoup,解析出所需的任何信息.

To answer your tags/comment, yes, you can use them together (Selenium and BeautifulSoup), and no, you can't directly use BeautifulSoup to execute events (clicking etc.). Although I myself haven't ever used them together in the same situation, a hypothetical situation could involve using Selenium to navigate to a target page via a certain path (i.e. click() these options and then click() the button to the next page), and then using BeautifulSoup to read the driver.page_source (where driver is the Selenium driver you created to 'drive' the browser). Since driver.page_source is the HTML of the page, you can use BeautifulSoup as you are used to, parsing out whatever information you need.

简单的例子:

from bs4 import BeautifulSoup
from selenium import webdriver

# Create your driver
driver = webdriver.Firefox()

# Get a page
driver.get('http://news.ycombinator.com')

# Feed the source to BeautifulSoup
soup = BeautifulSoup(driver.page_source)

print soup.title  # <title>Hacker News</title>

主要思想是,只要您需要阅读页面源代码,就可以将driver.page_source传递给BeautifulSoup以便阅读所需的内容.

The main idea is that anytime you need to read the source of a page, you can pass driver.page_source to BeautifulSoup in order to read whatever you want.

这篇关于漂亮的汤还能打网页事件吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆