BeatifulSoup findAll返回一个空数组(python) [英] BeatifulSoup findAll is returning an empty array (python)

查看:73
本文介绍了BeatifulSoup findAll返回一个空数组(python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从此网页

我正在使用BeatufulSoup获取html并在其中搜索,但是当我使用findAll方法获取该行时,它返回了一个空数组.我在其他页面也尝试过,但效果很好.发生什么事了?

这是我的代码:

这是一个有效的示例:

谢谢.

解决方案

您可以使用PyQt构建无头浏览器,然后从网站上抓取数据.这是给您的演示代码:

 将bs4导入为bs导入系统导入urllib.request从PyQt5.QtWebEngineWidgets导入QWebEnginePage从PyQt5.QtWidgets导入QApplication从PyQt5.QtCore导入QUrl类Page(QWebEnginePage):def __init __(self,url):self.app = QApplication(sys.argv)QWebEnginePage .__ init __()self.html =''self.loadFinished.connect(self._on_load_finished)self.load(QUrl(url))self.app.exec_()def _on_load_finished(自己):self.html = self.toHtml(self.Callable)打印('加载完成')def Callable(self,html_str):self.html = html_strself.app.quit()def main():页面=页面('https://playruneterra.com/es-es/news')汤= bs.BeautifulSoup(page.html,'html.parser')js_test = soup.find('h2',class _ ='heading-03 src-component-content-NewsItem -___ NewsItem-module__title ___ 3OcDj')打印(js_test.text)如果__name__ =='__main__':main() 

I am trying to get data from this web page https://playruneterra.com/es-es/news and the part I am trying to get is this:

I am using BeatufulSoup to get the html and search in it but when I used the findAll method to get that line, it returns me an empty array. I tried the same in other pages and it works fine. What is happening?

This is my code:

This is an example that is working:

Thanks all.

解决方案

You can use the PyQt to build a headless browser and then scrape the data from the website. Here's the demo code for you:

import bs4 as bs
import sys
import urllib.request
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl

class Page(QWebEnginePage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.html = ''
        self.loadFinished.connect(self._on_load_finished)
        self.load(QUrl(url))
        self.app.exec_()

    def _on_load_finished(self):
        self.html = self.toHtml(self.Callable)
        print('Load finished')

    def Callable(self, html_str):
        self.html = html_str
        self.app.quit()


def main():
    page = Page('https://playruneterra.com/es-es/news')
    soup = bs.BeautifulSoup(page.html, 'html.parser')
    js_test = soup.find('h2', class_='heading-03 src-component-content-NewsItem-___NewsItem-module__title___3OcDj')
    print(js_test.text)

if __name__ == '__main__': main()

这篇关于BeatifulSoup findAll返回一个空数组(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆