BeatifulSoup findAll返回一个空数组(python) [英] BeatifulSoup findAll is returning an empty array (python)
问题描述
我正在尝试从此网页
我正在使用BeatufulSoup获取html并在其中搜索,但是当我使用findAll方法获取该行时,它返回了一个空数组.我在其他页面也尝试过,但效果很好.发生什么事了?
这是我的代码:
这是一个有效的示例:
谢谢.
您可以使用PyQt构建无头浏览器,然后从网站上抓取数据.这是给您的演示代码:
将bs4导入为bs导入系统导入urllib.request从PyQt5.QtWebEngineWidgets导入QWebEnginePage从PyQt5.QtWidgets导入QApplication从PyQt5.QtCore导入QUrl类Page(QWebEnginePage):def __init __(self,url):self.app = QApplication(sys.argv)QWebEnginePage .__ init __()self.html =''self.loadFinished.connect(self._on_load_finished)self.load(QUrl(url))self.app.exec_()def _on_load_finished(自己):self.html = self.toHtml(self.Callable)打印('加载完成')def Callable(self,html_str):self.html = html_strself.app.quit()def main():页面=页面('https://playruneterra.com/es-es/news')汤= bs.BeautifulSoup(page.html,'html.parser')js_test = soup.find('h2',class _ ='heading-03 src-component-content-NewsItem -___ NewsItem-module__title ___ 3OcDj')打印(js_test.text)如果__name__ =='__main__':main()
I am trying to get data from this web page https://playruneterra.com/es-es/news and the part I am trying to get is this:
I am using BeatufulSoup to get the html and search in it but when I used the findAll method to get that line, it returns me an empty array. I tried the same in other pages and it works fine. What is happening?
This is my code:
This is an example that is working:
Thanks all.
You can use the PyQt to build a headless browser and then scrape the data from the website. Here's the demo code for you:
import bs4 as bs
import sys
import urllib.request
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
class Page(QWebEnginePage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebEnginePage.__init__(self)
self.html = ''
self.loadFinished.connect(self._on_load_finished)
self.load(QUrl(url))
self.app.exec_()
def _on_load_finished(self):
self.html = self.toHtml(self.Callable)
print('Load finished')
def Callable(self, html_str):
self.html = html_str
self.app.quit()
def main():
page = Page('https://playruneterra.com/es-es/news')
soup = bs.BeautifulSoup(page.html, 'html.parser')
js_test = soup.find('h2', class_='heading-03 src-component-content-NewsItem-___NewsItem-module__title___3OcDj')
print(js_test.text)
if __name__ == '__main__': main()
这篇关于BeatifulSoup findAll返回一个空数组(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!