网页抓取程序找不到我可以在浏览器中看到的元素 [英] Web scraping program cannot find element which I can see in the browser
问题描述
我正在尝试获取
文本形式的 HTML 源代码:
</a>
这是我的代码:
导入请求从 bs4 导入 BeautifulSoupreq = requests.get("https://www.twitch.tv/directory/game/Dota%202")汤 = BeautifulSoup(req.content, "lxml")title_elems = soup.find_all("h3", attrs={"title": True})打印(title_elems)
当我运行它时,title_elems
只是一个空列表 ([]
).
为什么我的程序找不到元素?
您感兴趣的元素是动态生成的,在初始页面加载后,这意味着您的浏览器执行了 JavaScript,发出了其他网络请求等为了构建页面.Requests 只是一个 HTTP 库,因此不会做这些事情.
您可以使用 Selenium 之类的工具,甚至可以分析网络流量以获取所需数据并直接发出请求.
I am trying to get the titles of the streams on https://www.twitch.tv/directory/game/Dota%202, using Requests and BeautifulSoup. I know that my search criteria are correct, yet my program does not find the elements I need.
Here is a screenshot showing the relevant part of the source code in the browser:
The HTML source as text:
<div class="tw-media-card-meta__title">
<div class="tw-c-text-alt">
<a class="tw-full-width tw-interactive tw-link tw-link--button tw-link--hover-underline-none tw-link--inherit" data-a-target="preview-card-title-link" href="/weplayesport_en">
<div class="tw-align-items-start tw-flex">
<h3 class="tw-ellipsis tw-font-size-5" title="NAVI vs HellRaisers | BO5 | ODPixel & S4 | WeSave! Charity Play">NAVI vs HellRaisers | BO5 | ODPixel & S4 | WeSave! Charity Play</h3>
</div>
</a>
</div>
</div>
Here is my code:
import requests
from bs4 import BeautifulSoup
req = requests.get("https://www.twitch.tv/directory/game/Dota%202")
soup = BeautifulSoup(req.content, "lxml")
title_elems = soup.find_all("h3", attrs={"title": True})
print(title_elems)
When I run it, title_elems
is just the empty list ([]
).
Why is my program not finding the elements?
The element you're interested in is dynamically generated, after the initial page load, which means that your browser executed JavaScript, made other network requests, etc. in order to build the page. Requests is just an HTTP library, and as such will not do those things.
You could use a tool like Selenium, or perhaps even analyze the network traffic for the data you need and make the requests directly.
这篇关于网页抓取程序找不到我可以在浏览器中看到的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!