我无法从bs4对象中找到重复出现的元素 [英] I can't locate a reocurring element from a bs4 object
问题描述
我遇到的问题使我发疯.我正在尝试从《职业橄榄球参考》网站中提取文字.
我需要的信息在网页第二部分的td
元素中显示qb hurries
.该信息位于名为qb_hurry
的td元素中.这是我到目前为止的内容:
res = requests.get('https://www.pro-football-reference.com/players/D/DonaAa00.htm')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
我尝试了
totalQbHurrys = soup.find('div', {'id':'all_detailed_defense'})
,当我解析漂亮的汤对象并打印时,我可以看到需要提取的信息.但是当我尝试检索td
元素时我需要
totalQbHurrys = soup.find('div', {'id':'all_detailed_defense'}).find('td', {'data-stat':'qb_hurry'})
它返回None
,我认为我要查找的文本首先作为注释存在,但我无法获取所需的实际HTML元素.有人知道成功定位qb_hurry
元素的方法吗?
问题是该字段位于HTML注释标记内.
这是一个解决方法:
import bs4
import requests
res = requests.get('https://www.pro-football-reference.com/players/D/DonaAa00.htm')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
extract = soup.find('div', {'id':'all_detailed_defense'})
for comments in extract.find_all(text=lambda text:isinstance(text, bs4.Comment)):
comments.extract()
soup2 = bs4.BeautifulSoup(comments, 'html.parser')
totalQbHurrys = soup2.find('td', {'data-stat':'qb_hurry'})
print(totalQbHurrys)
PS:我已经使用了这个技巧: https://stackoverflow.com/a/52874885/2186074 >
The issue I am having is driving me crazy. I am trying to pull text from the Pro Football Reference website.
The information I need is in a td
element displaying qb hurries
In the second section of the web page. The information is in a td element called qb_hurry
. Here is what I have so far:
res = requests.get('https://www.pro-football-reference.com/players/D/DonaAa00.htm')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
I tried
totalQbHurrys = soup.find('div', {'id':'all_detailed_defense'})
and I can see the information I need to pull when I parse through the beautiful soup object and print it. But when I try to retrieve the td
element I need
totalQbHurrys = soup.find('div', {'id':'all_detailed_defense'}).find('td', {'data-stat':'qb_hurry'})
it returns None
, I think the text I am looking for exists as a comment first, but I am having trouble getting to the actual HTML element I need. Would anyone know of a way to target the qb_hurry
element successfully?
The issue is that this field is inside HTML comment tag.
Here is a resolution :
import bs4
import requests
res = requests.get('https://www.pro-football-reference.com/players/D/DonaAa00.htm')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
extract = soup.find('div', {'id':'all_detailed_defense'})
for comments in extract.find_all(text=lambda text:isinstance(text, bs4.Comment)):
comments.extract()
soup2 = bs4.BeautifulSoup(comments, 'html.parser')
totalQbHurrys = soup2.find('td', {'data-stat':'qb_hurry'})
print(totalQbHurrys)
PS: I have used this trick : https://stackoverflow.com/a/52874885/2186074
这篇关于我无法从bs4对象中找到重复出现的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!