使用BeautifulSoup刮擦木材工业数据库 [英] Scrape wood industry database with BeautifulSoup
本文介绍了使用BeautifulSoup刮擦木材工业数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想从 https://www.sawmilldatabase.com/sawmill.php?id=1282 和BeautifulSoup.
I would like to scrape the sawmill owner (after "Owned by:") from https://www.sawmilldatabase.com/sawmill.php?id=1282 with BeautifulSoup.
我已经尝试采用这个非常相似的答案,但是它不起作用由于某种原因,我不明白.
I've tried to adapt this very similar answer, but it doesn't work for a reason I don't understand.
<td>
<a href="../company.php?id=729">AKD Softwoods </a>
</td>
Python:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.sawmilldatabase.com/sawmill.php?id=1282')
soup = BeautifulSoup(page.text, 'html.parser')
lst = soup.find_all('TD')
for td in lst:
if td.text == "Owned by":
print("yes")
print(lst[lst.index(td)+1].text)
推荐答案
我使用了正则表达式来帮助我找到所需的元素.
I've used regex to help me reach to the element you're looking for.
代码:
import requests, re
from bs4 import BeautifulSoup
page = requests.get('https://www.sawmilldatabase.com/sawmill.php?id=1282')
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.find('a', href=re.compile('company.php')).text)
输出:
AKD Softwoods
这篇关于使用BeautifulSoup刮擦木材工业数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文