使用BeautifulSoup刮擦木材工业数据库 [英] Scrape wood industry database with BeautifulSoup

查看:67
本文介绍了使用BeautifulSoup刮擦木材工业数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 https://www.sawmilldatabase.com/sawmill.php?id=1282 和BeautifulSoup.

I would like to scrape the sawmill owner (after "Owned by:") from https://www.sawmilldatabase.com/sawmill.php?id=1282 with BeautifulSoup.

我已经尝试采用这个非常相似的答案,但是它不起作用由于某种原因,我不明白.

I've tried to adapt this very similar answer, but it doesn't work for a reason I don't understand.

<td>
   <a href="../company.php?id=729">AKD Softwoods </a>
</td>

Python:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://www.sawmilldatabase.com/sawmill.php?id=1282')

soup = BeautifulSoup(page.text, 'html.parser')

lst = soup.find_all('TD')
for td in lst:
    if td.text == "Owned by":
        print("yes")
        print(lst[lst.index(td)+1].text)

推荐答案

我使用了正则表达式来帮助我找到所需的元素.

I've used regex to help me reach to the element you're looking for.

代码:

import requests, re
from bs4 import BeautifulSoup

page = requests.get('https://www.sawmilldatabase.com/sawmill.php?id=1282')
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.find('a', href=re.compile('company.php')).text)

输出:

AKD Softwoods 

这篇关于使用BeautifulSoup刮擦木材工业数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆