BeautifulSoup按数字指定表格列? [英] BeautifulSoup Specify table column by number?
问题描述
使用Python 2.7和BeautifulSoup 4,我从表格中刮取歌曲名称。
现在脚本在表格的行中找到链接;我怎么能指定我想要的第一列?
理想情况下,我可以切换数字来改变哪些选择。
现在代码如下所示:
from bs4 import BeautifulSoup
导入请求
$ b $ = requests.get(http://evamsharma.finosus.com/beatles/index.html)
data = r.text
soup = soup.find_all('table')中table的BeautifulSoup(data)
:soup.find_all('tr')中
的行:$ b $ (a):
print(link.contents)
实际上,我如何在每个< tr>
标签中索引< td>
标签?
现在的网址是我网站上的一个网页,我基本上从维基百科复制了表格来源,使得拼写更简单一些。
谢谢!
evamvid
tr
内找到所有 td
标签,并通过索引得到你需要的标签:对于soup.find_all('table')中的表,p index = 2
:
用于排汤。 find_all('tr'):
尝试:
td = row.find_all('td')[index]
除了IndexError:
继续
在td中的链接.find_all('a'):
print(link.contents)
Using Python 2.7 and BeautifulSoup 4, I'm scraping song names from a table.
Right now the script finds links in the row of a table; how can I specify I want the first column?
Ideally I'd be able to switch numbers around to change which ones got selected.
Right now the code looks like this:
from bs4 import BeautifulSoup
import requests
r = requests.get("http://evamsharma.finosus.com/beatles/index.html")
data = r.text
soup = BeautifulSoup(data)
for table in soup.find_all('table'):
for row in soup.find_all('tr'):
for link in soup.find_all('a'):
print(link.contents)
How do I, in effect, index the <td>
tags within each <tr>
tag?
The URL in there right now is a page on my site where I basically copied the table source from Wikipedia to make the scraping a little simpler.
Thanks!
evamvid
Find all td
tags inside tr
and get the one you need by index:
index = 2
for table in soup.find_all('table'):
for row in soup.find_all('tr'):
try:
td = row.find_all('td')[index]
except IndexError:
continue
for link in td.find_all('a'):
print(link.contents)
这篇关于BeautifulSoup按数字指定表格列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!