美丽的汤缺少一些html表格标签 [英] Beautiful soup missing some html table tags

查看：62 发布时间：2020/9/20 7:37:34 python beautifulsoup

本文介绍了美丽的汤缺少一些html表格标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用漂亮的汤来解析html来从网站中提取数据.我目前正在尝试从以下网页获取表数据:

I'm trying to extract data from a website using beautiful soup to parse the html. I'm currently trying to get the table data from the following webpage :

我想从表中获取数据.首先，我将页面另存为计算机上的html文件(此部分工作正常，检查了我所有的信息)，但是当我尝试使用以下代码进行解析时:

I want to get the data from the table. First I save the page as an html file on my computer (this part works fine, I checked that I got all the information) but when I try to parse with the following code :

soup = BeautifulSoup(fh, 'html.parser')
table = soup.find_all('table') 
cols = table[0].find_all('tr')
cells = cols[1].find_all('td')`

我没有任何结果(特别是它崩溃了，说索引1没有元素).知道它可能来自哪里吗?

I don't get any results (specifically it crashes, saying there's no element at index 1). Any idea of where it could come from?

谢谢

推荐答案

好吧，这实际上是html文件中的问题，在第一行中，html标签用th打开，而用td关闭.我对HTML不太了解，但是用td代替了th可以解决问题.

Ok actually it was an issue in the html file, in the first line the html tags were opened with th but closed with td. I don't know much about HTML but replacing the th by td solved the issue.

<tr class="listeEtablenTete">
<th title="Rubrique IC">Rubri. IC</td>
<th title="Alin&eacute;a">Ali.&nbsp;</td>
<th title="Date d'autorisation">Date auto.</td>
<th >Etat d'activit&eacute;</td>
<th title="R&eacute;gime">R&eacute;g.</td>
<th >Activit&eacute;</td>
<th >Volume</td>
<th >Unit&eacute;</td>`

谢谢！

这篇关于美丽的汤缺少一些html表格标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

美丽的汤缺少一些html表格标签 [英] Beautiful soup missing some html table tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

美丽的汤缺少一些html表格标签 [英] Beautiful soup missing some html table tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭