如何使用python pandas的read_html读取具有多个主体的html表? [英] How to read an html table with multiple tbodies with python pandas' read_html?

查看：814 发布时间：2020/5/4 8:32:29 python html pandas lxml

本文介绍了如何使用python pandas的read_html读取具有多个主体的html表?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的html:

import pandas as pd    
html_table = '''<table>
                      <thead>
                        <tr><th>Col1</th><th>Col2</th>
                      </thead>
                      <tbody>
                        <tr><td>1a</td><td>2a</td></tr>
                      </tbody>
                      <tbody>
                        <tr><td>1b</td><td>2b</td></tr>
                      </tbody>
                    </table>'''

如果我运行df = pd.read_html(html_table)，然后运行print(df[0]，我将得到:

If I run df = pd.read_html(html_table), and then print(df[0] I get:

  Col1 Col2
0   1a   2a

颜色2消失.为什么?如何预防呢?

Col 2 disappears. Why? How to prevent it?

推荐答案

您发布的HTML无效.多个tbody混淆了pandas解析器逻辑.如果您无法修复输入html本身，则必须预先对其进行解析，然后解包" 所有tbody元素:

The HTML you have posted is not a valid one. Multiple tbodys is what confuses the pandas parser logic. If you cannot fix the input html itself, you have to pre-parse it and "unwrap" all the tbody elements:

import pandas as pd
from bs4 import BeautifulSoup

html_table = '''
<table>
  <thead>
    <tr><th>Col1</th><th>Col2</th>
  </thead>
  <tbody>
    <tr><td>1a</td><td>2a</td></tr>
  </tbody>
  <tbody>
    <tr><td>1b</td><td>2b</td></tr>
  </tbody>
</table>'''

# fix HTML
soup = BeautifulSoup(html_table, "html.parser")
for body in soup("tbody"):
    body.unwrap()

df = pd.read_html(str(soup), flavor="bs4")
print(df[0])

打印:

  Col1 Col2
0   1a   2a
1   1b   2b

这篇关于如何使用python pandas的read_html读取具有多个主体的html表?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用python pandas的read_html读取具有多个主体的html表? [英] How to read an html table with multiple tbodies with python pandas' read_html?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何使用python pandas的read_html读取具有多个主体的html表? [英] How to read an html table with multiple tbodies with python pandas&#39; read_html?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

如何使用python pandas的read_html读取具有多个主体的html表? [英] How to read an html table with multiple tbodies with python pandas' read_html?

登录关闭