使用 pandas 读取下载的HTML文件 [英] Using pandas to read downloaded html file

查看：69 发布时间：2020/5/24 2:36:00 python html import pandas

本文介绍了使用 pandas 读取下载的HTML文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为标题，我尝试使用read_html，但出现以下错误:

As title, I tried using read_html but give me the following error:

In [17]:temp = pd.read_html('C:/age0.html',flavor='lxml')
  File "<string>", line unknown
XMLSyntaxError: htmlParseStartTag: misplaced <html> tag, line 65, column 6

我做错了什么?

HTML的顶部包含一些javascript，然后是html表.我使用R来处理它，方法是通过XML包解析html来给我一个数据帧.我想用python做它，在将它提供给熊猫之前，我还应该使用诸如beautifulsoup之类的东西吗?

The HTML contains some javascript on top and then a html table. I used R to process it by parsing the html by XML package to give me a dataframe. I want to do it in python, should I use something else like beautifulsoup before giving it to pandas?

推荐答案

我认为您可以通过使用html解析器(如漂亮的汤)来走上正确的轨道. pandas.read_html()读取html表而不是html页面.

I think you are on to the right track by using an html parser like beautiful soup. pandas.read_html() reads an html table not an html page.

您想做这样的事情...

You would want to do something like this...

from bs4 import BeautifulSoup
import pandas as pd

table = BeautifulSoup(open('C:/age0.html','r').read()).find('table')
df = pd.read_html(table) #I think it accepts BeatifulSoup object
                         #otherwise try str(table) as input

这篇关于使用 pandas 读取下载的HTML文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 pandas 读取下载的HTML文件 [英] Using pandas to read downloaded html file

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用 pandas 读取下载的HTML文件 [英] Using pandas to read downloaded html file

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭