pandas.read_html不支持十进制逗号 [英] pandas.read_html not support decimal comma

查看:108
本文介绍了pandas.read_html不支持十进制逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 pandas.read_html 读取xlm文件,并且工作得几乎完美,问题在于该文件使用逗号作为小数点分隔符而不是点( read_html )。

I was reading an xlm file using pandas.read_html and works almost perfect, the problem is that the file has commas as decimal separators instead of dots (the default in read_html).

我可以很容易地用一个文件中的点替换逗号,但使用该配置的文件将近200个。
pandas.read_csv 可以定义小数点分隔符,但是我不知道为什么在 pandas.read_html 您只能定义千位分隔符。

I could easily replace the commas by dots in one file, but i have almost 200 files with that configuration. with pandas.read_csv you can define the decimal separator, but i don't know why in pandas.read_html you can only define the thousand separator.

在这件事上有什么指导吗?还有另一种方法可以在熊猫打开前自动进行逗号/点替换?
预先感谢!

any guidance in this matter?, there is another way to automate the comma/dot replacement before it is open by pandas? thanks in advance!

推荐答案

感谢@zhqiat。我认为将 pandas 升级到版本 0.19 将解决此问题。不幸的是,我找不到简单的方法来实现这一目标。我找到了升级Pandas的教程,但针对 ubuntu (winXP用户) 。

Thanks @zhqiat. I think upgrading pandas to version 0.19 will solve the problem. unfortunately I couldn't found an easy way to accomplish that. I found a tutorial to upgrade Pandas but for ubuntu (winXP user).

我终于使用此处,基本上将所有列逐一转换为数字类型 pandas.Series

I finally chose the workaround, using the method posted here, basically converting all columns, one by one, to a numeric type of pandas.Series

result[col] = result[col].apply(lambda x: x.str.replace(".","").str.replace(",","."))

我知道这种解决方案不是最好的,但是可以解决。谢谢

I know that this solution ain't the best, but works. Thanks

这篇关于pandas.read_html不支持十进制逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆