pandas.read_html不支持十进制逗号 [英] pandas.read_html not support decimal comma
问题描述
我正在使用 pandas.read_html
读取xlm文件,并且工作得几乎完美,问题在于该文件使用逗号作为小数点分隔符而不是点( read_html
)。
I was reading an xlm file using pandas.read_html
and works almost perfect, the problem is that the file has commas as decimal separators instead of dots (the default in read_html
).
我可以很容易地用一个文件中的点替换逗号,但使用该配置的文件将近200个。
和 pandas.read_csv
可以定义小数点分隔符,但是我不知道为什么在 pandas.read_html
您只能定义千位分隔符。
I could easily replace the commas by dots in one file, but i have almost 200 files with that configuration.
with pandas.read_csv
you can define the decimal separator, but i don't know why in pandas.read_html
you can only define the thousand separator.
在这件事上有什么指导吗?还有另一种方法可以在熊猫打开前自动进行逗号/点替换?
预先感谢!
any guidance in this matter?, there is another way to automate the comma/dot replacement before it is open by pandas? thanks in advance!
推荐答案
感谢@zhqiat。我认为将 pandas
升级到版本 0.19
将解决此问题。不幸的是,我找不到简单的方法来实现这一目标。我找到了升级Pandas的教程,但针对 ubuntu (winXP用户) 。
Thanks @zhqiat. I think upgrading pandas
to version 0.19
will solve the problem. unfortunately I couldn't found an easy way to accomplish that. I found a tutorial to upgrade Pandas but for ubuntu (winXP user).
我终于使用此处,基本上将所有列逐一转换为数字类型 pandas.Series
I finally chose the workaround, using the method posted here, basically converting all columns, one by one, to a numeric type of pandas.Series
result[col] = result[col].apply(lambda x: x.str.replace(".","").str.replace(",","."))
我知道这种解决方案不是最好的,但是可以解决。谢谢
I know that this solution ain't the best, but works. Thanks
这篇关于pandas.read_html不支持十进制逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!