pandas -如何忽略read_excel和read_csv中的百分比 [英] Pandas - How to ignore percentages in read_excel and read_csv

查看:246
本文介绍了 pandas -如何忽略read_excel和read_csv中的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个允许用户更新excel(.xlsx)或csv(.csv)文件的应用.我使用pandas.read_excelpandas.read_csv来读取文件.这对于数值非常有用.但是,当列具有80%,时,它将被解析为0.8.读取csv或excel文件时是否有忽略百分比的方法? 这样在数据帧中具有80%的单元格将被解析为80.

I have an app that allows a user to update an excel(.xlsx) or csv (.csv) file. I use pandas.read_excel and pandas.read_csv to read the files. This works great for numeric values. However when a column has 80%, it is parsed as 0.8. Is there a way of ignoring percentages when reading the csv or the excel files? So that a cell with 80% is parsed as 80 in the dataframe.

我已经考虑过检查数据框中的所有值是否都小于1,但是会引入一个错误,因为如果用户在excel文件中输入零(可能的话),它将被解释为百分比,这将使乘以100.

I have thought of checking if all the values in the dataframe are less than 1 but it will introduce a bug because if the user inputs zeros in the excel file (which is possible), it will be interpreted as percentage which will make it be multiplied by 100.

推荐答案

Excel将百分比存储为小数. %表示形式只是数据的视图",而不是基础float值的属性.如果您事先不了解各栏,则可以定义一些调查逻辑:-

Excel stores percentages as decimals. The % representation is just a "view" of the data, not a property of the underlying float value. If you have no knowledge of your columns beforehand, you can define some investigative logic:-

首先按常规方式(Excel或CSV)读取文件:

First read your file as normal (Excel or CSV):

df = pd.read_excel('file.xlsx')  # or pd.read_csv('file.csv')

然后标识读取为float的列:

float_cols = df.select_dtypes(include=[np.float]).columns

现在过滤所有值都在0到1.0之间的列.这不是水密的,因为还将包含Boolean系列.因此,我们可以添加一个额外条件,使至少有 n 个不同的值.

Now filter for those columns where all values are between 0 and 1.0. This isn't watertight, since Boolean series will also be included. So we can add an extra condition for there to be at least n distinct values.

pct_cols = [x for x in float_cols if df[x].between(0, 1).all() and len(df[x].unique()) > 2]

最后,将范围[0, 1]中的小数转换为范围[0, 100]中的百分比:

Finally, convert decimals in range [0, 1] to percentages in range [0, 100]:

df[pct_cols] = df[pct_cols] * 100


这是一个完整的工作示例:


Here's a complete working example:

df = pd.DataFrame({'A': [0.1341234, 0.563465, 1.00, 0.00, 0.456546],
                   'B': [True, False, True, True, True],
                   'C': [1.0, 0.0, 1.0, 1.0, 0.0]})

float_cols = df.select_dtypes(include=[np.float]).columns
pct_cols = [x for x in float_cols if df[x].between(0, 1).all() and len(df[x].unique()) > 2]
df[pct_cols] = df[pct_cols] * 100

print(df)

           A      B    C
0   13.41234   True  1.0
1   56.34650  False  0.0
2  100.00000   True  1.0
3    0.00000   True  1.0
4   45.65460   True  0.0

这篇关于 pandas -如何忽略read_excel和read_csv中的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆