pandas -如何忽略read_excel和read_csv中的百分比 [英] Pandas - How to ignore percentages in read_excel and read_csv
问题描述
我有一个允许用户更新excel(.xlsx
)或csv(.csv
)文件的应用.我使用pandas.read_excel
和pandas.read_csv
来读取文件.这对于数值非常有用.但是,当列具有80%,
时,它将被解析为0.8
.读取csv或excel文件时是否有忽略百分比的方法?
这样在数据帧中具有80%的单元格将被解析为80.
I have an app that allows a user to update an excel(.xlsx
) or csv (.csv
) file. I use pandas.read_excel
and pandas.read_csv
to read the files. This works great for numeric values. However when a column has 80%,
it is parsed as 0.8
. Is there a way of ignoring percentages when reading the csv or the excel files?
So that a cell with 80% is parsed as 80 in the dataframe.
我已经考虑过检查数据框中的所有值是否都小于1,但是会引入一个错误,因为如果用户在excel文件中输入零(可能的话),它将被解释为百分比,这将使乘以100.
I have thought of checking if all the values in the dataframe are less than 1 but it will introduce a bug because if the user inputs zeros in the excel file (which is possible), it will be interpreted as percentage which will make it be multiplied by 100.
推荐答案
Excel将百分比存储为小数. %表示形式只是数据的视图",而不是基础float
值的属性.如果您事先不了解各栏,则可以定义一些调查逻辑:-
Excel stores percentages as decimals. The % representation is just a "view" of the data, not a property of the underlying float
value. If you have no knowledge of your columns beforehand, you can define some investigative logic:-
首先按常规方式(Excel或CSV)读取文件:
First read your file as normal (Excel or CSV):
df = pd.read_excel('file.xlsx') # or pd.read_csv('file.csv')
然后标识读取为float
的列:
float_cols = df.select_dtypes(include=[np.float]).columns
现在过滤所有值都在0到1.0之间的列.这不是水密的,因为还将包含Boolean
系列.因此,我们可以添加一个额外条件,使至少有 n 个不同的值.
Now filter for those columns where all values are between 0 and 1.0. This isn't watertight, since Boolean
series will also be included. So we can add an extra condition for there to be at least n distinct values.
pct_cols = [x for x in float_cols if df[x].between(0, 1).all() and len(df[x].unique()) > 2]
最后,将范围[0, 1]
中的小数转换为范围[0, 100]
中的百分比:
Finally, convert decimals in range [0, 1]
to percentages in range [0, 100]
:
df[pct_cols] = df[pct_cols] * 100
这是一个完整的工作示例:
Here's a complete working example:
df = pd.DataFrame({'A': [0.1341234, 0.563465, 1.00, 0.00, 0.456546],
'B': [True, False, True, True, True],
'C': [1.0, 0.0, 1.0, 1.0, 0.0]})
float_cols = df.select_dtypes(include=[np.float]).columns
pct_cols = [x for x in float_cols if df[x].between(0, 1).all() and len(df[x].unique()) > 2]
df[pct_cols] = df[pct_cols] * 100
print(df)
A B C
0 13.41234 True 1.0
1 56.34650 False 0.0
2 100.00000 True 1.0
3 0.00000 True 1.0
4 45.65460 True 0.0
这篇关于 pandas -如何忽略read_excel和read_csv中的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!