pandas -如何忽略read_excel和read_csv中的百分比 [英] Pandas - How to ignore percentages in read_excel and read_csv

查看：246 发布时间：2020/5/24 3:06:57 python python-3.x pandas

本文介绍了 pandas -如何忽略read_excel和read_csv中的百分比的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个允许用户更新excel(.xlsx)或csv(.csv)文件的应用.我使用pandas.read_excel和pandas.read_csv来读取文件.这对于数值非常有用.但是，当列具有80%,时，它将被解析为0.8.读取csv或excel文件时是否有忽略百分比的方法? 这样在数据帧中具有80％的单元格将被解析为80.

I have an app that allows a user to update an excel(.xlsx) or csv (.csv) file. I use pandas.read_excel and pandas.read_csv to read the files. This works great for numeric values. However when a column has 80%, it is parsed as 0.8. Is there a way of ignoring percentages when reading the csv or the excel files? So that a cell with 80% is parsed as 80 in the dataframe.

我已经考虑过检查数据框中的所有值是否都小于1，但是会引入一个错误，因为如果用户在excel文件中输入零(可能的话)，它将被解释为百分比，这将使乘以100.

I have thought of checking if all the values in the dataframe are less than 1 but it will introduce a bug because if the user inputs zeros in the excel file (which is possible), it will be interpreted as percentage which will make it be multiplied by 100.

推荐答案

Excel将百分比存储为小数. ％表示形式只是数据的视图"，而不是基础float值的属性.如果您事先不了解各栏，则可以定义一些调查逻辑:-

Excel stores percentages as decimals. The % representation is just a "view" of the data, not a property of the underlying float value. If you have no knowledge of your columns beforehand, you can define some investigative logic:-

首先按常规方式(Excel或CSV)读取文件:

First read your file as normal (Excel or CSV):

df = pd.read_excel('file.xlsx')  # or pd.read_csv('file.csv')

然后标识读取为float的列:

float_cols = df.select_dtypes(include=[np.float]).columns

现在过滤所有值都在0到1.0之间的列.这不是水密的，因为还将包含Boolean系列.因此，我们可以添加一个额外条件，使至少有 n 个不同的值.

Now filter for those columns where all values are between 0 and 1.0. This isn't watertight, since Boolean series will also be included. So we can add an extra condition for there to be at least n distinct values.

pct_cols = [x for x in float_cols if df[x].between(0, 1).all() and len(df[x].unique()) > 2]

最后，将范围[0, 1]中的小数转换为范围[0, 100]中的百分比:

Finally, convert decimals in range [0, 1] to percentages in range [0, 100]:

df[pct_cols] = df[pct_cols] * 100

这是一个完整的工作示例:

Here's a complete working example:

df = pd.DataFrame({'A': [0.1341234, 0.563465, 1.00, 0.00, 0.456546],
                   'B': [True, False, True, True, True],
                   'C': [1.0, 0.0, 1.0, 1.0, 0.0]})

float_cols = df.select_dtypes(include=[np.float]).columns
pct_cols = [x for x in float_cols if df[x].between(0, 1).all() and len(df[x].unique()) > 2]
df[pct_cols] = df[pct_cols] * 100

print(df)

           A      B    C
0   13.41234   True  1.0
1   56.34650  False  0.0
2  100.00000   True  1.0
3    0.00000   True  1.0
4   45.65460   True  0.0

这篇关于 pandas -如何忽略read_excel和read_csv中的百分比的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas -如何忽略read_excel和read_csv中的百分比 [英] Pandas - How to ignore percentages in read_excel and read_csv

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas -如何忽略read_excel和read_csv中的百分比 [英] Pandas - How to ignore percentages in read_excel and read_csv

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭