系列if语句应用于数据框 [英] Series of if statements applied to data frame

查看:65
本文介绍了系列if语句应用于数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对如何完成此任务有疑问。我想返回或分组数据框中的一系列数字,这些数字来自 PD列,范围为.001到1。我要分组为.91> PD> .9到.91(或返回值91),将.92>'PD'> =。91到.92,...,1> ='PD'> =。99到1。 分组。我一直在做的是手动执行每个if语句,然后将其与基础数据帧合并。任何人都可以通过一种更有效的方式来帮助我吗?仍处于使用python的早期阶段。很抱歉,这个问题似乎很简单。感谢您的回答和您的时间。

I have a question on how to this task. I want to return or group a series of numbers in my data frame, the numbers are from the column 'PD' which ranges from .001 to 1. What I want to do is to group those that are .91>'PD'>.9 to .91 (or return a value of .91), .92>'PD'>=.91 to .92, ..., 1>='PD' >=.99 to 1. onto a column named 'Grouping'. What I have been doing is manually doing each if statement then merging it with the base data frame. Can anyone please help me with a more efficient way of doing this? Still on the early stages of using python. Sorry if the question seems to be easy. Thank you for answering and for your time.

推荐答案

让您的数据看起来像这样

Let your data look like this

>>> df = pd.DataFrame({'PD': np.arange(0.001, 1, 0.001), 'data': np.random.randint(10, size=999)})
>>> df.head()
      PD  data
0  0.001     6
1  0.002     3
2  0.003     5
3  0.004     9
4  0.005     7

然后截取PD列的最后一个小数。这有点棘手,因为在没有str转换的情况下舍入会遇到很多问题。例如,

Then cut-off the last decimal of the PD column. This is a bit tricky since you get a lot of issues with rounding when doing it without str conversion. E.g.

>>> df['PD'] = df['PD'].apply(lambda x: float('{:.3f}'.format(x)[:-1]))
>>> df.tail()
       PD  data
994  0.99     1
995  0.99     3
996  0.99     2
997  0.99     1
998  0.99     0

现在您可以使用pandas-groupby了。随便使用任何数据,例如

Now you can use the pandas-groupby. Do with data whatever you want, e.g.

>>> df.groupby('PD').agg(lambda x: ','.join(map(str, x)))
                     data
PD                       
0.00    6,3,5,9,7,3,6,8,4
0.01  3,5,7,0,4,9,7,1,7,1
0.02  0,0,9,1,5,4,1,6,7,3
0.03  4,4,6,4,6,5,4,4,2,1
0.04  8,3,1,4,6,5,0,6,0,5
[...]

请注意,第一行是由于我的样本中缺少0.000,因此缩短了一件商品。

Note that the first row is one item shorter due to missing 0.000 in my sample.

这篇关于系列if语句应用于数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆