Python:迭代数据帧列,检查存储在数组中的条件值,并将值提供给列表 [英] Python: Iterate over a data frame column, check for a condition-value stored in array, and get the values to a list

查看:138
本文介绍了Python:迭代数据帧列,检查存储在数组中的条件值,并将值提供给列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在论坛的一些帮助之后,我设法做了我正在寻找的,现在我需要达到一个新的水平。 (长的解释在这里:
Python数据框架:到达条件的列的累积和,并返回索引):



我有一个数据框架:

 在[3]中:df 
输出[3]:
索引Num_Albums Num_authors
0 0 10 4
1 1 1 5
2 2 4 4
3 3 7 1000
4 4 1 44
5 5 3 8

我添加了另一列累积金额的列。

 在[4]中:df ['cumsum'] = df ['Num_Albums']。cumsum()

在[5]中:df
出[5]:
index Num_Albums Num_authors cumsum
0 0 10 4 10
1 1 1 5 11
2 2 4 4 15
3 3 7 1000 22
4 4 1 44 23
5 5 3 8 26
pre>

然后我将条件应用于 cumsum 列,并提取行的相应值,其中条件满足给定宽容:

 在[18]中:tol = 2 

在[19 ]:cond = df.where((df ['cumsum']> = 15-tol)&(df ['cumsum']< = 15 + tol))dropna()

在[20]中:cond
Out [20]:
index Num_Albums Num_authors cumsum
2 2.0 4.0 4.0 15.0

现在,我想做的是在示例中替换条件 15 ,存储在数组中的条件。检查条件是否满足并且不检索整个行,而只检索列 Num_Albums 的值。最后,所有这些检索的值(每个条件一个)存储在数组或列表中。
从matlab开始,我会做这样的事情(我为这个混合的matlab / python语法道歉):

 条件= np.array([10,15,23])
for i = 0:len(conditions)
retrieval_values(i)= df.where((df ['cumsum']> =条件(i)-tol)&(df ['cumsum']< =条件(i)+ tol))。dropna()

所以对于上面的数据框,我会得到(对于 tol = 0 ):

  retrieve_values = [10,4,1] 

我想要一个可以让我保持 .where 函数的解决方案。

解决方案

输出不总是1号吗?
如果ouput是1个数字,你可以写这个代码

  tol = 0 
#condition
c = [5,15,25]
value = []

c中的i
如果len(df.where((df ['a' ]]>(df ['a'] <= i + tol))。dropna()['a'])> 0:
value = value + [df.where((df ['a']> = i-tol)&(df ['a']< = i + tol))dropna() ['a']。value [0]]
else:
value = value + [[]]
print(value)
pre>

输出应该像


[1,2,3]


如果输出可以是多个数字,想要这样



< blockquote>

[[1.0,5.0],[12.0,15.0],[25.0]]


你可以使用这段代码

  tol = 5 
c = [5,15,25]
value = [ ]

对于我在c:
getdatas = df.where((df ['a']> = i - tol)&(df ['a'] <= $ d
value.append([x for get in getdatas])
print(value)


After some help in the forum I managed to do what I was looking for and now I need to get to the next level. ( the long explanation is here: Python Data Frame: cumulative sum of column until condition is reached and return the index):

I have a data frame:

In [3]: df
Out[3]: 
   index  Num_Albums  Num_authors
0      0          10            4
1      1           1            5
2      2           4            4
3      3           7         1000
4      4           1           44
5      5           3            8

I add a column with the cumulative sum of another column.

In [4]: df['cumsum'] = df['Num_Albums'].cumsum()

In [5]: df
Out[5]: 
   index  Num_Albums  Num_authors  cumsum
0      0          10            4      10
1      1           1            5      11
2      2           4            4      15
3      3           7         1000      22
4      4           1           44      23
5      5           3            8      26

Then I apply a condition to the cumsumcolumn and I extract the corresponding values of the row where the condition is met with a given tolerance:

In [18]: tol = 2

In [19]: cond = df.where((df['cumsum']>=15-tol)&(df['cumsum']<=15+tol)).dropna()

In [20]: cond
Out[20]: 
   index  Num_Albums  Num_authors  cumsum
2    2.0         4.0          4.0    15.0

Now, what I want to do is to substitute to the condition 15 in the example, the conditions stored in an array. Check when the condition is met and retrieve not the entire row, but only the value of the column Num_Albums. Finally, all these retrieved values (one per condition) are stored in an array or list. Coming from matlab, I would do something like this (I apologize for this mixed matlab/python syntax):

conditions = np.array([10, 15, 23])
for i=0:len(conditions)
   retrieved_values(i) = df.where((df['cumsum']>=conditions(i)-tol)&(df['cumsum']<=conditions(i)+tol)).dropna()

So for the data frame above I would get (for tol=0):

retrieved_values = [10, 4, 1]

I would like a solution that lets me keep the .where function if possible..

解决方案

well the output not always be 1 number right? in case the ouput is exact 1 number you can write this code

tol = 0
#condition
c = [5,15,25]
value = []

for i in c:
    if len(df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a']) > 0:
        value = value + [df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a'].values[0]]
    else:
        value = value + [[]]
print(value)

the output should be like

[1,2,3]

in case the output can be multiple number and want to be like this

[[1.0, 5.0], [12.0, 15.0], [25.0]]

you can use this code

tol = 5
c = [5,15,25]
value = []

for i in c:
    getdatas = df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a'].values
    value.append([x for x in getdatas])
print(value)

这篇关于Python:迭代数据帧列,检查存储在数组中的条件值,并将值提供给列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆