Pandas:创建一个条件列,并根据另一个df.groupby中的2列值返回一个值 [英] Pandas: create a conditional column and return a value based on the values of 2 columns in another df.groupby

查看:341
本文介绍了Pandas:创建一个条件列,并根据另一个df.groupby中的2列值返回一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是另一个的扩展问题但采用不同的方法。我有以下2个dfs:

This question is an extension of another question but with a different approach. I have the following 2 dfs:

(if someone can show me a more efficient way of creating the df below,instead of writing it out by hand, that would be great)

yrs = pd.DataFrame({'years': [1950, 1951, 1952, 1953, 1954, 1955, \
1956, 1957,1958,1959,1960,1961,1962,1963,1964,1965,1967,1968,1969,\
1970,1971,1972,1973,1974,1975,1976,10977,1978,1979,1980,1981,1982,\
1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,\
1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,\
2009,2010,2011,2012,2013,2014]}, index=[1,2,3,4,5,6,7,8,9,10,11,12,\
13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,\
35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,51,52,53,54,55,56,57,\
58,59,60,61,62,63,64,65])

yrs

    years
1   1950
2   1951
3   1952
4   1953
5   1954
........
58  2007
59  2008
60  2009
61  2010
62  2011
63  2012
64  2013
65  2014

dfyears.head(30).to_dict()
{'end': {0: 1995,1: 1997,2: 1999,3: 2001,4: 2003,5: 2005,6: 2007,07: 2013,
  8: 2014,9: 1995,10: 2007,11: 2013,12: 2014,13: 1989,14: 1991, 15: 1993,
  16: 1995,17: 1997,18: 1999,19: 2001,20: 2003,21: 2005,22: 2007,23: 2013,
  24: 2014,25: 1985,26: 1987,27: 1989,28: 1991,29: 1993},'idthomas': {0: 136,1: 136,2: 136,3: 136,4: 136,5: 136,6: 136,7: 136,8: 136,9: 172,10: 172,
  11: 172,12: 172,13: 174,14: 174,15: 174,16: 174,17: 174,18: 174,19: 174,
  20: 174, 21: 174,22: 174,23: 174,24: 174,25: 179,26: 179,27: 179,28: 179,
  29: 179}, 'start': {0: 1993,1: 1995,2: 1997,3: 1999,4: 2001,5: 2003,6: 2005,7: 2007,8: 2013,9: 1993,10: 2001,11: 2007,12: 2013,13: 1987,14: 1989,
  15: 1991,16: 1993,17: 1995,18: 1997, 19: 1999,20: 2001,21: 2003, 22: 2005,
  23: 2007,24: 2013, 25: 1983,26: 1985,27: 1987,28: 1989,29: 1991}}

dfyears.head(30)
    end     start   idthomas
0   1995    1993    136
1   1997    1995    136
2   1999    1997    136
3   2001    1999    136
4   2003    2001    136
5   2005    2003    136
6   2007    2005    136
7   2013    2007    136
8   2014    2013    136
9   1995    1993    172
10  2007    2001    172
11  2013    2007    172
12  2014    2013    172

我想创建一个列==服务,将返回1或0,条件是列中的对应值==年> = start < = end ,这将同时创建一个 column == idthomas 从与所应用条件对应的行返回 idthomas值。下面是我想要的一个例子:

I want to create a column == served in yrs that will return a 1 or a 0 conditioned on whether the corresponding value in column == years is >= start or <= end and, that will simultaneously create a column == idthomas that returns the idthomas value from the row that corresponds to the condition being applied. Below, is an example of what I want:

  years  served idthomas
1   1950    0   136
2   1951    0   136
3   1952    0   136
4   1953    0   136
5   1954    0   136
...................
43  1993    1   136
44  1994    1   136
45  1995    1   136
46  1996    1   136
47  1997    1   136
48  1998    1   136
49  1999    1   136
51  2000    1   136
52  2001    1   136
53  2002    1   136
54  2003    1   136
55  2004    1   136
56  2005    1   136
57  2006    1   136
58  2007    1   136
59  2008    1   136
60  2009    1   136
61  2010    1   136
62  2011    1   136
63  2012    1   136
64  2013    1   136
65  2014    1   136
66  1950    0   172
67  1951    0   172
68  1952    0   172
69  1953    0   172
70  1954    0   172
...................
72  1993    1   172
73  1994    1   172
74  1995    1   172
75  1996    0   172
76  1997    0   172
77  1998    0   172
78  1999    0   172
79  2000    0   172
80  2001    1   172
81  2002    1   172
82  2003    1   172
83  2004    1   172
84  2005    1   172
85  2006    1   172
86  2007    1   172
87  2008    1   172
88  2009    1   172
89  2010    1   172
90  2011    1   172
91  2012    1   172
92  2013    1   172
93  2014    1   172

我输入东西来编码。这是令人尴尬的原油:

I typed out 'something' to code this. It's embarrassingly crude:

uu=dfyears.groupby('idthomas')

yrs['did_service'] == 1 if:
# somewhere in the next line I think that I need to do some sort of
# tuple so that I can grab the value in the 'idthomas' column that 
# is associated with the comparison that I am doing.
    x in years >= uu.start | x in years <= uu.end 
    else == 0

如果不是工作然后我将手工完成工作。我只是问,如果有人尝试而且不能,那就让我知道,这样我就可以了解这个想法的活力。

If this does not work then I will be doing the work by hand. I only ask that if someone tries and is not able, then just let me know so I can have an idea of the vitality of the idea.

推荐答案

我可以提供时间序列的帮助,你不需要手动输入数据,这是你如何做到的。

I can help with the time series, you don't need to type the data by hand, here's how you can do it.

pd.DataFrame(np.array(pd.date_range(start='1900', end='1920', freq='A').strftime('%Y')), columns=['years'])

或丢失 .strftime()如果你想要有几个月,几天和完整日期换句话说。

or lose the .strftime() if you want to have months and days and full date in other words.

为了运行你所描述的逻辑,我认为np.where可能有效很好,类似(未经测试)

For running the logic you are describing, I was thinking that np.where might work fine, something like (not tested)

yrs['served'] = np.where((yrs['years'] >= dfyears['start'] | yrs['years'] <= dfyears['end']), 1, 0)

然而,根据你的exa,这并不能真正解决你想要为yrs添加新行的事实至少是mple。

However, that wouldn't really address the fact that you want to add new rows to yrs, according to your example at least.

我知道这不是一个完整的答案,但我希望它在一定程度上有所帮助。

I know this is not a complete answer, but I hope it helps to some extent.

这篇关于Pandas:创建一个条件列,并根据另一个df.groupby中的2列值返回一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆