Pandas:创建一个条件列,并根据另一个df.groupby中的2列值返回一个值 [英] Pandas: create a conditional column and return a value based on the values of 2 columns in another df.groupby
问题描述
此问题是另一个的扩展问题但采用不同的方法。我有以下2个dfs:
This question is an extension of another question but with a different approach. I have the following 2 dfs:
(if someone can show me a more efficient way of creating the df below,instead of writing it out by hand, that would be great)
yrs = pd.DataFrame({'years': [1950, 1951, 1952, 1953, 1954, 1955, \
1956, 1957,1958,1959,1960,1961,1962,1963,1964,1965,1967,1968,1969,\
1970,1971,1972,1973,1974,1975,1976,10977,1978,1979,1980,1981,1982,\
1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,\
1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,\
2009,2010,2011,2012,2013,2014]}, index=[1,2,3,4,5,6,7,8,9,10,11,12,\
13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,\
35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,51,52,53,54,55,56,57,\
58,59,60,61,62,63,64,65])
yrs
years
1 1950
2 1951
3 1952
4 1953
5 1954
........
58 2007
59 2008
60 2009
61 2010
62 2011
63 2012
64 2013
65 2014
dfyears.head(30).to_dict()
{'end': {0: 1995,1: 1997,2: 1999,3: 2001,4: 2003,5: 2005,6: 2007,07: 2013,
8: 2014,9: 1995,10: 2007,11: 2013,12: 2014,13: 1989,14: 1991, 15: 1993,
16: 1995,17: 1997,18: 1999,19: 2001,20: 2003,21: 2005,22: 2007,23: 2013,
24: 2014,25: 1985,26: 1987,27: 1989,28: 1991,29: 1993},'idthomas': {0: 136,1: 136,2: 136,3: 136,4: 136,5: 136,6: 136,7: 136,8: 136,9: 172,10: 172,
11: 172,12: 172,13: 174,14: 174,15: 174,16: 174,17: 174,18: 174,19: 174,
20: 174, 21: 174,22: 174,23: 174,24: 174,25: 179,26: 179,27: 179,28: 179,
29: 179}, 'start': {0: 1993,1: 1995,2: 1997,3: 1999,4: 2001,5: 2003,6: 2005,7: 2007,8: 2013,9: 1993,10: 2001,11: 2007,12: 2013,13: 1987,14: 1989,
15: 1991,16: 1993,17: 1995,18: 1997, 19: 1999,20: 2001,21: 2003, 22: 2005,
23: 2007,24: 2013, 25: 1983,26: 1985,27: 1987,28: 1989,29: 1991}}
dfyears.head(30)
end start idthomas
0 1995 1993 136
1 1997 1995 136
2 1999 1997 136
3 2001 1999 136
4 2003 2001 136
5 2005 2003 136
6 2007 2005 136
7 2013 2007 136
8 2014 2013 136
9 1995 1993 172
10 2007 2001 172
11 2013 2007 172
12 2014 2013 172
我想创建一个列==服务
,将返回1或0,条件是列中的对应值==年
是> = start
或< = end
,这将同时创建一个 column == idthomas
从与所应用条件对应的行返回 idthomas值
。下面是我想要的一个例子:
I want to create a column == served
in yrs that will return a 1 or a 0 conditioned on whether the corresponding value in column == years
is >= start
or <= end
and, that will simultaneously create a column == idthomas
that returns the idthomas value
from the row that corresponds to the condition being applied. Below, is an example of what I want:
years served idthomas
1 1950 0 136
2 1951 0 136
3 1952 0 136
4 1953 0 136
5 1954 0 136
...................
43 1993 1 136
44 1994 1 136
45 1995 1 136
46 1996 1 136
47 1997 1 136
48 1998 1 136
49 1999 1 136
51 2000 1 136
52 2001 1 136
53 2002 1 136
54 2003 1 136
55 2004 1 136
56 2005 1 136
57 2006 1 136
58 2007 1 136
59 2008 1 136
60 2009 1 136
61 2010 1 136
62 2011 1 136
63 2012 1 136
64 2013 1 136
65 2014 1 136
66 1950 0 172
67 1951 0 172
68 1952 0 172
69 1953 0 172
70 1954 0 172
...................
72 1993 1 172
73 1994 1 172
74 1995 1 172
75 1996 0 172
76 1997 0 172
77 1998 0 172
78 1999 0 172
79 2000 0 172
80 2001 1 172
81 2002 1 172
82 2003 1 172
83 2004 1 172
84 2005 1 172
85 2006 1 172
86 2007 1 172
87 2008 1 172
88 2009 1 172
89 2010 1 172
90 2011 1 172
91 2012 1 172
92 2013 1 172
93 2014 1 172
我输入东西来编码。这是令人尴尬的原油:
I typed out 'something' to code this. It's embarrassingly crude:
uu=dfyears.groupby('idthomas')
yrs['did_service'] == 1 if:
# somewhere in the next line I think that I need to do some sort of
# tuple so that I can grab the value in the 'idthomas' column that
# is associated with the comparison that I am doing.
x in years >= uu.start | x in years <= uu.end
else == 0
如果不是工作然后我将手工完成工作。我只是问,如果有人尝试而且不能,那就让我知道,这样我就可以了解这个想法的活力。
If this does not work then I will be doing the work by hand. I only ask that if someone tries and is not able, then just let me know so I can have an idea of the vitality of the idea.
推荐答案
我可以提供时间序列的帮助,你不需要手动输入数据,这是你如何做到的。
I can help with the time series, you don't need to type the data by hand, here's how you can do it.
pd.DataFrame(np.array(pd.date_range(start='1900', end='1920', freq='A').strftime('%Y')), columns=['years'])
或丢失 .strftime()
如果你想要有几个月,几天和完整日期换句话说。
or lose the .strftime()
if you want to have months and days and full date in other words.
为了运行你所描述的逻辑,我认为np.where可能有效很好,类似(未经测试)
For running the logic you are describing, I was thinking that np.where might work fine, something like (not tested)
yrs['served'] = np.where((yrs['years'] >= dfyears['start'] | yrs['years'] <= dfyears['end']), 1, 0)
然而,根据你的exa,这并不能真正解决你想要为yrs添加新行的事实至少是mple。
However, that wouldn't really address the fact that you want to add new rows to yrs, according to your example at least.
我知道这不是一个完整的答案,但我希望它在一定程度上有所帮助。
I know this is not a complete answer, but I hope it helps to some extent.
这篇关于Pandas:创建一个条件列,并根据另一个df.groupby中的2列值返回一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!