将列从一个数据帧添加到python中的数据帧 [英] Add column from one data frame to group-by data frame in python
问题描述
group.by
的每日降水量的总和。 一个数据框看起来像这样(在device_ids之间有更多行):
code>>>> df1
device_id雨日月份
0 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
1 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
6 e7z581f0-2693-42ad-9896-0048550ccda7 0.0 31 12 2016
11 e7z581f0-2693-42ad-9896-0048550ccda7 0.0 31 12 2016
12 ceez972b-135f-45b3-be4w-7c23102676bq 0.2 31 12 2016
13 ceez972b-135f-45b3-be4w-7c23102676bq 0.0 31 12 2016
18 ceez972b-135f-45b3-be4w-7c23102676bq 0.0 31 12 2016
19 1d28dz3a-c923-4967-a7bb-5881d232c9a7 0.0 31 12 2016
24 1d28dz3a-c923-4967-a7bb-5881d232c9a7 0.0 31 12 2016
25 a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 0.0 31 12 2016
29 a044ag4f-fd7c-4ae4 -bff3-9158cebad3b1 0.0 31 12 2016
29 a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 0.0 31 12 2016
... ... ... ... ... ...
3903 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
3904 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
3905 9z849362-b05d -4317-96f5-f267c1adf8d6 0.0 31 12 2016
另外看起来像这样:
>>> df2
rain
device_id
1d28dz3a-c923-4967-a7bb-5881d232c9a7 0.0
9z849362-b05d-4317-96f5-f267c1adf8d6 0.0
a044ag4f-fd7c-4ae4-bff3 -9158cebad3b1 1.2
ceez972b-135f-45b3-be4w-7c23102676bq 2.2
e7z581f0-2693-42ad-9896-0048550ccda7 0.2
...我通过使用:
df2 = df1.groupby([ 'device_id'])[[rain]] sum()
我想要我的最终数据框架看起来像这样:
>>> df3
rain day month year
device_id
1d28dz3a-c923-4967-a7bb-5881d232c9a7 0.0 31 12 2016
9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 1.2 31 12 2016
ceez972b-135f-45b3-be4w-7c23102676bq 2.2 31 12 2016
e7z581f0-2693-42ad-9896-0048550ccda7 0.2 31 12 2016
这就是说我想从 df1中的日月列
添加到 df2
。我不知道我是否应该使用 merge
, append
或做其他事情。
也许这会奏效吗?
df.groupby(['device_id','day','month',' year'])sum()
rain
device_id日月月
1d28dz3a-c923-4967-a7bb-5881d232c9a7 31 12 2016 0.0
9z849362-b05d-4317-96f5- f267c1adf8d6 31 12 2016 0.0
a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 31 12 2016 0.0
ceez972b-135f-45b3-be4w-7c23102676bq 31 12 2016 0.2
e7z581f0-2693-42ad-9896 -0048550ccda7 31 12 2016 0.0
或者你可以添加 reset_index
将这些列返回到DataFrame,如
df.groupby(['device_id','day','month ','year'])sum()。reset_index()
0 1d28dz3a-c923-4967-a7bb-5881d232c9a7 31 12 2016 0.0
1 9z849362-b05d-4317-96f5- f267c1adf8d6 31 12 2016 0.0
2 a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 31 12 2016 0.0
3 ceez972b-135f-45b3-be4w-7c23102676bq 31 12 2016 0.2
4 e7z581f0-2693-42ad-9896-0048550ccda7 31 12 2016 0.0
或者以下内容应该完全符合您的索引/列结构。
code> df.groupby(['device_id','day','month','year'])sum()。reset_index([1,2,3])
I have two data frames in python. The first is raw rainfall data for a single day of year and the second is the sum of daily rainfall using group.by
.
One data frame looks like this (with many more rows in between device_ids):
>>> df1
device_id rain day month year
0 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
1 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
6 e7z581f0-2693-42ad-9896-0048550ccda7 0.0 31 12 2016
11 e7z581f0-2693-42ad-9896-0048550ccda7 0.0 31 12 2016
12 ceez972b-135f-45b3-be4w-7c23102676bq 0.2 31 12 2016
13 ceez972b-135f-45b3-be4w-7c23102676bq 0.0 31 12 2016
18 ceez972b-135f-45b3-be4w-7c23102676bq 0.0 31 12 2016
19 1d28dz3a-c923-4967-a7bb-5881d232c9a7 0.0 31 12 2016
24 1d28dz3a-c923-4967-a7bb-5881d232c9a7 0.0 31 12 2016
25 a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 0.0 31 12 2016
29 a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 0.0 31 12 2016
29 a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 0.0 31 12 2016
... ... ... ... ... ...
3903 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
3904 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
3905 9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
And the other looks something like this:
>>> df2
rain
device_id
1d28dz3a-c923-4967-a7bb-5881d232c9a7 0.0
9z849362-b05d-4317-96f5-f267c1adf8d6 0.0
a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 1.2
ceez972b-135f-45b3-be4w-7c23102676bq 2.2
e7z581f0-2693-42ad-9896-0048550ccda7 0.2
... which I got by using:
df2 = df1.groupby(['device_id'])[["rain"]].sum()
I want my final data frame to look like this:
>>> df3
rain day month year
device_id
1d28dz3a-c923-4967-a7bb-5881d232c9a7 0.0 31 12 2016
9z849362-b05d-4317-96f5-f267c1adf8d6 0.0 31 12 2016
a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 1.2 31 12 2016
ceez972b-135f-45b3-be4w-7c23102676bq 2.2 31 12 2016
e7z581f0-2693-42ad-9896-0048550ccda7 0.2 31 12 2016
Which is to say that I want the "day month year" columns from df1
to be added to df2
. I'm not sure if I should use merge
, append
, or do something else.
Maybe this will work? groupby day month and year as well.
df.groupby(['device_id', 'day', 'month', 'year']).sum()
rain
device_id day month year
1d28dz3a-c923-4967-a7bb-5881d232c9a7 31 12 2016 0.0
9z849362-b05d-4317-96f5-f267c1adf8d6 31 12 2016 0.0
a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 31 12 2016 0.0
ceez972b-135f-45b3-be4w-7c23102676bq 31 12 2016 0.2
e7z581f0-2693-42ad-9896-0048550ccda7 31 12 2016 0.0
Or you could add reset_index
to return these columns to the DataFrame like
df.groupby(['device_id', 'day', 'month', 'year']).sum().reset_index()
0 1d28dz3a-c923-4967-a7bb-5881d232c9a7 31 12 2016 0.0
1 9z849362-b05d-4317-96f5-f267c1adf8d6 31 12 2016 0.0
2 a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 31 12 2016 0.0
3 ceez972b-135f-45b3-be4w-7c23102676bq 31 12 2016 0.2
4 e7z581f0-2693-42ad-9896-0048550ccda7 31 12 2016 0.0
Or the following should match your index / column structure exactly.
df.groupby(['device_id', 'day', 'month', 'year']).sum().reset_index([1, 2, 3])
这篇关于将列从一个数据帧添加到python中的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!