pandas :按日历周分组,然后为实际日期时间绘制分组的条形图 [英] Pandas: Group by calendar-week, then plot grouped barplots for the real datetime
问题描述
编辑
我找到了一个相当不错的解决方案,并将其作为答案发布在下面。
结果将如下所示:
您可以为此问题生成一些示例数据:
codes = list('ABCDEFGH');
dates = pd.Series(pd.date_range('2013-11-01','2014-01-31'));
dates = date.append(日期)
dates.sort()
df = pd.DataFrame({'amount':np.random.randint(1,10,dates.size) ,'col1':np.random.choice(codes,dates.size),'col2':np.random.choice(codes,dates.size),'date':dates})
$ c
$ b
In [ 55]:df
出[55]:
金额col1 col2日期
0 1 DE 2013-11-01
0 5 EB 2013-11-01
1 5 GA 2013-11-02
1 7 DH 2013-11-02
2 5 EG 2013-11-03
2 4 HG 2013-11-03
3 7 AF 2013-11-04
3 3 AA 2013-11-04
4 1 EG 2013-11-05
4 7 DC 2013-11-05
5 5 CA 2013- 11-06
5 7 HF 2013-11-06
6 1 GB 2013-11-07
6 8 DA 2013-11-07
7 1 BH 2013-11- 08
7 8 FH 2013-11-08
8 3 AE 2013-11-09
8 1 HD 2013-11-09
9 3 BD 2013-11-10
9 1 HG 2013-11-10
10 6 EE 2013- 11-11
10 6 FE 2013-11-11
11 2 GB 2013-11-12
11 5 HH 2013-11-12
12 5 FG 2013-11- 13
12 5 GB 2013-11-13
13 8 HB 2013-11-14
13 6 GF 2013-11-14
14 9 FC 2013-11-15
14 4 HA 2013-11-15
.. ... ... ... ...
77 9 AB 2014-01-17
77 7 EB 2014- 01-17
78 4 FE 2014-01-18
78 6 BE 2014-01-18
79 6 AH 2014-01-19
79 3 GD 2014-01- 19
80 7 EE 2014-01-20
80 6 GC 2014-01-20
81 9 HG 2014-01-21
81 9 CB 2014-01-21
82 2 DD 2014-01-22
82 7 DA 2014-01-22
83 6 GB 2014-01-23
83 1 AG 2014-01-23
84 9 BD 2014-01-24
84 7 GD 2014-01-24
85 7 AF 2014- 01-25
85 9 BH 2014-01-25
86 9 CD 2014-01-26
86 5 EB 2014-01-26
87 3 CH 2014-01- 27
87 7 FD 2014-01-27
88 3 DG 2014-01-28
88 4 AD 2014-01-28
89 2 FA 2014-01-29
89 8 DA 2014-01-29
90 1 AG 2014-01-30
90 6 CA 2014-01-30
91 6 HC 2014-01-31
91 2 GF 2014-01-31
[184行x 4列]
我想按日历周和 col1
的值进行分组。像这样:
kw = lambda x:x.isocalendar()[1]
grouped = df.groupby( [df ['date']。map(kw),'col1'],sort = False).agg({'amount':'sum'})
导致:
$ $ p $ 在[58]中:分组
出[58]:
金额
日期col1
44 D 8
E 10
G 5
H 4
45 D 15
E 1
G 1
H 9
A 13
C 5
B 4
F 8
46 E 7
G 13
H 17
B 9
F 23
47 G 14
H 4
A 40
C 7
B 16
F 13
48 D 7
E 16
G 9
H 2
A 7
C 7
B 2
... ...
1 H 14
A 14
B 15
F 19
2 D 13
H 13
A 13
B 10
F 32
3 D 8
E 18
G 3
H 6
A 30
C 9
B 6
F 5
4 D 9
E 12
G 19
H 9
A 8
C 18
B 18
5 D 11
G 2
H 6
A 5
C 9
F 9
[87 rows x 1 columns]
然后我想要一个绘图生成像这:
这意味着:日历周和年份(日期时间)在x轴上和for每个分组的 col1
一个栏。
我面对的问题是:我只有整数描述日历周(图中的KW),但我不知何故必须合并日期以得到年份标记的刻度。此外,我不仅可以绘制分组日历周,因为我需要项目的正确顺序(kw 47,kw 48(2013年)必须位于kw 1的左侧(因为这是2014年))。 p>
编辑
我从这里计算出:
http://pandas.pydata.org/pandas-docs/stable/visualization.html#可视化 - barplot ,分组的酒吧需要是列而不是行。所以我想过如何转换数据并找到方法 pivot
,这是一个很好的功能。需要 reset_index
才能将多索引转换为列。最后,我填零 NaN
s零:
A =分组.setset(0)
$ b .reset_index()。
$ b
将数据转换为:
col1 ABCDEFGH
日期
1 4 31 0 0 0 18 13 8
2 0 12 13 22 1 17 0 8
3 3 10 4 13 12 8 7 6
4 17 0 10 7 0 25 7 4
5 7 0 7 9 8 6 0 7
44 0 0 2 11 7 0 0 2
45 9 3 2 14 0 16 21 2
46 0 14 7 2 17 13 11 8
47 5 13 0 15 19 7 5 10
48 15 8 12 2 20 4 7 6
49 20 0 0 18 22 17 11 0
50 7 11 8 6 5 6 13 10
51 8 26 0 0 5 5 16 9
52 8 13 7 5 4 10 0 11
它看起来像文档中的示例数据,可以在分组中绘制酒吧:
A。 plot(kind ='bar')
得到这个:
而我有轴的问题,因为它现在被排序(从1-52),这实际上是错误的,因为在这种情况下,日历周52属于2013年...有关如何合并回真实日期时间的任何想法日历周,并使用它们作为x轴ticks?
好的我自己回答这个问题,我终于明白了。关键是不要按日历周分组(因为您会丢失有关年份的信息),而是按包含日历周和年的字符串进行分组。 然后更改布局(重塑),正如问题中提到的那样,使用 pivot
。该日期将成为索引。使用 reset_index()
使当前 date
-index成为一列,取而代之的是整数范围作为索引然后按照正确的顺序进行绘制(最低年/日历周为索引0,最高年/日历周为最高整数)。
选择 date
--column作为新变量 ticks
作为列表并从DataFrame中删除该列。设置xticks的标签为 ticks
。完整的解决方案非常简单,在这里:
codes = list('ABCDEFGH');
dates = pd.Series(pd.date_range('2013-11-01','2014-01-31'));
日期= date.append(日期)
dates.sort()
df = pd.DataFrame({'amount':np.random.randint(1,10,dates.size),'col1': np.random.choice(codes,dates.size),'col2':np.random.choice(codes,dates.size),'date':dates})
kw = lambda x: x.isocalendar()[1];
kw_year = lambda x:str(x.yea r)+' - '+ str(x.isocalendar()[1])$ b $ b grouped = df.groupby([df ['date']。map(kw_year),'col1'],sort = False, filln(0)as_index = False).agg({'amount':'sum'})
A = grouped.pivot(index ='date',columns ='col1',values ='amount' .reset_index()
ticks = A.date.values.tolist()
del A ['date']
ax = A.plot(kind ='bar')
ax.set_xticklabels(ticks)
结果:
EDIT
I found a quite nice solution and posted it below as an answer. The result will look like this:
Some example data you can generate for this problem:
codes = list('ABCDEFGH');
dates = pd.Series(pd.date_range('2013-11-01', '2014-01-31'));
dates = dates.append(dates)
dates.sort()
df = pd.DataFrame({'amount': np.random.randint(1, 10, dates.size), 'col1': np.random.choice(codes, dates.size), 'col2': np.random.choice(codes, dates.size), 'date': dates})
resulting in:
In [55]: df
Out[55]:
amount col1 col2 date
0 1 D E 2013-11-01
0 5 E B 2013-11-01
1 5 G A 2013-11-02
1 7 D H 2013-11-02
2 5 E G 2013-11-03
2 4 H G 2013-11-03
3 7 A F 2013-11-04
3 3 A A 2013-11-04
4 1 E G 2013-11-05
4 7 D C 2013-11-05
5 5 C A 2013-11-06
5 7 H F 2013-11-06
6 1 G B 2013-11-07
6 8 D A 2013-11-07
7 1 B H 2013-11-08
7 8 F H 2013-11-08
8 3 A E 2013-11-09
8 1 H D 2013-11-09
9 3 B D 2013-11-10
9 1 H G 2013-11-10
10 6 E E 2013-11-11
10 6 F E 2013-11-11
11 2 G B 2013-11-12
11 5 H H 2013-11-12
12 5 F G 2013-11-13
12 5 G B 2013-11-13
13 8 H B 2013-11-14
13 6 G F 2013-11-14
14 9 F C 2013-11-15
14 4 H A 2013-11-15
.. ... ... ... ...
77 9 A B 2014-01-17
77 7 E B 2014-01-17
78 4 F E 2014-01-18
78 6 B E 2014-01-18
79 6 A H 2014-01-19
79 3 G D 2014-01-19
80 7 E E 2014-01-20
80 6 G C 2014-01-20
81 9 H G 2014-01-21
81 9 C B 2014-01-21
82 2 D D 2014-01-22
82 7 D A 2014-01-22
83 6 G B 2014-01-23
83 1 A G 2014-01-23
84 9 B D 2014-01-24
84 7 G D 2014-01-24
85 7 A F 2014-01-25
85 9 B H 2014-01-25
86 9 C D 2014-01-26
86 5 E B 2014-01-26
87 3 C H 2014-01-27
87 7 F D 2014-01-27
88 3 D G 2014-01-28
88 4 A D 2014-01-28
89 2 F A 2014-01-29
89 8 D A 2014-01-29
90 1 A G 2014-01-30
90 6 C A 2014-01-30
91 6 H C 2014-01-31
91 2 G F 2014-01-31
[184 rows x 4 columns]
I'd like to group by calendar-week and by value of col1
. Like this:
kw = lambda x: x.isocalendar()[1]
grouped = df.groupby([df['date'].map(kw), 'col1'], sort=False).agg({'amount': 'sum'})
resulting in:
In [58]: grouped
Out[58]:
amount
date col1
44 D 8
E 10
G 5
H 4
45 D 15
E 1
G 1
H 9
A 13
C 5
B 4
F 8
46 E 7
G 13
H 17
B 9
F 23
47 G 14
H 4
A 40
C 7
B 16
F 13
48 D 7
E 16
G 9
H 2
A 7
C 7
B 2
... ...
1 H 14
A 14
B 15
F 19
2 D 13
H 13
A 13
B 10
F 32
3 D 8
E 18
G 3
H 6
A 30
C 9
B 6
F 5
4 D 9
E 12
G 19
H 9
A 8
C 18
B 18
5 D 11
G 2
H 6
A 5
C 9
F 9
[87 rows x 1 columns]
Then I want a plot to be generated like this:
That means: calendar-week and year (datetime) on the x-axis and for each of the grouped col1
one bar.
The problem I'm facing is: I only have integers describing the calendar week (KW in the plot), but I somehow have to merge back the date on it to get the ticks labeled by year as well. Furthermore I can't only plot the grouped calendar week because I need a correct order of the items (kw 47, kw 48 (year 2013) have to be on the left side of kw 1 (because this is 2014)).
EDIT
I figured out from here:
http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-barplot that grouped bars need to be columns instead of rows. So I thought about how to transform the data and found the method pivot
which turns out to be a great function. reset_index
is needed to transform the multiindex into columns. At the end I fill NaN
s by zero:
A = grouped.reset_index().pivot(index='date', columns='col1', values='amount').fillna(0)
transforms the data into:
col1 A B C D E F G H
date
1 4 31 0 0 0 18 13 8
2 0 12 13 22 1 17 0 8
3 3 10 4 13 12 8 7 6
4 17 0 10 7 0 25 7 4
5 7 0 7 9 8 6 0 7
44 0 0 2 11 7 0 0 2
45 9 3 2 14 0 16 21 2
46 0 14 7 2 17 13 11 8
47 5 13 0 15 19 7 5 10
48 15 8 12 2 20 4 7 6
49 20 0 0 18 22 17 11 0
50 7 11 8 6 5 6 13 10
51 8 26 0 0 5 5 16 9
52 8 13 7 5 4 10 0 11
which looks like the example data in the docs to be plotted in grouped bars:
A. plot(kind='bar')
gets this:
whereas I have the problem with the axis as it is now sorted (from 1-52), which is actually wrong, because calendar week 52 belongs to year 2013 in this case... Any ideas on how to merge back the real datetime for the calendar-weeks and use them as x-axis ticks?
Okay I answer the question myself as I finally figured it out. The key is to not group by calendar week (as you would loose information about the year) but rather group by a string containing calendar week and year.
Then change the layout (reshaping) as mentioned in the question already by using pivot
. The date will be the index. Use reset_index()
to make the current date
-index a column and instead get a integer-range as an index (which is then in the correct order to be plotted (lowest-year/calendar week is index 0 and highest year/calendar week is the highest integer).
Select the date
-column as a new variable ticks
as a list and delete that column from the DataFrame. Now plot the bars and simply set the labels of the xticks to ticks
. Completey solution is quite easy and here:
codes = list('ABCDEFGH');
dates = pd.Series(pd.date_range('2013-11-01', '2014-01-31'));
dates = dates.append(dates)
dates.sort()
df = pd.DataFrame({'amount': np.random.randint(1, 10, dates.size), 'col1': np.random.choice(codes, dates.size), 'col2': np.random.choice(codes, dates.size), 'date': dates})
kw = lambda x: x.isocalendar()[1];
kw_year = lambda x: str(x.year) + ' - ' + str(x.isocalendar()[1])
grouped = df.groupby([df['date'].map(kw_year), 'col1'], sort=False, as_index=False).agg({'amount': 'sum'})
A = grouped.pivot(index='date', columns='col1', values='amount').fillna(0).reset_index()
ticks = A.date.values.tolist()
del A['date']
ax = A.plot(kind='bar')
ax.set_xticklabels(ticks)
RESULT:
这篇关于 pandas :按日历周分组,然后为实际日期时间绘制分组的条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!