pandas :按日历周分组,然后为实际日期时间绘制分组的条形图 [英] Pandas: Group by calendar-week, then plot grouped barplots for the real datetime

查看:208
本文介绍了 pandas :按日历周分组,然后为实际日期时间绘制分组的条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑



我找到了一个相当不错的解决方案,并将其作为答案发布在下面。
结果将如下所示:






您可以为此问题生成一些示例数据:

  codes = list('ABCDEFGH'); 
dates = pd.Series(pd.date_range('2013-11-01','2014-01-31'));
dates = date.append(日期)
dates.sort()
df = pd.DataFrame({'amount':np.random.randint(1,10,dates.size) ,'col1':np.random.choice(codes,dates.size),'col2':np.random.choice(codes,dates.size),'date':dates})




$ b



  In [ 55]:df 
出[55]:
金额col1 col2日期
0 1 DE 2013-11-01
0 5 EB 2013-11-01
1 5 GA 2013-11-02
1 7 DH 2013-11-02
2 5 EG 2013-11-03
2 4 HG 2013-11-03
3 7 AF 2013-11-04
3 3 AA 2013-11-04
4 1 EG 2013-11-05
4 7 DC 2013-11-05
5 5 CA 2013- 11-06
5 7 HF 2013-11-06
6 1 GB 2013-11-07
6 8 DA 2013-11-07
7 1 BH 2013-11- 08
7 8 FH 2013-11-08
8 3 AE 2013-11-09
8 1 HD 2013-11-09
9 3 BD 2013-11-10
9 1 HG 2013-11-10
10 6 EE 2013- 11-11
10 6 FE 2013-11-11
11 2 GB 2013-11-12
11 5 HH 2013-11-12
12 5 FG 2013-11- 13
12 5 GB 2013-11-13
13 8 HB 2013-11-14
13 6 GF 2013-11-14
14 9 FC 2013-11-15
14 4 HA 2013-11-15
.. ... ... ... ...
77 9 AB 2014-01-17
77 7 EB 2014- 01-17
78 4 FE 2014-01-18
78 6 BE 2014-01-18
79 6 AH 2014-01-19
79 3 GD 2014-01- 19
80 7 EE 2014-01-20
80 6 GC 2014-01-20
81 9 HG 2014-01-21
81 9 CB 2014-01-21
82 2 DD 2014-01-22
82 7 DA 2014-01-22
83 6 GB 2014-01-23
83 1 AG 2014-01-23
84 9 BD 2014-01-24
84 7 GD 2014-01-24
85 7 AF 2014- 01-25
85 9 BH 2014-01-25
86 9 CD 2014-01-26
86 5 EB 2014-01-26
87 3 CH 2014-01- 27
87 7 FD 2014-01-27
88 3 DG 2014-01-28
88 4 AD 2014-01-28
89 2 FA 2014-01-29
89 8 DA 2014-01-29
90 1 AG 2014-01-30
90 6 CA 2014-01-30
91 6 HC 2014-01-31
91 2 GF 2014-01-31

[184行x 4列]

我想按日历周和 col1 的值进行分组。像这样:

  kw = lambda x:x.isocalendar()[1] 
grouped = df.groupby( [df ['date']。map(kw),'col1'],sort = False).agg({'amount':'sum'})

导致:

$ $ p $ 在[58]中:分组
出[58]:
金额
日期col1
44 D 8
E 10
G 5
H 4
45 D 15
E 1
G 1
H 9
A 13
C 5
B 4
F 8
46 E 7
G 13
H 17
B 9
F 23
47 G 14
H 4
A 40
C 7
B 16
F 13
48 D 7
E 16
G 9
H 2
A 7
C 7
B 2
... ...
1 H 14
A 14
B 15
F 19
2 D 13
H 13
A 13
B 10
F 32
3 D 8
E 18
G 3
H 6
A 30
C 9
B 6
F 5
4 D 9
E 12
G 19
H 9
A 8
C 18
B 18
5 D 11
G 2
H 6
A 5
C 9
F 9

[87 rows x 1 columns]

然后我想要一个绘图生成像这:
这意味着:日历周和年份(日期时间)在x轴上和for每个分组的 col1 一个栏。



我面对的问题是:我只有整数描述日历周(图中的KW),但我不知何故必须合并日期以得到年份标记的刻度。此外,我不仅可以绘制分组日历周,因为我需要项目的正确顺序(kw 47,kw 48(2013年)必须位于kw 1的左侧(因为这是2014年))。 p>




编辑



我从这里计算出:
http://pandas.pydata.org/pandas-docs/stable/visualization.html#可视化 - barplot ,分组的酒吧需要是列而不是行。所以我想过如何转换数据并找到方法 pivot ,这是一个很好的功能。需要 reset_index 才能将多索引转换为列。最后,我填零 NaN s零:

  A =分组.setset(0)

$ b .reset_index()。
$ b

将数据转换为:

  col1 ABCDEFGH 
日期
1 4 31 0 0 0 18 13 8
2 0 12 13 22 1 17 0 8
3 3 10 4 13 12 8 7 6
4 17 0 10 7 0 25 7 4
5 7 0 7 9 8 6 0 7
44 0 0 2 11 7 0 0 2
45 9 3 2 14 0 16 21 2
46 0 14 7 2 17 13 11 8
47 5 13 0 15 19 7 5 10
48 15 8 12 2 20 4 7 6
49 20 0 0 18 22 17 11 0
50 7 11 8 6 5 6 13 10
51 8 26 0 0 5 5 16 9
52 8 13 7 5 4 10 0 11

它看起来像文档中的示例数据,可以在分组中绘制酒吧:

  A。 plot(kind ='bar')

得到这个:




而我有轴的问题,因为它现在被排序(从1-52),这实际上是错误的,因为在这种情况下,日历周52属于2013年...有关如何合并回真实日期时间的任何想法日历周,并使用它们作为x轴ticks?

解决方案

好的我自己回答这个问题,我终于明白了。关键是不要按日历周分组(因为您会丢失有关年份的信息),而是按包含日历周和年的字符串进行分组。 然后更改布局(重塑),正如问题中提到的那样,使用 pivot 。该日期将成为索引。使用 reset_index()使当前 date -index成为一列,取而代之的是整数范围作为索引然后按照正确的顺序进行绘制(最低年/日历周为索引0,最高年/日历周为最高整数)。



选择 date --column作为新变量 ticks 作为列表并从DataFrame中删除该列。设置xticks的标签为 ticks 。完整的解决方案非常简单,在这里:

  codes = list('ABCDEFGH'); 
dates = pd.Series(pd.date_range('2013-11-01','2014-01-31'));
日期= date.append(日期)
dates.sort()
df = pd.DataFrame({'amount':np.random.randint(1,10,dates.size),'col1': np.random.choice(codes,dates.size),'col2':np.random.choice(codes,dates.size),'date':dates})

kw = lambda x: x.isocalendar()[1];
kw_year = lambda x:str(x.yea r)+' - '+ str(x.isocalendar()[1])$ ​​b $ b grouped = df.groupby([df ['date']。map(kw_year),'col1'],sort = False, filln(0)as_index = False).agg({'amount':'sum'})
A = grouped.pivot(index ='date',columns ='col1',values ='amount' .reset_index()

ticks = A.date.values.tolist()
del A ['date']
ax = A.plot(kind ='bar')
ax.set_xticklabels(ticks)

结果:




EDIT

I found a quite nice solution and posted it below as an answer. The result will look like this:


Some example data you can generate for this problem:

codes = list('ABCDEFGH'); 
dates = pd.Series(pd.date_range('2013-11-01', '2014-01-31')); 
dates = dates.append(dates)
dates.sort()
df = pd.DataFrame({'amount': np.random.randint(1, 10, dates.size), 'col1': np.random.choice(codes, dates.size), 'col2': np.random.choice(codes, dates.size), 'date': dates})

resulting in:

In [55]: df
Out[55]:
    amount col1 col2       date
0        1    D    E 2013-11-01
0        5    E    B 2013-11-01
1        5    G    A 2013-11-02
1        7    D    H 2013-11-02
2        5    E    G 2013-11-03
2        4    H    G 2013-11-03
3        7    A    F 2013-11-04
3        3    A    A 2013-11-04
4        1    E    G 2013-11-05
4        7    D    C 2013-11-05
5        5    C    A 2013-11-06
5        7    H    F 2013-11-06
6        1    G    B 2013-11-07
6        8    D    A 2013-11-07
7        1    B    H 2013-11-08
7        8    F    H 2013-11-08
8        3    A    E 2013-11-09
8        1    H    D 2013-11-09
9        3    B    D 2013-11-10
9        1    H    G 2013-11-10
10       6    E    E 2013-11-11
10       6    F    E 2013-11-11
11       2    G    B 2013-11-12
11       5    H    H 2013-11-12
12       5    F    G 2013-11-13
12       5    G    B 2013-11-13
13       8    H    B 2013-11-14
13       6    G    F 2013-11-14
14       9    F    C 2013-11-15
14       4    H    A 2013-11-15
..     ...  ...  ...        ...
77       9    A    B 2014-01-17
77       7    E    B 2014-01-17
78       4    F    E 2014-01-18
78       6    B    E 2014-01-18
79       6    A    H 2014-01-19
79       3    G    D 2014-01-19
80       7    E    E 2014-01-20
80       6    G    C 2014-01-20
81       9    H    G 2014-01-21
81       9    C    B 2014-01-21
82       2    D    D 2014-01-22
82       7    D    A 2014-01-22
83       6    G    B 2014-01-23
83       1    A    G 2014-01-23
84       9    B    D 2014-01-24
84       7    G    D 2014-01-24
85       7    A    F 2014-01-25
85       9    B    H 2014-01-25
86       9    C    D 2014-01-26
86       5    E    B 2014-01-26
87       3    C    H 2014-01-27
87       7    F    D 2014-01-27
88       3    D    G 2014-01-28
88       4    A    D 2014-01-28
89       2    F    A 2014-01-29
89       8    D    A 2014-01-29
90       1    A    G 2014-01-30
90       6    C    A 2014-01-30
91       6    H    C 2014-01-31
91       2    G    F 2014-01-31

[184 rows x 4 columns]

I'd like to group by calendar-week and by value of col1. Like this:

kw = lambda x: x.isocalendar()[1]
grouped = df.groupby([df['date'].map(kw), 'col1'], sort=False).agg({'amount': 'sum'})

resulting in:

In [58]: grouped
Out[58]:
           amount
date col1
44   D          8
     E         10
     G          5
     H          4
45   D         15
     E          1
     G          1
     H          9
     A         13
     C          5
     B          4
     F          8
46   E          7
     G         13
     H         17
     B          9
     F         23
47   G         14
     H          4
     A         40
     C          7
     B         16
     F         13
48   D          7
     E         16
     G          9
     H          2
     A          7
     C          7
     B          2
...           ...
1    H         14
     A         14
     B         15
     F         19
2    D         13
     H         13
     A         13
     B         10
     F         32
3    D          8
     E         18
     G          3
     H          6
     A         30
     C          9
     B          6
     F          5
4    D          9
     E         12
     G         19
     H          9
     A          8
     C         18
     B         18
5    D         11
     G          2
     H          6
     A          5
     C          9
     F          9

[87 rows x 1 columns]

Then I want a plot to be generated like this: That means: calendar-week and year (datetime) on the x-axis and for each of the grouped col1 one bar.

The problem I'm facing is: I only have integers describing the calendar week (KW in the plot), but I somehow have to merge back the date on it to get the ticks labeled by year as well. Furthermore I can't only plot the grouped calendar week because I need a correct order of the items (kw 47, kw 48 (year 2013) have to be on the left side of kw 1 (because this is 2014)).


EDIT

I figured out from here: http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-barplot that grouped bars need to be columns instead of rows. So I thought about how to transform the data and found the method pivot which turns out to be a great function. reset_index is needed to transform the multiindex into columns. At the end I fill NaNs by zero:

A = grouped.reset_index().pivot(index='date', columns='col1', values='amount').fillna(0)

transforms the data into:

col1   A   B   C   D   E   F   G   H
date
1      4  31   0   0   0  18  13   8
2      0  12  13  22   1  17   0   8
3      3  10   4  13  12   8   7   6
4     17   0  10   7   0  25   7   4
5      7   0   7   9   8   6   0   7
44     0   0   2  11   7   0   0   2
45     9   3   2  14   0  16  21   2
46     0  14   7   2  17  13  11   8
47     5  13   0  15  19   7   5  10
48    15   8  12   2  20   4   7   6
49    20   0   0  18  22  17  11   0
50     7  11   8   6   5   6  13  10
51     8  26   0   0   5   5  16   9
52     8  13   7   5   4  10   0  11

which looks like the example data in the docs to be plotted in grouped bars:

A. plot(kind='bar')

gets this:

whereas I have the problem with the axis as it is now sorted (from 1-52), which is actually wrong, because calendar week 52 belongs to year 2013 in this case... Any ideas on how to merge back the real datetime for the calendar-weeks and use them as x-axis ticks?

解决方案

Okay I answer the question myself as I finally figured it out. The key is to not group by calendar week (as you would loose information about the year) but rather group by a string containing calendar week and year.

Then change the layout (reshaping) as mentioned in the question already by using pivot. The date will be the index. Use reset_index() to make the current date-index a column and instead get a integer-range as an index (which is then in the correct order to be plotted (lowest-year/calendar week is index 0 and highest year/calendar week is the highest integer).

Select the date-column as a new variable ticks as a list and delete that column from the DataFrame. Now plot the bars and simply set the labels of the xticks to ticks. Completey solution is quite easy and here:

codes = list('ABCDEFGH'); 
dates = pd.Series(pd.date_range('2013-11-01', '2014-01-31')); 
dates = dates.append(dates)
dates.sort()
df = pd.DataFrame({'amount': np.random.randint(1, 10, dates.size), 'col1': np.random.choice(codes, dates.size), 'col2': np.random.choice(codes, dates.size), 'date': dates})

kw = lambda x: x.isocalendar()[1]; 
kw_year = lambda x: str(x.year) + ' - ' + str(x.isocalendar()[1])
grouped = df.groupby([df['date'].map(kw_year), 'col1'], sort=False, as_index=False).agg({'amount': 'sum'})
A = grouped.pivot(index='date', columns='col1', values='amount').fillna(0).reset_index()

ticks = A.date.values.tolist()
del A['date']
ax = A.plot(kind='bar')
ax.set_xticklabels(ticks)

RESULT:

这篇关于 pandas :按日历周分组,然后为实际日期时间绘制分组的条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆