pandas ：按日历周分组，然后为实际日期时间绘制分组的条形图 [英] Pandas: Group by calendar-week, then plot grouped barplots for the real datetime

查看：208 发布时间：2018/5/30 13:54:27 python datetime pandas calendar group-by

本文介绍了 pandas ：按日历周分组，然后为实际日期时间绘制分组的条形图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

编辑

我找到了一个相当不错的解决方案，并将其作为答案发布在下面。
结果将如下所示：

您可以为此问题生成一些示例数据：

codes = list（'ABCDEFGH'）; dates = pd.Series（pd.date_range（'2013-11-01'，'2014-01-31'））; dates = date.append（日期） dates.sort（） df = pd.DataFrame（{'amount'：np.random.randint（1，10，dates.size），'col1'：np.random.choice（codes，dates.size），'col2'：np.random.choice（codes，dates.size），'date'：dates}） $ b In [ 55]：df 出[55]：金额col1 col2日期 0 1 DE 2013-11-01 0 5 EB 2013-11-01 1 5 GA 2013-11-02 1 7 DH 2013-11-02 2 5 EG 2013-11-03 2 4 HG 2013-11-03 3 7 AF 2013-11-04 3 3 AA 2013-11-04 4 1 EG 2013-11-05 4 7 DC 2013-11-05 5 5 CA 2013- 11-06 5 7 HF 2013-11-06 6 1 GB 2013-11-07 6 8 DA 2013-11-07 7 1 BH 2013-11- 08 7 8 FH 2013-11-08 8 3 AE 2013-11-09 8 1 HD 2013-11-09 9 3 BD 2013-11-10 9 1 HG 2013-11-10 10 6 EE 2013- 11-11 10 6 FE 2013-11-11 11 2 GB 2013-11-12 11 5 HH 2013-11-12 12 5 FG 2013-11- 13 12 5 GB 2013-11-13 13 8 HB 2013-11-14 13 6 GF 2013-11-14 14 9 FC 2013-11-15 14 4 HA 2013-11-15 .. ... ... ... ... 77 9 AB 2014-01-17 77 7 EB 2014- 01-17 78 4 FE 2014-01-18 78 6 BE 2014-01-18 79 6 AH 2014-01-19 79 3 GD 2014-01- 19 80 7 EE 2014-01-20 80 6 GC 2014-01-20 81 9 HG 2014-01-21 81 9 CB 2014-01-21 82 2 DD 2014-01-22 82 7 DA 2014-01-22 83 6 GB 2014-01-23 83 1 AG 2014-01-23 84 9 BD 2014-01-24 84 7 GD 2014-01-24 85 7 AF 2014- 01-25 85 9 BH 2014-01-25 86 9 CD 2014-01-26 86 5 EB 2014-01-26 87 3 CH 2014-01- 27 87 7 FD 2014-01-27 88 3 DG 2014-01-28 88 4 AD 2014-01-28 89 2 FA 2014-01-29 89 8 DA 2014-01-29 90 1 AG 2014-01-30 90 6 CA 2014-01-30 91 6 HC 2014-01-31 91 2 GF 2014-01-31 [184行x 4列] 我想按日历周和 col1 的值进行分组。像这样： kw = lambda x：x.isocalendar（）[1] grouped = df.groupby（ [df ['date']。map（kw），'col1']，sort = False）.agg（{'amount'：'sum'}）导致： $ $ p $ 在[58]中：分组出[58]：金额日期col1 44 D 8 E 10 G 5 H 4 45 D 15 E 1 G 1 H 9 A 13 C 5 B 4 F 8 46 E 7 G 13 H 17 B 9 F 23 47 G 14 H 4 A 40 C 7 B 16 F 13 48 D 7 E 16 G 9 H 2 A 7 C 7 B 2 ... ... 1 H 14 A 14 B 15 F 19 2 D 13 H 13 A 13 B 10 F 32 3 D 8 E 18 G 3 H 6 A 30 C 9 B 6 F 5 4 D 9 E 12 G 19 H 9 A 8 C 18 B 18 5 D 11 G 2 H 6 A 5 C 9 F 9 [87 rows x 1 columns]
然后我想要一个绘图生成像这：这意味着：日历周和年份（日期时间）在x轴上和for每个分组的 col1 一个栏。
我面对的问题是：我只有整数描述日历周（图中的KW），但我不知何故必须合并日期以得到年份标记的刻度。此外，我不仅可以绘制分组日历周，因为我需要项目的正确顺序（kw 47，kw 48（2013年）必须位于kw 1的左侧（因为这是2014年））。 p> 编辑我从这里计算出： http://pandas.pydata.org/pandas-docs/stable/visualization.html#可视化 - barplot ，分组的酒吧需要是列而不是行。所以我想过如何转换数据并找到方法 pivot ，这是一个很好的功能。需要 reset_index 才能将多索引转换为列。最后，我填零 NaN s零： A =分组.setset（0） $ b .reset_index（）。 $ b 将数据转换为： col1 ABCDEFGH 日期 1 4 31 0 0 0 18 13 8 2 0 12 13 22 1 17 0 8 3 3 10 4 13 12 8 7 6 4 17 0 10 7 0 25 7 4 5 7 0 7 9 8 6 0 7 44 0 0 2 11 7 0 0 2 45 9 3 2 14 0 16 21 2 46 0 14 7 2 17 13 11 8 47 5 13 0 15 19 7 5 10 48 15 8 12 2 20 4 7 6 49 20 0 0 18 22 17 11 0 50 7 11 8 6 5 6 13 10 51 8 26 0 0 5 5 16 9 52 8 13 7 5 4 10 0 11 它看起来像文档中的示例数据，可以在分组中绘制酒吧： A。 plot（kind ='bar'）得到这个：而我有轴的问题，因为它现在被排序（从1-52），这实际上是错误的，因为在这种情况下，日历周52属于2013年...有关如何合并回真实日期时间的任何想法日历周，并使用它们作为x轴ticks？解决方案好的我自己回答这个问题，我终于明白了。关键是不要按日历周分组（因为您会丢失有关年份的信息），而是按包含日历周和年的字符串进行分组。然后更改布局（重塑），正如问题中提到的那样，使用 pivot 。该日期将成为索引。使用 reset_index（）使当前 date -index成为一列，取而代之的是整数范围作为索引然后按照正确的顺序进行绘制（最低年/日历周为索引0，最高年/日历周为最高整数）。选择 date --column作为新变量 ticks 作为列表并从DataFrame中删除该列。设置xticks的标签为 ticks 。完整的解决方案非常简单，在这里： codes = list（'ABCDEFGH'）; dates = pd.Series（pd.date_range（'2013-11-01'，'2014-01-31'））; 日期= date.append（日期） dates.sort（） df = pd.DataFrame（{'amount'：np.random.randint（1，10，dates.size），'col1'： np.random.choice（codes，dates.size），'col2'：np.random.choice（codes，dates.size），'date'：dates}） kw = lambda x： x.isocalendar（）[1]; kw_year = lambda x：str（x.yea r）+' - '+ str（x.isocalendar（）[1]）$ b $ b grouped = df.groupby（[df ['date']。map（kw_year），'col1']，sort = False， filln（0）as_index = False）.agg（{'amount'：'sum'}） A = grouped.pivot（index ='date'，columns ='col1'，values ='amount' .reset_index（） ticks = A.date.values.tolist（） del A ['date'] ax = A.plot（kind ='bar'） ax.set_xticklabels（ticks）结果： EDIT I found a quite nice solution and posted it below as an answer. The result will look like this: Some example data you can generate for this problem: codes = list('ABCDEFGH'); dates = pd.Series(pd.date_range('2013-11-01', '2014-01-31')); dates = dates.append(dates) dates.sort() df = pd.DataFrame({'amount': np.random.randint(1, 10, dates.size), 'col1': np.random.choice(codes, dates.size), 'col2': np.random.choice(codes, dates.size), 'date': dates}) resulting in: In [55]: df Out[55]: amount col1 col2 date 0 1 D E 2013-11-01 0 5 E B 2013-11-01 1 5 G A 2013-11-02 1 7 D H 2013-11-02 2 5 E G 2013-11-03 2 4 H G 2013-11-03 3 7 A F 2013-11-04 3 3 A A 2013-11-04 4 1 E G 2013-11-05 4 7 D C 2013-11-05 5 5 C A 2013-11-06 5 7 H F 2013-11-06 6 1 G B 2013-11-07 6 8 D A 2013-11-07 7 1 B H 2013-11-08 7 8 F H 2013-11-08 8 3 A E 2013-11-09 8 1 H D 2013-11-09 9 3 B D 2013-11-10 9 1 H G 2013-11-10 10 6 E E 2013-11-11 10 6 F E 2013-11-11 11 2 G B 2013-11-12 11 5 H H 2013-11-12 12 5 F G 2013-11-13 12 5 G B 2013-11-13 13 8 H B 2013-11-14 13 6 G F 2013-11-14 14 9 F C 2013-11-15 14 4 H A 2013-11-15 .. ... ... ... ... 77 9 A B 2014-01-17 77 7 E B 2014-01-17 78 4 F E 2014-01-18 78 6 B E 2014-01-18 79 6 A H 2014-01-19 79 3 G D 2014-01-19 80 7 E E 2014-01-20 80 6 G C 2014-01-20 81 9 H G 2014-01-21 81 9 C B 2014-01-21 82 2 D D 2014-01-22 82 7 D A 2014-01-22 83 6 G B 2014-01-23 83 1 A G 2014-01-23 84 9 B D 2014-01-24 84 7 G D 2014-01-24 85 7 A F 2014-01-25 85 9 B H 2014-01-25 86 9 C D 2014-01-26 86 5 E B 2014-01-26 87 3 C H 2014-01-27 87 7 F D 2014-01-27 88 3 D G 2014-01-28 88 4 A D 2014-01-28 89 2 F A 2014-01-29 89 8 D A 2014-01-29 90 1 A G 2014-01-30 90 6 C A 2014-01-30 91 6 H C 2014-01-31 91 2 G F 2014-01-31 [184 rows x 4 columns] I'd like to group by calendar-week and by value of col1. Like this: kw = lambda x: x.isocalendar()[1] grouped = df.groupby([df['date'].map(kw), 'col1'], sort=False).agg({'amount': 'sum'}) resulting in: In [58]: grouped Out[58]: amount date col1 44 D 8 E 10 G 5 H 4 45 D 15 E 1 G 1 H 9 A 13 C 5 B 4 F 8 46 E 7 G 13 H 17 B 9 F 23 47 G 14 H 4 A 40 C 7 B 16 F 13 48 D 7 E 16 G 9 H 2 A 7 C 7 B 2 ... ... 1 H 14 A 14 B 15 F 19 2 D 13 H 13 A 13 B 10 F 32 3 D 8 E 18 G 3 H 6 A 30 C 9 B 6 F 5 4 D 9 E 12 G 19 H 9 A 8 C 18 B 18 5 D 11 G 2 H 6 A 5 C 9 F 9 [87 rows x 1 columns] Then I want a plot to be generated like this: That means: calendar-week and year (datetime) on the x-axis and for each of the grouped col1 one bar. The problem I'm facing is: I only have integers describing the calendar week (KW in the plot), but I somehow have to merge back the date on it to get the ticks labeled by year as well. Furthermore I can't only plot the grouped calendar week because I need a correct order of the items (kw 47, kw 48 (year 2013) have to be on the left side of kw 1 (because this is 2014)). EDIT I figured out from here: http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-barplot that grouped bars need to be columns instead of rows. So I thought about how to transform the data and found the method pivot which turns out to be a great function. reset_index is needed to transform the multiindex into columns. At the end I fill NaNs by zero: A = grouped.reset_index().pivot(index='date', columns='col1', values='amount').fillna(0) transforms the data into: col1 A B C D E F G H date 1 4 31 0 0 0 18 13 8 2 0 12 13 22 1 17 0 8 3 3 10 4 13 12 8 7 6 4 17 0 10 7 0 25 7 4 5 7 0 7 9 8 6 0 7 44 0 0 2 11 7 0 0 2 45 9 3 2 14 0 16 21 2 46 0 14 7 2 17 13 11 8 47 5 13 0 15 19 7 5 10 48 15 8 12 2 20 4 7 6 49 20 0 0 18 22 17 11 0 50 7 11 8 6 5 6 13 10 51 8 26 0 0 5 5 16 9 52 8 13 7 5 4 10 0 11 which looks like the example data in the docs to be plotted in grouped bars: A. plot(kind='bar') gets this: whereas I have the problem with the axis as it is now sorted (from 1-52), which is actually wrong, because calendar week 52 belongs to year 2013 in this case... Any ideas on how to merge back the real datetime for the calendar-weeks and use them as x-axis ticks? 解决方案 Okay I answer the question myself as I finally figured it out. The key is to not group by calendar week (as you would loose information about the year) but rather group by a string containing calendar week and year. Then change the layout (reshaping) as mentioned in the question already by using pivot. The date will be the index. Use reset_index() to make the current date-index a column and instead get a integer-range as an index (which is then in the correct order to be plotted (lowest-year/calendar week is index 0 and highest year/calendar week is the highest integer). Select the date-column as a new variable ticks as a list and delete that column from the DataFrame. Now plot the bars and simply set the labels of the xticks to ticks. Completey solution is quite easy and here: codes = list('ABCDEFGH'); dates = pd.Series(pd.date_range('2013-11-01', '2014-01-31')); dates = dates.append(dates) dates.sort() df = pd.DataFrame({'amount': np.random.randint(1, 10, dates.size), 'col1': np.random.choice(codes, dates.size), 'col2': np.random.choice(codes, dates.size), 'date': dates}) kw = lambda x: x.isocalendar()[1]; kw_year = lambda x: str(x.year) + ' - ' + str(x.isocalendar()[1]) grouped = df.groupby([df['date'].map(kw_year), 'col1'], sort=False, as_index=False).agg({'amount': 'sum'}) A = grouped.pivot(index='date', columns='col1', values='amount').fillna(0).reset_index() ticks = A.date.values.tolist() del A['date'] ax = A.plot(kind='bar') ax.set_xticklabels(ticks) RESULT: 这篇关于 pandas ：按日历周分组，然后为实际日期时间绘制分组的条形图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

pandas ：按日历周分组，然后为实际日期时间绘制分组的条形图 [英] Pandas: Group by calendar-week, then plot grouped barplots for the real datetime

问题描述

编辑

编辑

EDIT

EDIT

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas ：按日历周分组，然后为实际日期时间绘制分组的条形图 [英] Pandas: Group by calendar-week, then plot grouped barplots for the real datetime

问题描述

编辑

编辑

EDIT

EDIT

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭