如何用字数和列名注释堆积的条形图 [英] How to annotate a stacked bar chart with word count and column name
问题描述
我的问题是关于在堆积的条形图中绘制单词频率,而不是在条形上带有标签的数字.假设我有这些话
My question is about plotting in a stacked bar plot the words frequency rather than numbers with labels on the bar. Let's suppose that I have these words
Date Text Count
01/01/2020 cura 25
destra 24
fino 18
guerra 13
americani 13
02/01/2020 italia 137
turismo 112
nuovi 109
pizza 84
moda 79
通过按日期分组并按Text
聚合创建,然后选择前5个(head(5)
):
created by grouping by date and aggregating by Text
, then selecting the top 5 (head(5)
):
尝试:
(我的尝试:这会生成一个堆叠图,但颜色和标签不是我想要的)
(my attempt: this generates a stacked plot, but colours and labels are not what I would like to expect)
data.groupby('Date').agg({'Text': 'value_counts'}).rename(columns={'Text': 'Count'}).groupby('Date').head(5).unstack().plot(kind='bar', stacked=True)
请求:我的预期输出将是一个条形图,其中在x轴上有日期,在y轴上有单词频率(同一日期的每个单词都应以不同的方式进行着色,例如在堆积图中,并且每个条形都应显示单词及其频率).
Request: My expected output would be a bar chart where on the x-axis there are the dates and on the y-axis the words frequency (each word on the same date should be coloured in a different way like in a stacked plot and each bar should show words and their frequency).
示例:请参阅下面的堆叠图示例,这将有助于解释我想做的事情(如果可能的话).在条形图中,不是数字 (340, 226,...),我想要上面的代码选择的最常用词的名称及其频率.在x轴上将显示我之前显示给您的日期,而不是年份(我在网上找不到更好的图表).第一个条形显示前 4 个单词(它们应该是 5 个,但我只找到了一个包含 4 个组的条形图)以及我希望如何可视化结果.对于图表的大小,您能记住我有200个日期吗?将其可视化会很有用.
Example: Please see below an example of stacked plot that it will be useful to explain what I would like to do (if it is possible). In the bars, instead of the numbers (340, 226,...), I would like to have the name of the top words selected by that code above and their frequency. On the x-axis there will be the date that I have shown you previously, not the year (I could not find a better plot on the web). The first bar shows the top 4 words (they should be 5 but I found only a bar chart with 4 groups) and how I would like to visualise the results. For the size of the chart, could you please keep in mind that I have 200 dates? It would be useful for visualising it.
如果您想向我展示如何做到这一点,即使使用另一个数据集,也很好.提前非常感谢您为我花费的时间.
If you would like to show me how to do it, even using another dataset, it would be great. Thank you so much in advance for the time you will spend helping me.
推荐答案
创建数据框
import pandas as pd
import matplotlib.pyplot as plt
# data and dataframe
data = {'Date': ['01/01/2020', '01/01/2020', '01/01/2020', '02/01/2020', '02/01/2020', '02/01/2020'],
'Text': [['cura']*25, ['destra']*24, ['fino']*18, ['italia']*137, ['turismo']*112, ['nuovi']*109]}
df = pd.DataFrame(data)
df = df.explode('Text')
df.Date = pd.to_datetime(df.Date)
groupby
和绘图
- 为了绘制单词,请注意每个日期行都将所有单词作为列.
- 即使有些字数为 0,绘图 API 仍包含该信息
- api会为所有日期绘制第一列,然后为所有日期绘制下一列,依此类推.
- 因此,用于文本注释的
cols
列表必须在df_gb
中存在的日期中重复每个单词. - 如果您需要使用
head()
,请将以下行替换为df_gb
:-
df_gb = df.groupby('Date').agg({'Text':'value_counts'}).rename(columns = {'Text':'Count'}).groupby('Date').head(2).unstack()
- In order to plot the words, note that each date row has all the words as columns.
- Even though some words are 0 count, the plotting api still includes that information
- The api plots the first column for all dates, then the next column for all dates, and so on.
- As such, the
cols
list, used for the text annotations, must have each word repeated for as many dates exist indf_gb
. - If you need to use
head()
, swap the following line fordf_gb
:df_gb = df.groupby('Date').agg({'Text': 'value_counts'}).rename(columns={'Text': 'Count'}).groupby('Date').head(2).unstack()
df_gb = df.groupby(['Date']).agg({'Text': 'value_counts'}).rename(columns={'Text': 'Count'}).unstack('Text') print(df_gb) Count Text cura destra fino italia nuovi turismo Date 2020-01-01 25.0 24.0 18.0 NaN NaN NaN 2020-02-01 NaN NaN NaN 137.0 109.0 112.0 # create list of words of appropriate length; all words repeat for each date cols = [x[1] for x in df_gb.columns for _ in range(len(df_gb))] # plot df_gb ax = df_gb.plot.bar(stacked=True) # annotate the bars for i, rect in enumerate(ax.patches): # Find where everything is located height = rect.get_height() width = rect.get_width() x = rect.get_x() y = rect.get_y() # The height of the bar is the count value and can used as the label label_text = f'{height:.0f}: {cols[i]}' label_x = x + width / 2 label_y = y + height / 2 # don't include label if it's equivalently 0 if height > 0.001: ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8) # rename xtick labels; remove time ticks, labels = plt.xticks(rotation=90) labels = [label.get_text()[:10] for label in labels] plt.xticks(ticks=ticks, labels=labels) ax.get_legend().remove() plt.show()
- 请参阅SO:如何注释堆积条形图的每一段?另一个示例.
- See SO: How to annotate each segment of a stacked bar chart? for another example.
这篇关于如何用字数和列名注释堆积的条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
groupby
and plot -