修改代码工作月和周,而不是年 [英] Modifying Code to work for Month and Week instead of Year
问题描述
我在一年的时间跨度中进行堆叠条形图,其中x轴是公司名称,y轴是通话次数,堆栈是月份。
我想能够让这个情节运行一个月的时间跨度,堆栈是天,时间跨度一周,其中堆栈是天由于我的代码已经在一年的时间内建成,所以我无法做到这一点。
我的输入原始输入是一个csv文件。我拉这两行:
CompanyName recvd_dttm
Company1 6/5/2015 18:28:50 PM
Company2 6/5/2015 14:25:43 PM
Company3 9/10/2015 21:45:12 PM
Company4 6/5/2015 14:30:43 PM
Company5 6/5/2015 14:32:33 PM
然后我做一个datatable看起来像这样
pivot_table.head(3)
输出[12]:
月1 2 3 4 5 6 7 8 9 10 11 12
公司名称
客户1 17 30 29 39 15 26 24 12 36 21 18 15
客户2 4 11 13 22 35 29 15 18 29 31 17 14
Customer3 11 8 25 24 7 15 20 0 21 12 12 17
我的代码到目前为止。
首先我抓住了一年的数据(我将这个更改为一个月或一周的这个问题)
#过滤器至少有一个奖牌和排序的国家
df ['recvd_dttm'] = pd.to_datetime(df ['recvd_dttm'])
#以前只检索数据(忽略打字错误未来日期)
mask = df ['recvd_dttm']< = datetime.datetime.now()
df = df.loc [mask]
#获取最后一周的第一个和最后一个datetime的数据
range_max = df ['recvd_dttm']。max()
range_min = range_max - pd.DateOffset(years = 1)
#数据的最后一周
df = df [(df ['recvd_dttm']> = range_min)&
(df ['recvd_dttm']< = range_max)]
然后我创建如上所示的pivot_table。
##################### #####################################
#Create Dataframe
## ################################################################################################## #######
df = df.set_index('recvd_dttm')
df.index = pd.to_datetime(df.index,format ='%m /%d / %Y%H:%M')
result = df.groupby([lambda idx:idx.month,'CompanyName'])。agg(len).reset_index()
result .columns = ['Month','CompanyName','NumberCalls']
pivot_table = result.pivot(index ='Month',columns ='CompanyName',values ='NumberCalls')fillna(0)
s = pivot_table.sum()。sort(ascending = False,inplace = False)
pivot_table = pivot_table.ix [:,s.index [:30]]
pivot_table = pivot_table.transpose )
pivot_table = pivot_table.reset_index()
pivot_table ['CompanyName'] = [在Axis_table ['CompanyName']中为x的str(x) ]]
Companies = list(pivot_table ['CompanyName'])
pivot_table = pivot_table.set_index('CompanyName')
pivot_table.to_csv('pivot_table.csv')
然后我使用数据透视表创建一个用于绘制的OrderedDict
####################################### ###################
#Create OrderedDict绘图
################# #############################################
个月= [pivot_table [(m)]。astype(float)。范围(1,13)中的m值
names = [Jan,Feb,Mar,Apr ,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec]
months_dict = OrderedDict(list(zip(names,months) ))
###################################### ##################
#Plot!
########################################################## #############
palette = brewer [RdYlGn] [8]
hover = HoverTool(
tooltips = [
(月,@months),
(电话号码,@NumberCalls),
]
)
output_file(stacked_bar.html)
bar = Bar(months_dict,Companies,title =每个月的通话次数,palette = palette,legend =top_right,width = 1200,height = = true)
bar.add_tools(hover)
显示(bar)
有没有人有关于如何处理修改此代码的想法,以便更短的时间跨度工作?我认为它将在OrderedDict部分中进行修改。可能使len(recvd_dttm)迭代?
一个月内的几天('2015-07'说)您可以更改
result = df.groupby([lambda idx:idx.month,'CompanyName'])。agg(len).reset_index()
像
月='2015-07'
result = df.loc [month] .groupby([lambda idx:idx.day,'CompanyName'])。agg(len).reset_index()
并将'Month'
替换为'Day'
以下。在这种情况下,您不需要打扰OrderedDict等,因为它们只是ints。一个星期你可以做
start,end ='2015-07-06','2015-07-12'
result = df.loc [start:end] .groupby(
[lambda idx:idx.dayofweek,'CompanyName'])。agg(len).reset_index()
I am making a stacked bar plot over a year time span where the x-axis is company names, y-axis is the number of calls, and the stacks are the months.
I want to be able to make this plot run for a time span of a month, where the stacks are days, and a time span of a week, where the stacks are days. I am having trouble doing this since my code is built already around the year time span.
My input original input is a csv file. I am pulling two rows like this:
CompanyName recvd_dttm
Company1 6/5/2015 18:28:50 PM
Company2 6/5/2015 14:25:43 PM
Company3 9/10/2015 21:45:12 PM
Company4 6/5/2015 14:30:43 PM
Company5 6/5/2015 14:32:33 PM
Then I make a datatable that looks like this
pivot_table.head(3)
Out[12]:
Month 1 2 3 4 5 6 7 8 9 10 11 12
CompanyName
Customer1 17 30 29 39 15 26 24 12 36 21 18 15
Customer2 4 11 13 22 35 29 15 18 29 31 17 14
Customer3 11 8 25 24 7 15 20 0 21 12 12 17
and my code is this so far.
First I grab a years worth of data (I would change this to a month or a week for this question)
# filter by countries with at least one medal and sort
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data
range_max = df['recvd_dttm'].max()
range_min = range_max - pd.DateOffset(years=1)
# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) &
(df['recvd_dttm'] <= range_max)]
Then I create the pivot_table shown above.
###########################################################
#Create Dataframe
###########################################################
df = df.set_index('recvd_dttm')
df.index = pd.to_datetime(df.index, format='%m/%d/%Y %H:%M')
result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()
result.columns = ['Month', 'CompanyName', 'NumberCalls']
pivot_table = result.pivot(index='Month', columns='CompanyName', values='NumberCalls').fillna(0)
s = pivot_table.sum().sort(ascending=False,inplace=False)
pivot_table = pivot_table.ix[:,s.index[:30]]
pivot_table = pivot_table.transpose()
pivot_table = pivot_table.reset_index()
pivot_table['CompanyName'] = [str(x) for x in pivot_table['CompanyName']]
Companies = list(pivot_table['CompanyName'])
pivot_table = pivot_table.set_index('CompanyName')
pivot_table.to_csv('pivot_table.csv')
Then I use the pivot table to create an OrderedDict for Plotting
###########################################################
#Create OrderedDict for plotting
###########################################################
months = [pivot_table[(m)].astype(float).values for m in range(1, 13)]
names = ["Jan", "Feb", "Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov", "Dec"]
months_dict = OrderedDict(list(zip(names, months)))
###########################################################
#Plot!
###########################################################
palette = brewer["RdYlGn"][8]
hover = HoverTool(
tooltips = [
("Month", "@months"),
("Number of Calls", "@NumberCalls"),
]
)
output_file("stacked_bar.html")
bar = Bar(months_dict, Companies, title="Number of Calls Each Month", palette = palette, legend = "top_right", width = 1200, height=900, stacked=True)
bar.add_tools(hover)
show(bar)
Does anyone have ideas on how to approach modifying this code so it can work for shorter time spans? I am thinking that it will be modification in the OrderedDict section. Possibly making len(recvd_dttm) to iterate over?
For days in a month ('2015-07' say) You could change
result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()
to something like
month = '2015-07'
result = df.loc[month].groupby([lambda idx: idx.day, 'CompanyName']).agg(len).reset_index()
And replace 'Month'
with 'Day'
below. You wouldn't have to bother with the OrderedDict etc. in this case as they are just ints. For a week you could do
start, end = '2015-07-06', '2015-07-12'
result = df.loc[start: end].groupby(
[lambda idx: idx.dayofweek, 'CompanyName']).agg(len).reset_index()
这篇关于修改代码工作月和周,而不是年的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!