修改代码工作月和周，而不是年 [英] Modifying Code to work for Month and Week instead of Year

查看：211 发布时间：2017/4/15 16:37:25 python datetime for-loop pandas date-range

本文介绍了修改代码工作月和周，而不是年的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在一年的时间跨度中进行堆叠条形图，其中x轴是公司名称，y轴是通话次数，堆栈是月份。

我想能够让这个情节运行一个月的时间跨度，堆栈是天，时间跨度一周，其中堆栈是天由于我的代码已经在一年的时间内建成，所以我无法做到这一点。

我的输入原始输入是一个csv文件。我拉这两行：

  CompanyName recvd_dttm 
 Company1 6/5/2015 18:28:50 PM 
 Company2 6/5/2015 14:25:43 PM 
 Company3 9/10/2015 21:45:12 PM 
 Company4 6/5/2015 14:30:43 PM 
 Company5 6/5/2015 14:32:33 PM

然后我做一个datatable看起来像这样

  pivot_table.head（3）
输出[12]：
月1 2 3 4 5 6 7 8 9 10 11 12 
公司名称
客户1 17 30 29 39 15 26 24 12 36 21 18 15 
客户2 4 11 13 22 35 29 15 18 29 31 17 14 
 Customer3 11 8 25 24 7 15 20 0 21 12 12 17

我的代码到目前为止。

首先我抓住了一年的数据（我将这个更改为一个月或一周的这个问题）

 ＃过滤器至少有一个奖牌和排序的国家
 df ['recvd_dttm'] = pd.to_datetime（df ['recvd_dttm']）
 
＃以前只检索数据（忽略打字错误未来日期）
 mask = df ['recvd_dttm']< = datetime.datetime.now（）
 df = df.loc [mask] 
＃获取最后一周的第一个和最后一个datetime的数据
 
 range_max = df ['recvd_dttm']。max（）
 range_min = range_max  -  pd.DateOffset（years = 1）
 
＃数据的最后一周
 df = df [（df ['recvd_dttm']> = range_min）& 
（df ['recvd_dttm']< = range_max）]

然后我创建如上所示的pivot_table。

  ##################### ##################################### 
 #Create Dataframe 
 ## ################################################################################################## ####### 
 
 df = df.set_index（'recvd_dttm'）
 df.index = pd.to_datetime（df.index，format ='％m /％d / ％Y％H：％M'）
 
 result = df.groupby（[lambda idx：idx.month，'CompanyName']）。agg（len）.reset_index（）
 result .columns = ['Month'，'CompanyName'，'NumberCalls'] 
 pivot_table = result.pivot（index ='Month'，columns ='CompanyName'，values ='NumberCalls'）fillna（0）
s = pivot_table.sum（）。sort（ascending = False，inplace = False）
 pivot_table = pivot_table.ix [：，s.index [：30]] 
 pivot_table = pivot_table.transpose ）
 pivot_table = pivot_table.reset_index（）
 pivot_table ['CompanyName'] = [在Axis_table ['CompanyName']中为x的str（x） ]] 
 Companies = list（pivot_table ['CompanyName']）
 pivot_table = pivot_table.set_index（'CompanyName'）
 pivot_table.to_csv（'pivot_table.csv'）

然后我使用数据透视表创建一个用于绘制的OrderedDict

  ####################################### ################### 
 #Create OrderedDict绘图
 ################# ############################################# 
 
 
个月= [pivot_table [（m）]。astype（float）。范围（1,13）中的m值
 names = [Jan，Feb，Mar，Apr ，May，Jun，Jul，Aug，Sep，Oct，Nov，Dec] 
 months_dict = OrderedDict（list（zip（names，months） ））
 
 ###################################### ################## 
 #Plot！ 
 ########################################################## ############# 
 
 
 palette = brewer [RdYlGn] [8] 
 
 hover = HoverTool（
 tooltips = [
（月，@months），
（电话号码，@NumberCalls），
] 
）
 output_file（stacked_bar.html）
 bar = Bar（months_dict，Companies，title =每个月的通话次数，palette = palette，legend =top_right，width = 1200，height = = true）
 bar.add_tools（hover）
 
 
显示（bar）

有没有人有关于如何处理修改此代码的想法，以便更短的时间跨度工作？我认为它将在OrderedDict部分中进行修改。可能使len（recvd_dttm）迭代？

解决方案

一个月内的几天（'2015-07'说）您可以更改

  result = df.groupby（[lambda idx：idx.month，'CompanyName']）。agg（len）.reset_index（）

像

 月='2015-07'
 result = df.loc [month] .groupby（[lambda idx：idx.day，'CompanyName']）。agg（len）.reset_index（）

并将'Month'替换为'Day' 以下。在这种情况下，您不需要打扰OrderedDict等，因为它们只是ints。一个星期你可以做

  start，end ='2015-07-06'，'2015-07-12'
 result = df.loc [start：end] .groupby（
 [lambda idx：idx.dayofweek，'CompanyName']）。agg（len）.reset_index（）

I am making a stacked bar plot over a year time span where the x-axis is company names, y-axis is the number of calls, and the stacks are the months.

I want to be able to make this plot run for a time span of a month, where the stacks are days, and a time span of a week, where the stacks are days. I am having trouble doing this since my code is built already around the year time span.

My input original input is a csv file. I am pulling two rows like this:

CompanyName     recvd_dttm
Company1        6/5/2015 18:28:50 PM
Company2        6/5/2015 14:25:43 PM
Company3        9/10/2015 21:45:12 PM
Company4        6/5/2015 14:30:43 PM
Company5        6/5/2015 14:32:33 PM

Then I make a datatable that looks like this

pivot_table.head(3)
Out[12]: 
Month       1   2   3   4   5   6   7   8   9   10  11   12 
CompanyName                                                                     
Customer1   17  30  29  39  15  26  24  12  36  21  18   15  
Customer2   4   11  13  22  35  29  15  18  29  31  17   14
Customer3   11   8  25  24   7  15  20   0  21  12  12   17

and my code is this so far.

First I grab a years worth of data (I would change this to a month or a week for this question)

# filter by countries with at least one medal and sort
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])

#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data

range_max = df['recvd_dttm'].max()
range_min = range_max - pd.DateOffset(years=1)

# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) & 
               (df['recvd_dttm'] <= range_max)]

Then I create the pivot_table shown above.

###########################################################
#Create Dataframe
###########################################################

df = df.set_index('recvd_dttm')
df.index = pd.to_datetime(df.index, format='%m/%d/%Y %H:%M')

result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()
result.columns = ['Month', 'CompanyName', 'NumberCalls']
pivot_table = result.pivot(index='Month', columns='CompanyName', values='NumberCalls').fillna(0)
s = pivot_table.sum().sort(ascending=False,inplace=False)
pivot_table = pivot_table.ix[:,s.index[:30]]
pivot_table = pivot_table.transpose()
pivot_table = pivot_table.reset_index()
pivot_table['CompanyName'] = [str(x) for x in pivot_table['CompanyName']]
Companies = list(pivot_table['CompanyName'])
pivot_table = pivot_table.set_index('CompanyName')
pivot_table.to_csv('pivot_table.csv')

Then I use the pivot table to create an OrderedDict for Plotting

###########################################################
#Create OrderedDict for plotting
###########################################################


months = [pivot_table[(m)].astype(float).values for m in range(1, 13)]
names = ["Jan", "Feb", "Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov", "Dec"]
months_dict = OrderedDict(list(zip(names, months)))

###########################################################
#Plot!
###########################################################


palette = brewer["RdYlGn"][8]

hover = HoverTool(
    tooltips = [
        ("Month", "@months"),
        ("Number of Calls", "@NumberCalls"),
        ]
)
output_file("stacked_bar.html")
bar = Bar(months_dict, Companies, title="Number of Calls Each Month", palette = palette, legend = "top_right", width = 1200, height=900, stacked=True)
bar.add_tools(hover)


show(bar)

Does anyone have ideas on how to approach modifying this code so it can work for shorter time spans? I am thinking that it will be modification in the OrderedDict section. Possibly making len(recvd_dttm) to iterate over?

解决方案

For days in a month ('2015-07' say) You could change

result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()

to something like

month = '2015-07'
result = df.loc[month].groupby([lambda idx: idx.day, 'CompanyName']).agg(len).reset_index()

And replace 'Month' with 'Day' below. You wouldn't have to bother with the OrderedDict etc. in this case as they are just ints. For a week you could do

start, end = '2015-07-06', '2015-07-12'
result = df.loc[start: end].groupby(
            [lambda idx: idx.dayofweek, 'CompanyName']).agg(len).reset_index()

这篇关于修改代码工作月和周，而不是年的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

修改代码工作月和周，而不是年 [英] Modifying Code to work for Month and Week instead of Year

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

修改代码工作月和周，而不是年 [英] Modifying Code to work for Month and Week instead of Year

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭