如何跟踪 pandas 数据框中的前一个日期记录列? [英] How to keep track of previous date record column in pandas dataframe?

查看:68
本文介绍了如何跟踪 pandas 数据框中的前一个日期记录列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是从我想对pandas Dataframe进行一些数据分析.我有一个如下数据框:

I want to perform some data analysis on pandas Dataframe. I have one dataframe like below:

                   derived_symbol  sport_name person_name      city  \
0       football.RAM.mumbai.ram_count    football         RAM    mumbai   
1       football.RAM.mumbai.mum_count    football         RAM    mumbai   
2        football.RAM.delhi.mum_count    football         RAM     delhi   
3        football.RAM.delhi.ram_count    football         RAM     delhi   
4       football.RAM.mumbai.ram_count    football         RAM    mumbai   
5       football.RAM.mumbai.mum_count    football         RAM    mumbai   
6        football.RAM.delhi.mum_count    football         RAM     delhi   
7        football.RAM.delhi.ram_count    football         RAM     delhi   
8       basketball.MAH.pune.mah_count  basketball         MAH      pune   
9     basketball.MAH.nagpur.mah_count  basketball         MAH    nagpur   
10     basketball.MAH.TOTAL.mah_count  basketball         MAH  No Entry   
11  basketball.MAH.TOTAL.nagpur_count  basketball         MAH    nagpur   
12    basketball.MAH.TOTAL.pune_count  basketball         MAH      pune   
13     football.RAM.TOTAL.delhi_count    football         RAM     delhi   
14     football.RAM.TOTAL.delhi_count    football         RAM     delhi   
15       football.RAM.TOTAL.mum_count    football         RAM  No Entry   
16       football.RAM.TOTAL.mum_count    football         RAM  No Entry   
17    football.RAM.TOTAL.mumbai_count    football         RAM    mumbai   
18    football.RAM.TOTAL.mumbai_count    football         RAM    mumbai   
19       football.RAM.TOTAL.ram_count    football         RAM  No Entry   
20       football.RAM.TOTAL.ram_count    football         RAM  No Entry   

   person_symbol       month sir  person_count  
0            ram  2017-01-23   a            10  
1            mum  2017-01-23   a            14  
2            mum  2017-01-23   a            25  
3            ram  2017-01-23   a            20  
4            ram  2017-02-22   b            34  
5            mum  2017-02-22   b            23  
6            mum  2017-02-22   b            43  
7            ram  2017-02-22   b            34  
8            mah  2017-03-03   c            10  
9            mah  2017-03-03   c            20  
10           mah  2017-03-03   c            30  
11      No Entry  2017-03-03   c            20  
12      No Entry  2017-03-03   c            10  
13      No Entry  2017-01-23   a            45  
14      No Entry  2017-02-22   b            77  
15           mum  2017-01-23   a            39  
16           mum  2017-02-22   b            66  
17      No Entry  2017-01-23   a            24  
18      No Entry  2017-02-22   b            57  
19           ram  2017-01-23   a            30  
20           ram  2017-02-22   b            68

我想向此数据框添加previous_person_count列.此数据框的月"列包含"yyyy-mm-dd"格式的日期.因此,我们需要查看月份,即"mm"字段,以确定它是哪个月份.

I want to add previous_person_count column to this Dataframe. "month" column of this dataframe contains date in the format of "yyyy-mm-dd". So we need to look at the month i.e. "mm" field to identify which month it is.

看到这个月,我们需要将"person_count"值放入下个月的"previous_person_count"值.

By seeing this month we need to put "person_count" value into the next month's "previous_person_count" value.

超量输出:

                derived_symbol  sport_name person_name      city  \
0       football.RAM.mumbai.ram_count    football         RAM    mumbai   
1       football.RAM.mumbai.mum_count    football         RAM    mumbai   
2        football.RAM.delhi.mum_count    football         RAM     delhi   
3        football.RAM.delhi.ram_count    football         RAM     delhi   
4       football.RAM.mumbai.ram_count    football         RAM    mumbai   
5       football.RAM.mumbai.mum_count    football         RAM    mumbai   
6        football.RAM.delhi.mum_count    football         RAM     delhi   
7        football.RAM.delhi.ram_count    football         RAM     delhi   
8       basketball.MAH.pune.mah_count  basketball         MAH      pune   
9     basketball.MAH.nagpur.mah_count  basketball         MAH    nagpur   
10     basketball.MAH.TOTAL.mah_count  basketball         MAH  No Entry   
11  basketball.MAH.TOTAL.nagpur_count  basketball         MAH    nagpur   
12    basketball.MAH.TOTAL.pune_count  basketball         MAH      pune   
13     football.RAM.TOTAL.delhi_count    football         RAM     delhi   
14     football.RAM.TOTAL.delhi_count    football         RAM     delhi   
15       football.RAM.TOTAL.mum_count    football         RAM  No Entry   
16       football.RAM.TOTAL.mum_count    football         RAM  No Entry   
17    football.RAM.TOTAL.mumbai_count    football         RAM    mumbai   
18    football.RAM.TOTAL.mumbai_count    football         RAM    mumbai   
19       football.RAM.TOTAL.ram_count    football         RAM  No Entry   
20       football.RAM.TOTAL.ram_count    football         RAM  No Entry   

   person_symbol       month sir  person_count      previous_person_count
0            ram  2017-01-23   a            10      0
1            mum  2017-01-23   a            14      0
2            mum  2017-01-23   a            25      0
3            ram  2017-01-23   a            20      0
4            ram  2017-02-22   b            34      10
5            mum  2017-02-22   b            23      14
6            mum  2017-02-22   b            43      25
7            ram  2017-02-22   b            34      20
8            mah  2017-03-03   c            10      0
9            mah  2017-03-03   c            20      0
10           mah  2017-03-03   c            30      0
11      No Entry  2017-03-03   c            20      0
12      No Entry  2017-03-03   c            10      0
13      No Entry  2017-01-23   a            45      0
14      No Entry  2017-02-22   b            77      45
15           mum  2017-01-23   a            39      0
16           mum  2017-02-22   b            66      39
17      No Entry  2017-01-23   a            24      0
18      No Entry  2017-02-22   b            57      24
19           ram  2017-01-23   a            30      0
20           ram  2017-02-22   b            68      30

编辑参考代码:

df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football','basketball','basketball'],
           'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','mahesh','mahesh'],
               'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi','pune','nagpur'],
        'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram','mah','mah'],
        'person_count': ['10','14','25','20','34','23','43','34','10','20'],
        'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-22','2017-02-22','2017-02-22','2017-02-22','2017-03-03','2017-03-03'],
        'sir': ['a','a','a','a','b','b','b','b','c','c']})

df = df[['sport_name','person_name','city','person_symbol','person_count','month','sir']]

df['person_name'] = df['person_name'].apply(symbology)

df['person_count'] = df['person_count'].astype(int)

print df
df1=df.set_index(['sport_name','person_name','person_count','month','sir']).stack().reset_index(name='val')

df1['derived_symbol'] = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count'

df2 = df1.groupby(['derived_symbol','month','sir','sport_name','person_name','level_5','val'])['person_count'].sum().reset_index(name='person_count')

df3 = df2.set_index(['derived_symbol','month','sir','sport_name','person_name','person_count','level_5'])['val'].unstack().fillna('No Entry').rename_axis(None, 1).reset_index()

df['derived_symbol'] = df['sport_name'] + '.' + df['person_name'] + '.' + df['city'] + "."+ df['person_symbol'] + '_count'
df4 = pd.concat([df, df3]).reset_index(None)
print df3
del df4['index']
df4 = df4[['derived_symbol','sport_name','person_name','city','person_symbol','month','sir','person_count']]
print df4


便利:


Convenience:

d = {'city': {0: 'mumbai',
  1: 'mumbai',
  2: 'delhi',
  3: 'delhi',
  4: 'mumbai',
  5: 'mumbai',
  6: 'delhi',
  7: 'delhi',
  8: 'pune',
  9: 'nagpur',
  10: 'No Entry',
  11: 'nagpur',
  12: 'pune',
  13: 'delhi',
  14: 'delhi',
  15: 'No Entry',
  16: 'No Entry',
  17: 'mumbai',
  18: 'mumbai',
  19: 'No Entry',
  20: 'No Entry'},
 'derived_symbol': {0: 'football.RAM.mumbai.ram_count',
  1: 'football.RAM.mumbai.mum_count',
  2: 'football.RAM.delhi.mum_count',
  3: 'football.RAM.delhi.ram_count',
  4: 'football.RAM.mumbai.ram_count',
  5: 'football.RAM.mumbai.mum_count',
  6: 'football.RAM.delhi.mum_count',
  7: 'football.RAM.delhi.ram_count',
  8: 'basketball.MAH.pune.mah_count',
  9: 'basketball.MAH.nagpur.mah_count',
  10: 'basketball.MAH.TOTAL.mah_count',
  11: 'basketball.MAH.TOTAL.nagpur_count',
  12: 'basketball.MAH.TOTAL.pune_count',
  13: 'football.RAM.TOTAL.delhi_count',
  14: 'football.RAM.TOTAL.delhi_count',
  15: 'football.RAM.TOTAL.mum_count',
  16: 'football.RAM.TOTAL.mum_count',
  17: 'football.RAM.TOTAL.mumbai_count',
  18: 'football.RAM.TOTAL.mumbai_count',
  19: 'football.RAM.TOTAL.ram_count',
  20: 'football.RAM.TOTAL.ram_count'},
 'month': {0: '2017-01-23',
  1: '2017-01-23',
  2: '2017-01-23',
  3: '2017-01-23',
  4: '2017-02-22',
  5: '2017-02-22',
  6: '2017-02-22',
  7: '2017-02-22',
  8: '2017-03-03',
  9: '2017-03-03',
  10: '2017-03-03',
  11: '2017-03-03',
  12: '2017-03-03',
  13: '2017-01-23',
  14: '2017-02-22',
  15: '2017-01-23',
  16: '2017-02-22',
  17: '2017-01-23',
  18: '2017-02-22',
  19: '2017-01-23',
  20: '2017-02-22'},
 'person_count': {0: 10,
  1: 14,
  2: 25,
  3: 20,
  4: 34,
  5: 23,
  6: 43,
  7: 34,
  8: 10,
  9: 20,
  10: 30,
  11: 20,
  12: 10,
  13: 45,
  14: 77,
  15: 39,
  16: 66,
  17: 24,
  18: 57,
  19: 30,
  20: 68},
 'person_name': {0: 'RAM',
  1: 'RAM',
  2: 'RAM',
  3: 'RAM',
  4: 'RAM',
  5: 'RAM',
  6: 'RAM',
  7: 'RAM',
  8: 'MAH',
  9: 'MAH',
  10: 'MAH',
  11: 'MAH',
  12: 'MAH',
  13: 'RAM',
  14: 'RAM',
  15: 'RAM',
  16: 'RAM',
  17: 'RAM',
  18: 'RAM',
  19: 'RAM',
  20: 'RAM'},
 'person_symbol': {0: 'ram',
  1: 'mum',
  2: 'mum',
  3: 'ram',
  4: 'ram',
  5: 'mum',
  6: 'mum',
  7: 'ram',
  8: 'mah',
  9: 'mah',
  10: 'mah',
  11: 'No Entry',
  12: 'No Entry',
  13: 'No Entry',
  14: 'No Entry',
  15: 'mum',
  16: 'mum',
  17: 'No Entry',
  18: 'No Entry',
  19: 'ram',
  20: 'ram'},
 'sir': {0: 'a',
  1: 'a',
  2: 'a',
  3: 'a',
  4: 'b',
  5: 'b',
  6: 'b',
  7: 'b',
  8: 'c',
  9: 'c',
  10: 'c',
  11: 'c',
  12: 'c',
  13: 'a',
  14: 'b',
  15: 'a',
  16: 'b',
  17: 'a',
  18: 'b',
  19: 'a',
  20: 'b'},
 'sport_name': {0: 'football',
  1: 'football',
  2: 'football',
  3: 'football',
  4: 'football',
  5: 'football',
  6: 'football',
  7: 'football',
  8: 'basketball',
  9: 'basketball',
  10: 'basketball',
  11: 'basketball',
  12: 'basketball',
  13: 'football',
  14: 'football',
  15: 'football',
  16: 'football',
  17: 'football',
  18: 'football',
  19: 'football',
  20: 'football'}}

推荐答案

您可以做的是

What you can do is merge the dataframe into itself, after computing the month number (from the date), and the previous one as well.

让我们开始计算这两个值.为了方便起见,我首先将原始的month字符串值转换为datetime,这使我可以使用

Let's start with computing those 2 values. For convenience purposes, I firstly converted the raw month string value to datetime, which allowed me to use relativedelta to compute the previous month. This ensures behaviour is correct, even after a change of year.

In [7]: df['month'] = pd.to_datetime(df['month'])

In [8]: df['month_num'] = df['month'].apply(lambda x: x.strftime('%Y-%m'))

In [9]: from dateutil.relativedelta import relativedelta

In [10]: df['previous_month_num'] = df['month'].apply(lambda x: (x + relativedelta(months=-1)).strftime('%Y-%m'))

In [11]: df
Out[11]:
     city      month person_count person_name person_symbol sir  sport_name  \
0  mumbai 2017-01-23           10      ramesh           ram   a    football
1  mumbai 2017-01-23           14      ramesh           mum   a    football
2   delhi 2017-01-23           25      ramesh           mum   a    football
3   delhi 2017-01-23           20      ramesh           ram   a    football
4  mumbai 2017-02-22           34      ramesh           ram   b    football
5  mumbai 2017-02-22           23      ramesh           mum   b    football
6   delhi 2017-02-22           43      ramesh           mum   b    football
7   delhi 2017-02-22           34      ramesh           ram   b    football
8    pune 2017-03-03           10      mahesh           mah   c  basketball
9  nagpur 2017-03-03           20      mahesh           mah   c  basketball

  month_num previous_month_num
0   2017-01            2016-12
1   2017-01            2016-12
2   2017-01            2016-12
3   2017-01            2016-12
4   2017-02            2017-01
5   2017-02            2017-01
6   2017-02            2017-01
7   2017-02            2017-01
8   2017-03            2017-02
9   2017-03            2017-02

然后,我们可以使用计算的月份值作为合并键,将数据框合并到自身中:

We can then merge the dataframe into itself, using the computed month values as merging keys:

In [12]: relevant_columns = ['city', 'person_symbol', 'sport_name']

In [13]: pd.merge(df, df, left_on=relevant_columns + ['previous_month_num'], right_on=rele
    ...: vant_columns + ['month_num'], how='left', suffixes=('', '_previous'))[list(df.col
    ...: umns) + ['person_count_previous']].fillna(0).drop(['month_num', 'previous_month_n
    ...: um'], axis=1)
Out[13]:
     city      month person_count person_name person_symbol sir  sport_name  \
0  mumbai 2017-01-23           10      ramesh           ram   a    football
1  mumbai 2017-01-23           14      ramesh           mum   a    football
2   delhi 2017-01-23           25      ramesh           mum   a    football
3   delhi 2017-01-23           20      ramesh           ram   a    football
4  mumbai 2017-02-22           34      ramesh           ram   b    football
5  mumbai 2017-02-22           23      ramesh           mum   b    football
6   delhi 2017-02-22           43      ramesh           mum   b    football
7   delhi 2017-02-22           34      ramesh           ram   b    football
8    pune 2017-03-03           10      mahesh           mah   c  basketball
9  nagpur 2017-03-03           20      mahesh           mah   c  basketball

  person_count_previous
0                     0
1                     0
2                     0
3                     0
4                    10
5                    14
6                    25
7                    20
8                     0
9                     0

一些评论:

  • I used ['city', 'person_symbol', 'sport_name'] as the reference columns, but feel free to add some more, depending on what exactly you want to achieve.
  • The new column is named person_count_previous, but you can rename it, should it be best for you.
  • By default, when there is no match for the previous count, the column will be NaN. I replaced the values with 0, thanks to fillna.
  • I removed the "temporary" columns using drop, but feel free to keep them.

这篇关于如何跟踪 pandas 数据框中的前一个日期记录列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆