如何判断值是否在Pandas中的尺寸上发生了变化 [英] How to tell if a value changed over dimension(s) in Pandas

查看:194
本文介绍了如何判断值是否在Pandas中的尺寸上发生了变化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我在某些日期有一些客户数据,例如,我想看看他们的地址是否已更改.在这些日期.理想情况下,我想将发生更改的两列复制到新表中,或者只是获取总更改量的指标.

所以,如果我有一个像这样的表

Date , Customer , Address
12/31/14, Cust1, 12 Rocky Hill Rd
12/31/15, Cust1, 12 Rocky Hill Rd
12/31/16, Cust1, 14 Rocky Hill Rd
12/31/14, Cust2, 12 Testing Rd
12/31/15, Cust2, 12 Testing Ln
12/31/16, Cust2, 12 Testing Rd

最后我要进行两次更改,客户1在12/31/15和12/31/16之间在Rocky Hill Rd 12之间进行更改,而Cust2在12/31/14和12/31/15之间进行了更改. /p>

理想情况下,我可以得到一张这样的桌子

Dates , Customer , Change
12/31/15 to 12/31/16, Cust1, 12 Rocky Hill Rd to 14 Rocky Hill Rd
12/31/14 to 12/31/15, Cust2, 12 Testing Rd to 12 Testing Ln

或者仅是变更总数就可以了.有任何想法吗?理想情况下,我要有更多的日期,这些日期之间可能有多个更改,并且可能还要检查其他更改中的其他列.的确,只要在某个日期范围内每列的客户记录更改汇总即可.

我是Panda的新手,不确定要从哪里开始.

正如我在下面的解决方案中指出的那样,我希望能够传递一个更大的数据框,而不仅仅是一个地址来检测更改.例如,我在R中完成了以下操作: `在此处输入代码

`#How many changes have occured (unique values - 1)
UniLen <-  function(x){
  x <- length(unique(x))-1
  return(x)
}
#Create a vector of Address Features to check for changes in
Address_Features <- c("AddrLine1", "AddrLine2", "AddrLine3", "CityName", "State", "ZipCodeNum", "County")
#Check for changes in each address 'use this address for description' for each customer
AddressChanges_Detail <- mktData[,c("CustomerNumEID","AddressUniqueRelationDesc",Address_Features)] %>%
  group_by(CustomerNumEID, AddressUniqueRelationDesc) %>%
  summarise_each(funs(UniLen))

#Summarise results (how many changes for each feature)
AddressChanges_Summary <- AddressChanges_Detail[,Address_Features] %>%
  summarise_each(funs(sum))

这使我们能够计算发生了多少更改,但是我错过了更改发生的日期以及功能的更改日期和更改日期.似乎您建议的Python解决方案可以解决此问题使用.shift而不是某个组中唯一值的摘要.理想情况下,我希望两全其美:).

解决方案

df

输入数据框

    Date    Customer    Address
0   12/31/14    Cust1   12 Rocky Hill Rd
1   12/31/15    Cust1   12 Rocky Hill Rd
2   12/31/16    Cust1   14 Rocky Hill Rd
3   12/31/14    Cust2   12 Testing Rd
4   12/31/15    Cust2   12 Testing Ln
5   12/31/16    Cust2   12 Testing Rd

地址更改功能:

def changeAdd(x):
    x=x[x.Address != x.shift(-1).Address]
    df1 = pd.DataFrame({'Date':x.shift(1).Date + ' to '+ x.Date,
              'Customer':x.Customer.max(),
              'Address':x.shift(1).Address +' to ' + x.Address})
    return df1[df1.Address.notnull()]


dm = df.groupby('Customer')\
   .apply(changeAdd)\
   .reset_index(drop=True)[['Date','Customer','Address']]

dm

输出数据框:

Date    Customer    Address
0   12/31/15 to 12/31/16    Cust1   12 Rocky Hill Rd to 14 Rocky Hill Rd
1   12/31/14 to 12/31/15    Cust2   12 Testing Rd to 12 Testing Ln
2   12/31/15 to 12/31/16    Cust2   12 Testing Ln to 12 Testing Rd

Let's say that I have some customer data over some dates and I want to see if for example their address has changed. Over those dates. Ideally, i'd like to copy the two columns where the changes occurred into a new table or just get a metric for the amount of total changes.

So, if I had a table like

Date , Customer , Address
12/31/14, Cust1, 12 Rocky Hill Rd
12/31/15, Cust1, 12 Rocky Hill Rd
12/31/16, Cust1, 14 Rocky Hill Rd
12/31/14, Cust2, 12 Testing Rd
12/31/15, Cust2, 12 Testing Ln
12/31/16, Cust2, 12 Testing Rd

I would end up with a count of two changes, Customer 1's change between12 Rocky Hill Rd between 12/31/15 and 12/31/16 and Cust2's change between 12/31/14 and 12/31/15.

Ideally I could get a table like this

Dates , Customer , Change
12/31/15 to 12/31/16, Cust1, 12 Rocky Hill Rd to 14 Rocky Hill Rd
12/31/14 to 12/31/15, Cust2, 12 Testing Rd to 12 Testing Ln

Or even just a total count of changes would be great. Any ideas? Ideally, i'd have any more dates, possibly multiple changes across those dates, and potentially additional columns i'd like to be checking for changes in as well. Really, just a summation of changes to a customer record over some date period for each column would suffice.

I'm new to Panda's and not really sure exactly where to start on this.

Edit: As I note on the solution below, i'd like to be able to pass a larger dataframe with more than just an address to detect changes. In example, I've accomplished this in R with something like the following: `enter code here

`#How many changes have occured (unique values - 1)
UniLen <-  function(x){
  x <- length(unique(x))-1
  return(x)
}
#Create a vector of Address Features to check for changes in
Address_Features <- c("AddrLine1", "AddrLine2", "AddrLine3", "CityName", "State", "ZipCodeNum", "County")
#Check for changes in each address 'use this address for description' for each customer
AddressChanges_Detail <- mktData[,c("CustomerNumEID","AddressUniqueRelationDesc",Address_Features)] %>%
  group_by(CustomerNumEID, AddressUniqueRelationDesc) %>%
  summarise_each(funs(UniLen))

#Summarise results (how many changes for each feature)
AddressChanges_Summary <- AddressChanges_Detail[,Address_Features] %>%
  summarise_each(funs(sum))

This allows us to count how many changes occur, but i'm missing out on the date the change occurred and what the feature was changed from and to... It seems the Python solution you've proposed solves for that with the use of .shift instead of just a summary of unique values on some group. Ideally i'd like the best of both worlds :).

解决方案

df

Input dataframe

    Date    Customer    Address
0   12/31/14    Cust1   12 Rocky Hill Rd
1   12/31/15    Cust1   12 Rocky Hill Rd
2   12/31/16    Cust1   14 Rocky Hill Rd
3   12/31/14    Cust2   12 Testing Rd
4   12/31/15    Cust2   12 Testing Ln
5   12/31/16    Cust2   12 Testing Rd

Address change function:

def changeAdd(x):
    x=x[x.Address != x.shift(-1).Address]
    df1 = pd.DataFrame({'Date':x.shift(1).Date + ' to '+ x.Date,
              'Customer':x.Customer.max(),
              'Address':x.shift(1).Address +' to ' + x.Address})
    return df1[df1.Address.notnull()]


dm = df.groupby('Customer')\
   .apply(changeAdd)\
   .reset_index(drop=True)[['Date','Customer','Address']]

dm

Output dataframe:

Date    Customer    Address
0   12/31/15 to 12/31/16    Cust1   12 Rocky Hill Rd to 14 Rocky Hill Rd
1   12/31/14 to 12/31/15    Cust2   12 Testing Rd to 12 Testing Ln
2   12/31/15 to 12/31/16    Cust2   12 Testing Ln to 12 Testing Rd

这篇关于如何判断值是否在Pandas中的尺寸上发生了变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆