跟踪字段变化的函数 [英] function to track the changes in a field
问题描述
我需要一个函数(使用基本 SAS 或 RStudio),使我能够确定某个日期的 ID 号和起始日期的原始(根)ID 号.数据集包括旧 ID、新 ID 和 ID 更改日期.示例数据:
I need a function (using base SAS or RStudio) that will enable me to determine the ID numbers as of a certain date and the original (root) ID numbers as of the start date. The dataset includes the old ID, the new ID, and the date the ID changed. Example data:
OldID | NewID | 更改日期 |
---|---|---|
1 | 2 | 1/1/10 |
10 | 11 | 1/1/10 |
2 | 3 | 7/1/10 |
3 | 4 | 7/10/10 |
11 | 12 | 8/1/10 |
我需要知道截至 2010 年 7 月 15 日的 ID 号和原始(根)ID(截至 2010 年 1 月 1 日).输出应如下所示:
I need to know the ID numbers as of 7/15/10 and the original (root) ID (as of 1/1/10). The output should look like this:
OrigID | LastID |
---|---|
1 | 4 |
10 | 11 |
然后我需要一个标志来帮助我计算在给定时间间隔(在本例中为 2010 年 1 月 1 日到 2010 年 7 月 15 日)更改的 OrigID 的数量.我还需要对 2010 年 7 月 15 日之后的多个日期进行类似的计数.
I will then need a flag that will help me count the number of OrigID's that changed over the given time interval (in this case, 1/1/10 to 7/15/10). I need to do similar counts for multiple dates after 7/15/10 as well.
基础 SAS 或 RStudio 中是否有可以执行此操作的函数?
Is there a function in base SAS or RStudio that can do this?
我研究的 SAS/R 中的函数(分层记录器、同步跟踪、序列跟踪函数)似乎不会起作用(例如,logger、lumberjack、log4r、validate、futile.logger)
It doesn't appear that the functions in SAS/R I researched (hierarchic loggers, synchronous tracking, sequence tracking functions) will work (e.g., logger, lumberjack, log4r, validate, futile.logger)
推荐答案
这应该可以,我只是懒得输入正确的日期.
This should work, I was just too lazy to type proper dates.
注意:这假设数据按更改发生排序.
Note : this assumes the data is sorted by change occurence.
数据
df <- data.frame(
OldID = c(1, 10, 2, 3, 11), NewID = c(2, 11, 3, 4, 12), ChangeDate = c(1, 1, 2, 2, 3))
df
#> OldID NewID ChangeDate
#> 1 1 2 1
#> 2 10 11 1
#> 3 2 3 2
#> 4 3 4 2
#> 5 11 12 3
功能
process <- function(df, from, to) {
process0 <- function(df, i = 1){
# fetch new value
new <- df$NewID[i]
# check in old column
j <- match(new, df$OldID)
if(is.na(j)) {
# if not matched, set i to next row
i <- i + 1
} else {
# else we update current row with new "new" value
df$NewID[i] <- df$NewID[j]
# and increment the changes
df$Changes[i] <- df$Changes[i] + 1
# and remove obsolete row
df <- df[-j,]
}
# do it all over again except if there is no next row
if(i <= nrow(df)) process0(df, i) else df
}
# filter data frame
df <- subset(df, ChangeDate >= from & ChangeDate <= to, select = c("OldID", "NewID"))
# start with 1 change per line
df$Changes <- 1
# run recursive function
process0(df)
}
结果
process(df, 1, 2)
#> OldID NewID Changes
#> 1 1 4 3
#> 2 10 11 1
由 reprex 包 (v0.3.0) 于 2021-06-09 创建支持>
Created on 2021-06-09 by the reprex package (v0.3.0)
这篇关于跟踪字段变化的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!