R：在数据帧B中填充行之前的日期使用来自数据帧A的值 [英] R: Using values from data frame A from a date prior to populate a row in data frame B

查看：95 发布时间：2017/3/26 2:07:10 r join dataframe

本文介绍了R：在数据帧B中填充行之前的日期使用来自数据帧A的值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这可能非常复杂，我怀疑需要高级知识。我现在有两种不同类型的数据。我需要组合：

数据：

数据帧A ：

按患者ID列出所有输血日期。每次输血均由单独的一行表示，患者可以多次输血。不同的病人在同一天可以输血。

 患者ID Transfusion.Date 
 1 01/01/2000 
 1 01/30/2000 
 2 04/01/2003 
 3 04/01/2003

B类包含其他日期的测试结果，也包括患者ID：

 患者ID Test.Date Test.Value 
 1 11/30/1999负
 1 01/15/2000 700份/ uL 
 1 01/27/2000 900份/ uL 
 2 03/30/2003负

我想要的是数据帧A具有相同的行数（每次输注1个），并且最近的Test.Value作为一个单独的列。每个输血日期应具有最接近输血的测试结果（之前）。

所需输出：

- >

< p $ p>

患者ID Transfusion.Date Pre.Transfusion.Test 
 1 01/01/2000负
 1 01/30/2000 900份/ ul 
 2 04/01/2003负
 3 04/01/2003 NA

我认为一般策略将是按照患者ID对数据框架进行子集。然后取病人1的所有输血日期，检查哪个结果最接近每个元素的所有可用测试日期，然后返回最接近的值。

我如何解释R

编辑1 ：以下是这些示例的R代码

  df_A<  -  data.frame（MRN = c（1,1,2,3），
 Transfusion.Date = as.Date（c（'01 / 01 / 2000'，'01 / 30/2000'，
 '04 / 01/2003'，'04/01/2003'），'％m /％d /％Y'））
 
 df_B<  -  data.frame（MRN = c（1,1,1,2），
 Test.Date = as.Date（c（'11 / 30/1999'，'01 / 15 / 2000'，'01 / 27/2000'，
 '03 / 30/2003'），'％m /％d /％Y'），Test.Result = c（'negative'，
 '700份/ ul'，'900份/ ul'，'negative'））

编辑2：

为了澄清，结果数据应该是：患者A接受输血在第X天和第Y天（对于df_A）。在第X天输液之前，他最近的检测结果是X（最初的输血测试日期，在df_B中）。在第Y天输血之前，他最近的测试结果是Y（在第二次输血之前，也在df_B中，df_B还包含一些其他测试日期，这些测试日期并不是最终的输出。

解决方案

这里使用 data.table 的滚动联接：

  require（data.table）
 setkey（setDT（df_A），MRN，Transfusion.Date）
 setkey（setDT（df_B） ，MRN，Test.Date）
 
 df_B [df_A，roll = TRUE] 
＃MRN Test.Date Test.Result 
＃1：1 2000-01-01 negative 
＃2：1 2000-01-30 900份/ ul 
＃3：2 2003-04-01负
＃4：3 2003-04-01 NA

 
    setDT  convert  data.frame  to  data.table 通过引用（没有任何额外的复制），这将导致 df_A 和 df_B 现在是data.tables。
 
 
    setkey 排序数据.table 由我们提供的列和标记这些列作为关键列，这允许我们使用基于二进制搜索的连接。
 
 
  我们在关键列上执行表单 x [i] 的连接，其中每行 i ，匹配的行 x （如果有的话，其他NA）以及 i 的行被返回。这就是我们所说的均衡加入。通过添加 roll = TRUE ，如果不匹配，最后一次观察结束（LOCF）。这就是我们所说的滚动加入。以递增顺序排序（由于 setkey（））确保最后一次观察是最近的日期。
 
 
  
 
  HTH 
 
This may be very complicated and I suspect requires advanced knowledge. I have now two different types of data.frames I need to combine:

The data:

Dataframe A:

lists all transfusion dates by patient ID. Every transfusion is represented by a separate row, patients can have multiple transfusions. Different patients can have transfusions on the same date. 
Patient ID Transfusion.Date
1          01/01/2000
1          01/30/2000
2          04/01/2003
3          04/01/2003
Dataframes of Type B contain test results at other dates, also by patient ID:
Patient ID  Test.Date   Test.Value
1           11/30/1999   negative
1           01/15/2000   700 copies/uL
1           01/27/2000   900 copies/uL
2           03/30/2003   negative
What I would like to have is Dataframe A with the same number of rows (1 for each transfusion), and with the most recent Test.Value as a separate column. Each transfusion date should have the test result from the test performed most closely (prior) to the transfusion. 

desired output: 

-->
Patient ID Transfusion.Date Pre.Transfusion.Test
1          01/01/2000       negative
1          01/30/2000       900 copies/ul
2          04/01/2003       negative
3          04/01/2003       NA
I think the general strategy would be to subset the data.frames by patient IDs. Then take all transfusion dates for patient 1, check which result is closest to all available test_dates for each element and then return the value closest.

How can I explain R to do that?

Edit 1: Here is the R-code for these examples
df_A <- data.frame(MRN = c(1,1,2,3), 
                   Transfusion.Date = as.Date(c('01/01/2000', '01/30/2000', 
                   '04/01/2003','04/01/2003'),'%m/%d/%Y')) 

df_B <- data.frame(MRN = c(1,1,1,2), 
                   Test.Date = as.Date(c('11/30/1999', '01/15/2000', '01/27/2000', 
                   '03/30/2003'),'%m/%d/%Y'), Test.Result = c('negative', 
                   '700 copies/ul','900 copies/ul','negative'))
Edit 2:

To clarify, the resulting data should be: Patient A received transfusions on Day X and Day Y. (for df_A). Prior to the transfusion on day X, his most recent test result was X (closest test date to first transfusion, in df_B). Prior to the transfusion on day Y, his most recent test result was Y (prior to the second transfusion, also in df_B. df_B also contains a bunch of other test dates, which are not needed for the final output.
 解决方案 
Here's using data.table's rolling joins:
require(data.table)
setkey(setDT(df_A), MRN, Transfusion.Date)
setkey(setDT(df_B), MRN, Test.Date)

df_B[df_A, roll=TRUE]
#    MRN  Test.Date   Test.Result
# 1:   1 2000-01-01      negative
# 2:   1 2000-01-30 900 copies/ul
# 3:   2 2003-04-01      negative
# 4:   3 2003-04-01            NA



setDT converts data.frame to data.table by reference (without any additional copying). That'll result in df_A and df_B now being data.tables.
setkey sorts the data.table by the columns we provided, and marks those columns as key columns, which allows us to use binary search based joins.
We perform a join of the form x[i] on the key columns, where for each row of i, the matching rows of x (if any, else NA) along with i's rows are returned. This is what we call an equi-join. By adding roll = TRUE, in the event of a mismatch, the last observation is carried forward (LOCF). This is what we call a rolling join. The sorting in increasing order (due to setkey()) ensures that the last observation is the most recent date.


HTH

                        这篇关于R：在数据帧B中填充行之前的日期使用来自数据帧A的值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R：在数据帧B中填充行之前的日期使用来自数据帧A的值 [英] R: Using values from data frame A from a date prior to populate a row in data frame B

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：在数据帧B中填充行之前的日期使用来自数据帧A的值 [英] R: Using values from data frame A from a date prior to populate a row in data frame B

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭