将ifelse语句与R中的shift data.table函数组合 [英] Combining an ifelse statement with shift data.table function in R

查看:155
本文介绍了将ifelse语句与R中的shift data.table函数组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找出如何将ifelse语句与data.table中的shift函数结合在一起。我的数据看起来像这样:

I am trying to work out how I would combine an ifelse statement with the shift function in data.table. My data looks like this:

DF <- structure(list(CHR = c(1, 1, 1, 1, 1,1), 
SNP = c("rs2494631", "rs4648637", "rs2494627", "rs11122119", "rs1844583","rs2292242"), 
BP = c(2399149, 2401364, 2402499, 6768856, 8383469, 8385059), 
KBdist= c(NA, 2215, 1135, 4366357, 1614613, 1590), 
locus = c(1, NA, NA, NA, NA, NA)), 
.Names = c("CHR","SNP","BP","KBdist","locus"), 
row.names = c(NA, 6L), 
class = "data.frame")

> df

CHR SNP        BP       KBdist   locus
1   rs2494631  2399149  NA       1
1   rs4648637  2401364  2215     NA
1   rs2494627  2402499  1135     NA
1   rs11122119 6768856  4366357  NA
1   rs1844583  8383469  1614613  NA
1   rs2292242  8385059  1590     NA

我试图实现的是:
如果C​​HR等于上面的行,而KBdist小于500,000,则使轨迹等于上面的行,否则将一个值加到上面的行的值。将会产生如下所示的输出:

and what I am trying to achieve is: "If CHR is equal to the line above, and KBdist is less than 500,000, make locus equal to the line above, else add one to the value of the line above". Which would yield an output that looks like this:

CHR SNP        BP       KBdist   locus
1   rs2494631  2399149  NA       1
1   rs4648637  2401364  2215     1
1   rs2494627  2402499  1135     1
1   rs11122119 6768856  4366357  2
1   rs1844583  8383469  1614613  3
1   rs2292242  8385059  1590     3

我知道我可以使用shift来访问上一行的值,例如:

I know that I can use shift to access the values in the row above, for example:

DF<-DF[ , KBdist := BP - shift(BP, 1L, type="lag")]

这就是我创建列之一的方式。但是我看不到如何将其扩展到包括上述ifelse语句条件。

As that is how I created one of the columns. But I don't see how you could extend it to including the ifelse statement conditions above.

任何帮助将不胜感激。

谢谢。

推荐答案

这是解决 base R 中任务的解决方案-此处未使用 data.table

Here is a solution that solves the task in base R though - data.table is not used here.

# logical vector with our condition tested
ind <- (diff(DF$CHR) == 0 & DF$KBdist[-1] < 5e+5)
# populating the 'locus' column   ---   notice the '<<-'
vapply(2:nrow(DF), function (k) DF$locus[k] <<- DF$locus[k-1] + 1 - ind[k-1], numeric(1)) 
# [1] 1 1 2 3 3
DF
#   CHR        SNP      BP  KBdist locus
# 1   1  rs2494631 2399149      NA     1
# 2   1  rs4648637 2401364    2215     1
# 3   1  rs2494627 2402499    1135     1
# 4   1 rs11122119 6768856 4366357     2
# 5   1  rs1844583 8383469 1614613     3
# 6   1  rs2292242 8385059    1590     3

vapply(...)返回 locus 列并覆盖它。

备注

请注意,我使用了<函数内的<-以便覆盖 DF $ locus [k] 值。如果您不喜欢这方面,只需将<<-换成<-并替换 vapply(...) DF $ locus [-1]<-vapply(...)

Note that I used <<- inside the function in order to overwrite the DF$locus[k] value. If you don't like this aspect, simply swap <<- for <- and subsitute vapply(...) with DF$locus[-1] <- vapply(...).

这篇关于将ifelse语句与R中的shift data.table函数组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆