获取每一行的最后一个非空列的值 [英] Get Value of last non-empty column for each row

查看：133 发布时间：2020/10/26 3:08:29 r string dplyr

本文介绍了获取每一行的最后一个非空列的值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

获取此示例数据：

  data.frame（a_1 = c（ Apple， Grapes， Melon ， Peach），a_2 = c（ Nuts， Kiwi， Lime， Honey），a_3 = c（ Plum， Apple，NA，NA），a_4 = c（黄瓜，NA，NA，NA））
 
 a_1 a_2 a_3 a_4 
 1苹果坚果李子黄瓜
 2葡萄猕猴桃苹果< NA> 
 3甜瓜石灰< NA> < NA> 
 4桃子蜂蜜< NA> < NA>

基本上我想在每行的最后一列上运行grep，而不是NA。因此，我在grep（ pattern，x）中的x应该是：

 黄瓜
苹果
酸橙
蜂蜜

我有一个整数，告诉我最后一个a_N：

  numcol<-rowSums（！is.na（df [，grep（（^ a_）\\d， colnames（df））]））

到目前为止，我已经尝试过与ave（），apply（）和dplyr：

  grepl（ pattern，df [，sprintf（ a_％i，numcol ）]）

但是我不太能做到。请记住，我的数据集非常大，因此我希望使用矢量化解决方案或mb dplyr。
帮助将不胜感激。

/ e：谢谢，这是一个非常好的解决方案。我的想法太复杂了。（正则表达式归因于我的更具体的数据）

解决方案

这里不需要正则表达式。只需使用 apply + tail + na.omit ：

 > apply（mydf，1，function（x）tail（na.omit（x），1））
 [1]黄瓜 Apple石灰蜂蜜

~~我不知道这在速度方面有何不同，但是您~~您还可以结合使用 data.table和 reshape2，例如：

  library（data .table）
 library（reshape2）
 na.omit（melt（as.data.table（mydf，keep.rownames = TRUE），
 id.vars = rn））[ ，value [.N]，由= rn] 
＃rn V1 
＃1：1黄瓜
＃2：2苹果
＃3：3酸橙
＃ 4：4蜂蜜

或者甚至更好：

 融化（as.data.table（df，keep.rownames = TRUE），
 id.vars = rn，na.rm = TRUE）[，值[.N]，由= rn] 
＃rn V1 
＃1：1黄瓜
＃2：2苹果
＃3：3石灰
＃4： 4 Honey

这会更快。在一个80万行的数据集上， apply 花费了约50秒，而 data.table 方法花费了约2.5秒。 / p>

Take this sample data:

data.frame(a_1=c("Apple","Grapes","Melon","Peach"),a_2=c("Nuts","Kiwi","Lime","Honey"),a_3=c("Plum","Apple",NA,NA),a_4=c("Cucumber",NA,NA,NA)) 

   a_1    a_2   a_3     a_4
1  Apple  Nuts  Plum    Cucumber
2 Grapes  Kiwi  Apple    <NA>
3  Melon  Lime  <NA>     <NA>
4  Peach  Honey  <NA>    <NA>

Basically I want to run a grep on the last column of each row which is not NA. Thus my x in grep("pattern",x) should be:

Cucumber
Apple
Lime
Honey

I have an integer which tells me which a_N is the last one:

numcol <- rowSums(!is.na(df[,grep("(^a_)\\d", colnames(df))]))

So far I have tried something like this in combination with ave(), apply() and dplyr:

grepl("pattern",df[,sprintf("a_%i",numcol)])

However I dont quite can make it work. Keep in mind that my dataset is very large thus I was hoping vor a vectorized solution or mb dplyr. Help would be greatly appreciated.

/e: Thanks, that is a really good solution. My thinking was too complicated. (the regex is due to my more specific data )

解决方案

There's no need for regex here. Just use apply + tail + na.omit:

> apply(mydf, 1, function(x) tail(na.omit(x), 1))
[1] "Cucumber" "Apple"    "Lime"     "Honey"

~~I don't know how this compares in terms of speed, but you~~ You can also use a combination of "data.table" and "reshape2", like this:

library(data.table)
library(reshape2)
na.omit(melt(as.data.table(mydf, keep.rownames = TRUE), 
             id.vars = "rn"))[, value[.N], by = rn]
#    rn       V1
# 1:  1 Cucumber
# 2:  2    Apple
# 3:  3     Lime
# 4:  4    Honey

Or, even better:

melt(as.data.table(df, keep.rownames = TRUE), 
     id.vars = "rn", na.rm = TRUE)[, value[.N], by = rn]
#    rn       V1
# 1:  1 Cucumber
# 2:  2    Apple
# 3:  3     Lime
# 4:  4    Honey

This would be much faster. On an 800k-row dataset, apply took ~ 50 seconds while the data.table approach took about 2.5 seconds.

这篇关于获取每一行的最后一个非空列的值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取每一行的最后一个非空列的值 [英] Get Value of last non-empty column for each row

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

获取每一行的最后一个非空列的值 [英] Get Value of last non-empty column for each row

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭