按索引替换行 [英] Replace rows by index

查看:181
本文介绍了按索引替换行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以下示例中:

  library(data.table)
df1< - data.table 1A= c(0,0,0,0),1B= c(4:3),2A= c(0,0,0,0),2B= c )
df2 < - data.table(1A= c(0,0),1B= c(1:2),2A= c(0,0) = c(1:2))

df1
#1A 1B 2A 2B
#1:0 4 0 4
#2:0 3 0 3
#3:0 4 0 4
#4:0 3 0 3

df2
#1A 1B 2A 2B
#1:0 1 0 1
#2:0 2 0 2

indx = c(1,3)
indx
#[1] 1 3

df1 [indx,]< - df2
df1
#1A 1B 2A 2B
#1:0 1 0 1
#2:0 3 0 3
#3 :0 2 0 2
#4:0 3 0 3

和df1中的3与df2。在我的真实数据中复制相同的练习,我遇到错误:


不能在同一查询检测到重复的
)。


$ p> Z4 [positionpdis,]< - ZpdisRow2

以下属性:

  is.data.table(ZpdisRow2)
#[1] TRUE
.data.table(Z4)
#[1] TRUE
dim(Z4)
#[1] 7968 7968
dim(Z4 [positionpdis,])
#[1] 48 7968
dim(ZpdisRow2)
#[1] 48 7968
str(positionpdis)
#int [1:48] 91 257 423 589 755 921 1087 1253 1419 1585 ...
> length(unique(positionpdis))
#[1] 48

的错误?

解决方案

我猜我们可能有一些列名称在原始数据集中重复。例如,如果我们将第三列名称更改为与第一列名称相同,则会出现错误。

  colnames )[3] < - '1A'
df1 [indx,] < - df2



< blockquote>

[< - 。data.table * tmp * indx,value = list( 1A = c(0,
0),:
不能在同一列中分配给同一列两次


我们可以使用 make.unique 这是一个方便的功能,这种类型的情况下,而不必查看每个列名称的重复。

  colnames (df1)< -  make.unique(colnames(df1))
df1 [indx,] < - df2
df1
#1A 1B 1A.1 2B
#1 :0 1 0 1
#2:0 3 0 3
#3:0 2 0 2
#4:0 3 0 3






另一个也应该处理重复列名的选项是 set 。这是非常有效的,因为避免了 [。data.table ]中的开销。这里,我们循环通过列索引( seq_along(df1)),并基于行( i )列( j )索引,我们设置'df1' p>

  for(j in seq_along(df1)){
set(df1,i = as.integer(indx),j = j,df2 [[j]])
}
df1
#1A 1B 1A 2B
#1:0 1 0 1
#2:0 3 0 3
#3:0 2 0 2
#4:0 3 0 3


In the following example:

library(data.table)
df1 <- data.table("1A"=c(0,0,0,0),"1B"=c(4:3),"2A"=c(0,0,0,0), "2B"=c(4:3))
df2 <- data.table("1A"=c(0,0),"1B"=c(1:2),"2A"=c(0,0), "2B"=c(1:2))

df1
#    1A 1B 2A 2B
# 1:  0  4  0  4
# 2:  0  3  0  3
# 3:  0  4  0  4
# 4:  0  3  0  3

df2
#    1A 1B 2A 2B
# 1:  0  1  0  1
# 2:  0  2  0  2

indx = c(1,3)
indx
# [1] 1 3

df1[indx,] <- df2
df1
#    1A 1B 2A 2B
# 1:  0  1  0  1
# 2:  0  3  0  3
# 3:  0  2  0  2
# 4:  0  3  0  3

I successfully replace rows 1 and 3 in df1 with df2. Replicating the same exercise in my real data, I encounter the error:

Can't assign to the same column twice in the same query (duplicates detected).

in this expression:

Z4[positionpdis,] <- ZpdisRow2

The objects have the following attributes:

is.data.table(ZpdisRow2)
# [1] TRUE
is.data.table(Z4)
# [1] TRUE
dim(Z4)
# [1] 7968 7968
dim(Z4[positionpdis,])
# [1]   48 7968
dim(ZpdisRow2)
# [1]   48 7968
str(positionpdis)
# int [1:48] 91 257 423 589 755 921 1087 1253 1419 1585 ...
> length(unique(positionpdis))
# [1] 48

What can be the source of the error?

解决方案

I am guessing that we might have some column names duplicated in the original dataset. For example, if we change the 3rd column name as the same as the first one, we get an error.

colnames(df1)[3] <- '1A'
df1[indx,] <- df2

Error in [<-.data.table(*tmp*, indx, , value = list(1A = c(0, 0), : Can't assign to the same column twice in the same query (duplicates detected).

We can make that column names unique with make.unique which is a convenient function for this type of cases without having to look each and every column name for duplicates.

 colnames(df1) <- make.unique(colnames(df1)) 
 df1[indx,] <- df2
 df1
 #  1A 1B 1A.1 2B
 #1:  0  1    0  1
 #2:  0  3    0  3
 #3:  0  2    0  2
 #4:  0  3    0  3


Another option that should also work with duplicate column names is set. It is very efficient as the overhead in [.data.table is avoided. Here, we loop through the column index (seq_along(df1)), and based on the row (i) and column (j) index, we set the values in 'df1' with the values of 'df2'.

 for(j in seq_along(df1)){
           set(df1, i= as.integer(indx), j=j, df2[[j]])
  }
 df1
#   1A 1B 1A 2B
#1:  0  1  0  1
#2:  0  3  0  3
#3:  0  2  0  2
#4:  0  3  0  3

这篇关于按索引替换行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆