按索引替换行 [英] Replace rows by index
问题描述
在以下示例中:
library(data.table)
df1< - data.table 1A= c(0,0,0,0),1B= c(4:3),2A= c(0,0,0,0),2B= c )
df2 < - data.table(1A= c(0,0),1B= c(1:2),2A= c(0,0) = c(1:2))
df1
#1A 1B 2A 2B
#1:0 4 0 4
#2:0 3 0 3
#3:0 4 0 4
#4:0 3 0 3
df2
#1A 1B 2A 2B
#1:0 1 0 1
#2:0 2 0 2
indx = c(1,3)
indx
#[1] 1 3
df1 [indx,]< - df2
df1
#1A 1B 2A 2B
#1:0 1 0 1
#2:0 3 0 3
#3 :0 2 0 2
#4:0 3 0 3
和df1中的3与df2。在我的真实数据中复制相同的练习,我遇到错误:
不能在同一查询检测到重复的
)。
$ p> Z4 [positionpdis,]< - ZpdisRow2
以下属性:
is.data.table(ZpdisRow2)
#[1] TRUE
.data.table(Z4)
#[1] TRUE
dim(Z4)
#[1] 7968 7968
dim(Z4 [positionpdis,])
#[1] 48 7968
dim(ZpdisRow2)
#[1] 48 7968
str(positionpdis)
#int [1:48] 91 257 423 589 755 921 1087 1253 1419 1585 ...
> length(unique(positionpdis))
#[1] 48
的错误?
我猜我们可能有一些列名称在原始数据集中重复。例如,如果我们将第三列名称更改为与第一列名称相同,则会出现错误。
colnames )[3] < - '1A'
df1 [indx,] < - df2
< blockquote>
[< - 。data.table
( * tmp *
indx,value = list( 1A
= c(0,
0),:
不能在同一列中分配给同一列两次
我们可以使用 make.unique
这是一个方便的功能,这种类型的情况下,而不必查看每个列名称的重复。
colnames (df1)< - make.unique(colnames(df1))
df1 [indx,] < - df2
df1
#1A 1B 1A.1 2B
#1 :0 1 0 1
#2:0 3 0 3
#3:0 2 0 2
#4:0 3 0 3
另一个也应该处理重复列名的选项是 set
。这是非常有效的,因为避免了 [。data.table
]中的开销。这里,我们循环通过列索引( seq_along(df1)
),并基于行( i
)列( j
)索引,我们设置
'df1' p>
for(j in seq_along(df1)){
set(df1,i = as.integer(indx),j = j,df2 [[j]])
}
df1
#1A 1B 1A 2B
#1:0 1 0 1
#2:0 3 0 3
#3:0 2 0 2
#4:0 3 0 3
In the following example:
library(data.table)
df1 <- data.table("1A"=c(0,0,0,0),"1B"=c(4:3),"2A"=c(0,0,0,0), "2B"=c(4:3))
df2 <- data.table("1A"=c(0,0),"1B"=c(1:2),"2A"=c(0,0), "2B"=c(1:2))
df1
# 1A 1B 2A 2B
# 1: 0 4 0 4
# 2: 0 3 0 3
# 3: 0 4 0 4
# 4: 0 3 0 3
df2
# 1A 1B 2A 2B
# 1: 0 1 0 1
# 2: 0 2 0 2
indx = c(1,3)
indx
# [1] 1 3
df1[indx,] <- df2
df1
# 1A 1B 2A 2B
# 1: 0 1 0 1
# 2: 0 3 0 3
# 3: 0 2 0 2
# 4: 0 3 0 3
I successfully replace rows 1 and 3 in df1 with df2. Replicating the same exercise in my real data, I encounter the error:
Can't assign to the same column twice in the same query (duplicates detected).
in this expression:
Z4[positionpdis,] <- ZpdisRow2
The objects have the following attributes:
is.data.table(ZpdisRow2)
# [1] TRUE
is.data.table(Z4)
# [1] TRUE
dim(Z4)
# [1] 7968 7968
dim(Z4[positionpdis,])
# [1] 48 7968
dim(ZpdisRow2)
# [1] 48 7968
str(positionpdis)
# int [1:48] 91 257 423 589 755 921 1087 1253 1419 1585 ...
> length(unique(positionpdis))
# [1] 48
What can be the source of the error?
I am guessing that we might have some column names duplicated in the original dataset. For example, if we change the 3rd column name as the same as the first one, we get an error.
colnames(df1)[3] <- '1A'
df1[indx,] <- df2
Error in
[<-.data.table
(*tmp*
, indx, , value = list(1A
= c(0, 0), : Can't assign to the same column twice in the same query (duplicates detected).
We can make that column names unique with make.unique
which is a convenient function for this type of cases without having to look each and every column name for duplicates.
colnames(df1) <- make.unique(colnames(df1))
df1[indx,] <- df2
df1
# 1A 1B 1A.1 2B
#1: 0 1 0 1
#2: 0 3 0 3
#3: 0 2 0 2
#4: 0 3 0 3
Another option that should also work with duplicate column names is set
. It is very efficient as the overhead in [.data.table
is avoided. Here, we loop through the column index (seq_along(df1)
), and based on the row (i
) and column (j
) index, we set
the values in 'df1' with the values of 'df2'.
for(j in seq_along(df1)){
set(df1, i= as.integer(indx), j=j, df2[[j]])
}
df1
# 1A 1B 1A 2B
#1: 0 1 0 1
#2: 0 3 0 3
#3: 0 2 0 2
#4: 0 3 0 3
这篇关于按索引替换行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!