通过链接2列设置R data.table行顺序 [英] Set R data.table row order by chaining 2 columns
问题描述
我试图找出如何基于2列的链接对R数据表进行排序。
I'm trying to figure out how to order an R data table based on the chaining of 2 columns.
这是我的示例数据表。
dt <- data.table(id = c('A', 'A', 'A', 'A', 'A')
, col1 = c(7521, 0, 7915, 5222, 5703)
, col2 = c(7907, 5703, 8004, 7521, 5222))
id col1 col2
1: A 7521 7907
2: A 0 5703
3: A 7915 8004
4: A 5222 7521
5: A 5703 5222
我需要以col1 = 0开头的行顺序。第2行中的col1值应等于上一行中col2的值,依此类推。
I need the row order to start with col1 = 0. The col1 value in row 2 should be equal to the value of col2 in the preceding row, and so on.
此外,通常应该始终有一个匹配值链接行顺序。但是,如果没有,它应该选择最接近的值(请参见下面的第4和第5行)。
Additionally, there generally should always be a matching value that chains the row order. But if not, it should select the closest value (see rows 4 & 5 below).
我要查找的结果如下所示:
The outcome I'm looking for is shown below:
id col1 col2
1: A 0 5703
2: A 5703 5222
3: A 5222 7521
4: A 7521 7907
5: A 7915 8004
我想我可以编写一个疯狂的函数来执行此操作。.但是我想知道是否存在一种优雅的data.table解决方案。
I think I can write a crazy function to do this.. but I'm wondering if there's an elegant data.table solution.
编辑
我更新了表,使其包括具有重复行的附加ID和唯一的源列:
EDIT
I updated the table to include an additional ID with duplicate rows, and a unique source column:
dt <- data.table(id = c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B')
, col1 = c(7521, 0, 7915, 5222, 5703, 1644, 1625, 0, 1625, 1625)
, col2 = c(7907, 5703, 8004, 7521, 5222, 1625, 1625, 1644, 1625, 1505)
, source = c('c', 'b', 'a', 'e', 'd', 'y', 'z', 'x', 'w', 'v'))
id col1 col2 source
1: A 7521 7907 c
2: A 0 5703 b
3: A 7915 8004 a
4: A 5222 7521 e
5: A 5703 5222 d
6: B 1644 1625 y
7: B 1625 1625 z
8: B 0 1644 x
9: B 1625 1625 w
10: B 1625 1505 v
有可以是ID中的匹配值。参见B,第7行和& 9以上。但是,每行数据的来源都是唯一的。
There can be matching values within an ID. See B, rows 7 & 9 above. However, there's a unique source for each row where this data comes from.
所需的输出为:
id col1 col2 source
1: A 0 5703 b
2: A 5703 5222 d
3: A 5222 7521 e
4: A 7521 7907 c
5: A 7915 8004 a
6: B 0 1644 x
7: B 1644 1625 y
8: B 1625 1625 w
9: B 1625 1625 z
10: B 1625 1625 v
在输出中,匹配的行8& 9可以按任何顺序排列。
In the output, the matching rows, 8 & 9 could be in any order.
谢谢!
推荐答案
这是另一种方法:
- 对数据重新排序,该数据将首先放置0值。
- 遍历其余值以返回
col2
与col1 $ c匹配的索引$ c>。
- Reorders the data which will place the 0 value first.
- Loops through the rest of the values to return the index of where
col2
matchescol1
.
setorder(dt, col1)
neworder = seq_len(nrow(dt))
init = 1L
col1 = dt[['col1']]; col2 = dt[['col2']]
for (i in seq_along(neworder)[-1L]) {
ind = match(col2[init], col1)
if (is.na(ind)) break
neworder[i] = init = ind
}
dt[neworder]
## id col1 col2
## <char> <num> <num>
##1: A 0 5703
##2: A 5703 5222
##3: A 5222 7521
##4: A 7521 7907
##5: A 7915 8004
如果要进行分组,则可以将循环包装在 dt [,.I [{...},by = id] $ V1
返回索引。或者使它看起来更好,我们可以创建一个函数。
If you are doing it with grouping, you can wrap the loop within a dt[, .I[{...}, by = id]$V1
to return the indices. Or to make it look better, we can make a function.
recursive_order = function (x, y) {
neworder = seq_len(length(x))
init = 1L
for (i in neworder[-1L]) {
ind = match(y[init], x)
if (is.na(ind)) break
# Multiple matches which means all the maining matches are the same number
if (ind == init) {
inds = which(x %in% y[init])
l = length(inds)
neworder[i:(i + l - 2L)] = inds[-1L]
break
}
neworder[i] = init = ind
}
return(neworder)
}
dt <- data.table(id = c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B')
, col1 = c(7521, 0, 7915, 5222, 5703, 1644, 1625, 0, 1625, 1625)
, col2 = c(7907, 5703, 8004, 7521, 5222, 1625, 1625, 1644, 1625, 1505)
, source = c('c', 'b', 'a', 'e', 'd', 'y', 'z', 'x', 'w', 'v'))
setorder(dt, col1)
dt[dt[, .I[recursive_order(col1, col2)], by = id]$V1]
id col1 col2 source
<char> <num> <num> <char>
1: A 0 5703 b
2: A 5703 5222 d
3: A 5222 7521 e
4: A 7521 7907 c
5: A 7915 8004 a
6: B 0 1644 x
7: B 1644 1625 y
8: B 1625 1625 z
9: B 1625 1625 w
10: B 1625 1505 v
这篇关于通过链接2列设置R data.table行顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!