6个位置中3个元素的排列 [英] Permutations of 3 elements within 6 positions
问题描述
我希望在始终具有序列的条件下,在六个位置内置换(或组合) c( a, b, c)
带有其他元素,例如 abcbab
。
I'm looking to permute (or combine) c("a","b","c")
within six positions under the condition to have always sequences with alternate elements, e.g abcbab
.
可以很容易地获得排列:
Permutations could easily get with:
abc<-c("a","b","c")
permutations(n=3,r=6,v=abc,repeats.allowed=T)
我认为用gtools不可能做到这一点,我已经一直在尝试为此设计一个功能-即使我认为它可能已经存在。
I think is not possible to do that with gtools, and I've been trying to design a function for that -even though I think it may already exist.
推荐答案
因为您正在寻找排列, expand.grid
可以与排列
一起使用。但是,由于您不希望有邻居,因此我们可以大大缩短其维度。我认为这是合理的随机明智方法!
Since you're looking for permutations, expand.grid
can work as well as permutations
. But since you don't want like-neighbors, we can shorten the dimensionality of it considerably. I think this is legitimate random-wise!
预先:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
dim(m)
# [1] 96 6
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))))
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a b c a bcabca
# 2 c a b c a b cabcab
# 3 a b c a b c abcabc
# 4 b a b c a b babcab
# 5 c b c a b c cbcabc
# 6 a c a b c a acabca
演练:
Walk-through:
- 由于您希望对其进行所有循环使用,因此可以使用
gtools: :permutations
,或者我们可以使用expand.grid
...我将使用后者,我不知道这是不是ch更快,但这确实是我需要的捷径(稍后) - 在处理此类约束时,我想扩展值向量的索引
-
但是,由于我们不希望邻居相同,所以我认为,
cumsum $ c代替了每一行值都是直接索引$ c>他们;通过使用它,我们可以控制累积和重新达到相同值的能力...通过删除
0
和length(abc)
从可能值的列表中,我们消除了以下可能性:(a)从不保持相同,并且(b)从不实际增加一个向量长度(重复相同的值);作为演练:
- since you want all recycled permutations of it, we can use
gtools::permutations
, or we can useexpand.grid
... I'll use the latter, I don't know if it's much faster, but it does a short-cut I need (more later) - when dealing with constraints like this, I like to expand on the indices of the vector of values
however, since we don't want neighbors to be the same, I thought that instead of each row of values being the straight index, we
cumsum
them; by using this, we can control the ability of the cumulative sum to re-reach the same value ... by removing0
andlength(abc)
from the list of possible values, we remove the possibility of (a) never staying the same, and (b) never increasing actually one vector-length (repeating the same value); as a walk-through:
head(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# 1 1 1 1 1 1 1
# 2 2 1 1 1 1 1
# 3 3 1 1 1 1 1
# 4 1 2 1 1 1 1
# 5 2 2 1 1 1 1
# 6 3 2 1 1 1 1
由于第一个值可以是所有三个值,因此它是 1:3
,但每个附加项都应与其相距1或2。
Since the first value can be all three values, it's 1:3
, but each additional is intended to be 1 or 2 away from it.
head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum)), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 1 2 3 4 5 6
# [2,] 2 3 4 5 6 7
# [3,] 3 4 5 6 7 8
# [4,] 1 3 4 5 6 7
# [5,] 2 4 5 6 7 8
# [6,] 3 5 6 7 8 9
好的,这似乎没有用ful(因为它超出了向量的长度),所以我们可以调用模运算符和一个移位(因为模数返回从0开始,我们希望从1开始):
okay, that doesn't seem that useful (since it goes beyond the length of the vector), so we can invoke the modulus operator and a shift (since modulus returns 0-based, we want 1-based):
head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 2 3 1 2 3 1
# [2,] 3 1 2 3 1 2
# [3,] 1 2 3 1 2 3
# [4,] 2 1 2 3 1 2
# [5,] 3 2 3 1 2 3
# [6,] 1 3 1 2 3 1
为验证此功能,我们可以在每一行中进行 diff
并查找 0
:
m <- t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1)
any(apply(m, 1, diff) == 0)
# [1] FALSE
将此自动化转换为任意向量,我们寻求复制$ c的帮助$ c>生成锂可能的向量st:
to automate this to an arbitrary vector, we enlist the help of replicate
to generate the list of possible vectors:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
str(r)
# List of 6
# $ : int [1:3] 1 2 3
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
,然后 do.call
进行扩展。
您拥有索引矩阵
head(m)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 2 3 1 2 3 1
# [2,] 3 1 2 3 1 2
# [3,] 1 2 3 1 2 3
# [4,] 2 1 2 3 1 2
# [5,] 3 2 3 1 2 3
# [6,] 1 3 1 2 3 1
,然后将每个索引替换为向量的值:
and then replace each index with the vector's value:
m[] <- abc[m]
head(m)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] "b" "c" "a" "b" "c" "a"
# [2,] "c" "a" "b" "c" "a" "b"
# [3,] "a" "b" "c" "a" "b" "c"
# [4,] "b" "a" "b" "c" "a" "b"
# [5,] "c" "b" "c" "a" "b" "c"
# [6,] "a" "c" "a" "b" "c" "a"
然后我们 cbind
统一字符串(通过 apply
和 paste
粘贴)
and then we cbind
the united string (via apply
and paste
)
性能:
library(microbenchmark)
library(dplyr)
library(tidyr)
library(stringr)
microbenchmark(
tidy1 = {
gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>%
data.frame() %>%
unite(united, sep = "", remove = FALSE) %>%
filter(!str_detect(united, "([a-c])\\1"))
},
tidy2 = {
filter(unite(data.frame(gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE)),
united, sep = "", remove = FALSE),
!str_detect(united, "([a-c])\\1"))
},
base = {
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
},
times=10000
)
# Unit: microseconds
# expr min lq mean median uq max neval
# tidy1 1875.400 2028.8510 2446.751 2165.651 2456.051 12790.901 10000
# tidy2 1745.402 1875.5015 2284.700 2000.051 2278.101 50163.901 10000
# base 796.701 871.4015 1020.993 919.801 1021.801 7373.901 10000
我尝试了infix(non-%>%
)tidy2版本只用于踢,尽管我确信从理论上讲它会更快,但我没有意识到它将削除7%以上在运行时间。 (50163可能是R垃圾收集,而不是真实的。)我们为可读性/可维护性付出的代价。
I tried the infix (non-%>%
) tidy2 version just for kicks, and though I was confident it would theoretically be faster, I didn't realize it would shave over 7% off the run-times. (The 50163 is likely R garbage-collecting, not "real".) The price we pay for readability/maintainability.
这篇关于6个位置中3个元素的排列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!