将长列表分成R中指定长度的短列表 [英] Divide long list into short lists of specified length in R
问题描述
这与先前的问题紧密相关此处 .但是我需要一些不同的东西...
This is closely related to a previous question here. However I need something slightly different...
我有一长串对象,我需要将它们分成较小的列表,每个列表都有一定数量的条目.我需要能够更改不同任务的列表长度.问题在于,每个对象只能在一个列表中出现一次.
I have a long list of objects that I need to divide into smaller lists, each with a certain number of entries. I need to be able to change the length of the lists for different tasks. The catch is that each object can only appear once in a single list.
# Create some example data...
# Make a list of objects.
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')
# Generate a longer list, with a random sequence and number of repetitions for each entry.
set.seed(123)
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))
print(LONG.LIST)
Name
1 Cup
2 Distance
3 Roof
4 Pencil
5 Lunch
6 Toast
7 Watch
8 Bottle
9 Car
10 Roof
11 Lunch
12 Forever
13 Cheese
14 Oranges
15 Ocean
16 Chocolate
17 Socks
18 Leaves
19 Oranges
20 Distance
21 Green
22 Paper
23 Red
24 Paper
25 Trees
26 Chocolate
27 Bottle
28 Dog
29 Wind
30 Parrot
etc....
作为参数,假设我想创建一系列20个项目的列表.使用上面生成的示例,'Distance'
出现在位置"2"和位置"20"上,'Lunch'
出现在"5"和"11"上,并且'Oranges'
出现在"14"和"19"上,因此是第一个列表如果没有重复项,则需要扩展为包括'Green'
,'Paper'
和'Red'
.第二个列表将以'Paper'
在位置24开头.但是,我不想将长度限制为20,有时我可能希望将其设置为10或25.
For argument, suppose I wanted to create a series of 20-item lists. Using the example generated above, 'Distance'
appears at both position '2' and position '20', 'Lunch'
at both '5' and '11, and 'Oranges'
at '14' and 19', so the first list without duplicates would need to extend to include 'Green'
, 'Paper'
and 'Red'
. The second list would then begin with 'Paper'
at position 24. However I don't want to be restricted to a length of 20, sometimes I might want to make it 10 or 25.
在下面加入来自@LAP的评论,这有助于描述我的问题; 遍历向量,直到找到20个唯一的项目,将它们放在一起,丢弃重复项,然后继续在向量上移动,直到找到接下来的20个唯一的项目,依此类推,直到向量的结尾,最后用NA
.
Incorporating comments from @LAP below, which help to describe my problem; "Go through your vector until you found 20 unique items, put them together, discard the duplicates, then move on over your vector until you found the next 20 unique items, and so on until the end of your vector, filling the last part with NA
.
单独的列表本身仅需要唯一.两个或多个列表之间可能存在重复."
"The separate lists only need to be unique in and of themselves. There may be duplicates between two or more lists."
最后一个列表可能不完整,因此最好用'NA'
填充它.理想情况下,每个列表中的条目应按字母顺序排列.
The last list is likely to be incomplete, so it would be good to pad it with 'NA'
s. Ideally the entries would be alphabetical within each list.
最有用的输出将是数据框中每列一个列表.
The most useful output would be one-list per column in a dataframe.
推荐答案
好的,这是部分答案,因为我认为我已经满足了您的大部分需求.
Alright, this is a partial answer, as I think I've got most of what you need.
请注意,如果处理大量数据,这可能会很慢.
Note that this may be slow with huge data.
首先,初始化一个列表,然后将其包含任意数量的空向量.在此示例中,我们要从200个项目的向量中创建10组,每组20个.
First you initialize a list with as many empty vectors as you want groups afterwards. In this example we want to create 10 groups of 20 from a vector of 200 items.
首先,我们创建可复制的数据:
First, we create reproducible data:
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red',
'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever',
'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese',
'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green',
'Chocolate', 'Car', 'Distance')
set.seed(123)
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)), stringsAsFactors = F)
test <- vector("list", 10)
然后您初始化两个计数器:
Then you initialize two counters:
i <- 1
j <- 1
现在,我们使用while
循环,直到i
大于要分割的向量中的项数为止(因此在i > 200
时停止).在此循环中,我们检查列表中的当前子向量j
是否短于20.如果是,则添加一个项目并进行重复数据删除,否则,我们将j
加1以跳到下一个子向量.
Now we use a while
loop that runs until i
is greater than the number of items in our vector to be splitted (so it stops when i > 200
). Within this loop we check whether the current subvector j
in our list is shorter than 20. If so, we add an item and deduplicate, if not, we add 1 to j
to jump into the next subvector.
while(i <= nrow(LONG.LIST)){
if(length(test[[j]]) < 20){
test[[j]] <- c(test[[j]], LONG.LIST$Name[i])
test[[j]] <- unique(test[[j]])
i <- i+1
}else{
j <- j+1
}
}
这是我们的结果:
> test
[[1]]
[1] "Lunch" "Cheese" "Truck" "Roof" "Hope" "Mint" "Lemons" "Pencil" "Hippo" "Moon"
[11] "Car" "Chocolate" "Trees" "Distance" "Dog" "Bag" "Paper" "Peanuts" "Ocean" "Wind"
[[2]]
[1] "Hippo" "Wind" "Mint" "Plane" "Trees" "Truck" "Lemons" "Watch" "Chocolate" "Train"
[11] "Dog" "Lunch" "Green" "Horse" "Toast" "Distance" "Cloud" "Hammock" "Fork" "Paper"
[[3]]
[1] "Watch" "Hope" "Paper" "Socks" "Bag" "Plane" "Bottle" "Green" "Lunch" "Fork"
[11] "Mint" "Hippo" "Chocolate" "Car" "Trees" "Toast" "Forever" "Red" "Wind" "Ocean"
[[4]]
[1] "Car" "Lunch" "Toast" "Lemons" "Moon" "Socks" "Hippo" "Pencil" "Blue" "Fork" "Paper"
[12] "Distance" "Cloud" "Train" "Wind" "Watch" "Bottle" "Forever" "Green" "Bag"
[[5]]
[1] "Train" "Cheese" "Bottle" "Fork" "Paper" "Green" "Leaves" "Blue" "Toast" "Parrot" "Lemons" "Dog"
[13] "Hammock" "Ocean" "Red" "Peanuts" "Pencil" "Bag" "Horse" "Hope"
[[6]]
[1] "Oranges" "Truck" "Hippo" "Trees" "Parrot" "Red" "Hope" "Cloud" "Tin" "Bag"
[11] "Pencil" "Cup" "Dog" "Leaves" "Chocolate" "Mint" "Plane" "Moon" "Fork" "Green"
[[7]]
[1] "Tin" "Mint" "Book" "Bag" "Roof" "Hope" "Socks" "Watch" "Paper" "Peanuts"
[11] "Cup" "Distance" "Leaves" "Bottle" "Cloud" "Horse" "Trees" "Oranges" "Chocolate" "Toast"
[[8]]
[1] "Horse" "Watch" "Chocolate" "Tin" "Red" "Train"
[[9]]
NULL
[[10]]
NULL
现在,我们只需要用NA
填充最后的向量.这样做可能有所不同,但是可以完成工作:
Now we only need to fill the last vectors with NA
. This can probably done differently, but it gets the job done:
for(i in 1:length(test)){
if(length(test[[i]]) < 20){
test[[i]] <- c(test[[i]], rep(NA, 20 - length(test[[i]])))
}
}
这篇关于将长列表分成R中指定长度的短列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!