将长列表分成R中指定长度的短列表 [英] Divide long list into short lists of specified length in R

查看:63
本文介绍了将长列表分成R中指定长度的短列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与先前的问题紧密相关此处 .但是我需要一些不同的东西...

This is closely related to a previous question here. However I need something slightly different...

我有一长串对象,我需要将它们分成较小的列表,每个列表都有一定数量的条目.我需要能够更改不同任务的列表长度.问题在于,每个对象只能在一个列表中出现一次.

I have a long list of objects that I need to divide into smaller lists, each with a certain number of entries. I need to be able to change the length of the lists for different tasks. The catch is that each object can only appear once in a single list.

# Create some example data... 
# Make a list of objects.
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')

# Generate a longer list, with a random sequence and number of repetitions for each entry.
set.seed(123)

LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))

print(LONG.LIST)

Name
1         Cup
2    Distance
3        Roof
4      Pencil
5       Lunch
6       Toast
7       Watch
8      Bottle
9         Car
10       Roof
11      Lunch
12    Forever
13     Cheese
14    Oranges
15      Ocean
16  Chocolate
17      Socks
18     Leaves
19    Oranges
20   Distance
21      Green
22      Paper
23        Red
24      Paper
25      Trees
26  Chocolate
27     Bottle
28        Dog
29       Wind
30     Parrot
etc....

作为参数,假设我想创建一系列20个项目的列表.使用上面生成的示例,'Distance'出现在位置"2"和位置"20"上,'Lunch'出现在"5"和"11"上,并且'Oranges'出现在"14"和"19"上,因此是第一个列表如果没有重复项,则需要扩展为包括'Green''Paper''Red'.第二个列表将以'Paper'在位置24开头.但是,我不想将长度限制为20,有时我可能希望将其设置为10或25.

For argument, suppose I wanted to create a series of 20-item lists. Using the example generated above, 'Distance' appears at both position '2' and position '20', 'Lunch' at both '5' and '11, and 'Oranges' at '14' and 19', so the first list without duplicates would need to extend to include 'Green', 'Paper' and 'Red'. The second list would then begin with 'Paper' at position 24. However I don't want to be restricted to a length of 20, sometimes I might want to make it 10 or 25.

在下面加入来自@LAP的评论,这有助于描述我的问题; 遍历向量,直到找到20个唯一的项目,将它们放在一起,丢弃重复项,然后继续在向量上移动,直到找到接下来的20个唯一的项目,依此类推,直到向量的结尾,最后用NA.

Incorporating comments from @LAP below, which help to describe my problem; "Go through your vector until you found 20 unique items, put them together, discard the duplicates, then move on over your vector until you found the next 20 unique items, and so on until the end of your vector, filling the last part with NA.

单独的列表本身仅需要唯一.两个或多个列表之间可能存在重复."

"The separate lists only need to be unique in and of themselves. There may be duplicates between two or more lists."

最后一个列表可能不完整,因此最好用'NA'填充它.理想情况下,每个列表中的条目应按字母顺序排列.

The last list is likely to be incomplete, so it would be good to pad it with 'NA's. Ideally the entries would be alphabetical within each list.

最有用的输出将是数据框中每列一个列表.

The most useful output would be one-list per column in a dataframe.

推荐答案

好的,这是部分答案,因为我认为我已经满足了您的大部分需求.

Alright, this is a partial answer, as I think I've got most of what you need.

请注意,如果处理大量数据,这可能会很慢.

Note that this may be slow with huge data.

首先,初始化一个列表,然后将其包含任意数量的空向量.在此示例中,我们要从200个项目的向量中创建10组,每组20个.

First you initialize a list with as many empty vectors as you want groups afterwards. In this example we want to create 10 groups of 20 from a vector of 200 items.

首先,我们创建可复制的数据:

First, we create reproducible data:

LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 
          'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 
          'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 
          'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 
          'Chocolate', 'Car', 'Distance')

set.seed(123)

LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)), stringsAsFactors = F)

test <- vector("list", 10)

然后您初始化两个计数器:

Then you initialize two counters:

i <- 1
j <- 1

现在,我们使用while循环,直到i大于要分割的向量中的项数为止(因此在i > 200时停止).在此循环中,我们检查列表中的当前子向量j是否短于20.如果是,则添加一个项目并进行重复数据删除,否则,我们将j加1以跳到下一个子向量.

Now we use a while loop that runs until i is greater than the number of items in our vector to be splitted (so it stops when i > 200). Within this loop we check whether the current subvector j in our list is shorter than 20. If so, we add an item and deduplicate, if not, we add 1 to j to jump into the next subvector.

while(i <= nrow(LONG.LIST)){
  if(length(test[[j]]) < 20){
      test[[j]] <- c(test[[j]], LONG.LIST$Name[i])
      test[[j]] <- unique(test[[j]])
      i <- i+1
  }else{
      j <- j+1
    }
}

这是我们的结果:

> test
[[1]]
 [1] "Lunch"     "Cheese"    "Truck"     "Roof"      "Hope"      "Mint"      "Lemons"    "Pencil"    "Hippo"     "Moon"     
[11] "Car"       "Chocolate" "Trees"     "Distance"  "Dog"       "Bag"       "Paper"     "Peanuts"   "Ocean"     "Wind"     

[[2]]
 [1] "Hippo"     "Wind"      "Mint"      "Plane"     "Trees"     "Truck"     "Lemons"    "Watch"     "Chocolate" "Train"    
[11] "Dog"       "Lunch"     "Green"     "Horse"     "Toast"     "Distance"  "Cloud"     "Hammock"   "Fork"      "Paper"    

[[3]]
 [1] "Watch"     "Hope"      "Paper"     "Socks"     "Bag"       "Plane"     "Bottle"    "Green"     "Lunch"     "Fork"     
[11] "Mint"      "Hippo"     "Chocolate" "Car"       "Trees"     "Toast"     "Forever"   "Red"       "Wind"      "Ocean"    

[[4]]
 [1] "Car"      "Lunch"    "Toast"    "Lemons"   "Moon"     "Socks"    "Hippo"    "Pencil"   "Blue"     "Fork"     "Paper"   
[12] "Distance" "Cloud"    "Train"    "Wind"     "Watch"    "Bottle"   "Forever"  "Green"    "Bag"     

[[5]]
 [1] "Train"   "Cheese"  "Bottle"  "Fork"    "Paper"   "Green"   "Leaves"  "Blue"    "Toast"   "Parrot"  "Lemons"  "Dog"    
[13] "Hammock" "Ocean"   "Red"     "Peanuts" "Pencil"  "Bag"     "Horse"   "Hope"   

[[6]]
 [1] "Oranges"   "Truck"     "Hippo"     "Trees"     "Parrot"    "Red"       "Hope"      "Cloud"     "Tin"       "Bag"      
[11] "Pencil"    "Cup"       "Dog"       "Leaves"    "Chocolate" "Mint"      "Plane"     "Moon"      "Fork"      "Green"    

[[7]]
 [1] "Tin"       "Mint"      "Book"      "Bag"       "Roof"      "Hope"      "Socks"     "Watch"     "Paper"     "Peanuts"  
[11] "Cup"       "Distance"  "Leaves"    "Bottle"    "Cloud"     "Horse"     "Trees"     "Oranges"   "Chocolate" "Toast"    

[[8]]
[1] "Horse"     "Watch"     "Chocolate" "Tin"       "Red"       "Train"    

[[9]]
NULL

[[10]]
NULL

现在,我们只需要用NA填充最后的向量.这样做可能有所不同,但是可以完成工作:

Now we only need to fill the last vectors with NA. This can probably done differently, but it gets the job done:

for(i in 1:length(test)){
  if(length(test[[i]]) < 20){
    test[[i]] <- c(test[[i]], rep(NA, 20 - length(test[[i]])))
  }
}

这篇关于将长列表分成R中指定长度的短列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆