对于循环?通过缺少因子级别的值在数据框中包括行 [英] For loops? Including rows in a dataframe by the missing values of factor levels

查看:100
本文介绍了对于循环?通过缺少因子级别的值在数据框中包括行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

早上好

我有一个渔业数据的数据集,其中有几个变量如下所示:

I have a dataset of fisheries data with several variables that look like this:

ID              Day      Month   Year  Depth  Haul number  Count LengthClass     
H111200840       11        1     2008   -80       40        4      10-20
H111200840       11        1     2008   -80       40        15     20-30
H29320105        29        3     2010   -40       5         3      50-60
H29320105        29        3     2010   -40       5         8      60-70

列ID是粘贴列的日期,月份,年份和运行编号的唯一ID。正如您所看到的相同ID我有不同长度类的数据。在每个运输中,捕获不同长度的鱼。

The column ID is a unique ID made by paste the columns day,month,Year and Haul.number. As you can see for the same ID I have data of different Length Class. En each Haul, fish from different lengths are captured.

但是,LengthClass是一个因子变量,具有以下级别:10-20,20-30,30-40, 40-50和一个未在Haul中捕获的特定长度类的鱼没有记录在数据集中。

However, LengthClass is a factor variable with the following levels: 10-20, 20-30, 30-40, 40-50 and fish of a certain length class that is not captured in a Haul is not recorded in the dataset.

我需要包含在上面的data.frame示例中每个ID的新行,缺少LengthClass的级别。

I need to include in the above data.frame example new rows for each ID with the missing levels of LengthClass.

缺少的Length类的Count应为0,但其余的变量必须相同。

The missing Length classes should have a Count of 0 but the rest of the variables have to be the same.

这是我想要的一个例子

 ID              Day      Month   Year  Depth  Haul number  Count LengthClass     
  H111200840       11        1     2008   -80       40        4      10-20
  H111200840       11        1     2008   -80       40        15     20-30
  H111200840       11        1     2008   -80       40        0      30-40
  H111200840       11        1     2008   -80       40        0      40-50
  H111200840       11        1     2008   -80       40        0      50-60
  H29320105        29        3     2010   -40       5         3      40-60
  H29320105        29        3     2010   -40       5         8      50-60
  H29320105        29        3     2010   -40       5         0      10-20
  H29320105        29        3     2010   -40       5         0      20-30
  H29320105        29        3     2010   -40       5         0      30-40

无论如何在R中这样做?我已尝试使用if参数进行循环但没有运气并且也是这篇文章的例子:

Is there anyway to do this in R? I have tried for loops with if arguments but with no luck and also the example of this post:

感谢您提前获得任何建议

Thanks for any advice in advance

推荐答案

您可以使用 tidyr

首先使用 tidyr :: complete 填写 LengthClass 的所有组合,指定 Count 应填写为 0

First use tidyr::complete to fill in all the combinations of LengthClass, specifying that Count should be filled in as 0.

然后对数据进行排序并使用 tidyr :: fill 为其他列填写相同的值( ID 除外, LengthClass Count )。

Then sort the data and use tidyr::fill to fill in the same values for the other columns (other than ID, LengthClass, and Count).

library(tidyr)
library(dplyr)


df <- readr::read_csv(
'ID,Day,Month,Year,Depth,Haul_number,Count,LengthClass
H111200840,11,1,2008,-80,40,4,10-20
H111200840,11,1,2008,-80,40,15,20-30
H29320105,29,3,2010,-40,5,3,50-60
H29320105,29,3,2010,-40,5,8,60-70') %>% 
  mutate(LengthClass = as.factor(LengthClass))

df
#> # A tibble: 4 x 8
#>           ID   Day Month  Year Depth Haul_number Count LengthClass
#>        <chr> <int> <int> <int> <int>       <int> <int>      <fctr>
#> 1 H111200840    11     1  2008   -80          40     4       10-20
#> 2 H111200840    11     1  2008   -80          40    15       20-30
#> 3  H29320105    29     3  2010   -40           5     3       50-60
#> 4  H29320105    29     3  2010   -40           5     8       60-70



填写额外的行





Fill in the extra rows

df %>% 
  group_by(ID) %>% 
  complete(LengthClass, fill = list(Count = 0)) %>% 
  arrange(ID, Day) %>% 
  fill(-ID, -LengthClass, -Count, .direction = "down") %>% 
  ungroup()

#> # A tibble: 8 x 8
#>           ID LengthClass   Day Month  Year Depth Haul_number Count
#>        <chr>      <fctr> <int> <int> <int> <int>       <int> <dbl>
#> 1 H111200840       10-20    11     1  2008   -80          40     4
#> 2 H111200840       20-30    11     1  2008   -80          40    15
#> 3 H111200840       50-60    11     1  2008   -80          40     0
#> 4 H111200840       60-70    11     1  2008   -80          40     0
#> 5  H29320105       50-60    29     3  2010   -40           5     3
#> 6  H29320105       60-70    29     3  2010   -40           5     8
#> 7  H29320105       10-20    29     3  2010   -40           5     0
#> 8  H29320105       20-30    29     3  2010   -40           5     0

这篇关于对于循环?通过缺少因子级别的值在数据框中包括行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆