对于循环?通过缺少因子级别的值在数据框中包括行 [英] For loops? Including rows in a dataframe by the missing values of factor levels
问题描述
早上好
我有一个渔业数据的数据集,其中有几个变量如下所示:
I have a dataset of fisheries data with several variables that look like this:
ID Day Month Year Depth Haul number Count LengthClass
H111200840 11 1 2008 -80 40 4 10-20
H111200840 11 1 2008 -80 40 15 20-30
H29320105 29 3 2010 -40 5 3 50-60
H29320105 29 3 2010 -40 5 8 60-70
列ID是粘贴
列的日期,月份,年份和运行编号的唯一ID。正如您所看到的相同ID我有不同长度类的数据。在每个运输中,捕获不同长度的鱼。
The column ID is a unique ID made by paste
the columns day,month,Year and Haul.number. As you can see for the same ID I have data of different Length Class. En each Haul, fish from different lengths are captured.
但是,LengthClass是一个因子变量,具有以下级别:10-20,20-30,30-40, 40-50和一个未在Haul中捕获的特定长度类的鱼没有记录在数据集中。
However, LengthClass is a factor variable with the following levels: 10-20, 20-30, 30-40, 40-50 and fish of a certain length class that is not captured in a Haul is not recorded in the dataset.
我需要包含在上面的data.frame示例中每个ID的新行,缺少LengthClass的级别。
I need to include in the above data.frame example new rows for each ID with the missing levels of LengthClass.
缺少的Length类的Count应为0,但其余的变量必须相同。
The missing Length classes should have a Count of 0 but the rest of the variables have to be the same.
这是我想要的一个例子
ID Day Month Year Depth Haul number Count LengthClass
H111200840 11 1 2008 -80 40 4 10-20
H111200840 11 1 2008 -80 40 15 20-30
H111200840 11 1 2008 -80 40 0 30-40
H111200840 11 1 2008 -80 40 0 40-50
H111200840 11 1 2008 -80 40 0 50-60
H29320105 29 3 2010 -40 5 3 40-60
H29320105 29 3 2010 -40 5 8 50-60
H29320105 29 3 2010 -40 5 0 10-20
H29320105 29 3 2010 -40 5 0 20-30
H29320105 29 3 2010 -40 5 0 30-40
无论如何在R中这样做?我已尝试使用if参数进行循环但没有运气并且也是这篇文章的例子:
Is there anyway to do this in R? I have tried for loops with if arguments but with no luck and also the example of this post:
感谢您提前获得任何建议
Thanks for any advice in advance
推荐答案
您可以使用 tidyr
。
首先使用 tidyr :: complete
填写 LengthClass
的所有组合,指定 Count
应填写为 0
。
First use tidyr::complete
to fill in all the combinations of LengthClass
, specifying that Count
should be filled in as 0
.
然后对数据进行排序并使用 tidyr :: fill
为其他列填写相同的值( ID
除外, LengthClass
, Count
)。
Then sort the data and use tidyr::fill
to fill in the same values for the other columns (other than ID
, LengthClass
, and Count
).
library(tidyr)
library(dplyr)
df <- readr::read_csv(
'ID,Day,Month,Year,Depth,Haul_number,Count,LengthClass
H111200840,11,1,2008,-80,40,4,10-20
H111200840,11,1,2008,-80,40,15,20-30
H29320105,29,3,2010,-40,5,3,50-60
H29320105,29,3,2010,-40,5,8,60-70') %>%
mutate(LengthClass = as.factor(LengthClass))
df
#> # A tibble: 4 x 8
#> ID Day Month Year Depth Haul_number Count LengthClass
#> <chr> <int> <int> <int> <int> <int> <int> <fctr>
#> 1 H111200840 11 1 2008 -80 40 4 10-20
#> 2 H111200840 11 1 2008 -80 40 15 20-30
#> 3 H29320105 29 3 2010 -40 5 3 50-60
#> 4 H29320105 29 3 2010 -40 5 8 60-70
填写额外的行
Fill in the extra rows
df %>%
group_by(ID) %>%
complete(LengthClass, fill = list(Count = 0)) %>%
arrange(ID, Day) %>%
fill(-ID, -LengthClass, -Count, .direction = "down") %>%
ungroup()
#> # A tibble: 8 x 8
#> ID LengthClass Day Month Year Depth Haul_number Count
#> <chr> <fctr> <int> <int> <int> <int> <int> <dbl>
#> 1 H111200840 10-20 11 1 2008 -80 40 4
#> 2 H111200840 20-30 11 1 2008 -80 40 15
#> 3 H111200840 50-60 11 1 2008 -80 40 0
#> 4 H111200840 60-70 11 1 2008 -80 40 0
#> 5 H29320105 50-60 29 3 2010 -40 5 3
#> 6 H29320105 60-70 29 3 2010 -40 5 8
#> 7 H29320105 10-20 29 3 2010 -40 5 0
#> 8 H29320105 20-30 29 3 2010 -40 5 0
这篇关于对于循环?通过缺少因子级别的值在数据框中包括行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!