为R中的每个观察组创建新变量 [英] Making new variables for every group of observation in R
问题描述
我的数据框中有11个变量。第一个是观察的唯一标识符(平面)。第二个是从1到21的数字,表示给定飞机的飞行。其余变量是时间,速度,距离等。
I have 11 variables in my dataframe. The first is unique identifier of observation (a plane). The second one is a number from 1 to 21 representing flight of a given plane. The rest of the variables are time, velocity, distance, etc.
我要做的是为每个航班组(数量)创建新变量,例如 time_1
, time_2
,..., velocity_1
, velocity_2
等,因此减少了观察次数(重复的观察次数)。
What I want to do is make new variables for every group (number) of flight e.g. time_1
, time_2
,..., velocity_1
, velocity_2
, etc. and consequently, reduce the number of observations (the repeating ones).
我并不是真的有想法如何开始。我在考虑一个类似mutate的函数:
I don't really have idea how to start. I was thinking about a mutate function like:
mutate(df, time_1 = ifelse(n_flight == 1, time, NA))
但这可能需要大量输入,并且可能会出现新问题。
But that would be a lot of typing and a new problem may appear, perhaps.
推荐答案
基本上,您希望为每个变量将长数据转换为宽数据。在这种情况下,您可以在 tidyr :: spread
上使用 lapply
。假设数据如下所示:
Basically, you want to convert long to wide data for each variable. You can lapply
over these with tidyr::spread
in that case. Suppose the data looks like the following:
library(dplyr)
library(tidyr)
df <- data.frame(
ID = c(rep("A", 3), rep("B", 3)),
n_flight = rep(seq(3), 2),
time = seq(19, 24),
velocity = rev(seq(65, 60))
)
然后,只要您摆脱多余的ID变量,以下内容就会产生您感兴趣的结果。
Then the following will generate your outcome of interest, as long as you get rid of the extra ID variables.
lapply(
setdiff(names(df), c("ID", "n_flight")), function(x) {
df %>%
select(ID, n_flight, !!x) %>%
tidyr::spread(., key = "n_flight", value = x) %>%
setNames(paste(x, names(.), sep = "_"))
}
) %>%
bind_cols()
让我知道这是否不是您想要的。
Let me know if this wasn't what you were going for.
这篇关于为R中的每个观察组创建新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!