为R中的每个观察组创建新变量 [英] Making new variables for every group of observation in R

查看:86
本文介绍了为R中的每个观察组创建新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框中有11个变量。第一个是观察的唯一标识符(平面)。第二个是从1到21的数字,表示给定飞机的飞行。其余变量是时间,速度,距离等。

I have 11 variables in my dataframe. The first is unique identifier of observation (a plane). The second one is a number from 1 to 21 representing flight of a given plane. The rest of the variables are time, velocity, distance, etc.

我要做的是为每个航班组(数量)创建新变量,例如 time_1 time_2 ,..., velocity_1 velocity_2 等,因此减少了观察次数(重复的观察次数)。

What I want to do is make new variables for every group (number) of flight e.g. time_1, time_2,..., velocity_1, velocity_2, etc. and consequently, reduce the number of observations (the repeating ones).

我并不是真的有想法如何开始。我在考虑一个类似mutate的函数:

I don't really have idea how to start. I was thinking about a mutate function like:

mutate(df, time_1 = ifelse(n_flight == 1, time, NA))

但这可能需要大量输入,并且可能会出现新问题。

But that would be a lot of typing and a new problem may appear, perhaps.

推荐答案

基本上,您希望为每个变量将长数据转换为宽数据。在这种情况下,您可以在 tidyr :: spread 上使用 lapply 。假设数据如下所示:

Basically, you want to convert long to wide data for each variable. You can lapply over these with tidyr::spread in that case. Suppose the data looks like the following:

library(dplyr)
library(tidyr)
df <- data.frame(
  ID = c(rep("A", 3), rep("B", 3)), 
  n_flight = rep(seq(3), 2),
  time = seq(19, 24), 
  velocity = rev(seq(65, 60))
)

然后,只要您摆脱多余的ID变量,以下内容就会产生您感兴趣的结果。

Then the following will generate your outcome of interest, as long as you get rid of the extra ID variables.

lapply(
  setdiff(names(df), c("ID", "n_flight")), function(x) {
    df %>% 
      select(ID, n_flight, !!x) %>%
      tidyr::spread(., key = "n_flight", value = x) %>%
      setNames(paste(x, names(.), sep = "_"))
  }
) %>%
  bind_cols()

让我知道这是否不是您想要的。

Let me know if this wasn't what you were going for.

这篇关于为R中的每个观察组创建新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆