R:在一定条件下提取向量中的最大值 [英] R: extract maximum value in vector under certain conditions

查看:61
本文介绍了R:在一定条件下提取向量中的最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试研究一个大型数据集,该数据集表示一个人在公司的职业历史.我想查看一个人担任 Manager 的最长年限,条件是此人在成为 Boss<之前属于 Sales 类别/code>(不管这是多少年前).数据如下所示:job2 是一个虚拟变量,表示此人是否为 Managercumu_job2 表示此人工作的累计年数Manager 职位(仅考虑顺序累积).

I'm trying to look into a large data set that denotes a person's career history in a firm. I want to see the maximum number of years a person worked as a Manager, under the condition that this person was in a Sales category prior to becoming a Boss (regardless of how many years prior this was). The data looks like the following: job2 is a dummy variable indicating whether the person was a Manager, cumu_job2 denotes the cumulative years a person was in a Manager position (only sequential cumulation is considered).

  id    name    year    job    job2 cumu_job2
    1   Jane    1980    Worker  0   0
    1   Jane    1981    Manager 1   1
    1   Jane    1982    Sales   0   0
    1   Jane    1983    Sales   0   0
    1   Jane    1984    Manager 1   1
    1   Jane    1985    Manager 1   2
    1   Jane    1986    Boss    0   0
    2   Bob     1985    Worker  0   0
    2   Bob     1986    Sales   0   0
    2   Bob     1987    Manager 1   1
    2   Bob     1988    Manager 1   2
    2   Bob     1989    Boss    0   0

通过提取一个人工作的最长年限,在此人有销售工作历史的条件下,我希望数据有另一列表示此信息:

By extracting the maximum years a person worked, under the condition that the person had history of working in Sales I would want the data to have another column that denotes this information:

id  name    year    job    job2    cumu_job2 cumu_max 
        1   Jane    1983    Sales       0       0
        1   Jane    1986    Boss        0       2
        2   Bob     1986    Sales       0       0 
        2   Bob     1989    Boss        0       2

所以我认为这需要两个步骤 - 我首先只需要提取当人员从 Sales 移动到 Boss 时的案例,然后存储每个人的最大值在基于 cumu_job2 的新向量 cumu_max 中.

So I believe this requires two steps - I first need to only extract the case when person move from Sales to Boss, and then store the maximum value for each person in new vector cumu_max based on cumu_job2.

这是一个复杂的过程,因此非常感谢您的任何建议......!

This is a complex process, so any suggestions would be very much appreciated...!

我已经考虑过为什么下面使用 dplyr 的答案不起作用,这是我的想法 - 该示例表明所有人只有在成为经理后才成为老板,但我也有数据点Kevin 的样子:

I have considered why the answer below using dplyr does not work, and here is what I think - the example showed that all people became boss only after becoming Manager, but I also have data points that looks like for Kevin:

id  name    year    job    job2 cumu_job2
        1   Jane    1980    Worker  0   0
        1   Jane    1981    Manager 1   1
        1   Jane    1982    Sales   0   0
        1   Jane    1983    Sales   0   0
        1   Jane    1984    Manager 1   1
        1   Jane    1985    Manager 1   2
        1   Jane    1986    Boss    0   0
        2   Bob     1985    Worker  0   0
        2   Bob     1986    Sales   0   0
        2   Bob     1987    Manager 1   1
        2   Bob     1988    Manager 1   2
        2   Bob     1989    Boss    0   0
        3   Kevin   1991    Manager 1   1
        3   Kevin   1992    Manager 1   2
        3   Kevin   1993    Sales   0   0
        4   Kevin   1994    Boss    0   0

所以最后,我想要

 id name    year    job    cumu_job2 cumu_max 
  1 Jane    1983    Sales       0       0
  1 Jane    1986    Boss        0       2
  2 Bob     1986    Sales       0       0 
  2 Bob     1989    Boss        0       2
  3 Kevin   1993    Sales       0       2
  3 Kevin   1994    Boss        0       2

dplyr 解决方案只吐出从 Sales - Manager - Boss 去的那些,没有考虑到 Manager - Sales - Boss 的可能性(在我的数据集中观察得更多).

The dplyr solution only spits out the ones who went from Sales - Manager - Boss without taking into account the possibility of Manager - Sales - Boss (which is more observed in my data set).

推荐答案

这可能无法涵盖您实际数据中的所有情况,但(大部分)可以满足您的需求.请注意,我添加了根据您的条件应该排除的吉尔.

This may not cover all cases in your actual data but does (mostly) what you are looking for. Note that I added Jill who should be excluded according to your conditions.

require(dplyr)
dat <- read.table(header = TRUE, text = "id    name    year    job    job2 cumu_job2
1   Jane    1980    Worker  0   0
1   Jane    1981    Manager 1   1
1   Jane    1982    Sales   0   0
1   Jane    1983    Sales   0   0
1   Jane    1984    Manager 1   1
1   Jane    1985    Manager 1   2
1   Jane    1986    Boss    0   0
2   Bob     1985    Worker  0   0
2   Bob     1986    Sales   0   0
2   Bob     1987    Manager 1   1
2   Bob     1988    Manager 1   2
2   Bob     1989    Boss    0   0
3   Jill    1989    Worker  0   0
3   Jill    1990    Boss    0   0")

dat %.%
  group_by(id) %.%
  mutate(
    all_jobs = sum(unique(job) %in% c("Sales","Manager","Boss")),
    cumu_max = max(cumu_job2)
  ) %.%
  filter(all_jobs == 3, job %in% c("Sales","Boss"))

Source: local data frame [5 x 8]
Groups: id

  id name year   job job2 cumu_job2 all_jobs cumu_max
1  1 Jane 1982 Sales    0         0        3        2
2  1 Jane 1983 Sales    0         0        3        2
3  1 Jane 1986  Boss    0         0        3        2
4  2  Bob 1986 Sales    0         0        3        2
5  2  Bob 1989  Boss    0         0        3        2

这篇关于R:在一定条件下提取向量中的最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆