R:在一定条件下提取向量中的最大值 [英] R: extract maximum value in vector under certain conditions
问题描述
我正在尝试研究一个大型数据集,该数据集表示一个人在公司的职业历史.我想查看一个人担任 Manager
的最长年限,条件是此人在成为 Boss<之前属于
Sales
类别/code>(不管这是多少年前).数据如下所示:job2
是一个虚拟变量,表示此人是否为 Manager
,cumu_job2
表示此人工作的累计年数Manager
职位(仅考虑顺序累积).
I'm trying to look into a large data set that denotes a person's career history in a firm. I want to see the maximum number of years a person worked as a Manager
, under the condition that this person was in a Sales
category prior to becoming a Boss
(regardless of how many years prior this was). The data looks like the following: job2
is a dummy variable indicating whether the person was a Manager
, cumu_job2
denotes the cumulative years a person was in a Manager
position (only sequential cumulation is considered).
id name year job job2 cumu_job2
1 Jane 1980 Worker 0 0
1 Jane 1981 Manager 1 1
1 Jane 1982 Sales 0 0
1 Jane 1983 Sales 0 0
1 Jane 1984 Manager 1 1
1 Jane 1985 Manager 1 2
1 Jane 1986 Boss 0 0
2 Bob 1985 Worker 0 0
2 Bob 1986 Sales 0 0
2 Bob 1987 Manager 1 1
2 Bob 1988 Manager 1 2
2 Bob 1989 Boss 0 0
通过提取一个人工作的最长年限,在此人有销售
工作历史的条件下,我希望数据有另一列表示此信息:
By extracting the maximum years a person worked, under the condition that the person had history of working in Sales
I would want the data to have another column that denotes this information:
id name year job job2 cumu_job2 cumu_max
1 Jane 1983 Sales 0 0
1 Jane 1986 Boss 0 2
2 Bob 1986 Sales 0 0
2 Bob 1989 Boss 0 2
所以我认为这需要两个步骤 - 我首先只需要提取当人员从 Sales
移动到 Boss
时的案例,然后存储每个人的最大值在基于 cumu_job2
的新向量 cumu_max
中.
So I believe this requires two steps - I first need to only extract the case when person move from Sales
to Boss
, and then store the maximum value for each person in new vector cumu_max
based on cumu_job2
.
这是一个复杂的过程,因此非常感谢您的任何建议......!
This is a complex process, so any suggestions would be very much appreciated...!
我已经考虑过为什么下面使用 dplyr
的答案不起作用,这是我的想法 - 该示例表明所有人只有在成为经理后才成为老板,但我也有数据点Kevin
的样子:
I have considered why the answer below using dplyr
does not work, and here is what I think - the example showed that all people became boss only after becoming Manager, but I also have data points that looks like for Kevin
:
id name year job job2 cumu_job2
1 Jane 1980 Worker 0 0
1 Jane 1981 Manager 1 1
1 Jane 1982 Sales 0 0
1 Jane 1983 Sales 0 0
1 Jane 1984 Manager 1 1
1 Jane 1985 Manager 1 2
1 Jane 1986 Boss 0 0
2 Bob 1985 Worker 0 0
2 Bob 1986 Sales 0 0
2 Bob 1987 Manager 1 1
2 Bob 1988 Manager 1 2
2 Bob 1989 Boss 0 0
3 Kevin 1991 Manager 1 1
3 Kevin 1992 Manager 1 2
3 Kevin 1993 Sales 0 0
4 Kevin 1994 Boss 0 0
所以最后,我想要
id name year job cumu_job2 cumu_max
1 Jane 1983 Sales 0 0
1 Jane 1986 Boss 0 2
2 Bob 1986 Sales 0 0
2 Bob 1989 Boss 0 2
3 Kevin 1993 Sales 0 2
3 Kevin 1994 Boss 0 2
dplyr 解决方案只吐出从 Sales - Manager - Boss 去的那些,没有考虑到 Manager - Sales - Boss 的可能性(在我的数据集中观察得更多).
The dplyr solution only spits out the ones who went from Sales - Manager - Boss without taking into account the possibility of Manager - Sales - Boss (which is more observed in my data set).
推荐答案
这可能无法涵盖您实际数据中的所有情况,但(大部分)可以满足您的需求.请注意,我添加了根据您的条件应该排除的吉尔.
This may not cover all cases in your actual data but does (mostly) what you are looking for. Note that I added Jill who should be excluded according to your conditions.
require(dplyr)
dat <- read.table(header = TRUE, text = "id name year job job2 cumu_job2
1 Jane 1980 Worker 0 0
1 Jane 1981 Manager 1 1
1 Jane 1982 Sales 0 0
1 Jane 1983 Sales 0 0
1 Jane 1984 Manager 1 1
1 Jane 1985 Manager 1 2
1 Jane 1986 Boss 0 0
2 Bob 1985 Worker 0 0
2 Bob 1986 Sales 0 0
2 Bob 1987 Manager 1 1
2 Bob 1988 Manager 1 2
2 Bob 1989 Boss 0 0
3 Jill 1989 Worker 0 0
3 Jill 1990 Boss 0 0")
dat %.%
group_by(id) %.%
mutate(
all_jobs = sum(unique(job) %in% c("Sales","Manager","Boss")),
cumu_max = max(cumu_job2)
) %.%
filter(all_jobs == 3, job %in% c("Sales","Boss"))
Source: local data frame [5 x 8]
Groups: id
id name year job job2 cumu_job2 all_jobs cumu_max
1 1 Jane 1982 Sales 0 0 3 2
2 1 Jane 1983 Sales 0 0 3 2
3 1 Jane 1986 Boss 0 0 3 2
4 2 Bob 1986 Sales 0 0 3 2
5 2 Bob 1989 Boss 0 0 3 2
这篇关于R:在一定条件下提取向量中的最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!