查找值是否在其他列的范围内 [英] Finding if a value is within the range of other columns
问题描述
我有一个数据框 df
看起来像这样:
I have a dataframe df
which looks like this:
df <- read.table(text =
"ID Q1_PM Q1_TP Q1_overall Q2_PM Q2_LS Q2_overall
1 1 2 3 1 2 2
2 0 NA NA 2 1 1
3 2 1 1 3 4 0
4 1 0 2 4 0 2
5 NA 1 NA 0 NA 0
6 2 0 1 1 NA NA"
, header = TRUE)
期望输出:
进一步解释一下,我想要的输出如下:
Desired Output:
To explain a little further, my desired output is as below:
ID Q1_PM Q1_TP Q1_overall Q2_PM Q2_LS Q2_overall Q1_check Q2_check
1 1 2 3 1 2 2 "above" "within"
2 0 NA NA 2 1 1 NA "within"
3 2 1 1 3 4 0 "within" "below"
4 1 0 2 4 0 2 "above" "within"
5 NA 1 NA 0 NA 0 NA "within"
6 2 0 1 1 NA NA "within" NA
说明:
示例 1:
根据Q1_PM
和Q1_TP
列中的值,我想看看Q1_overall
列中的值是否in 他们的范围与否?如果不在范围内,值是高于还是低于范围?为了跟踪这一点,我想添加一个额外的列 Q1_check
.
Explanation:
Example 1:
Based on the value in columns Q1_PM
and Q1_TP
, I want to see whether the value in column Q1_overall
is within their range or not? If, not in range, is the value above or below the range? To track this, I want to add an additional column Q1_check
.
同样,根据Q2_PM
和Q2_LS
的值,我想检查Q2_overall
的值是否在 他们的范围与否?如果不在范围内,是高于还是低于范围?同样,为了跟踪这一点,我想添加一个额外的列 Q2_check
Similarly, based on the values of Q2_PM
and Q2_LS
, I want to check if the value of Q2_overall
is within their range or not? If not in range, is it above or below the range? Again, to track this, I want to add an additional column Q2_check
1- 为此,我想添加额外的列 Q1_check
和 Q2_check
,其中第一列用于涉及 Q1
项目的比较第二列是涉及Q2
项的比较.
1- For this, I want to add additional columns Q1_check
and Q2_check
where the first column is for the comparisons that involve Q1
items and the second column is for the comparisons that involve Q2
items.
2- 列可以包含以下值:above
、below
和 within
.
2- The columns could contain the following values: above
, below
and within
.
3- 当名为 overall
的列有 NAs
时,额外的列也可能有 NAs
.
3- The case when the columns named overall
have NAs
, then the extra columns could also have NAs
.
我找过相关的帖子,例如:将具有取决于另一列的值的列添加到数据框和 通过将数字列与固定值但我遇到了如下所述的错误.
I have looked for related posts such as: Add column with values depending on another column to a dataframe and Create categories by comparing a numeric column with a fixed value but I am running into errors as discussed below.
我能想到的唯一解决方案是:
The only solution, I can think of is, along these lines:
df$Q1_check <- ifelse(data$Q1_overall < data$Q1_PM, 'below',
ifelse(data$Q1_overall > data$Q1_TP, 'above',
ifelse(is.na(data$Q1_overall), NA, 'within')))
但它会导致以下错误:Error in data$Q1_overall : object of type 'closure' is not subsettable
.我不明白可能的问题是什么.
But it results in following error: Error in data$Q1_overall : object of type 'closure' is not subsettable
. I do not understand what the possible issue could be.
或
df %>%
mutate(Regulation = case_when(Q1_overall < Q1_PM ~ 'below',
Q1_overall > Q1_TP ~ 'above',
Q1_PM < Q1_overall < Q1_TP, 'within'))
这也会导致错误 Error: unexpected '<'在:"Q1_overall>Q1_TP ~ '上面', Q1_PM <Q1_整体<"
如果(假设)列是这些,如何扩展解决方案:
How can the solution be extended if (let's say) the columns are these:
"Q1 Comm - 01 Scope Thesis"
"Q1 Comm - 02 Scope Project"
"Q1 Comm - 03 Learn Intern"
"Q1 Comm - 04 Biography"
"Q1 Comm - 05 Exhibit"
"Q1 Comm - 06 Social Act"
"Q1 Comm - 07 Post Project"
"Q1 Comm - 08 Learn Plant"
"Q1 Comm - 09 Study Narrate"
"Q1 Comm - 10 Learn Participate"
"Q1 Comm - 11 Write 1"
"Q1 Comm - 12 Read 2"
"Q1 Comm - Overall Study Plan"
我们如何识别Q1 Comm-Overall Study Plan
列是什么时候:
How can we identify when the column Q1 Comm - Overall Study Plan
is:
1 - Below
所有其他列的 min()
,或
1 - Below
the min()
of all the other columns, or
2 - Above
所有其他列的 max()
,或
2 - Above
the max()
of all the other columns, or
3 - 在
所有其他列的范围内
3 - Within
the range of all the other columns
对于更新的字段,我还包括 dput(df)
For the updated fields, I am also including the dput(df)
dput(df)
structure(list(ï..ID = c(10L, 31L, 225L, 243L), Q1.Comm...01.Scope.Thesis = c(NA,
2L, 0L, NA), Q1.Comm...02.Scope.Project = c(NA, NA, NA, 2L),
Q1.Comm...03.Learn.Intern = c(4L, NA, NA, NA), Q1.Comm...04.Biography = c(NA,
NA, NA, 1L), Q1.Comm...05.Exhibit = c(4L, 2L, NA, NA), Q1.Comm...06.Social.Act = c(NA,
NA, NA, 3L), Q1.Comm...07.Post.Project = c(NA, NA, 3L, NA
), Q1.Comm...08.Learn.Plant = c(NA, NA, NA, 4L), Q1.Comm...09.Study.Narrate = c(NA,
NA, 0L, NA), Q1.Comm...10.Learn.Participate = c(4L, NA, NA,
NA), Q1.Comm...11.Write.1 = c(NA, 2L, NA, NA), Q1.Comm...12.Read.2 = c(NA,
NA, 1L, NA), Q1.Comm...Overall.Study.Plan = c(4L, 1L, 2L,
NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA), X.2 = c(NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))
关于如何实现这一目标的任何建议将不胜感激.谢谢!
Any advice on how to achieve this would be greatly appreciated. Thank you!
推荐答案
comparison <- function(dt, group_cols, new_col, compare_col){
dt[,
c("min", "max") := transpose(pmap(.SD, range, na.rm = TRUE)), .SDcols = group_cols
][,(new_col) := fcase(
is.na(get(compare_col)), NA_character_,
get(compare_col) < min, "below",
get(compare_col) > max, "above",
default = "within"
)
][]
}
group_cols <- names(df) %>%
str_subset("^Q[0-9]+") %>%
str_subset("overall", negate = TRUE) %>%
split(str_extract(., "^Q[0-9]+"))
new_cols <- names(group_cols) %>% str_c("_check")
compare_cols <- names(group_cols) %>% str_c("_overall")
setDT(df)
pwalk(list(group_cols, new_cols, compare_cols), ~comparison(df, ...))
df[, c("min", "max") := NULL]
这篇关于查找值是否在其他列的范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!