在R data.frames中提取具有常数的列 [英] Extracting columns with constant numbers in R data.frames
问题描述
在data.frame DATA
中,有些列在称为study.name
的第一列的唯一行中是恒定数.例如,列ESL
和prof
对于Shin.Ellis
的所有行都是恒定,对于Trus.Hsu
的所有行都是恒定,依此类推.包括Shin.Ellis
和Trus.Hsu
,有 8 个唯一的study.name
行.
In data.frame DATA
, I have some columns that are constant numbers across the unique rows of the first column called study.name
. For example, columns ESL
and prof
are constant for all rows of Shin.Ellis
and constant for all rows of Trus.Hsu
and so on. Including Shin.Ellis
and Trus.Hsu
, there are 8 unique study.name
rows.
但是,如何为这样的常数在唯一的study.name
下的所有行仅获取一个数据点(例如,一个用于Shin.Ellis
,一个用于Trus.Hsu
等).变量? (即,总共 8 行)
BUT after my split.default()
call below, how can I obtain only one data-point for all rows under a unique study.name
(e.g., one for Shin.Ellis
, one for Trus.Hsu
etc.) for such constant variables? (i.e., 8 rows overall)
例如,在我的split.default()
之后,所有名为ESL
的变量都只显示一个唯一的study.name
的8
行.
For example, after my split.default()
, all variables named ESL
show only have 8
rows each for a unique study.name
.
仅 ESL
和prof
的我所需的输出在下面进一步显示.
My desired output for ONLY ESL
and prof
is shown further below.
注意:这是玩具数据.我们首先应该找到常量.一个功能性的答案受到高度赞赏.
NOTE: This is toy data. We first should find constant variables. A functional answer is highly appreciated.
DATA <- read.csv("https://raw.githubusercontent.com/izeh/m/master/irr.csv", h = T)[-(2:3)]
DATA <- setNames(DATA, sub("\\.\\d+$", "", names(DATA)))
tbl <- table(names(DATA))
nm2 <- names(which(tbl==max(tbl)))
L <- split.default(DATA[names(DATA) %in% nm2], names(DATA)[names(DATA) %in% nm2])
## FIRST 8 ROWS of `DATA`:
# study.name ESL prof scope type ESL prof scope type
# 1 Shin.Ellis 1 2 1 1 1 2 1 1
# 2 Shin.Ellis 1 2 1 1 1 2 1 1
# 3 Shin.Ellis 1 2 1 2 1 2 1 1
# 4 Shin.Ellis 1 2 1 2 1 2 1 1
# 5 Shin.Ellis 1 2 NA NA 1 2 NA NA
# 6 Shin.Ellis 1 2 NA NA 1 2 NA NA
# 7 Trus.Hsu 2 2 2 1 2 2 1 1
# 8 Trus.Hsu 2 2 NA NA 2 2 NA NA
# . ... . . . . . . . . # `DATA` has 54 rows overall
在split.default()
调用后ESL
和prof
的所需输出:
Desired output for ESL
and prof
after split.default()
call:
# $ESL ## 8 unique rows for 8 unique `study.name`
# ESL ESL.1
# 1 1 1
# 7 2 2
# 9 1 1
# 17 1 1
# 23 1 1
# 35 1 1
# 37 2 2
# 49 2 2
# $prof ## 8 unique rows for 8 unique `study.name`
# prof prof.1
# 1 2 2
# 7 2 2
# 9 3 3
# 17 2 2
# 23 2 2
# 35 2 2
# 37 NA NA
# 49 2 2
推荐答案
我们首先可以找到常量列,然后使用lapply
对其进行循环,并在每个study.name
中仅选择它们的第一行.
We can first find constant columns and then use lapply
to loop over them and select only their first row in each study.name
.
is_constant <- function(x) length(unique(x)) == 1L
cols <- names(Filter(all, aggregate(.~study.name, DATA, is_constant)[-1]))
L[cols] <- lapply(L[cols], function(x)
x[ave(x[[1]], DATA$study.name, FUN = seq_along) == 1, ])
L
#$ESL
# ESL ESL.1
#1 1 1
#7 2 2
#9 1 1
#17 1 1
#23 1 1
#35 1 1
#37 2 2
#49 2 2
#$prof
# prof prof.1
#1 2 2
#7 2 2
#9 3 3
#17 2 2
#23 2 2
#35 2 2
#37 NA NA
#49 2 2
#.....
这篇关于在R data.frames中提取具有常数的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!