对列表元素的子集使用lapply,并返回与R中原始元素长度相同的列表 [英] Use lapply on a subset of list elements and return list of same length as original in R
问题描述
我想使用lapply将正则表达式操作应用于列表元素(字符串)的子集,并返回与原始元素长度相同的列表.列表元素是长字符串(从阅读长文本文件并将段落折叠为单个字符串中得出). regex操作仅对列表元素/字符串的子集有效.我希望未分配的列表元素(字符串)以其原始状态返回.
I want to apply a regex operation to a subset of list elements (which are character strings) using lapply and return a list of same length as the original. The list elements are long strings (derived from reading in long text files and collapsing paragraphs into a single string). The regex operation is valid only for the subset of list elements/strings. I want the non-subsetted list elements (character strings) to be returned in their original state.
正则表达式操作来自stringr
包中的str_extract
,即我想从更长的字符串中提取子字符串.我根据文件名中的正则表达式模式对列表元素进行了子集设置.
The regex operation is str_extract
from the stringr
package, i.e. I want to extract a substring from a longer string. I subset the list elements based on a regex pattern in the filename.
具有简化数据的示例:
library(stringr)
texts <- as.list(c("abcdefghijkl", "mnopqrstuvwxyz", "ghijklmnopqrs", "uvwxyzabcdef"))
filenames <- c("AB1997R.txt", "BG2000S.txt", "MN1999R.txt", "DC1997S.txt")
names(texts) <- filenames
regexp <- "abcdef"
我预先知道我要对哪些字符串应用正则表达式操作,因此我想对这些字符串进行子集化.也就是说,我不想对列表中的所有元素运行正则表达式,因为这样做将返回一些无效的结果(在此简化示例中并不明显).
I know in advance to which strings I want to apply the regex operation, and hence I want to subset these strings. That is, I don't want to run the regex over all elements in the list, as doing so will return some invalid results (which is not apparent in this simplified example).
我已经做了一些天真的尝试,例如:
I've made a few naive efforts, e.g.:
x <- lapply(texts[str_detect(names(texts), "1997")], str_extract, regexp)
> x
$AB1997R.txt
[1] "abcdef"
$DC1997S.txt
[1] "abcdef"
返回一个缩减长度的列表,其中仅包含找到的子字符串. 但是我想要得到的结果是:
which returns a reduced-length list containing just the substrings found. But the results I want to get are:
> x
$AB1997R.txt
[1] "abcdef"
$BG2000S.txt
[1] "mnopqrstuvwxyz"
$MN1999R.txt
[1] "ghijklmnopqrs"
$DC1997S.txt
[1] "abcdef"
其中不包含正则表达式模式的字符串以其原始状态返回.
where the strings not containing the regex pattern are returned in their original state.
我已经向自己介绍了stringr
,lapply
和llply
(在plyr
程序包中),但是许多操作都是以数据框为例进行说明的,而不是列表,并且不涉及对字符的正则表达式操作字符串.我可以使用for循环来实现我的目标,但是正如我通常所建议的那样,我试图摆脱这种情况,并更好地使用函数的apply-class.
I have informed myself about stringr
, lapply
and llply
(in the plyr
package), but many operations are illustrated using dataframes as examples, not lists, and don't involve regex operations on character strings. I can achieve my goal using a for loop, but I'm trying to get away from that, as is generally advised, and get better at using the apply-class of functions.
推荐答案
您可以使用子集运算符[<-
:
You can use the subset operator [<-
:
x <- texts
is1997 <- str_detect(names(texts), "1997")
x[is1997] <- lapply(texts[is1997], str_extract, regexp)
x
# $AB1997R.txt
# [1] "abcdef"
#
# $BG2000S.txt
# [1] "mnopqrstuvwxyz"
#
# $MN1999R.txt
# [1] "ghijklmnopqrs"
#
# $DC1997S.txt
# [1] "abcdef"
#
这篇关于对列表元素的子集使用lapply,并返回与R中原始元素长度相同的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!