使用grep确定字符串的频率 [英] determine frequency of string using grep
问题描述
如果我有一个vector
x <-c(ajjss,acdjfkj,auyjyjjksjj)
和do:
y <-x [grep(jj,x)]
表(y)
我得到:
y
ajjss auyjyjjksjj
1 1
然而,第二个字符串auyjyjjksjj应该计算子字符串jj两次。我怎么能把这个从真/假计算改变到实际计算jj的频率?
另外,如果对于每个字符串,子字符串的频率除以字符串的长度可以计算,这将是很大的。
预先感谢。
x <-c(ajjss,acdjfkj, (x)if(x [[1]]!= - 1)length(x)else 0)
(b)freq< - sapply(gregexpr(jj,x) df< -data.frame(x,freq)
df
#x freq
#1 ajjss 1
#2 acdjfkj 0
#3 auyjyjjksjj 2
对于问题的最后部分,计算频率 / string length ...
... df $ rate < - df $ freq / nchar(as.character (df $ x))
有必要将df $ x转换回字符串,因为数据.frame(x,freq)automati除非指定stringsAsFactors = F,否则将字符串转换为因子。
$ $ $ $ b $ x $ d
#x freq rate
# 1 ajjss 1 0.2000000
#2 acdjfkj 0 0.0000000
#3 auyjyjjksjj 2 0.1818182
if I have a vector
x <- c("ajjss","acdjfkj","auyjyjjksjj")
and do:
y <- x[grep("jj",x)]
table(y)
I get:
y
ajjss auyjyjjksjj
1 1
However the second string "auyjyjjksjj" should count the substring "jj" twice. How can I change this from a true/false computation, to actually counting the frequency of "jj"?
Also if for each string the frequency of the substring divided by the string's length could be calculated that would be great.
Thanks in advance.
I solved this using gregexpr()
x <- c("ajjss","acdjfkj","auyjyjjksjj")
freq <- sapply(gregexpr("jj",x),function(x)if(x[[1]]!=-1) length(x) else 0)
df<-data.frame(x,freq)
df
# x freq
#1 ajjss 1
#2 acdjfkj 0
#3 auyjyjjksjj 2
And for the last part of the question, calculating frequency / string length...
df$rate <- df$freq / nchar(as.character(df$x))
It is necessary to convert df$x back to a character string because data.frame(x,freq) automatically converts strings to factors unless you specify stringsAsFactors=F.
df
# x freq rate
#1 ajjss 1 0.2000000
#2 acdjfkj 0 0.0000000
#3 auyjyjjksjj 2 0.1818182
这篇关于使用grep确定字符串的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!