从字符串匹配中总结 [英] summarize from string matches
问题描述
我有这个 df 列:
df <- data.frame(Strings = c("ñlas onepojasd", "onenañdsl", "ñelrtwofkld", "asdthreeasp", "asdfetwoasd", "fouroqwke","okasdtwo", "acmofour", "porefour", "okstwo"))
> df
Strings
1 ñlas onepojasd
2 onenañdsl
3 ñelrtwofkld
4 asdthreeasp
5 asdfetwoasd
6 fouroqwke
7 okasdtwo
8 acmofour
9 porefour
10 okstwo
我知道 df$Strings
中的每个值都将与单词 one、two、three 或 Four
匹配.而且我也知道它只会与其中一个词匹配.所以要匹配它们:
I know that each value from df$Strings
will match with the words one, two, three or four
. And I also know that it will match with just ONE of those words. So to match them:
str_detect(df$Strings,"one")
str_detect(df$Strings,"two")
str_detect(df$Strings,"three")
str_detect(df$Strings,"four")
但是,我被困在这里,因为我正在尝试制作这张桌子:
However, I'm stucked here, as I'm trying to do this table:
Homes Quantity Percent
One 2 0.3
Two 4 0.4
Three 1 0.1
Four 3 0.3
Total 10 1
推荐答案
使用 tidyverse
和 janitor
你可以做到:
With tidyverse
and janitor
you can do:
df %>%
mutate(Homes = str_extract(Strings, "one|two|three|four"),
n = n()) %>%
group_by(Homes) %>%
summarise(Quantity = length(Homes),
Percent = first(length(Homes)/n)) %>%
adorn_totals("row")
Homes Quantity Percent
four 3 0.3
one 2 0.2
three 1 0.1
two 4 0.4
Total 10 1.0
或者只使用tidyverse
:
df %>%
mutate(Homes = str_extract(Strings, "one|two|three|four"),
n = n()) %>%
group_by(Homes) %>%
summarise(Quantity = length(Homes),
Percent = first(length(Homes)/n)) %>%
rbind(., data.frame(Homes = "Total", Quantity = sum(.$Quantity),
Percent = sum(.$Percent)))
在这两种情况下,代码首先提取匹配模式并计算案例数.其次,它按匹配的单词分组.第三,它计算每个单词的案例数以及给定单词在所有单词中的比例.最后,它添加了一个总计"行.
In both cases the code, first, extracts the matching pattern and count the number of cases. Second, it groups by the matched words. Third, it computes the number of cases per word and the proportion of the given word from all words. Finally, it adds a "Total" row.
这篇关于从字符串匹配中总结的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!