从字符串匹配中总结 [英] summarize from string matches

查看:34
本文介绍了从字符串匹配中总结的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个 df 列:

df <- data.frame(Strings = c("ñlas onepojasd", "onenañdsl", "ñelrtwofkld", "asdthreeasp", "asdfetwoasd", "fouroqwke","okasdtwo", "acmofour", "porefour", "okstwo"))
> df
          Strings
1  ñlas onepojasd
2       onenañdsl
3     ñelrtwofkld
4     asdthreeasp
5     asdfetwoasd
6       fouroqwke
7        okasdtwo
8        acmofour
9        porefour
10         okstwo

我知道 df$Strings 中的每个值都将与单词 one、two、three 或 Four 匹配.而且我也知道它只会与其中一个词匹配.所以要匹配它们:

I know that each value from df$Strings will match with the words one, two, three or four. And I also know that it will match with just ONE of those words. So to match them:

str_detect(df$Strings,"one")
str_detect(df$Strings,"two")
str_detect(df$Strings,"three")
str_detect(df$Strings,"four")

但是,我被困在这里,因为我正在尝试制作这张桌子:

However, I'm stucked here, as I'm trying to do this table:

Homes  Quantity Percent
  One         2     0.3
  Two         4     0.4
Three         1     0.1
 Four         3     0.3
Total        10       1

推荐答案

使用 tidyversejanitor 你可以做到:

With tidyverse and janitor you can do:

df %>%
 mutate(Homes = str_extract(Strings, "one|two|three|four"),
        n = n()) %>%
 group_by(Homes) %>%
 summarise(Quantity = length(Homes),
           Percent = first(length(Homes)/n)) %>%
 adorn_totals("row")

 Homes Quantity Percent
  four        3     0.3
   one        2     0.2
 three        1     0.1
   two        4     0.4
 Total       10     1.0

或者只使用tidyverse:

 df %>%
 mutate(Homes = str_extract(Strings, "one|two|three|four"),
        n = n()) %>%
 group_by(Homes) %>%
 summarise(Quantity = length(Homes),
           Percent = first(length(Homes)/n)) %>%
 rbind(., data.frame(Homes = "Total", Quantity = sum(.$Quantity), 
                     Percent = sum(.$Percent)))

在这两种情况下,代码首先提取匹配模式并计算案例数.其次,它按匹配的单词分组.第三,它计算每个单词的案例数以及给定单词在所有单词中的比例.最后,它添加了一个总计"行.

In both cases the code, first, extracts the matching pattern and count the number of cases. Second, it groups by the matched words. Third, it computes the number of cases per word and the proportion of the given word from all words. Finally, it adds a "Total" row.

这篇关于从字符串匹配中总结的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆