用于检测大写单词的 Stringr 模式 [英] Stringr pattern to detect capitalized words

查看：24 发布时间：2021/8/31 18:47:03 r stringr

本文介绍了用于检测大写单词的 Stringr 模式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试编写一个函数来检测全部大写的大写单词

I am trying to write a function to detect capitalized words that are all capitalised

目前，代码:

df <- data.frame(title = character(), id = numeric())%>%
        add_row(title= "THIS is an EXAMPLE where I DONT get the output i WAS hoping for", id = 6)

df <- df %>%
        mutate(sec_code_1 = unlist(str_extract_all(title," [A-Z]{3,5} ")[[1]][1]) 
               , sec_code_2 = unlist(str_extract_all(title," [A-Z]{3,5} ")[[1]][2]) 
               , sec_code_3 = unlist(str_extract_all(title," [A-Z]{3,5} ")[[1]][3]))
df

输出在哪里:

<头>

标题	id	sec_code_1	sec_code_2	sec_code_3
这是一个例子，我没有得到我希望的输出	6	不要	WAS

第一个 3-5 个字母大写的单词是THIS"，第二个应该跳过示例(>5)并且是DONT"，第三个示例应该是WAS".即:

The first 3-5 letter capitalized word is "THIS", second should skip example (>5) and be "DONT", third example should be "WAS". ie:

<头>

标题	id	sec_code_1	sec_code_2	sec_code_3
这是一个例子，我没有得到我希望的输出	6	这个	不要	想要

有谁知道我哪里出错了?特别是我如何表示空格或字符串开头"?或空格或字符串结尾"逻辑上使用 stringr.

does anyone know where Im going wrong? specifically how I can denote "space or beginning of string" or "space or end of string" logically using stringr.

推荐答案

如果您使用正则表达式运行代码，您会发现 'THIS' 根本不包含在输出中.

If you run the code with your regex you'll realise 'THIS' is not included in the output at all.

str_extract_all(df$title," [A-Z]{3,5} ")[[1]]
#[1] " DONT " " WAS "

这是因为您正在提取带有前导和后置空格的单词.'THIS' 没有滞后空格，因为它是句子的开头，因此不满足正则表达式模式.您可以改用字边界 (\\b).

This is because you are extracting words with leading and lagging whitespace. 'THIS' does not have lagging whitespace because it is start of the sentence, hence it does not satisfy the regex pattern. You can use word boundaries (\\b) instead.

str_extract_all(df$title,"\\b[A-Z]{3,5}\\b")[[1]]
#[1] "THIS" "DONT" "WAS"

如果您在其中使用上述模式，您的代码将起作用.

Your code would work if you use the above pattern in it.

或者你也可以使用:

library(tidyverse)

df %>%
  mutate(code = str_extract_all(title,"\\b[A-Z]{3,5}\\b")) %>%
  unnest_wider(code) %>%
  rename_with(~paste0('sec_code_', seq_along(.)), starts_with('..'))

# title                                     id sec_code_1 sec_code_2 sec_code_3
#  <chr>                                  <dbl> <chr>      <chr>      <chr>     
#1 THIS is an EXAMPLE where I DONT get t…     6 THIS       DONT       WAS

这篇关于用于检测大写单词的 Stringr 模式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用于检测大写单词的 Stringr 模式 [英] Stringr pattern to detect capitalized words

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用于检测大写单词的 Stringr 模式 [英] Stringr pattern to detect capitalized words

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭