在 R 中使用正则表达式提取某些符号之间的文本 [英] Extract text between certain symbols using Regular Expression in R

查看：43 发布时间：2021/7/6 19:18:57 regex r

本文介绍了在 R 中使用正则表达式提取某些符号之间的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一系列的表达比如:

I have a series of expressions such as:

"<i>the text I need to extract</i></b></a></div>"

我需要提取和符号"之间的文本.也就是说，结果应该是:

I need to extract the text between the  and  "symbols". This is, the result should be:

"the text I need to extract"

目前我在 R 中使用 gsub 手动删除所有不是文本的符号.但是，我想使用正则表达式来完成这项工作.有谁知道提取和之间的正则表达式?

At the moment I am using gsub in R to manually remove all the symbols that are not text. However, I would like to use a regular expression to do the job. Does anyone know a regular expression to extract the between  and ?

谢谢.

推荐答案

如果只有一个 ... 如示例中那样，则匹配所有内容直到和中的所有内容向前并用空字符串替换它们:

If there is only one ... as in the example then match everything up to  and everything from  forward and replace them both with the empty string:

x <- "<i>the text I need to extract</i></b></a></div>"
gsub(".*<i>|</i>.*", "", x)

给予:

[1] "the text I need to extract"

如果同一字符串中可能出现多次，请尝试:

If there could be multiple occurrences in the same string then try:

library(gsubfn)
strapplyc(x, "<i>(.*?)</i>", simplify = c)

在这个例子中给出相同的.

giving the same in this example.

这篇关于在 R 中使用正则表达式提取某些符号之间的文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 R 中使用正则表达式提取某些符号之间的文本 [英] Extract text between certain symbols using Regular Expression in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 R 中使用正则表达式提取某些符号之间的文本 [英] Extract text between certain symbols using Regular Expression in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭