R - 使用regex和ifelse条件从字符串中分离文本时出错 [英] R - error in separating text from a string using regex and ifelse condition

查看:231
本文介绍了R - 使用regex和ifelse条件从字符串中分离文本时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要做的是从字符串中剥离文本,其中有一个:。

What I want to do is to strip text from a string where ever there is an ":".

假设我的文字包含:

 text$Text[[3]] = "There is a horror movie running in the iNox theater. : Can we go?"

我想创建一个数据帧,以便:

And what I want to create a dataframe such that:

  Col1                                                    Col2
  There is a horror movie running in the iNox theater.    Can we go?

我正在尝试使用以下内容:

I am trying to use the following :

 df = data.frame(Text = strsplit(text$Text[[3]], 
                 ifelse(":", ":", text$Text[[3]]))[[1]], stringsAsFactors = F)

dat3 $ Text [[3]] 因为文本在行号中。 3文本$ Text。

dat3$Text[[3]] because the text is in row no. 3 of the text$Text.

但上面的 ifelse()逻辑不起作用。在这里,我试图使用ifelse条件,如果文本中有:,请使用:否则使用完整的文本。所以这意味着如果没有:那么结果看起来像是:

But the above ifelse() logic did not work. Here I was trying to use ifelse condition such that if there is a ":" in the text, use ":" otherwise use the complete text as it is. So it means in case if there is no ":" then the result would look like something:

 text$Text[[3]] = "Hi Mom, You there. Can I go to Jimmy's house?"

 Col1                                                 Col2
 Hi Mom, You there. Can I go to Jimmy's house?         NA

如何正确完成?

请注意有一个问题:


  • 如果文本中有两个:怎么办?

  • 我想只考虑前两行中的:而不是文本的其余部分?

推荐答案

你真的不需要if else语句。正则表达式是为处理这样​​的条件而构建的。

You don't really need an if else statement for this. Regex is built to handle conditions like this.

对于第一种只有一个符号的数据 - 在这个例子中是冒号(:) - 我们可以使用这个:

For the first case of data with just one symbol – a colon (":") in this example – we can use this:

x <- "There is a horror movie running in the iNox theater. : Can we go?"

data.frame(Col1=gsub("(.*)+\\s[:]\\s+(.*)","\\1",x), 
           Col2=gsub("(.*)+\\s[:]\\s+(.*)","\\2",x))

输出:

                                                  Col1            Col2
1 There is a horror movie running in the iNox theater.      Can we go?

现在假设您的字符串中有多个符号,并且您希望能够保留信息在第一列中的第一个符号之前,以及第二列中第一个符号之后的信息。为此,请尝试使用?正则表达式符号,像这样:

Now let's say you have more than one symbol in your string and you want to be able to keep information before the first symbol in the first column, and information after the first symbol in the second column. To do this, try using the "?" regex symbol, like this:

x <- "There is a horror movie running in the iNox theater. : Can we go? : Please?"

data.frame(Col1=gsub("\\s\\:.*$","\\1",x), 
           Col2=gsub("^[^:]+(?:).\\s","\\1",x))

输出:

                                                  Col1                      Col2
1 There is a horror movie running in the iNox theater.      Can we go? : Please?

有关在R中使用正则表达式符号的详细信息,这是一个有用的参考

For more information on using regex symbols in R, this is a helpful reference.

这篇关于R - 使用regex和ifelse条件从字符串中分离文本时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆