从r中给定的字符串中提取日期 [英] Extract date from given string in r

查看:209
本文介绍了从r中给定的字符串中提取日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

string<-c("Posted 69 months ago (7/4/2011)")
library(gsubfn)
strapplyc(string, "(.*)", simplify = TRUE)

我应用了上述函数,但没有任何反应。

I apply above function but nothing happens.

在此我只想提取日期部分,即 7/4/2011

In this I want to extract only date part i.e 7/4/2011.

推荐答案

第一个显示如何解决问题中的代码以提供所需的答案。接下来的两个解决方案相同,除了它们使用不同的正则表达式。第四个解决方案显示了如何使用 gsub 来实现。第五个将 gsub 分成两个 sub 调用,第六个使用 read.table

The first one shows how to fix the code in the question to give the desired answer. The next 2 solutions are the same except they use different regular expressions. The fourth solution shows how to do it with gsub. The fifth breaks the gsub into two sub calls and the sixth uses read.table.

1)转义括号问题在于,(和)在正则表达式中具有特殊含义,因此如果要匹配它们,则必须将其转义从字面上看。通过使用下面的操作,使用 [(] (或将它们写为 \\( )内括号定义了捕获组,因为我们不希望捕获组本身包括文字括号:

1) Escape parens The problem is that ( and ) have special meaning in regular expressions so you must escape them if you want to match them literally. By using "[(]" as we do below (or writing them as "\\(" ) they are matched literally. The inner parentheses define the capture group as we don't want that group to include the literal parentheses themselves:

strapplyc(string, "[(](.*)[)]", simplify = TRUE)
## [1] "7/4/2011"

2)匹配内容另一种方法是匹配数据本身,而不是周围的括号。在这里 \\d + 匹配一个或多个数字:

2) Match content Another way to do it is to match the data itself rather than the surrounding parentheses. Here "\\d+" matches one or more digits:

strapplyc(string, "\\d+/\\d+/\\d+", simplify = TRUE)
## [1] "7/4/2011"

如果您想更具体一些,可以指定位数,但是如果数据看起来与问题中的相似,这里似乎没有必要。

You could specify the number of digits if you want to be even more specific but it seems unnecessary here if the data looks similar to that in the question.

3)匹配8个或更多的数字和斜杠假定没有其他由8个或更多的字符组成的序列只需在字符串的其余部分中使用斜杠和数字,

3) Match 8 or more digits and slashes Given that there are no other sequences of 8 or more characters consisting only of slashes and digits in the rest of the string we could just pick out that:

strapplyc(string, "[0-9/]{8,}", simplify = TRUE)
## [1] "7/4/2011"

4)删除文本之前和之后另一种方法是像这样删除所有(和之后)的文本:

4) Remove text before and after Another way of doing it is to remove everything up to the ( and after the ) like this:

gsub(".*[(]|[)].*", "", string)
## [1] "7/4/2011"

5)sub 与(4)相同除了它将 gsub 分为两个 sub 调用,一个调用删除了所有(直到另一个)。因此,正则表达式稍微简单一些。

5) sub This is the same as (4) except it breaks the gsub into two sub invocations, one removing everything up to ( and the other removing ) onwards. The regular expressions are therefore slightly simpler.

sub(".*\\(", "", sub("\\).*", "", string))

6)已读.table 该解决方案根本不使用任何正则表达式。它在 read.table 中定义 sep comment.char 因此 read.table 结果的第二列是必需的日期。

6) read.table This solution uses no regular expressions at all. It defines sep and comment.char in read.table so that the second column of the result of read.table is the required date or dates.

read.table(text = string, sep = "(", comment.char = ")", as.is = TRUE)$V2
## [1] "7/4/2011"

注意:请注意,您不需要 c 定义 string

Note: Note that you don't need the c in defining string

string <- c("Posted 69 months ago (7/4/2011)")
string2 <- "Posted 69 months ago (7/4/2011)"
identical(string, string2)
## [1] TRUE

这篇关于从r中给定的字符串中提取日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆