R删除字符向量中的重复元素,而不是重复的行 [英] R remove duplicate elements in character vector, not duplicate rows
问题描述
我有一个数据框(日期)与一些文件ID和日期存储在一个字符向量:文件日期
1 12345 c(06/01/2000,08/09/2002)
2 23456 c(07/01/2000,09/08/2003,07/01/2000)
3 34567 c(09/06/2004,09/06 / 2004,12/30/2006)
4 45678 c(06/01/2000,08/09/2002)
我正在尝试删除日期中的重复元素以获取此结果:
文件日期
1 12345 c(06/01/2000,08/09/2002)
2 23456 c(07/01/2000,09/08 / 2003)
3 34567 c(09/06/2004,12/30/2006)
4 45678 c(06/01/2000,08/09/2002 )
我尝试过:
R>唯一(日期$日期)
但它会删除重复行日期:
文件日期
1 12345 c(06/01/2000,08/09 / 2002)
2 23456 c(07/0 1/2000,09/08/2003)
3 34567 c(09/06/2004,12/30/2006)
有关如何仅删除日期中重复元素的帮助,而不是删除重复的日期吗?
#匹配一些文本字符串(日期)从一些文本:
df1 $ dates < - as.character(strapply(df1 [[2]],((\\\D\\\d {1,2}(/ | - )\\ \\\d {1,2}(/ | - )\\d {2,4})| ([^ /] \\d {1,2}(/ | - )\\d {2,4})|((JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV){1} [\\s | - ] {0,2} {\\d 1,4}(\\D [\\s | - ] {0 ,} \\d {2,4}){0,}))))
#从数据帧中删除前2列
df2 <-df1 [-c(1 ,2)]
#列表数据
> df2
872 7/23/2007
873 c(11/4/2007,11 / 4/2007)
874 c(2008/4/2008,2007年8月2日)
880 11/14/2006
> class df2)
[1]data.frame
> class(df2 $ dates)
[1]character
> dput(df2)
structure(list(dates = c(NULL,NULL,7/23/2007,c(\11/4 / 2007\,\ 11/4 / 2007\,
c(\4/2/2008 \,\8/2/2007 \),NULL ,NULL,NULL,
NULL,11/14/2006)),.Names =dates,class =data.frame,row.names = 870:880 )
所以我的问题是如何摆脱第873行的重复日期? p>
我解决了我从字符向量中删除重复值的问题 - 打包(strapply(),unique):
df1 $ date< - as.character(lapply((strap(df1 [[2]],((\\ \\\D\\d {1,2}(/ | - )\\d {1,2}(/ | - )\\d {2,4})|(\\\ \\s\\d {1,2}(/ | - )\\d {2,4})|((JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV){1} [\\s | - ] {0,2} {\\d 1,4}(\\D [\\s | - ] {0,} \\ \\\d {2,4}){0,})))),唯一))
<感谢您的帮助。
I am hitting a brick wall with this problem.
I have a data frame (dates) with some document ids and dates stored in a character vector:
Doc Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003", "07/01/2000")
3 34567 c("09/06/2004", "09/06/2004", "12/30/2006")
4 45678 c("06/01/2000","08/09/2002")
I am trying to remove the duplicate elements in the Dates to get this result:
Doc Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003")
3 34567 c("09/06/2004", "12/30/2006")
4 45678 c("06/01/2000","08/09/2002")
I have tried:
R>unique(dates$dates)
but it removes duplicate rows by Dates:
Doc Dates
1 12345 c("06/01/2000","08/09/2002")
2 23456 c("07/01/2000", 09/08/2003")
3 34567 c("09/06/2004", "12/30/2006")
Any help on how to remove only the duplicate elements in Dates, and not remove duplicate Rows by Dates?
** Updated with data
# Match some text string (dates) from some text:
df1$dates <- as.character(strapply(df1[[2]], "((\\D\\d{1,2}(/|-)\\d{1,2}(/|-)\\d{2,4})| ([^/]\\d{1,2}(/|-)\\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\\s|-]{0,2}\\d{1,4}(\\D[\\s|-]{0,}\\d{2,4}){0,}))"))
# Drop first 2 columns from dataframe
df2<-df1[ -c(1,2)]
# List data
>df2
872 7/23/2007
873 c(" 11/4/2007", " 11/4/2007")
874 c(" 4/2/2008", " 8/2/2007")
880 11/14/2006
> class(df2)
[1] "data.frame"
> class(df2$dates)
[1] "character"
> dput(df2)
structure(list(dates = c("NULL", "NULL", " 7/23/2007", "c(\" 11/4/2007\", \" 11/4/2007\")",
"c(\" 4/2/2008\", \" 8/2/2007\")", "NULL", "NULL", "NULL", "NULL",
"NULL", " 11/14/2006")), .Names = "dates", class = "data.frame", row.names = 870:880)
So my issue is how to get rid of the duplicate dates in Row 873?
I solved the issue I was having of removing duplicate values from a character vector - wrap a lapply(strapply(), unique):
df1$date <- as.character(lapply((strapply(df1[[2]], "((\\D\\d{1,2}(/|-)\\d{1,2}(/|- )\\d{2,4})|(\\s\\d{1,2}(/|-)\\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\\s|-]{0,2}\\d{1,4}(\\D[\\s|-]{0,}\\d{2,4}){0,}))")),unique))
Thanks for all your help.
这篇关于R删除字符向量中的重复元素,而不是重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!