用于同一文件输入R的多个分隔符 [英] Multiple Separators for the same file input R
问题描述
我已经找到答案,但只找到了指向C或C#的东西。
我意识到,很多R是用C编写的,但我对它的了解是不存在的。
我对R也比较新。
我使用当前的Rstudio。
I've had a look for answers, but have only found things referring to C or C#. I realise that much of R is written in C but my knowledge of it is non-existent. I am also relatively new to R. I am using the current Rstudio.
这和我想要的类似。
使用多个分隔线有效读取数据in R
This is similar to what I want, I think. Read the data efficiently with multiple separating lines in R
我有一个csv文件,但是一个变量是一个字符串,其值由 _
和 -
我想知道是否有一个包或额外的代码,在读取以下。命令。
I have a csv file but one variable is a string with values separated by _
and -
And I would like to know if there is a package or extra code which does the following on the read. command.
"1","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,218,4,93,1377907200000
"2","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,390,5,157,1377993600000
"3","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,376,5,193,1.37808e+12
"4","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",1,35,1,15,1377907200000
"5","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",12,11258,117,2843,1377993600000
"6","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",5,4659,56,1826,1.37808e+12
"7","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",7,7296,136,2684,1377907200000
"8","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,4533,35,1632,1377907200000
"9","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,421,6,161,1377993600000
"10","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,57,2,23,1.37808e+12
示例行:
Name Name1 *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
这样很容易阅读
results <- read.delim("~/results", header=F)
字符串* XYZ_Name3_KB_MobApp_M-18-25_AU_PI
but then I still have the string *XYZ_Name3_KB_MobApp_M-18-25_AU_PI
所需输出(由 _
和
):
Name Name1 *XYZ Name3 KB MobApp M 18 25 AU PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
但不分割时间字符串。
----感谢@Henrik和@AnandaMahto的代码和包。
---- Thanks @Henrik and @AnandaMahto for the code and package. ----
library(splitstackshape)
# split concatenated column by `_`
df4 <- concat.split(data = df3, split.col = "V3", sep = "_", drop = TRUE)
# split the remaining concatenated part by `-`
df5 <- concat.split(data = df4, split.col = "V3_5", sep = "-", drop = TRUE)
推荐答案
尝试:
#dummy data
df <- read.table(text="
Name Name1 *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
Name Name2 *CCC_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
",as.is=T)
#replace "_" to "-"
df_V3 <- gsub(pattern="_",replacement="-",df$V3,fixed=TRUE)
#strsplit, make dataframe
df_V3 <- do.call(rbind.data.frame,strsplit(df_V3,split="-"))
#output, merge columns
output <- cbind(df[,c(1:2)],
df_V3,
df[,c(4:ncol(df))])
是另一个相关选项,但使用 read.table
而不是 strsplit
。
splitCol <- "V3"
temp <- read.table(text = gsub("-", "_", df[, splitCol]), sep = "_")
names(temp) <- paste(splitCol, seq_along(temp), sep="_")
cbind(df[setdiff(names(df), splitCol)], temp)
这篇关于用于同一文件输入R的多个分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!