R将非结构化的csv文件转换为数据帧 [英] R convert unstructured csv file to a data frame

查看：162 发布时间：2017/3/26 0:36:37 r dataframe reshape

本文介绍了R将非结构化的csv文件转换为数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在学习R文本挖掘。我有一个CSV格式的电视节目安排。节目通常在06:00 AM开始，直到第二天的05:00 AM被称为播出日。例如：15/11/2015的节目从06:00 AM开始，并在第二天的05:00 AM结束。

以下是一个示例代码，显示计划如何：

  read.table（textConnection（Sunday | \ 01-Nov-15 | \\\
 6 | Tom\\\
有关程序的一些信息| \ 23.3 | Jerry关于程序的一些信息| \\ \\ n 5 |头像关于程序的一些信息| \\\
Monday | \\\
 02-Nov-15 | \\\
 6 | Tom \关于程序的一些信息| \ 23.3 | Jerry\关于程序的一些信息| \\\
 5 |头像关于程序的一些信息|），header = F，sep =|，stringsAsFactors = F）

其输出如下：

  V1 | V2 
星期日| 
 01-Nov-15 | 
 6 | Tom 
有关程序的一些信息
 23.3 | Jerry 
有关程序的一些信息| 
 5 |头像
有关程序的一些信息
 5.3 |熊猫
有关程序的一些信息| 
星期一| 
 02-Nov-15 | 
 6 Jerry 
有关程序的一些信息
 6.25 |熊猫
有关程序的一些信息| 
 23.3 |头像
有关程序的一些信息
 7.25 | Tom 
有关程序的一些信息

我想将上述数据转换为data.frame的形式。

 日期|节目|剧情简介
 2015-11-1 06:00 | Tom |关于程序的一些信息
 2015-11-1 23:30 | Jerry |关于程序的一些信息
 2015-11-2 05:00 |头像|一些有关程序的信息
 2015-11-2 05:30 |熊猫|关于程序的一些信息
 2015-11-2 06:00 | Jerry |关于程序的一些信息
 2015-11-2 06:25 |熊猫|关于程序的一些信息
 2015-11-2 23:30 |头像|关于程序的一些信息
 2015-11-3 07:25 | Tom |关于程序的一些信息

我感谢任何关于功能或包的建议/提示，我应该有看着。

解决方案

这有点麻烦，但似乎有效：

  df<  -  read.table（textConnection（txt<  - Sunday | \ 01-Nov-15 | \\\
 6 | Tom \\程序| \\\
 23.3 | Jerry\\\
有关程序的一些信息| \\\
 5 |头像关于程序的一些信息| \\\
Monday | \\\
 02-Nov-15 | \ 6 | Tomny有关程序的一些信息| \ 23.3 | Jerry关于程序的一些信息| \\\
 5 |头像|有关程序的一些信息|），header = F，sep = |，stringsAsFactors = F）
 cat（txt）
 Sys.setlocale（LC_TIME，英文）＃如果需要
平日<  - 格式（seq.Date .Date（），Sys.Date（）+ 6,1），％A）
 days<  -  split（df，cumsum（df $ V1％in％工作日））
 lapply days，function（dayDF）{
 tmp < -  cbind.data.frame（V1 = dayDF [2，1]，do.call（rbind，split（unlist（dayDF [-c（1：2） ]），cumsum（！dayDF [ - （1：2），2] ==））），stringsAs因子= F）
 tmp [，1]<  -  as.Date（tmp [，1]，％d-％B-％y）
 tmp [，2] .numeric（tmp [，2]）
 tmp [，5]<  -  NULL 
 idx<  -  c（FALSE，diff（tmp [，2]）< 0）
 tmp [idx，1]<  -  tmp [idx，1] + 1 
 return（tmp）
}） - > days 
 days<  -  transform（do.call（rbind.data.frame，days），V1 = as.POSIXct（paste（V1，sprintf（％。2f，V11）），format =％ 
名称（天）<  -  c（日期，剧情简介，程序）
 rownames（y-％m-％d％H.％M），V11 =天）<  -  NULL 
 days [，c（1，3，2）] 
＃日期程序简介
＃1 2015-11-01 06:00:00汤姆有些信息节目
＃2 2015-11-01 23:30:00 Jerry有关节目的一些信息
＃3 2015-11-02 05:00:00阿凡达关于节目的一些信息
 ＃4 2015-11-02 06:00:00汤姆关于节目的一些信息
＃5 2015-11-02 23:30:00 Jerry关于节目的一些信息
＃6 2015-11- 03 05:00:00阿凡达关于程序的一些信息

I am learning R for text mining. I have a TV program schedule in form of CSV. The programs usually start at 06:00 AM and goes on until 05:00 AM the next day which is called a broadcast day. For example: the programs for 15/11/2015 start at 06:00 AM and ends at 05:00 AM the next day.

Here is a sample code showing how the schedule looks like:

 read.table(textConnection("Sunday|\n 01-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|\nMonday|\n 02-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|"), header = F, sep = "|", stringsAsFactors = F)

whose output is as follows:

  V1|V2
Sunday |  
01-Nov-15 |       
6 | Tom  
some information about the program |       
23.3 |  Jerry  
some information about the program |       
5 | Avatar  
some information about the program |       
5.3 | Panda  
some information about the program |       
Monday  |       
02-Nov-15|       
6  Jerry  
some information about the program |      
6.25 | Panda  
some information about the program |      
23.3 | Avatar  
some information about the program |       
7.25 |   Tom  
some information about the program |

I want to convert the above data into a form of data.frame

Date            |Program|Synopsis
2015-11-1 06:00 |Tom    | some information about the program
2015-11-1 23:30 |Jerry  | some information about the program
2015-11-2 05:00 |Avatar | some information about the program
2015-11-2 05:30 |Panda  | some information about the program
2015-11-2 06:00 |Jerry  | some information about the program
2015-11-2 06:25 |Panda  | some information about the program
2015-11-2 23:30 |Avatar | some information about the program
2015-11-3 07:25 |Tom    | some information about the program

I am thankful for any suggestions/tips regarding functions or packages I should have a look at.

解决方案

It's a bit of a mess, but it seems to work:

df <- read.table(textConnection(txt <- "Sunday|\n 01-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|\nMonday|\n 02-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|"), header = F, sep = "|", stringsAsFactors = F)
cat(txt)
Sys.setlocale("LC_TIME", "English") # if needed
weekdays <- format(seq.Date(Sys.Date(), Sys.Date()+6, 1), "%A")
days <- split(df, cumsum(df$V1 %in% weekdays))
lapply(days, function(dayDF) {
  tmp <- cbind.data.frame(V1=dayDF[2, 1], do.call(rbind, split(unlist(dayDF[-c(1:2), ]), cumsum(!dayDF[-(1:2), 2]==""))), stringsAsFactors = F)
  tmp[, 1] <- as.Date(tmp[, 1], "%d-%B-%y")
  tmp[, 2] <- as.numeric(tmp[, 2])
  tmp[, 5] <- NULL
  idx <- c(FALSE, diff(tmp[, 2])<0)
  tmp[idx, 1] <- tmp[idx, 1] + 1
  return(tmp)
}) -> days
days <- transform(do.call(rbind.data.frame, days), V1=as.POSIXct(paste(V1, sprintf("%.2f", V11)), format="%Y-%m-%d %H.%M"), V11=NULL)  
names(days) <- c("Date", "Synopsis", "Program")
rownames(days) <- NULL
days[, c(1, 3, 2)]
#                  Date Program                            Synopsis
# 1 2015-11-01 06:00:00     Tom  some information about the program
# 2 2015-11-01 23:30:00   Jerry  some information about the program
# 3 2015-11-02 05:00:00  Avatar  some information about the program
# 4 2015-11-02 06:00:00     Tom  some information about the program
# 5 2015-11-02 23:30:00   Jerry  some information about the program
# 6 2015-11-03 05:00:00  Avatar  some information about the program

这篇关于R将非结构化的csv文件转换为数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R将非结构化的csv文件转换为数据帧 [英] R convert unstructured csv file to a data frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R将非结构化的csv文件转换为数据帧 [英] R convert unstructured csv file to a data frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭