将非结构化的CSV文件转换为数据框 [英] Convert unstructured csv file to a data frame

查看:122
本文介绍了将非结构化的CSV文件转换为数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习R以进行文本挖掘。我有CSV格式的电视节目时间表。程序通常从06:00 AM开始,一直持续到第二天的05:00 AM(称为广播日)。例如:2015年11月15日的节目开始于06:00 AM,第二天结束于05:00 AM。

I am learning R for text mining. I have a TV program schedule in form of CSV. The programs usually start at 06:00 AM and goes on until 05:00 AM the next day which is called a broadcast day. For example: the programs for 15/11/2015 start at 06:00 AM and ends at 05:00 AM the next day.

以下是显示时间表的示例代码:

Here is a sample code showing how the schedule looks like:

 read.table(textConnection("Sunday|\n 01-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|\nMonday|\n 02-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|"), header = F, sep = "|", stringsAsFactors = F)

其输出如下:

  V1|V2
Sunday |  
01-Nov-15 |       
6 | Tom  
some information about the program |       
23.3 |  Jerry  
some information about the program |       
5 | Avatar  
some information about the program |       
5.3 | Panda  
some information about the program |       
Monday  |       
02-Nov-15|       
6  Jerry  
some information about the program |      
6.25 | Panda  
some information about the program |      
23.3 | Avatar  
some information about the program |       
7.25 |   Tom  
some information about the program |      

我想将上述数据转换为data.frame

I want to convert the above data into a form of data.frame

Date            |Program|Synopsis
2015-11-1 06:00 |Tom    | some information about the program
2015-11-1 23:30 |Jerry  | some information about the program
2015-11-2 05:00 |Avatar | some information about the program
2015-11-2 05:30 |Panda  | some information about the program
2015-11-2 06:00 |Jerry  | some information about the program
2015-11-2 06:25 |Panda  | some information about the program
2015-11-2 23:30 |Avatar | some information about the program
2015-11-3 07:25 |Tom    | some information about the program

我很感谢关于功能或软件包的任何建议/提示,看着。

I am thankful for any suggestions/tips regarding functions or packages I should have a look at.

推荐答案

虽然有点混乱,但似乎可行:

It's a bit of a mess, but it seems to work:

df <- read.table(textConnection(txt <- "Sunday|\n 01-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|\nMonday|\n 02-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|"), header = F, sep = "|", stringsAsFactors = F)
cat(txt)
Sys.setlocale("LC_TIME", "English") # if needed
weekdays <- format(seq.Date(Sys.Date(), Sys.Date()+6, 1), "%A")
days <- split(df, cumsum(df$V1 %in% weekdays))
lapply(days, function(dayDF) {
  tmp <- cbind.data.frame(V1=dayDF[2, 1], do.call(rbind, split(unlist(dayDF[-c(1:2), ]), cumsum(!dayDF[-(1:2), 2]==""))), stringsAsFactors = F)
  tmp[, 1] <- as.Date(tmp[, 1], "%d-%B-%y")
  tmp[, 2] <- as.numeric(tmp[, 2])
  tmp[, 5] <- NULL
  idx <- c(FALSE, diff(tmp[, 2])<0)
  tmp[idx, 1] <- tmp[idx, 1] + 1
  return(tmp)
}) -> days
days <- transform(do.call(rbind.data.frame, days), V1=as.POSIXct(paste(V1, sprintf("%.2f", V11)), format="%Y-%m-%d %H.%M"), V11=NULL)  
names(days) <- c("Date", "Synopsis", "Program")
rownames(days) <- NULL
days[, c(1, 3, 2)]
#                  Date Program                            Synopsis
# 1 2015-11-01 06:00:00     Tom  some information about the program
# 2 2015-11-01 23:30:00   Jerry  some information about the program
# 3 2015-11-02 05:00:00  Avatar  some information about the program
# 4 2015-11-02 06:00:00     Tom  some information about the program
# 5 2015-11-02 23:30:00   Jerry  some information about the program
# 6 2015-11-03 05:00:00  Avatar  some information about the program

这篇关于将非结构化的CSV文件转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆