导入fread与read.table和错误 [英] Importing fread vs read.table and errors

查看:278
本文介绍了导入fread与read.table和错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用read.table导入.csv文件时,调用 df< - read.table(ModelSugar(new)real_thesis_experiment-table_1.csv,skip = 6,sep = ,,head = TRUE)我检查了我得到的数据摘要(只显示了45列的前3列):

When I import a .csv file with read.table, with the call df <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", head = TRUE) and I check the summary of the data I get (only first 3 columns of 45 are shown):

 X.run.number. scenario        configuration   
 Min.   :   1 "pessimistic":999994   "central":999994  
 1st Qu.: 650                                            
 Median :1299                                            
 Mean   :1299                                            
 3rd Qu.:1949                                            
 Max.   :2600  

使用这个数据帧我可以制作漂亮的图形。但是,我有80个.csv文件,总大小为40 GB,所以我只想导入特定列。

With this dataframe I can make nice graphics. However, I have 80 .csv files with a total size of 40 GB, so I want to import only specific columns.

我认为这会更容易使用 fread (来自data.table包)。所以我导入了5列并将它们一起调整到一个数据帧中,调用

I figured this would be easier with fread (from the data.table package). So I imported 5 columns and rbind them together into one dataframe with the call

my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",") 
df <- do.call("rbind", my.data)

摘要该数据帧看起来像(显示5列中的4列:

The summary of that dataframe looks like(4 of 5 columns shown:

[run number]         scenario         configuration         [step]         
 Length:999994      Length:999994      Length:999994      Length:999994     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character 

使用这个数据帧,我无法使用read.table创建图形。我猜这与图像的类有关。列的值。

With this dataframe I cannot make the graphics that I could with read.table. I guess that this has to do with the class of the columns' values.

如何确保使用fread创建的数据帧具有相同的ch具有read.table的特性,以便我可以制作我想要的图形?

How can I make sure that the dataframe created with fread has the same characteristics as the one with read.table, so that I can make the graphics I want?

编辑

我发现当我第一次拆分.csv时excel到列然后使用sep =;的fread调用而不是sep =,,它确实有效。奇怪......而且我不想手动将.csv文件转换为excel中的列。

I found out that when I first split the .csv in excel into columns and then use the fread call with sep = ";" instead of sep = ",", that it does work. Strange... And I don't want to convert the .csv files into columns in excel manually.

推荐答案

你能做什么do是用write.csv读取一个文件并保存该文件的10行作为模板然后你可以执行以下操作 -

What you can do is read one file with write.csv and save 10 rows of that file as template and then you can do the following-

## Getting your files using fread
dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
df_needed<-dfshort[1:10]
template <- subset(df_needed,select=c(columns_required)) ##select whatever cols you need

##Read you large files using fread
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",") 
df <- do.call("rbind", my.data)

## changing cols types as per your template
result = data.frame(
  lapply(setNames(,names(template)), function(x) 
    if (x %in% names(df)) as(df[[x]], class(template[[x]])) 
    else template[[x]][NA_integer_]
  ), stringsAsFactors = FALSE)

然后,您可以使用它进行绘图,因为它将使用write.csv获得相同的类类型。

Then, you can use it to plot because it will have same class types which you get using write.csv.

dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
    template <- copy(dfshort)
    my.files <- list.files(pattern=".csv")
    my.data <- lapply(my.files,fread, header = FALSE, colClasses = c(1,2,3,25,29), sep=",") 
    df <- do.call("rbind", my.data)

    result = data.frame(
      lapply(setNames(,names(template)), function(x) 
        if (x %in% names(df)) as(df[[x]], class(template[[x]])) 
        else template[[x]][NA_integer_]
      ), stringsAsFactors = FALSE)

这篇关于导入fread与read.table和错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆