导入fread与read.table和错误 [英] Importing fread vs read.table and errors
问题描述
当我使用read.table导入.csv文件时,调用 df< - read.table(ModelSugar(new)real_thesis_experiment-table_1.csv,skip = 6,sep = ,,head = TRUE)
我检查了我得到的数据摘要(只显示了45列的前3列):
When I import a .csv file with read.table, with the call df <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", head = TRUE)
and I check the summary of the data I get (only first 3 columns of 45 are shown):
X.run.number. scenario configuration
Min. : 1 "pessimistic":999994 "central":999994
1st Qu.: 650
Median :1299
Mean :1299
3rd Qu.:1949
Max. :2600
使用这个数据帧我可以制作漂亮的图形。但是,我有80个.csv文件,总大小为40 GB,所以我只想导入特定列。
With this dataframe I can make nice graphics. However, I have 80 .csv files with a total size of 40 GB, so I want to import only specific columns.
我认为这会更容易使用 fread
(来自data.table包)。所以我导入了5列并将它们一起调整到一个数据帧中,调用
I figured this would be easier with fread
(from the data.table package). So I imported 5 columns and rbind them together into one dataframe with the call
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",")
df <- do.call("rbind", my.data)
摘要该数据帧看起来像(显示5列中的4列:
The summary of that dataframe looks like(4 of 5 columns shown:
[run number] scenario configuration [step]
Length:999994 Length:999994 Length:999994 Length:999994
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
使用这个数据帧,我无法使用read.table创建图形。我猜这与图像的类有关。列的值。
With this dataframe I cannot make the graphics that I could with read.table. I guess that this has to do with the class of the columns' values.
如何确保使用fread创建的数据帧具有相同的ch具有read.table的特性,以便我可以制作我想要的图形?
How can I make sure that the dataframe created with fread has the same characteristics as the one with read.table, so that I can make the graphics I want?
编辑
我发现当我第一次拆分.csv时excel到列然后使用sep =;的fread调用而不是sep =,,它确实有效。奇怪......而且我不想手动将.csv文件转换为excel中的列。
I found out that when I first split the .csv in excel into columns and then use the fread call with sep = ";" instead of sep = ",", that it does work. Strange... And I don't want to convert the .csv files into columns in excel manually.
推荐答案
你能做什么do是用write.csv读取一个文件并保存该文件的10行作为模板然后你可以执行以下操作 -
What you can do is read one file with write.csv and save 10 rows of that file as template and then you can do the following-
## Getting your files using fread
dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
df_needed<-dfshort[1:10]
template <- subset(df_needed,select=c(columns_required)) ##select whatever cols you need
##Read you large files using fread
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",")
df <- do.call("rbind", my.data)
## changing cols types as per your template
result = data.frame(
lapply(setNames(,names(template)), function(x)
if (x %in% names(df)) as(df[[x]], class(template[[x]]))
else template[[x]][NA_integer_]
), stringsAsFactors = FALSE)
然后,您可以使用它进行绘图,因为它将使用write.csv获得相同的类类型。
Then, you can use it to plot because it will have same class types which you get using write.csv.
dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
template <- copy(dfshort)
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, colClasses = c(1,2,3,25,29), sep=",")
df <- do.call("rbind", my.data)
result = data.frame(
lapply(setNames(,names(template)), function(x)
if (x %in% names(df)) as(df[[x]], class(template[[x]]))
else template[[x]][NA_integer_]
), stringsAsFactors = FALSE)
这篇关于导入fread与read.table和错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!