在使用fread导入数据后,所有列都将作为字符 [英] All columns as character after importing data with fread

查看:91
本文介绍了在使用fread导入数据后,所有列都将作为字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我导入了一个CSV文件(包含文本列和数字列)与

I imported a CSV file (containing both text columns and number columns) with

x <- fread('myfile.csv', header = TRUE, verbose =T, na.strings = c("null", "'null'", ""))

导入后,所有列在运行summary(x)时被视为字符

yet after import all columns are seen as characters when I run summary(x)

mycolumn
Length:100000      
Class :character   
Mode  :character   

有没有办法使数字列识别为数字?详细输出如下(使用nrows运行),以使其更快。

Is there any way to make it recognize numerical columns as numbers? The verbose output is below (from a run with nrows), to make it faster.

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 10.162 GB
File is opened and mapped ok
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Looking for supplied sep '\t' on line 30 (the last non blank line in the first 'autostart') ... found ok
Found 166 columns
First row with 166 fields occurs on line 1 (either column names or first row of data)
'header' changed by user from 'auto' to TRUE
Count of eol after first data row: 6513865
Subtracted 1 for last eol and any trailing empty lines, leaving 6513864 data rows
nrow limited to nrows passed in (100000)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (first 5 rows)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (+middle 5 rows)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (+last 5 rows)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (after applying colClasses and integer64)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (after applying drop or select (if supplied)
Allocating 166 column slots (166 - 0 NULL)
Read 100000 rows and 166 (of 166) columns from 10.162 GB file in 00:00:04
   0.564s ( 15%) Memory map (rerun may be quicker)
   0.001s (  0%) sep and header detection
   1.613s ( 43%) Count rows (wc -l)
   0.030s (  1%) Column type detection (first, middle and last 5 rows)
   0.015s (  0%) Allocation of 100000x166 result (xMB) in RAM
   1.437s ( 38%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time if triggered
   0.000s (  0%) Coercing data already read in type bumps (if any)
   0.080s (  2%) Changing na.strings to NA
   3.739s        Total


推荐答案

通过 colClasses 参数指定列的类。但 freads 应该能够自动猜出数字列,这使我认为在数字列中有不是数字的条目。

The way to manually specify classes for columns is via the colClasses argument. But freads should be able to automatically guess the numeric columns, which makes me think that there are entries in your numeric columns that are not numeric.

也许你没有设法捕获所有类型的 NA 值?如果是这种情况,未捕获的 NA 值将被读取为字符串,这将导致整个列设置为字符

Perhaps you haven't managed to catch all types of NA values? If this is the case, the uncaught NA values will be read as character strings, which will cause the whole column to be set as type character.

这篇关于在使用fread导入数据后,所有列都将作为字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆