在R中读取具有多个空格的文本文件作为定界符 [英] Reading text file with multiple space as delimiter in R

查看:281
本文介绍了在R中读取具有多个空格的文本文件作为定界符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大数据集,其中包含约94列和300万行.此文件在列之间有一个分隔符,也可以有多个空格.我需要从R中的该文件中读取一些列.为此,我尝试使用read.table()及其选项,这些选项可以在下面的代码中看到,代码粘贴在下面-

I have big data set which consist of around 94 columns and 3 Million rows. This file have single as well as multiple spaces as delimiter between columns. I need to read some columns from this file in R. For this I tried using read.table() with options which can be seen in the code below, the code is pasted below-

### Defining the columns to be read from the file, the first 5 column, then we do not read next 24, after this we read next 5 columns. Last 60 columns are not read in-

    col_classes = c(rep("character",2), rep("numeric", 3), rep("NULL",24), rep("numeric", 5), rep("NULL", 60))   

### Reading first 100 rows of the data

    data <- read.table(file, sep = " ",header = F, nrows = 100, na.strings ="", stringsAsFactors= F)

由于必须读入的文件具有多个空格作为某些列之间的分隔符,因此上述方法不起作用.有什么方法可以有效地读取此文件.

Since, the file which has to read in have more than one space as the delimiter between some of the column, the above method does not work. Is there any method using which we can read in this file efficiently.

推荐答案

您需要更改定界符. " "指的是一个空格字符. ""将任何长度的空格都称为分隔符

You need to change your delimiter. " " refers to one whitespace character. "" refers to any length whitespace as being the delimiter

 data <- read.table(file, sep = "" , header = F , nrows = 100,
                     na.strings ="", stringsAsFactors= F)

从手册中:

如果sep ="(read.table的默认设置),则分隔符为空白",即一个或多个空格,制表符,换行符或回车符.

If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns.

此外,对于大型数据文件,您可能需要考虑data.table:::fread来快速将数据直接读取到data.table中.今天早上我自己在使用此功能.它仍然是实验性的,但我发现它确实运行良好.

Also, with a large datafile you may want to consider data.table:::fread to quickly read data straight into a data.table. I was myself using this function this morning. It is still experimental, but I find it works very well indeed.

这篇关于在R中读取具有多个空格的文本文件作为定界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆