只读文本文件的第n列,该文本文件不包含带有R和sqldf的标题 [英] Read only n-th column of a text file which has no header with R and sqldf

查看:88
本文介绍了只读文本文件的第n列,该文本文件不包含带有R和sqldf的标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有类似这样的问题: 在使用SQLDF或读取时选择第N个列. csv.sql

I have a similiar problem like this question: selecting every Nth column in using SQLDF or read.csv.sql

我想读取大文件的某些列(150行表,> 500,000列,以空格分隔,填充有数字数据并且只有32位系统可用).该文件没有标题,因此上面线程中的代码不起作用,我决定写一篇新文章.

I want to read some columns of large files (table of 150rows, >500,000 columns, space separated, filled with numeric data and only a 32 bit system available). This file has no header, therefore the code in the thread above didn't work and I decided to write a new post.

您有解决此问题的想法吗?

Do you have an idea to solve this problem?

我考虑过类似的事情,但是任何使用fread或read.table的结果都可以:

I thought about something like that, but any results with fread or read.table are also ok:

MyConnection <- file("path/file.txt")
df<-sqldf("select column 1 100 1000 235612 from MyConnection",file.format = list(header=F,sep=" "))

推荐答案

如果固定宽度,您可以使用substr指定要读取的列的开始和结束位置:

You can use substr to specify the start and end position of the columns you want to read in if they are fixed width:

x <- tempfile()
cat("12345", "67890", "09876", "54321", sep = "\n", file = x)

myfile <- file(x)

sqldf("select substr(V1, 1, 1) var1, substr(V1, 3, 5) var2 from myfile")
#   var1 var2
# 1    1  345
# 2    6  890
# 3    9   76
# 4    5  321

有关其他示例,请参见此博客文章.如果您知道有关列起始位置和宽度的详细信息,则可以使用paste轻松构造"select"语句.

See this blog post for some more examples. The "select" statement can easily be constructed with paste if you know the details about the column starting positions and widths.

这篇关于只读文本文件的第n列,该文本文件不包含带有R和sqldf的标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆