导入没有行分隔符的固定宽度数据文件 [英] Import fixed width data file with no line separator

查看:93
本文介绍了导入没有行分隔符的固定宽度数据文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有没有行分隔符的固定宽度数据文件(.dbf).该数据文件的两行如下所示:

I have fixed width data files (.dbf) that don't have line separators. Here is what two lines of that datafile looks like:

20141101 77h  3.210                                  0    3 20141102 76h  3.090                                  0    3 

对于日期(8),某个时间度量(4),数据点(7)以及我可以在一个休息"列(41)中总结的其他一些列,一行的宽度为c(8,4,7,41).一行之后没有分隔符,而下一行仅追加到第一行.所有时间步长基本上都连续地写成一行.此文件中仅包含数字,字符和空格.

The widths of one line is c(8,4,7,41) for date (8), some time measure (4), the data point (7), and some other columns that i can summarize in one "rest" column (41). After one line there is no separator and the next line is just appended to the first line. All time steps are basically written consecutively in one massive line. There is exclusively numbers, characters and white space in this file.

With read.fwf('filepath', widths = c(8,4,7,41))由于缺少行分隔符,R在第一行之后停止读取.

With read.fwf('filepath', widths = c(8,4,7,41)) R stops reading after the first line due to lack of line separator.

有没有参数告诉read.fwf()在没有行分隔符的情况下何时开始读取新行?还是我应该使用其他读取命令?

Is there an argument to tell read.fwf() when to start reading the new line when there is no line separator? Or should i use a different read command?

谢谢.

推荐答案

使用readLinessubstrtrimwsseparate( tidyr )和mutate_all( dplyr ):

A different, and probably less elegant, solution with readLines, substr, trimws, separate (tidyr) and mutate_all (dplyr):

txt <- readLines('filepath')
dfx <- data.frame(V1 = sapply(seq(from=1, to=nchar(txt), by=60),
                              function(x) substr(txt, x, x+59)))

library(dplyr)
library(tidyr)
dfx %>% 
  separate(V1, c(paste0("V",LETTERS[1:5])), c(8,12,19,55)) %>% 
  mutate_all(trimws)

给出:

        VA  VB    VC VD VE
1 20141101 77h 3.210  0  3
2 20141102 76h 3.090  0  3

要获取不同的列名,只需将c(paste0("V",LETTERS[1:5])替换为所需的列名向量即可.

To get different column names , just replace c(paste0("V",LETTERS[1:5]) with a vector of columnnames you want.

如果要将列转换为正确的类而不是转换为character,则可以在mutate_all中使用funs(ul = type.convert(trimws(.))).

If you want to transform the columns into the correct classes instead of into character, you can use funs(ul = type.convert(trimws(.))) inside mutate_all.

这篇关于导入没有行分隔符的固定宽度数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆