将文本文件转换为R中的数据框 [英] Converting text file into data frame in R
问题描述
我的原始数据位于文本文件中,值之间没有特定的分隔符,如下所示:
My raw data is in a text file with no particular delimiters between the values, like so:
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
在R中应用read.table创建一个只有一个变量的数据框每行,而我想要一个每行有10个变量的数据帧(10个值中的每一个都有一个)。如果文本文件中没有分隔符,我该如何实现呢?
Applying read.table in R creates a data frame with only one variable per row, whereas I would like a data frame with 10 variables per row (one for each of the 10 values). How can I achieve this if there is no delimiter in the text file?
推荐答案
我们假设每个字段都包含非空格除了可能有嵌入空格的字段6。
We assume that each field consist of non-spaces except for field 6 which may have embedded spaces.
创建测试文件
Lines <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
"
cat(Lines, file = "myfile.txt")
运行。使用 readLines
生成 L
读入文件。然后在 gsubfn包中使用 gsubfn
插入由产生 g
的字段之间的 sep
。
最后使用 read.table
读取 g
中的文字来创建数据框:
Run. Read in the file using readLines
producing L
. Then using gsubfn
in the gsubfn package insert the character defined by sep
between the fields producing g
.
Finally read the text in g
using read.table
to create a data frame:
library(gsubfn)
L <- readLines("myfile.txt")
sep <- ";" # choose any character not in the file
pat <- "(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S.*\\S) (\\S+) (\\S+) (\\S+) (\\S+)"
pat <- gsub(" ", "\\s+", pat) # can omit if there is only 1 space between fields
g <- gsubfn(pat, ... ~ paste(..., sep = sep), L)
read.table(text = g, sep = sep)
输出。最后一行的结果是:
Output. The result of the last line is:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
2 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
这篇关于将文本文件转换为R中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!