如何读取文本文件并在 R 中创建数据框 [英] how to read text files and create a data frame in R
本文介绍了如何读取文本文件并在 R 中创建数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
需要读取txt文件https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt
并将它们转换为数据框 R,列号为:LastName、FirstName、streetno、streetname、city、state 和 zip...
and convert them into a data frame R with column number as: LastName, FirstName, streetno, streetname, city, state, and zip...
尝试使用 sep 命令将它们分开但失败了...
Tried to use sep command to separate them but failed...
推荐答案
扩展我的评论,这是另一种方法.如果您的完整数据集有更广泛的模式需要考虑,您可能需要调整一些代码.
Expanding on my comments, here's another approach. You may need to tweak some of the code if your full data set has a wider range of patterns to account for.
library(stringr) # For str_trim
# Read string data and split into data frame
dat = readLines("addr.txt")
dat = as.data.frame(do.call(rbind, strsplit(dat, split=" {2,10}")), stringsAsFactors=FALSE)
names(dat) = c("LastName", "FirstName", "address", "city", "state", "zip")
# Separate address into number and street (if streetno isn't always numeric,
# or if you don't want it to be numeric, then just remove the as.numeric wrapper).
dat$streetno = as.numeric(gsub("([0-9]{1,4}).*","\\1", dat$address))
dat$streetname = gsub("[0-9]{1,4} (.*)","\\1", dat$address)
# Clean up zip
dat$zip = gsub("O","0", dat$zip)
dat$zip = str_trim(dat$zip)
dat = dat[,c(1:2,7:8,4:6)]
dat
LastName FirstName streetno streetname city state zip
1 Bania Thomas M. 725 Commonwealth Ave. Boston MA 02215
2 Barnaby David 373 W. Geneva St. Wms. Bay WI 53191
3 Bausch Judy 373 W. Geneva St. Wms. Bay WI 53191
...
41 Wright Greg 791 Holmdel-Keyport Rd. Holmdel NY 07733-1988
42 Zingale Michael 5640 S. Ellis Ave. Chicago IL 60637
这篇关于如何读取文本文件并在 R 中创建数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文