如何读取文本文件并在 R 中创建数据框 [英] how to read text files and create a data frame in R

查看:50
本文介绍了如何读取文本文件并在 R 中创建数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要读取txt文件https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt

并将它们转换为数据框 R,列号为:LastName、FirstName、streetno、streetname、city、state 和 zip...

and convert them into a data frame R with column number as: LastName, FirstName, streetno, streetname, city, state, and zip...

尝试使用 sep 命令将它们分开但失败了...

Tried to use sep command to separate them but failed...

推荐答案

扩展我的评论,这是另一种方法.如果您的完整数据集有更广泛的模式需要考虑,您可能需要调整一些代码.

Expanding on my comments, here's another approach. You may need to tweak some of the code if your full data set has a wider range of patterns to account for.

library(stringr) # For str_trim 

# Read string data and split into data frame
dat = readLines("addr.txt")
dat = as.data.frame(do.call(rbind, strsplit(dat, split=" {2,10}")), stringsAsFactors=FALSE)
names(dat) = c("LastName", "FirstName", "address", "city", "state", "zip")

# Separate address into number and street (if streetno isn't always numeric,
# or if you don't want it to be numeric, then just remove the as.numeric wrapper).
dat$streetno = as.numeric(gsub("([0-9]{1,4}).*","\\1",  dat$address))
dat$streetname = gsub("[0-9]{1,4} (.*)","\\1",  dat$address)

# Clean up zip
dat$zip = gsub("O","0", dat$zip)
dat$zip = str_trim(dat$zip)

dat = dat[,c(1:2,7:8,4:6)]

dat
      LastName  FirstName streetno           streetname       city state        zip
1        Bania  Thomas M.      725    Commonwealth Ave.     Boston    MA      02215
2      Barnaby      David      373        W. Geneva St.   Wms. Bay    WI      53191
3       Bausch       Judy      373        W. Geneva St.   Wms. Bay    WI      53191
...
41      Wright       Greg      791  Holmdel-Keyport Rd.    Holmdel    NY 07733-1988
42     Zingale    Michael     5640        S. Ellis Ave.    Chicago    IL      60637

这篇关于如何读取文本文件并在 R 中创建数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆