将非标准CSV文件读入R [英] Reading a non-standard CSV File into R

查看:156
本文介绍了将非标准CSV文件读入R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将以下csv文件读入R

Im trying to read the following csv file into R

http://asic.gov.au/Reports/YTD/2015/RR20150511-001-SSDailyYTD.csv

我当前使用的代码是:

url <- "http://asic.gov.au/Reports/YTD/2015/RR20150511-001-SSDailyYTD.csv"
shorthistory <- read.csv(url, skip = 4)

但是我仍然遇到以下错误.

However I keep getting the following error.

1:在readLines(file,skip)中:第1行似乎包含嵌入式nul
2:在readLines(file,skip)中:第2行似乎包含嵌入式nul
3:在readLines(file,skip)中:第3行似乎包含嵌入式nul
4:在readLines(file,skip)中:第4行似乎包含嵌入式nul

1: In readLines(file, skip) : line 1 appears to contain an embedded nul
2: In readLines(file, skip) : line 2 appears to contain an embedded nul
3: In readLines(file, skip) : line 3 appears to contain an embedded nul
4: In readLines(file, skip) : line 4 appears to contain an embedded nul

这使我相信我正在错误地使用该功能,因为每一行都失败了.

Which leads me to believe I am utilizing the function incorrectly as it is failing with every line.

任何帮助将不胜感激!

推荐答案

由于左上角的空白,read.csv()似乎无效.必须逐行(readLines())读取文件,然后跳过前4行.

Due to the blank at the top left corners, read.csv() doesn't seem to work. The file has to be read line by line (readLines()) followed by skipping the the first 4 lines.

下面显示了一个示例.该文件将作为文件连接(file())打开,然后逐行读取(readLines()).子设置会跳过前4行.该文件用制表符分隔,因此可以递归地应用strsplit().它们仍然保留为字符串列表,应将其重新格式化为数据框或任何其他合适的类型.

Below shows an example. The file is open as file connection (file()) and then read line by line (readLines()). The first 4 lines are skipped by subsetting. The file is tab-delimited so that strsplit() is applied recursively. Still they are kept as string lists and they should be reformatted as data frame or any other suitable types.

# open file connection and read lines
path <- "http://asic.gov.au/Reports/YTD/2015/RR20150511-001-SSDailyYTD.csv"
con <- file(path, open = "rt", raw = TRUE)
text <- readLines(con, skipNul = TRUE)
close(con)

# skip first 4 lines
text <- text[5:length(text)]
# recursively split string
text <- do.call(c, lapply(text, strsplit, split = "\t"))

text[[1]][1:4]
# [1] "1-PAGE LTD ORDINARY" "1PG "                "1330487"             "1.72"

这篇关于将非标准CSV文件读入R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆