我如何读取包含一些其他文本数据的csv文件 [英] how can i read a csv file containing some additional text data

查看：56 发布时间：2021/4/27 19:51:13 r csv

本文介绍了我如何读取包含一些其他文本数据的csv文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要在R中读取一个csv文件.但是该文件在某些行中包含一些文本信息，而不是逗号值.所以我无法使用read.csv(fileName)方法读取该文件.该文件的内容如下:

I need to read a csv file in R. But the file contains some text information in some rows instead of comma values. So i cannot read that file using read.csv(fileName) method. The content of the file is as follows:

name:russel date:21-2-1991
abc,2,saa
anan,3,ds
ama,ds,az
,,

name:rus date:23-3-1998
snans,32,asa
asa,2,saz

我只需要存储每个名称，日期对的值作为数据框.为此，我该如何读取该文件?

I need to store only values of each name,date pair as data frame. To do that how can i read that file?

实际上我需要的输出是

>dataFrame1
    abc,2,saa
    anan,3,ds
    ama,ds,az
>dataFrame2
    snans,32,asa
    asa,2,saz

推荐答案

您可以使用 scan 读取数据，并使用 grep 和 sub 函数提取重要值.

You can read the data with scan and use grep and sub functions to extract the important values.

文本:

text <- "name:russel date:21-2-1991
abc,2,saa
anan,3,ds
ama,ds,az
,,

name:rus date:23-3-1998
snans,32,asa
asa,2,saz"

这些命令生成带有名称和日期值的数据框.

These commands generate a data frame with name and date values.

# read the text
lines <- scan(text = text, what = character())
# find strings staring with 'name' or 'date'
nameDate <- grep("^name|^date", lines, value = TRUE)
# extract the values
values <- sub("^name:|^date:", "", nameDate)
# create a data frame
dat <- as.data.frame(matrix(values, ncol = 2, byrow = TRUE,
                            dimnames = list(NULL, c("name", "date"))))

结果:

> dat
    name      date
1 russel 21-2-1991
2    rus 23-3-1998

更新

要从不包含名称和日期信息的字符串中提取值，可以使用以下命令:

To extract the values from the strings, which do not contain name and date information, the following commands can be used:

# read data
lines <- readLines(textConnection(text))
# split lines
splitted <- strsplit(lines, ",")
# find positions of 'name' lines
idx <- grep("^name", lines)[-1]
# create grouping variable
grp <- cut(seq_along(lines), c(0, idx, length(lines)))
# extract values
values <- tapply(splitted, grp, FUN = function(x)
                                        lapply(x, function(y)
                                                    if (length(y) == 3) y))
create a list of data frames
dat <- lapply(values, function(x) as.data.frame(matrix(unlist(x),
                                                       ncol = 3, byrow = TRUE)))

结果:

> dat
$`(0,7]`
    V1 V2  V3
1  abc  2 saa
2 anan  3  ds
3  ama ds  az

$`(7,9]`
     V1 V2  V3
1 snans 32 asa
2   asa  2 saz

这篇关于我如何读取包含一些其他文本数据的csv文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何读取包含一些其他文本数据的csv文件 [英] how can i read a csv file containing some additional text data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我如何读取包含一些其他文本数据的csv文件 [英] how can i read a csv file containing some additional text data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭