R:如何使用R从txt文件中获取信息 [英] R: how to get information from a txt file with R

查看:208
本文介绍了R:如何使用R从txt文件中获取信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R专家

我有一个很大的文本文件,它具有特定的样式和格式.

I have a large text file, which has specific pattern and format.

我的text.txt包含

My text.txt contains

x1 `xx`nkkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakd`xx`nmm  cataitha`yy`knkcnaktnhakt

x2 `xx`ngkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt 

x3 `xx`nkg,kna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknk`xx`cna`yy`ktnhakt 

x4  nkkndataktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt 

然后,我想让R查找单词列表,在这种情况下为x1,x2,x3和x4 在这两者之间,我想为它们中的每一个获取一个列表,该列表在"xx"和"yy"之间.

Then, I want to ask R to find a list of words, in this case is x1, x2, x3 and x4 And inbetween, I want to get a list for each of them, that is between "xx" and "yy".

这样,结果将是四个列表

As such, the results will be four lists

x1 = c("nkkna", "nmm  cataitha")
x2 = c("ngkna")
x3 = c("nkg,kna", "cna")
x4 = c("NA")

但是,我面临两个问题,想寻求您的帮助.

However, I am facing two problems would like to ask for your help.

  • 如何将大文本文件读入R?我从stackoverflow了解到命令

x<-read.csv(textConnection"xxx")可能有帮助,但是问题是我的文件太大,无法复制和过去,因此应将文件读为csv.有什么更好的方法可以将我的文本文件作为对象加载到R中,之后可以进行搜索和grep了?

x <- read.csv(textConnection"xxx") may help, but the problem is my file is too large to be copy and past, and the file should be be readin as csv. Are there any much better way to load my text file to R as an object that can be search and grep afterwards?

  • 如何编写代码以获取所有这些信息?

我了解到可能使用了strsplit,它似乎可以在RCurl报废的材料中使用,在这里也可以使用吗?如果是的话,您介意教我怎么做吗?

I learn strsplit maybe used, it seems to work in RCurl scrapped materials, does it work here too? If yes, could you mind to teach me how?

非常感谢您.....

Thank you so much.....

推荐答案

要回答第一个问题,要读取文本文件,应使用函数scan().您在SO上看到的对textConnection的引用纯粹是为了读取粘贴到控制台中的一些示例数据.这是我接下来要读取您的数据的步骤:

To answer your first question, to read a text file you should use the function scan(). The references you see on SO to textConnection are purely to read in some example data that is pasted into the console. This is what I am doing next to read your data:

txt <- "
x1 `xx`nkkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakd`xx`nmm  cataitha`yy`knkcnaktnhakt
x2 `xx`ngkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt 
x3 `xx`nkg,kna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknk`xx`cna`yy`ktnhakt 
x4  nkkndataktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt"

dtxt <- textConnection(txt)

然后,我以相同的方式使用scan来读取textConnetion数据.在您自己的代码中,您应该修改以下行,因此tat dtxt是您的文件位置.我将其保留为这种格式,以便其他人可以复制我的结果而不必在自己的文件系统上创建文件:

Then I use scan in the same way to read the textConnetion data. In your own code, you should modify the following line, so tat dtxt is your file location. I keep it in this format, so that other people can replicate my results without having to create a file on their own file system:

dat <- scan(dtxt, what="character", sep="\n")

现在您已经读取了数据,这是对sapplystrsplitgsub的调用(有点复杂),以操纵数据.

Now that you have read the data, it is a (somewhat complicated) call to sapply, strsplit and gsub to manipulate the data.

sapply(seq_along(dat), 
    function(i)unlist(c(sapply(strsplit(dat[i], "`xx`"), 
              function(x)gsub("^(.*?)`.*", "\\1", x)[-1]))))

结果与您指定的完全相同:

The results are exactly as you specified:

[[1]]
[1] "nkkna"         "nmm  cataitha"

[[2]]
[1] "ngkna"

[[3]]
[1] "nkg,kna" "cna"    

[[4]]
character(0)

这篇关于R:如何使用R从txt文件中获取信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆