我该如何处理非常大的列表 [英] how can I manipulate a very large list

查看：70 发布时间：2020/5/2 6:28:31 r list

本文介绍了我该如何处理非常大的列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有10000多个文件.我首先将目录设置为文件所在的文件夹.

I have over 10000 files. I first set my directory to the folder that the files are there.

然后我链接到所有这样的.txt格式的文件

Then I make a link to the all files with .txt format like this

filenames <- list.files("path to the file", pattern="*.txt", full.names=TRUE)

然后我用fread

ldf<- lapply(filenames, FUN=fread, header=TRUE)

为什么要担心?实际上，当我使用data.table时，例如，它弄乱了，那么我必须添加sep","和row.names=FALSE etc.如果您知道更好的方法，请继续提出建议.无论如何

Why fread? actually when I use data.table , it messes up for example then i must add sep","and row.names=FALSEetc . If you know a better way, go ahead and advise please. In any case

完成此操作后，我得到了一个庞大的列表，现在需要从中提取数据.举例来说，我尝试在

After i did this, I end up with a huge list which I need now to extract data from it. As an example, I tried to make a representative data below

当然，在实际数据中，每个文件中都有更多的列，只有三个名为check和myfile和Myname

Of course in real data, there are way much more columns in each file, there only three named checkand myfileand Myname

现在，我尝试通过以下未保留的命令仅保留列myfile和Myname.

Now I tried to keep only column myfileand Myname by the following command which did not make it.

t<- lapply(ldf, `[`, c(2,3))



 my.list <- list(structure(list(check = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE), myfile = c("", "1xLabel:13C(6)15N(4) [R11]", "1xOxidation [M7]", 
"", "1xLabel:13C(6)15N(4) [R11]", ""), myname = c("Q9Y383", "Q9Y383", 
"Q9Y383", "Q15366-2", "Q15366-2", "Q15366-2")), .Names = c("check", 
"myfile", "myname"), row.names = c(NA, -6L), class = c("data.table", 
"data.frame")), structure(list(
    check = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
    ), myfile = c(NA, NA, NA, NA, NA, NA, NA), Myname = c("F8W727", 
    "O76021", "P46783", "P35527", "Q96C45", "Q9Y383", "Q9Y383"
    )), .Names = c("check", "myfile", "myname"), row.names = c(NA, 
-7L), class = c("data.table", "data.frame")), 
    structure(list(check = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), myfile = c("", 
    "2xLabel:13C(6)15N(4) [R6; R8]; 1xCarbamidomethyl [C4]", 
    "", "", "", "1xCarbamidomethyl [C1]", "", "", "", "", "1xLabel:13C(6)15N(4) [R6]; 1xCarbamidomethyl [C5]"
    ), myname = c("P39019", "A2A3R5; P62753", "Q8IYB3; E9PCT1; M0R088; A9Z1X7; Q8IYB3-2", 
    "S4R3J4; O43390-3; B4DT28; O43390; O43390-2; O60506; O60506-2; E7ETM7", 
    "P07910-4; B4DY08; G3V4C1; P07910-2; G3V4W0; P07910; G3V5V7; P07910-3; G3V2D6; G3V2Q1", 
    "D6R9X9; D6RG19; P61927", "Q00839", "G3XAD8; H0YGI8; P31948; F5H0T1", 
    "Q8IYB3; E9PCT1; M0R088; A9Z1X7; Q8IYB3-2", "P42766", "Q9NX58; D6RDJ1"
    )), .Names = c("check", "myfile", "myname"), row.names = c(NA, 
    -11L), class = c("data.table", "data.frame")), 
    structure(list(check = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), myfile = c("", 
    "", "", "", "1xLabel:13C(6)15N(4) [R7]", "", "", "", "3xLabel:13C(6)15N(4) [R1; R7; R10]", 
    "", ""), myname = c("P61247", "P39019", "Q9NWH9", "P62917", 
    "P62917", "E9PCT1", "Q15149", "Q14152", "Q14152", "Q15020", 
    "Q02543")), .Names = c("check", "myfile", "myname"), row.names = c(NA, 
    -11L), class = c("data.table", "data.frame")))

我想要什么?

我要检查加载的所有文件中是否都有myfile和myname?然后有这样的输出

I want to check whether I have myfile and myname in all files I loaded ? and then have a output like this

  file1                file2                  file3                 file4
myfile   myname       myfile   myname      myfile   myname     myfile   myname 
 info     info         info      info        info    info       info     info

使其更具可复制性.我希望示例数据输出如下所示

To make it more reproducible. I want the example data output to be like below

    myout<- structure(list(myfile1 = structure(c(NA, 1L, 2L, NA, 1L, NA, 
NA, NA, NA, NA, NA), .Label = c("1xLabel:13C(6)15N(4) [R11]", 
"1xOxidation [M7]"), class = "factor"), Myname1 = structure(c(2L, 
2L, 2L, 1L, 1L, 1L, NA, NA, NA, NA, NA), .Label = c("Q15366-2", 
"Q9Y383"), class = "factor"), myfile2 = c(NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA), Myname2 = structure(c(1L, 2L, 4L, 3L, 
5L, 6L, 6L, NA, NA, NA, NA), .Label = c("F8W727", "O76021", "P35527", 
"P46783", "Q96C45", "Q9Y383"), class = "factor"), myfile3 = structure(c(NA, 
3L, NA, NA, NA, 1L, NA, NA, NA, NA, 2L), .Label = c("1xCarbamidomethyl [C1]", 
"1xLabel:13C(6)15N(4) [R6]; 1xCarbamidomethyl [C5]", "2xLabel:13C(6)15N(4) [R6; R8]; 1xCarbamidomethyl [C4]"
), class = "factor"), Myname3 = structure(c(5L, 1L, 8L, 10L, 
4L, 2L, 7L, 3L, 8L, 6L, 9L), .Label = c("A2A3R5; P62753", "D6R9X9; D6RG19; P61927", 
"G3XAD8; H0YGI8; P31948; F5H0T1", "P07910-4; B4DY08; G3V4C1; P07910-2; G3V4W0; P07910; G3V5V7; P07910-3; G3V2D6; G3V2Q1", 
"P39019", "P42766", "Q00839", "Q8IYB3; E9PCT1; M0R088; A9Z1X7; Q8IYB3-2", 
"Q9NX58; D6RDJ1", "S4R3J4; O43390-3; B4DT28; O43390; O43390-2; O60506; O60506-2; E7ETM7"
), class = "factor"), myfile4 = structure(c(NA, NA, NA, NA, 1L, 
NA, NA, NA, 2L, NA, NA), .Label = c("1xLabel:13C(6)15N(4) [R7]", 
"3xLabel:13C(6)15N(4) [R1; R7; R10]"), class = "factor"), Myname4 = structure(c(3L, 
2L, 9L, 4L, 4L, 1L, 8L, 6L, 6L, 7L, 5L), .Label = c("E9PCT1", 
"P39019", "P61247", "P62917", "Q02543", "Q14152", "Q15020", "Q15149", 
"Q9NWH9"), class = "factor")), .Names = c("myfile1", "Myname1", 
"myfile2", "Myname2", "myfile3", "Myname3", "myfile4", "Myname4"
), class = "data.frame", row.names = c(NA, -11L))

新要求

然后我想将数据分成两个数据帧.一种是仅保留其myfile具有名为df1的特殊字符串的那些mynames，另一种是保留其myfile不包含任何特殊字符串或没有这些特殊字符串

NEW REQuest

Then I want to split the data into two dataframe. One is keeping only those mynames that their myfile has special strings called df1and one those mynames that their myfiles do not have anything or not those special strings

df1<- structure(list(myname1 = structure(c(3L, 2L, 1L, 1L), .Label = c("", 
"Q15366-2", "Q9Y383"), class = "factor"), myname2 = c(NA, NA, 
NA, NA), myname3 = structure(c(1L, 3L, 4L, 2L), .Label = c("A2A3R5", 
"D6RDJ1", "P62753", "Q9NX58"), class = "factor"), myname4 = structure(c(2L, 
3L, 1L, 1L), .Label = c("", "P62917", "Q14152"), class = "factor")), .Names = c("myname1", 
"myname2", "myname3", "myname4"), class = "data.frame", row.names = c(NA, 
-4L))


df2 <- structure(list(myname1 = structure(c(3L, 3L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("", "Q15366-2", "Q9Y383"), class = "factor"), 
    myname2 = structure(c(2L, 3L, 5L, 4L, 6L, 7L, 7L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = c("", "F8W727", "O76021", "P35527", "P46783", 
    "Q96C45", "Q9Y383"), class = "factor"), myname3 = structure(c(29L, 
    33L, 11L, 18L, 1L, 34L, 35L, 22L, 6L, 20L, 21L, 23L, 4L, 
    10L, 27L, 7L, 2L, 25L, 15L, 24L, 16L, 26L, 13L, 14L, 8L, 
    9L, 31L, 8L, 9L, 31L, 32L, 17L, 3L, 28L, 12L, 33L, 11L, 19L, 
    5L, 34L, 30L), .Label = c(" A9Z1X7", " G3V4C1", " H0YGI8", 
    " O60506-2 ", "A9Z1X7", "B4DT28", "B4DY08", "D6R9X9", "D6RG19", 
    "E7ETM7", "E9PCT1", "F5H0T1", "G3V2D6", "G3V2Q1", "G3V4W0", 
    "G3V5V7", "G3XAD8", "M0R088", "M0R088 ", "O43390", "O43390-2", 
    "O43390-3", "O60506", "P07910", "P07910-2 ", "P07910-3 ", 
    "P07910-4", "P31948", "P39019", "P42766", "P61927", "Q00839", 
    "Q8IYB3", "Q8IYB3-2", "S4R3J4"), class = "factor"), myname4 = structure(c(4L, 
    3L, 10L, 5L, 2L, 9L, 7L, 8L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", 
    "E9PCT1", "P39019", "P61247", "P62917", "Q02543", "Q14152", 
    "Q15020", "Q15149", "Q9NWH9"), class = "factor")), .Names = c("myname1", 
"myname2", "myname3", "myname4"), class = "data.frame", row.names = c(NA, 
-41L))

我该如何处理非常大的列表 [英] how can I manipulate a very large list

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我该如何处理非常大的列表 [英] how can I manipulate a very large list

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭