R:For循环嵌套在for循环中 [英] R: For loop nested in for loop

查看:642
本文介绍了R:For循环嵌套在for循环中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据,如下所示:

I have some data, that looks like the following:

"Name","Length","Startpos","Endpos","ID","Start","End","Rev","Match"    
"Name_1",140,0,138,"1729",11,112,0,1
"Name_2",132,0,103,"16383",23,232,0,1
"Name_3",102,0,100,"1729",22,226,1,1
"Name_4",112,0,130,"16383",99,992,1,1
"Name_5",132,0,79,"1729",81,820,1,1
"Name_6",112,0,163,"16383",81,820,0,1
"Name_7",123,0,164,"1729",54,542,1,1
"Name_8",123,0,65,"16383",28,289,0,1

我已经使用order函数按照第一个"ID然后是开始"的顺序进行排序.

I have used the order function to order according to first "ID then "Start".

"Name","Length","Startpos","Endpos","ID","Start","End","Rev","Match"   
"Name_1",140,0,138,"1729",11,112,0,1
"Name_3",102,0,100,"1729",22,226,1,1
"Name_7",123,0,164,"1729",54,542,1,1
"Name_5",132,0,79,"1729",81,820,1,1
"Name_2",132,0,103,"16383",23,232,0,1
"Name_8",123,0,65,"16383",28,289,0,1
…

现在我需要做两件事: 首先,我需要创建一个表,其中包含每个ID组中的成对配对.对于一个包含名称(1,2,3,4,5)的ID中的组,我需要创建对(12,23,34,45).因此,对于上面的示例,对将为(Name_1 + Name_3,Name_3 + Name_7,Name_7 + Name_5).

Now I need to do two things: First I need to create a table that includes pairwise couples out of each ID group. For a group in one ID containing the names (1,2,3,4,5), I need to create the pairs (12,23,34,45). So for the above example, the pairs would be (Name_1+Name_3, Name_3+Name_7, Name_7+Name_5).

以上示例的输出如下:

"Start_Name_X","Start_Name_Y","Length_Name_X","Length_Name_Y","Name_Name_X","Name_Name_Y","ID","New column"
11, 22, 140, 102, "Name_1", Name_3", 1729,,
22, 54, 102, 123, "Name_3", Name_7, 1729,,
54, 81, 123, 132, "Name_7", Name_5, 1729,,
23, 28, 132, 123, "Name_2", "Name_8", 16383,,
…

因此,我需要通过升序开始"来创建对,但要在每个"ID"之内. 我认为应该使用for循环来完成此操作,但是我是一个新手,因此使用for循环将数据拖到新表中本身会使我感到困惑,尤其是在每个唯一的"ID"中执行该操作的约束,我不知道该怎么办. 我已经尝试过使用split根据ID将数据分成几组,但是通过创建新的数据表并不能真正使我更进一步.

So I need to create pairs through ascending "Start", but within each "ID". I am thinking it should be done with a for loop, but I am a newbie, so pulling the data to a new table with the for loop confuses me in itself, and especially the constraint of doing it within each unique "ID", I have no idea how to do. I have experimented with splitting the data into groups according to ID using split, but it doesn't really get me further with creating the new data table.

我用以下代码创建了ned数据表:

I have created the ned data-table with the following code:

column_names = data.frame(Start_Name_X ="Start_Name_x",
Start_Name_Y="Start_Name_Y", Length_Name_X ="Length_Name_X",
Length_Name_Y="Length_Name_Y", Name_X="Name_X", Name_Y="Name_Y", ID="ID",
New_Column="New_Column")

write.table(column_names, file = "datatabel.csv", row.names=FALSE, append =
FALSE, col.names = FALSE, sep=",", quote=TRUE)

这是我要写的表格. 是for循环是处理此问题的写方法,如果是,可以给我一些启动方法的提示吗?

And this is the table, I would like to write to. Is a for loop the write way to handle this, and if so, can you give me a few clues on how to start?

推荐答案

只需一个循环即可完成:

It can be done with only one loop:

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE,
text = "\"Name\",\"Length\",\"Startpos\",\"Endpos\",\"ID\",\"Start\",\"End\",\"Rev\",\"Match\"\n\"Name_1\",140,0,138,\"1729\",11,112,0,1\n\"Name_2\",132,0,103,\"16383\",23,232,0,1\n\"Name_3\",102,0,100,\"1729\",22,226,1,1\n\"Name_4\",112,0,130,\"16383\",99,992,1,1\n\"Name_5\",132,0,79,\"1729\",81,820,1,1\n\"Name_6\",112,0,163,\"16383\",81,820,0,1\n\"Name_7\",123,0,164,\"1729\",54,542,1,1\n\"Name_8\",123,0,65,\"16383\",28,289,0,1",
    )

df <- df[order(df$ID, df$Start), ]

inds <- c("Name", "Start", "Length")
indsSorted <- c("Start_Name_X","Start_Name_Y","Length_Name_X","Length_Name_Y","Name_Name_X","Name_Name_Y","ID","New_Column")

out <- data.frame(matrix(nrow = 0, ncol = 8))
colnames(out) <- c("Start_Name_X","Start_Name_Y","Length_Name_X","Length_Name_Y","Name_Name_X","Name_Name_Y","ID","New_Column")
for (i in unique(df$ID)){
    dfID <- subset(df, ID == i)
    dfHead <- head(dfID, n = nrow(dfID) - 1)[, inds]
    colnames(dfHead) <- paste0(colnames(dfHead), "_Name_X")

    dfTail <- tail(dfID, n = nrow(dfID) - 1)[, inds]
    colnames(dfTail) <- paste0(colnames(dfTail), "_Name_Y")

    out <- rbind(out, cbind(dfHead, dfTail, ID = i, New_Column = '', stringsAsFactors = FALSE)[, indsSorted])
}
  out

如果输入很大,这可能会非常慢.它可以进行优化,但是我没有打扰,因为使用data.table可能更快.

This will probably be horribly slow if the input is large. It can be optimized, but I didn't bother since using data.table is probably much quicker.

dt <- data.table(df, key = "ID,Start")
fn <- function(dtIn, id){
    dtHead <- head(dtIn, n = nrow(dtIn) - 1)
    setnames(dtHead, paste0(colnames(dtHead), "_Name_X"))

    dtTail <- tail(dtIn, n = nrow(dtIn) - 1)
    setnames(dtTail,  paste0(colnames(dtTail), "_Name_Y"))

    cbind(dtHead, dtTail, ID = id, New_Column = '')
}

out2 <- dt[, fn(.SD, ID), by = ID, .SDcols = c("Name", "Start", "Length")]
out2 <- as.data.frame(out2[, indsSorted, with = FALSE])

行名不同,但是结果相同.可能还可以优化所使用的功能.

Rownames are different but otherwise the results are identical. The function used can probably be optimized as well.

rownames(out) <- NULL
rownames(out2) <- NULL

identical(out, out2)

这篇关于R:For循环嵌套在for循环中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆