如何在R中循环遍历文件名两次 [英] How to loop twice over files' names in R

查看:99
本文介绍了如何在R中循环遍历文件名两次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我正在研究两个主题(实际上是其中的20个).由于每个主题都生成27个文件,我需要将这些文件合并以生成另外9个文件,因此我想使这一过程自动化!

Let's say I am running a study on two subjects (in reality, 20 of them). Since each subject generates 27 files that I need to combine to generate additional 9 files, I would like to automate this process!

我有一个因素在九个层次上有所不同: AA,AB,AM,BA,BB,BM,MA,MB,MM.

I have one factor varying on nine levels: AA, AB, AM, BA, BB, BM, MA, MB, MM.

对于每种治疗,我得到三个输出文件,例如,对于AA治疗,我得到: AA1.csv,AA1.txt和AA1log.txt.

For each treatment I get three output files, for example for the AA treatment I get: AA1.csv, AA1.txt and AA1log.txt.

我将需要在这些文件上运行脚本(我们称其为R1);它将它们合并到一个摘要文件中.然后,我需要在生成的所有摘要文件上运行另一个脚本(我将其称为R2).

I will need to run a script (let's call it R1) on these files; it will combine them together in a summary file. Then I will need to run another script (I will call it R2) on all the summary files I have generated.

所有主题的所有输出文件都在一个文件夹数据"中.

All the output files for all the subjects are in one folder, "data".

(对于R示例,感谢@ManuelBickel)

(for the R example, thank you to @ManuelBickel)

# make sure you are in a safe directory!

### Generate the toy data ###

# I define the main directories I need
dir_project = "test"
dirs = list(
  dir_project = dir_project
  ,dir_data = paste0(dir_project, "/data")
  ,dir_summary = paste0(dir_project, "/summary")
  ,dir_plots= paste0(dir_project, "/plots")
)
# create dirs
lapply(dirs, dir.create)

# create some exemplary data and write it in dir
m = matrix(1:4, nrow = 2)
data = list(AA = m, AB = m, AM = m
            ,BA = m, BB = m, BM = m,
            MA = m, MB = m, MM =m)

# generate the csv files for subject 1 and 2
for (i in 1:length(data)) {
  write.csv(data[[i]], file = paste0(dirs[["dir_data"]], "/", names(data[i]), "1.csv"))  
}

for (i in 1:length(data)) {
write.csv(data[[i]], file = paste0(dirs[["dir_data"]], "/", names(data[i]), "2.csv"))  
}

# Generate the .txt files for subjects 1 and 2
for (i in
1:length(data)) {
  write.table(data[[i]], file = paste0(dirs[["dir_data"]], "/", names(data[i]), "1.txt"))  
}

for (i in 1:length(data)) {
  write.table(data[[i]], file = paste0(dirs[["dir_data"]], "/", names(data[i]), "2.txt"))  
}

 # Generate the log.txt files for subjects 1 and 2
for (i in 1:length(data)) {
  write.table(data[[i]], file = paste0(dirs[["dir_data"]], "/", names(data[i]), "1log.txt"))  
}

for (i in 1:length(data)) {
  write.table(data[[i]], file = paste0(dirs[["dir_data"]], "/", names(data[i]), "2log.txt"))  
}

以下是我的数据文件夹中的文件:

So the following are the files I have in my data folder:

list.files(dirs[["dir_data"]])

# [1] "AA1.csv"    "AA1.txt"    "AA1log.txt" "AA2.csv"    "AA2.txt"    "AA2log.txt" "AB1.csv"    "AB1.txt"    "AB1log.txt"
# [10] "AB2.csv"    "AB2.txt"    "AB2log.txt" "AM1.csv"    "AM1.txt"    "AM1log.txt" "AM2.csv"    "AM2.txt"    "AM2log.txt"
# [19] "BA1.csv"    "BA1.txt"    "BA1log.txt" "BA2.csv"    "BA2.txt"    "BA2log.txt" "BB1.csv"    "BB1.txt"    "BB1log.txt"
# [28] "BB2.csv"    "BB2.txt"    "BB2log.txt" "BM1.csv"    "BM1.txt"    "BM1log.txt" "BM2.csv"    "BM2.txt"    "BM2log.txt"
# [37] "MA1.csv"    "MA1.txt"    "MA1log.txt" "MA2.csv"    "MA2.txt"    "MA2log.txt" "MB1.csv"    "MB1.txt"    "MB1log.txt"
# [46] "MB2.csv"    "MB2.txt"    "MB2log.txt" "MM1.csv"    "MM1.txt"    "MM1log.txt" "MM2.csv"    "MM2.txt"    "MM2log.txt"

现在,我需要我的代码来选择文件:AA1.csv,AA1.txt和AA1log.txt,并在它们上运行脚本R1.

Now I need my code to pick the files: AA1.csv, AA1.txt and AA1log.txt and run the script R1 on them.

脚本R1将生成一个csv文件作为输出,该文件将以"summaryAA1_csv"的形式进入数据"文件夹.它还将生成32 png.文件(AA1_1.png,AA1_2.png等)将进入文件夹"plots"中的子文件夹"AA1".

The script R1 will generate as output one csv file that will go in the folder "data" as "summaryAA1_csv". It will also generate 32 png. files (AA1_1.png, AA1_2.png and so on) that will go into a subfolder "AA1" in the folder "plots".

然后,我将从文件夹数据"中选择主题1的所有摘要文件,然后运行脚本R2.

Then I will pick all the summary files for subject 1 out of the folder "data" and run the script R2.

基本上,首先我需要选择主题1产生的所有数据集.然后我需要选择通过相同处理生成的那些(首先是所有AA,然后是AB).经过9种治疗后,我将转到主题2.

Besically first I need to pick all the datasets produced by subject 1; then I need to pick the ones generated by the same treatment (all the AAs first, then the ABs etc.). Once I have gone trough the nine treatments, I move to subject 2.

让我们说这就是R1在做的事情:

Let's say this is what R1 is doing:

temp = read.csv("test/data/AA1.csv", sep=",", row.names=1)
temp1 <- as.matrix(temp) 
temp2 <- read.table("test/data/AA1.txt")
temp3 <- read.table("test/data/AA1log.txt")
summaryAA1 <- temp1 + temp2 + temp3
summaryAA1

在我编写R1代码时,它还会生成位于不同文件夹中的图(每次处理32张!)

As I wrote my R1 code also generates plots (32 for each treatment!) that go in a different folder

dir.create("test/plots/AA1plots")
png(filename="test/plots//AA1plots/AA1_1_plot.png")
plot(summaryAA1)
dev.off()

我的问题是我如何使我的代码两次选择文件;首先选择引用相同治疗(AA)和相同主题编号的文件;运行完所有处理后,移至引用第二个主题的相同处理的文件.

My question is how I make my code select the files twice; first select the files that refer to the same treatment (AA) and the same subject number; once all the treatments have been run, move to the files that refer to the same treatment for the second subject.

我也乐意接受有关更方便的命名系统的建议,这可能会使循环更加方便.

I am also open to suggestions about a more convenient naming system that may make the looping more convenient.

推荐答案

请考虑组织输入(受试者和治疗组合的列表)和过程(R1和R2).然后适当地打电话给他们:

Consider organizing your inputs (list of subject and treatment combinations) and processes (R1 and R2). Then call them appropriately:

subjects <- c(1, 2)
treatments <- c("AA", "AB", "AM", "BA", "BB", "BM", "MA", "MB", "MM")

r1_list <- as.vector(sapply(subjects, function(x,y) paste0(y,x), treatments))
# [1] "AA1" "AB1" "AM1" "BA1" "BB1" "BM1" "MA1" "MB1" "MM1" "AA2" "AB2" "AM2" "BA2" "BB2" "BM2" "MA2" "MB2" "MM2"

r2_list <- sapply(subjects, function(x,y) paste0(y,x), treatments, simplify = FALSE)
r2_list
# [[1]]
# [1] "AA1" "AB1" "AM1" "BA1" "BB1" "BM1" "MA1" "MB1" "MM1"

# [[2]]
# [1] "AA2" "AB2" "AM2" "BA2" "BB2" "BM2" "MA2" "MB2" "MM2"

R1脚本

setwd("test")

my_func1 <- function(f){
    temp = read.csv(paste0("data/", f, ".csv"), row.names=1)
    temp1 <- as.matrix(temp) 
    temp2 <- read.table(paste0("data/", f, ".txt"))
    temp3 <- read.table(paste0("data/", f, "log.txt"))

    # SUMMARIES
    summary_all <- temp1 + temp2 + temp3
    summary_data <- read.csv(paste0("summary", f, ".csv"))

    ...

    # IMAGES
    for (i in seq(1,32)) {
        dir.create(paste0("plots/", f, "plots"))
        png(filename=paste0("plots/", f, "plots/", f, "_", i, "_plot.png"))
        plot(...)
        dev.off()
    }
}

# CREATE ALL SUMMARY AND IMAGE FILES
for (j in r1_list) my_func1(j)

R2脚本

my_func2 <- function(items){
    files <- paste0("summary", items, ".csv")

    # READ ALL SUMMARY FILES INTO A LIST OF DATA FRAMES
    df_list <- lapply(files, read.csv)

    # PROCESS LIST
    ...    
}

# PROCESS SUMMARY FILES
for (j in r2_list) my_func2(j)

这篇关于如何在R中循环遍历文件名两次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆