我的笔记本电脑上有一堆上载的.root文件,但我只需要特定的文件 [英] I have a bunch of uploaded .root files on my laptop, but I need just specific ones

查看:71
本文介绍了我的笔记本电脑上有一堆上载的.root文件,但我只需要特定的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含10000个.root文件的目录(每个文件看起来像hists11524_blinded.roothists9899_blinded.root),并且需要运行一些宏以进行数据分析.但是,我不需要所有文件(总共只有4000个)在目录中.我在thebest.txt file中列出了所需的运行(这些4000个数字).该文件也位于带有直方图的目录中.

I have a directory with 10000 .root files (each looks like hists11524_blinded.root or hists9899_blinded.root) in it and need to run some macros for my data analysis purposes. However, I don't need all of the files (just 4000 out of all) to be in the directory. I have a list of needed runs (these 4000 numbers) in thebest.txt file. This file is also in that directory with histograms.

我想通过使用.txt文件中的信息来删除在运行宏之前不需要进行处理的文件.

I want to delete the files which are not needed for the processing before running macros by using the info from a .txt file.

thebest.txt文件的外观如下:

   09769 
   09772 
   09773 
   09776 
   09777 
   09781 
   09782  
   09785  
   09786  
   09789  
   09790
   09793
    ...

我的猜测是使用以下命令:

My guess is to work with the command:

-comm -2 -3 <(ls) <(sort thebest) | tail +2 | xargs -p rm

我遇到2个错误:

tail: invalid option -- 'p'

sort: cannot read: No such file or directory 

文件thebest.txt仅包含5位数字,例如0999911256,目录包含名称如hists9999_blinded.roothists11256_blinded.root的文件.

The file thebest.txt contains only numbers with 5 digits like 09999 or 11256, the directory contains files with names like hists9999_blinded.root or hists11256_blinded.root.

两个列表中的位数不同-这是主要问题.

The number of digits in both lists are different - that is the main issue.

推荐答案

一种选择是从数字中删除前导0以匹配文件名.为避免匹配子字符串,您可以在前面和后面添加相应的文件名部分. (在您的情况下,文件名中间应有数字.)

One option is to remove the leading 0s from the numbers to match the file names. To avoid matching substrings you can prepend and append the corresponding file name parts. (In your case with the number in the middle of the file name.)

由于尚不清楚示例文件thebest.txt中的前导空格是故意的还是仅是格式问题,因此也会删除前导空格.

As it is not clear if the leading spaces in the sample file thebest.txt are intentional or only a formatting issue, leading spaces will be removed as well.

由于删除错误的文件可能会导致数据丢失,因此您也可以考虑仅处理匹配的文件,而不是删除不匹配的文件.

As deleting the wrong files may lead to data loss you may also consider processing the matching files only instead of deleting the non-matching files.

# remove leading spaces followed by leading zeros and prepend/append file name parts
sed 's/ *0*\([1-9][0-9]*\)/hists\1_blinded.root/' thebest.txt > thebestfiles.txt

# get matching files and process
find . -name 'hists*_blinded.root' | fgrep -f thebestfiles.txt | xargs process_matching

# or get non-matching files and remove
find . -name 'hists*_blinded.root' | fgrep -v -f thebestfiles.txt | xargs rm

find命令在当前目录中递归搜索.如果要排除子目录,可以使用-maxdepth 1.为了避免处理目录名称,您还可以添加-type f.

The find command searches recursively in the current directory. If you want to exclude subdirectories you can use -maxdepth 1. To avoid processing directory names you might also add -type f.

这篇关于我的笔记本电脑上有一堆上载的.root文件,但我只需要特定的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆