我的笔记本电脑上有一堆上载的.root文件,但我只需要特定的文件 [英] I have a bunch of uploaded .root files on my laptop, but I need just specific ones
问题描述
我有一个包含10000个.root
文件的目录(每个文件看起来像hists11524_blinded.root
或hists9899_blinded.root
),并且需要运行一些宏以进行数据分析.但是,我不需要所有文件(总共只有4000个)在目录中.我在thebest.txt file
中列出了所需的运行(这些4000个数字).该文件也位于带有直方图的目录中.
I have a directory with 10000 .root
files (each looks like hists11524_blinded.root
or hists9899_blinded.root
) in it and need to run some macros for my data analysis purposes. However, I don't need all of the files (just 4000 out of all) to be in the directory. I have a list of needed runs (these 4000 numbers) in thebest.txt file
. This file is also in that directory with histograms.
我想通过使用.txt文件中的信息来删除在运行宏之前不需要进行处理的文件.
I want to delete the files which are not needed for the processing before running macros by using the info from a .txt file.
thebest.txt
文件的外观如下:
09769
09772
09773
09776
09777
09781
09782
09785
09786
09789
09790
09793
...
我的猜测是使用以下命令:
My guess is to work with the command:
-comm -2 -3 <(ls) <(sort thebest) | tail +2 | xargs -p rm
我遇到2个错误:
tail: invalid option -- 'p'
sort: cannot read: No such file or directory
文件thebest.txt
仅包含5位数字,例如09999
或11256
,目录包含名称如hists9999_blinded.root
或hists11256_blinded.root
的文件.
The file thebest.txt
contains only numbers with 5 digits like 09999
or 11256
, the directory contains files with names like hists9999_blinded.root
or hists11256_blinded.root
.
两个列表中的位数不同-这是主要问题.
The number of digits in both lists are different - that is the main issue.
推荐答案
一种选择是从数字中删除前导0以匹配文件名.为避免匹配子字符串,您可以在前面和后面添加相应的文件名部分. (在您的情况下,文件名中间应有数字.)
One option is to remove the leading 0s from the numbers to match the file names. To avoid matching substrings you can prepend and append the corresponding file name parts. (In your case with the number in the middle of the file name.)
由于尚不清楚示例文件thebest.txt
中的前导空格是故意的还是仅是格式问题,因此也会删除前导空格.
As it is not clear if the leading spaces in the sample file thebest.txt
are intentional or only a formatting issue, leading spaces will be removed as well.
由于删除错误的文件可能会导致数据丢失,因此您也可以考虑仅处理匹配的文件,而不是删除不匹配的文件.
As deleting the wrong files may lead to data loss you may also consider processing the matching files only instead of deleting the non-matching files.
# remove leading spaces followed by leading zeros and prepend/append file name parts
sed 's/ *0*\([1-9][0-9]*\)/hists\1_blinded.root/' thebest.txt > thebestfiles.txt
# get matching files and process
find . -name 'hists*_blinded.root' | fgrep -f thebestfiles.txt | xargs process_matching
# or get non-matching files and remove
find . -name 'hists*_blinded.root' | fgrep -v -f thebestfiles.txt | xargs rm
find
命令在当前目录中递归搜索.如果要排除子目录,可以使用-maxdepth 1
.为了避免处理目录名称,您还可以添加-type f
.
The find
command searches recursively in the current directory. If you want to exclude subdirectories you can use -maxdepth 1
. To avoid processing directory names you might also add -type f
.
这篇关于我的笔记本电脑上有一堆上载的.root文件,但我只需要特定的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!