Bash:从最大列递归地将行写入文件 [英] Bash: recursively write a line to a file from column maximum

查看:44
本文介绍了Bash:从最大列递归地将行写入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

链接回我之前的问题,我发现问题还没有完全解决.问题出在这里:

Linking back to my previous question, I found the problem not to be entirely solved. Here's the problem:

我有一个名为 RUN1 RUN2 RUN3 的目录每个目录都有一些文件.目录 RUN1 具有文件 mod1_1.csv mod1_2.csv mod1_3.csv .目录 RUN2 具有文件 mod2_1.csv mod2_2.csv mod3_3.csv 等.

I have directories named RUN1, RUN2, and RUN3 Each directory has some files. Directory RUN1 has files mod1_1.csv, mod1_2.csv, mod1_3.csv. Directory RUN2 has files mod2_1.csv, mod2_2.csv, mod3_3.csv, etc.

mod1_1.csv 文件的内容如下:

5.71 6.66 5.52 6.90
5.78 6.69 5.55 6.98
5.77 6.63 5.73 6.91

mod1_2.csv 看起来像这样:

5.73 6.43 5.76 6.57
5.79 6.20 5.10 7.01
5.71 6.21 5.34 6.81

在RUN2中,mod2_1.csv如下所示:

In RUN2, mod2_1.csv looks like this:

5.72 6.29 5.39 5.59
5.71 6.10 5.10 7.34
5.70 6.23 5.23 6.45

mod2_2.csv看起来像这样:

And mod2_2.csv looks like this:

5.72 6.29 5.39 5.69
5.71 6.10 5.10 7.32
5.70 6.23 5.23 6.21

我的目标是为每个RUN *目录获取第4列的最小值的行,并将其和将其写入新的.csv文件的模型写入.现在,我有以下代码:

My goal is to obtain the line with the smallest value of column 4 for each RUN* directory, and write that and the model which gave it to a new .csv file. Right now, I have this code:

#!/bin/bash
resultfile="best_results_mlp_2.txt"
for d in $(find . -type d -name 'RUN*' | sort);
do
  find $d -type f -name 'mod*' -exec sort -k4 {} -g \; | head -1 >> "$resultfile"
done

但是它并不总是返回第4列的最小值(我浏览了文件并检查了),并且不包括包含最小数字的文件名.为了澄清,我想要一个包含以下内容的.csv文件:

But it doesn't always return the smallest value of column 4 (I went through the files and checked), and it doesn't include the file name that contains the smallest number. To clarify, I would like a .csv file with these contents:

5.73 6.43 5.76 6.57 mod1_2.csv
5.72 6.29 5.39 5.59 mod2_1.csv

推荐答案

如果要从所有文件中获取最小值,则必须立即对所有内容进行排序.该命令当前按文件对文件进行排序,因此您将在第一个排序的文件中获得最小值.

If you would like to get the smallest value from all files, you will have to sort all their content at once. The command currently sorts file by file, so you get the smallest value in the first sorted file.

检查两者之间的区别

find "$d" -type f -name 'mod*' -exec sort -k4 -g {} + 

find "$d" -type f -name 'mod*' -exec sort -k4 -g {} \;

此外,除非确实需要,否则建议使用 -n 而不是 -g .请查看 info coreutils'sort invocation'-general-numeric-sort 部分,以获取更多详细信息.

Also it is recommended to use -n instead of -g unless you really need to. Check --general-numeric-sort section of info coreutils 'sort invocation' for more details why.

刚刚检查了您上一个问题的链接,现在我看到您需要使用-general-numeric-sort

也就是说,这是一种将相应文件名放入行中的方法,以便将其包含在输出中:

That said, here's a way to get the corresponding filename into the lines, so that you have it in the output:

find "$d" -type f -name 'mod*' -exec awk '{print $0, FILENAME}' {} \;|sort -k4 -g |head -1 >> "$resultfile"

基本上,每个文件分别调用

awk .Awk打印文件的每一行,并在其后附加相应的文件名.然后将所有这些行传递进行排序.

Essentially awk is invoked for each file separately. Awk print each line of the file, appending the corresponding file name to it. Then all those lines are passed for sorting.

注意:上面的代码将打印文件名及其找到 find 的路径.如果您只想获取文件的基本名称,则可以改用以下 awk 命令(其余与上面相同):

Note: The above will print the filename with its path under which find found it. If you are looking to get only the file's basename, you can use the following awk command instead (the rest stays the same as above):

awk 'FNR==1{ cnt=split(FILENAME, arr, "/"); basename=arr[cnt] } { print $0, basename}'

这篇关于Bash:从最大列递归地将行写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆