在UNIX Shell脚本中按多个字段的唯一值排序 [英] Sorting by unique values of multiple fields in UNIX shell script
本文介绍了在UNIX Shell脚本中按多个字段的唯一值排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是unix的新手,希望能够执行以下操作,但不确定如何操作.
I am new to unix and would like to be able to do the following but am unsure how.
以如下行作为文本文件:
Take a text file with lines like:
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
并输出:
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
我希望脚本能够找到具有唯一Line值的每个TR值的所有行.
I would like the script to be able to find all all the lines for each TR value that have a unique Line value.
谢谢
推荐答案
由于您显然没事.通过随机选择dir
,day
,TI
和stn
的值,您可以编写:
Since you are apparently O.K. with randomly choosing among the values for dir
, day
, TI
, and stn
, you can write:
sort -u -t ';' -k 1,1 -k 6,6 -s < input_file > output_file
说明:
-
sort
实用程序对文本文件的行进行排序",使您可以对文件中的行进行排序/比较/合并. (请参见 GNU Coreutils文档.) -
-u
或--unique
选项仅输出等行程的第一个",告诉sort
,如果两条输入线相等,则只需要其中一条. -
-k POS[,POS2]
或--key=POS1[,POS2]
选项,在POS1(起源1)处开始密钥,在POS2(缺省行尾)处结束",告诉sort
我们要在哪里密钥"排序方式.在我们的情况下,-k 1,1
表示一个键由第一字段(从字段1
到字段1
)组成,而-k 6,6
表示一个键由第六字段(从字段6
到字段)组成6
). -
-t SEP
或--field-separator=SEP
选项告诉sort
我们要使用SEP
—在我们的例子中,';'
—分隔和计数字段. (否则,它会认为字段由空格分隔,在我们的示例中,它将把整行视为单个字段.) -
-s
或--stabilize
选项通过禁用最后查询比较来稳定排序",告诉sort
我们仅希望以指定的方式比较行;如果两行具有相同的上述键",则即使它们在其他方面有所不同,也将它们视为等效的.由于我们使用的是-u
,因此这意味着其中之一将被丢弃. (如果我们不使用-u
,则仅表示sort
不会相对于彼此重新排序.)
- The
sort
utility, "sort lines of text files", lets you sort/compare/merge lines from files. (See the GNU Coreutils documentation.) - The
-u
or--unique
option, "output only the first of an equal run", tellssort
that if two input-lines are equal, then you only want one of them. - The
-k POS[,POS2]
or--key=POS1[,POS2]
option, "start a key at POS1 (origin 1), end it at POS2 (default end of line)", tellssort
where the "keys" are that we want to sort by. In our case,-k 1,1
means that one key consists of the first field (from field1
through field1
), and-k 6,6
means that one key consists of the sixth field (from field6
through field6
). - The
-t SEP
or--field-separator=SEP
option tellssort
that we want to useSEP
— in our case,';'
— to separate and count fields. (Otherwise, it would think that fields are separated by whitespace, and in our case, it would treat the entire line as a single field.) - The
-s
or--stabilize
option, "stabilize sort by disabling last-resort comparison", tellssort
that we only want to compare lines in the way that we've specified; if two lines have the same above-defined "keys", then they're considered equivalent, even if they differ in other respects. Since we're using-u
, that means that means that one of them will be discarded. (If we weren't using-u
, it would just mean thatsort
wouldn't reorder them with respect to each other.)
这篇关于在UNIX Shell脚本中按多个字段的唯一值排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文