排序与Linux Shell中的uniq [英] Sort & uniq in Linux shell

查看:87
本文介绍了排序与Linux Shell中的uniq的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下命令之间有什么区别?

What is the difference between the following to commands?

sort -u FILE

sort FILE | uniq

推荐答案

使用sort -u进行的I/O少于sort | uniq,但是最终结果是相同的.特别是,如果文件足够大,以至sort必须创建中间文件,则sort -u很有可能会使用较少或稍小的中间文件,因为它可以在对每个集合进行排序时消除重复项.如果数据具有高度重复性,则可能会有所帮助;如果实际上几乎没有重复项,则不会有太大的区别(绝对是第二级的性能效果,与管道的第一级效果相比).

Using sort -u does less I/O than sort | uniq, but the end result is the same. In particular, if the file is big enough that sort has to create intermediate files, there's a decent chance that sort -u will use slightly fewer or slightly smaller intermediate files as it could eliminate duplicates as it is sorting each set. If the data is highly duplicative, this could be beneficial; if there are few duplicates in fact, it won't make much difference (definitely a second order performance effect, compared to the first order effect of the pipe).

请注意,有时需要适当的管道.例如:

Note that there times when the piping is appropriate. For example:

sort FILE | uniq -c | sort -n

这将文件按文件中每行出现的次数排序,最后重复的行出现在最后. (发现这种组合(对于Unix或POSIX来说是惯用的)可以用GNU sort压缩成一个复杂的"sort"命令,这不足为奇).

This sorts the file into order of the number of occurrences of each line in the file, with the most repeated lines appearing last. (It wouldn't surprise me to find that this combination, which is idiomatic for Unix or POSIX, can be squished into one complex 'sort' command with GNU sort.)

有时不使用管道很重要.例如:

There are times when not using the pipe is important. For example:

sort -u -o FILE FILE

这将文件原位"排序;也就是说,输出文件由-o FILE指定,并且保证此操作是安全的(在覆盖输出之前先读取文件).

This sorts the file 'in situ'; that is, the output file is specified by -o FILE, and this operation is guaranteed safe (the file is read before being overwritten for output).

这篇关于排序与Linux Shell中的uniq的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆