如何过滤掉文件中所有唯一的行? [英] How do you filter out all unique lines in a file?
问题描述
有没有办法通过命令行工具过滤掉文件中所有唯一的行而不对行进行排序?我想基本上这样做:
Is there a way to filter out all unique lines in a file via commandline tools without sorting the lines? I'd like to essentially do this:
sort -u myFile
没有排序的性能损失.
推荐答案
删除重复行:
awk '!a[$0]++' file
这就是著名的 awk one-liner.inet上有很多解释.这里是一种解释:
This is famous awk one-liner. there are many explanations on inet. Here is one explanation:
这种单线非常地道.它注册了在关联数组a"(数组在 Awk 中总是关联的)和 at同时测试它之前是否见过这条线.如果它看到了前一行,然后是 a[line] > 0 和 !a[line] == 0.任何表达式评估为 false 是空操作,任何评估为 true 的表达式等于{打印}".
This one-liner is very idiomatic. It registers the lines seen in the associative-array "a" (arrays are always associative in Awk) and at the same time tests if it had seen the line before. If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to "{ print }".
这篇关于如何过滤掉文件中所有唯一的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!