你如何筛选出文件中的所有独特的行? [英] How do you filter out all unique lines in a file?
问题描述
有没有办法过滤掉通过命令行工具文件中的所有独特的行不排序行?我想从根本上做到这一点:
Is there a way to filter out all unique lines in a file via commandline tools without sorting the lines? I'd like to essentially do this:
sort -u myFile
不排序的性能损失。
without the performance hit of sorting.
推荐答案
删除重复的行:
awk '!a[$0]++' file
这是著名的awk单行。上有许多INET解释。 这里是一个解释:
This is famous awk one-liner. there are many explanations on inet. Here is one explanation:
这一个班轮是非常地道。它登记在看到的线
关联阵列A(数组始终在关联awk中)和
同时测试它是否以前见过的行。如果它看到的
前行,那么[行]> 0!A [线] == 0任何Ex pression了
计算结果为false是一个空操作,任何前pression的evals为真
等于{}打印。
This one-liner is very idiomatic. It registers the lines seen in the associative-array "a" (arrays are always associative in Awk) and at the same time tests if it had seen the line before. If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to "{ print }".
这篇关于你如何筛选出文件中的所有独特的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!