参考寻求理解一个模式＆QUOT;！_ [$ 0] ++＆QUOT; [英] seeking reference to understand one pattern "!_[$0]++"

查看：120 发布时间：2016/7/28 16:40:38 regex sorting awk

本文介绍了参考寻求理解一个模式＆QUOT;！_ [$ 0] ++＆QUOT;的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是一个AWK新手，使用移植到Windows（UNXUtils）GNU工具和呆子，而不是AWK。在这个论坛上一个解决方案的工作就像绝对魔术，然后我试图找到一个源，我可以读更好地了解该解决方案提供的模式前pression。

Am an AWK newbie, using GNU utilities ported to Windows (UNXUtils) and gawk instead of awk. A solution on this forum worked like absolute magic, and I'm trying to find a source I can read to understand better the pattern expression offered in that solution.

在<一个href=\"http://stackoverflow.com/questions/618378/select-unique-or-distinct-values-from-a-list-in-unix-shell-script\">Select在UNIX shell脚本列表中唯一的或不同的值通过Dimitre Radoulov的答案提供了以下code

In Select unique or distinct values from a list in UNIX shell script an answer by Dimitre Radoulov offering the following code

zsh-4.3.9[t]%   awk '!_[$0]++' file

作为选择列表元素反复和混乱的元素，列出每个元件仅一次的溶液

as a solution for selecting elements of a list with repeated and jumbled elements, listing each element only once.

我已经previously使用排序| uniq的来做到这一点，这对小测试文件工作得很好。对于我的实际问题（提取来自印度国家证券交易所16天在2006年4月档案订单调研数据公司符号列表，与129+万条记录中的多个文件），分类负担变得太多。和柱不仅消除相邻重复。

I had previously used sort | uniq to do this, which worked fine for small test files. For my actual problem (extracting the list of company symbols from archival order book research data from India's National Stock Exchange for 16 days in April 2006, with 129+ million records in multiple files), the sorting burden became too much. And uniq only eliminates adjacent duplicates.

复制上面的线为我的Win-GNU GAWK，我用

Copying the above line for my Win-GNU gawk, I used

C:\Users\PAPERS\>  cat ..\Full*_Symbols.txt | gawk "!_[$0]++"  | wc -l

946

这表明129+万条记录，涉及到946家不同的公司，这是一个非常合理的答案。它把在我微薄的Windows机器上5分钟，试图SORT小时后穿我出去。

suggesting that the 129+ million records pertained to 946 different firms, which is a VERY reasonable answer. And it took under 5 minutes on my modest Windows machine, after hours of trying to SORT wore me out.

在所有awk的文字我已经看过，并搜索了一下网上，并同时为模式的一部分，为什么它的工作的解释是明确的（！作为NOT ， $ 1,0 是当前整个记录），用于下划线 _ 我无法找到任何解释，并有看到 ++ 的例子只是显示为1。更新计数器

Looked at all the awk texts I have and searched a bit online, and while for part of the pattern the explanation of why it worked is clear (! serves as NOT, $0 is the whole current record), for the underscore _ I am not able to find any explanation, and have seen ++ in examples only as "update the counter by 1."

将必须对相应的文本或Web参考充分理解这个例子感激，因为我认为这将有助于我在其他相关案件。谢谢。最好的，

Will be grateful for any appropriate text or web reference to understand this example fully, as I think it will help me in other related cases as well. Thanks. Best,

参考寻求理解一个模式＆QUOT;！_ [$ 0] ++＆QUOT; [英] seeking reference to understand one pattern "!_[$0]++"

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

参考寻求理解一个模式＆QUOT;！_ [$ 0] ++＆QUOT; [英] seeking reference to understand one pattern &quot;!_[$0]++&quot;

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

参考寻求理解一个模式＆QUOT;！_ [$ 0] ++＆QUOT; [英] seeking reference to understand one pattern "!_[$0]++"

登录关闭