使用grep从根目录已经存在的字典中删除单词 [英] Use grep to remove words from dictionary whose roots are already present

查看:116
本文介绍了使用grep从根目录已经存在的字典中删除单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图编写一个随机密码生成器。我有一个字典,有一堆词,我想删除词根已经在词典中的词,这样一个字典,看起来像:

  ablaze 
able
abler
ablest
abloom
ably

只会以

 点燃
able
abloom
ably

因为abler和ablest包含以前使用过的。



我更喜欢用grep来做这件事,这样我可以更多地了解它是如何工作的。我可以用c或python编写一个程序来执行此操作。

解决方案

如果列表被排序,总是在较长的字符串之前,你可以从一个简单的Awk脚本中获得相当好的性能。

  awk'$ 1〜r &安培;&安培; pk in {{next} {k [$ 1] ++;打印; r =^$ 1;如果当前单词与前缀正则表达式匹配 r> (在某一时刻定义)和前缀 p (ditto)在可见键列表中,跳过。否则,将当前单词添加到前缀键中,打印当前行,创建一个匹配当前行的正则表达式(现在是前缀正则表达式 r )并记住 p 中的前缀字符串。

如果所有相似的字符串总是相邻的(因为它们是如果你对文件进行词法排序),我猜可以完全避免 k p >。 p>

  awk'NR> 1&& $ 1〜r {next} {print; r =^$ 1}'单词


I am trying to write a random passphrase generator. I have a dictionary with a bunch of words and I would like to remove words whose root is already in the dictionary, so that a dictionary that looks like:

ablaze
able
abler
ablest
abloom
ably

would end up with only

ablaze
able
abloom
ably

because abler and ablest contain able which was previously used.

I would prefer to do this with grep so that I can learn more about how that works. I am capable of writing a program in c or python that will do this.

解决方案

If the list is sorted so that shorter strings always precede longer strings, you might be able to get fairly good performance out of a simple Awk script.

awk '$1~r && p in k { next } { k[$1]++; print; r= "^" $1; p=$1 }' words

If the current word matches the prefix regex r (defined in a moment) and the prefix p (ditto) is in the list of seen keys, skip. Otherwise, add the current word to the prefix keys, print the current line, create a regex which matches the current word at beginning of line (this is now the prefix regex r) and also remember the prefix string in p.

If all the similar strings are always adjacent (as they would be if you sort the file lexically), you could do away with k and p entirely too, I guess.

awk 'NR>1 && $1~r { next } { print; r="^" $1 }' words

这篇关于使用grep从根目录已经存在的字典中删除单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆