在 UNIX 中查找所有包含字符的单词 [英] Find all words containing characters in UNIX

查看:30
本文介绍了在 UNIX 中查找所有包含字符的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个单词 W,我想从/usr/dict/words 中找到包含 W 中字母的所有单词.例如,bat"应该返回bat"和tab"(而不是table").

Given a word W, I want to find all words containing the letters in W from /usr/dict/words. For example, "bat" should return "bat" and "tab" (but not "table").

这是一种涉及对输入词进行排序和匹配的解决方案:

Here is one solution which involves sorting the input word and matching:

word=$1
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`

while read line
do
    sortedLine=`echo $line | grep -o . | sort | tr -d '\n'`
    if [ "$sortedWord" == "$sortedLine" ]
    then
        echo $line
    fi
done < /usr/dict/words

有没有更好的办法?我更喜欢使用基本命令(而不是 perl/awk 等),但欢迎使用所有解决方案!

Is there a better way? I'd prefer using basic commands (instead of perl/awk etc), but all solutions are welcome!

为了澄清,我想找到原始单词的所有排列.不允许添加或删除字符.

To clarify, I want to find all permutations of the original word. Addition or deletion of characters is not allowed.

推荐答案

这是一个 awk 实现.它在W"中查找带有这些字母的单词.

here's an awk implementation. It finds the words with those letters in "W".

dict="/usr/share/dict/words"
word=$1
awk -vw="$word" 'BEGIN{
  m=split(w,c,"")
  for(p=1;p<=m;p++){ chars[c[p]]++ }
}
length($0)==length(w){
  f=0;g=0
  n=split($0,t,"")
  for(o=1;o<=n;o++){
    if (!( t[o] in chars) ){
       f=1; break
    }else{ st[t[o]]++ }
  }
  if (!f || $0==w){
      for(z in st){
        if ( st[z] != chars[z] ) { g=1 ;break}
      }
      if(!g){ print "found: "$0 }
  }
  delete st
}' $dict

输出

$ wc -l < /usr/share/dict/words
479829

$ time ./shell.sh look
found: kolo
found: look

real    0m1.361s
user    0m1.074s
sys     0m0.015s

更新:改变算法,使用排序

dict="/usr/share/dict/words"
awk 'BEGIN{
  w="table"
  m=split(w,c,"")
  b=asort(c,chars)
}
length($0)==length(w){
  f=0
  n=split($0,t,"")
  e=asort(t,d)
  for(i=1;i<=e;i++) {
    if(d[i]!=chars[i]){
        f=1;break
    }
  }
  if(!f) print $0
}' $dict

输出

$ time ./shell.sh #looking for table
ablet
batel
belat
blate
bleat
tabel
table

real    0m1.416s
user    0m1.343s
sys     0m0.014s

$ time ./shell.sh #looking for chairs
chairs
ischar
rachis

real    0m1.697s
user    0m1.660s
sys     0m0.014s

$ time perl perl.pl #using beamrider's Perl script
table
tabel
ablet
batel
blate
bleat
belat

real    0m2.680s
user    0m1.633s
sys     0m0.881s

$ time perl perl.pl # looking for chairs
chairs
ischar
rachis

real    0m14.044s
user    0m8.328s
sys     0m5.236s

这篇关于在 UNIX 中查找所有包含字符的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆