在 UNIX 中查找所有包含字符的单词 [英] Find all words containing characters in UNIX
问题描述
给定一个单词 W,我想从/usr/dict/words 中找到包含 W 中字母的所有单词.例如,bat"应该返回bat"和tab"(而不是table").
Given a word W, I want to find all words containing the letters in W from /usr/dict/words. For example, "bat" should return "bat" and "tab" (but not "table").
这是一种涉及对输入词进行排序和匹配的解决方案:
Here is one solution which involves sorting the input word and matching:
word=$1
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
while read line
do
sortedLine=`echo $line | grep -o . | sort | tr -d '\n'`
if [ "$sortedWord" == "$sortedLine" ]
then
echo $line
fi
done < /usr/dict/words
有没有更好的办法?我更喜欢使用基本命令(而不是 perl/awk 等),但欢迎使用所有解决方案!
Is there a better way? I'd prefer using basic commands (instead of perl/awk etc), but all solutions are welcome!
为了澄清,我想找到原始单词的所有排列.不允许添加或删除字符.
To clarify, I want to find all permutations of the original word. Addition or deletion of characters is not allowed.
推荐答案
这是一个 awk 实现.它在W"中查找带有这些字母的单词.
here's an awk implementation. It finds the words with those letters in "W".
dict="/usr/share/dict/words"
word=$1
awk -vw="$word" 'BEGIN{
m=split(w,c,"")
for(p=1;p<=m;p++){ chars[c[p]]++ }
}
length($0)==length(w){
f=0;g=0
n=split($0,t,"")
for(o=1;o<=n;o++){
if (!( t[o] in chars) ){
f=1; break
}else{ st[t[o]]++ }
}
if (!f || $0==w){
for(z in st){
if ( st[z] != chars[z] ) { g=1 ;break}
}
if(!g){ print "found: "$0 }
}
delete st
}' $dict
输出
$ wc -l < /usr/share/dict/words
479829
$ time ./shell.sh look
found: kolo
found: look
real 0m1.361s
user 0m1.074s
sys 0m0.015s
更新:改变算法,使用排序
dict="/usr/share/dict/words"
awk 'BEGIN{
w="table"
m=split(w,c,"")
b=asort(c,chars)
}
length($0)==length(w){
f=0
n=split($0,t,"")
e=asort(t,d)
for(i=1;i<=e;i++) {
if(d[i]!=chars[i]){
f=1;break
}
}
if(!f) print $0
}' $dict
输出
$ time ./shell.sh #looking for table
ablet
batel
belat
blate
bleat
tabel
table
real 0m1.416s
user 0m1.343s
sys 0m0.014s
$ time ./shell.sh #looking for chairs
chairs
ischar
rachis
real 0m1.697s
user 0m1.660s
sys 0m0.014s
$ time perl perl.pl #using beamrider's Perl script
table
tabel
ablet
batel
blate
bleat
belat
real 0m2.680s
user 0m1.633s
sys 0m0.881s
$ time perl perl.pl # looking for chairs
chairs
ischar
rachis
real 0m14.044s
user 0m8.328s
sys 0m5.236s
这篇关于在 UNIX 中查找所有包含字符的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!