如何使用bash脚本计算单词中出现最多的3个字母的序列 [英] How can I count most occuring sequence of 3 letters within a word with a bash script
本文介绍了如何使用bash脚本计算单词中出现最多的3个字母的序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个示例文件
XYZAcc
ABCAccounting
Accounting firm
Accounting Aco
Accounting Acompany
Acoustical consultant
在这里,我需要重复一个单词中最多出现的3个字母的顺序
Here I need to grep most occurring sequence of 3 letters within a word
输出应为
acc = 5 aco = 3
acc = 5 aco = 3
在Bash中有可能吗?
Is that possible in Bash?
我完全不知道如何用awk,sed,grep来完成它.
I got absolutely no idea how I can accomplish it with either awk, sed, grep.
任何线索,怎么可能...
Any clue how it's possible...
PS:无输出,因为我不知道该怎么做,我不想写不必要的awk -F,xyz abc ...在任何地方都无济于事...
PS: no output because I got no idea how to do that, I dont wanna wrote unnecessary awk -F, xyz abc... that not gonna help anywhere...
推荐答案
以下是您尝试做的事情的入门方法:
Here's how to get started with what I THINK you're trying to do:
$ cat tst.awk
BEGIN { stringLgth = 3 }
{
for (fldNr=1; fldNr<=NF; fldNr++) {
field = $fldNr
fieldLgth = length(field)
if ( fieldLgth >= stringLgth ) {
maxBegPos = fieldLgth - (stringLgth - 1)
for (begPos=1; begPos<=maxBegPos; begPos++) {
string = tolower(substr(field,begPos,stringLgth))
cnt[string]++
}
}
}
}
END {
for (string in cnt) {
print string, cnt[string]
}
}
.
$ awk -f tst.awk file | sort -k2,2nr
acc 5
cou 5
cco 4
ing 4
nti 4
oun 4
tin 4
unt 4
aco 3
abc 1
ant 1
any 1
bca 1
cac 1
cal 1
com 1
con 1
fir 1
ica 1
irm 1
lta 1
mpa 1
nsu 1
omp 1
ons 1
ous 1
pan 1
sti 1
sul 1
tan 1
tic 1
ult 1
ust 1
xyz 1
yza 1
zac 1
这篇关于如何使用bash脚本计算单词中出现最多的3个字母的序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文