在文本文件中计算回文数 [英] Counting palindromes in a text file
问题描述
关注此线程 BASH在.txt文件中查找回文我无法弄清楚我的脚本在做什么错.
Having followed this thread BASH Finding palindromes in a .txt file I can't figure out what am I doing wrong with my script.
#!/bin/bash
search() {
tr -d '[[:punct:][:digit:]@]' \
| sed -E -e '/^(.)\1+$/d' \
| tr -s '[[:space:]]' \
| tr '[[:space:]]' '\n'
}
search "$1"
paste <(search <"$1") <(search < "$1" | rev) \
| awk '$1 == $2 && (length($1) >=3) { print $1 }' \
| sort | uniq -c
此脚本提供的所有即时消息都是整个文本文件的输出.我只想输出> = 3的回文并计数它们,例如
All im getting from this script is output of the whole text file. I want to only output palindromes >=3 and count them such as
425做过
120非
等我的文本文件称为sample.txt,每次我使用以下命令运行脚本时:cat sample.txt |源回文,我收到消息"bash ::没有这样的文件或目录".
etc. My textfile is called sample.txt and everytime i run the script with: cat sample.txt | source palindrome I get message 'bash: : No such file or directory'.
推荐答案
使用 awk 和 sed
awk 'function palindrome(str) {len=length(str); for(k=1; k<=len/2+len%2; k++) { if(substr(str,k,1)!=substr(str,len+1-k,1)) return 0 } return 1 } {for(i=1; i<=NF; i++) {if(length($i)>=3){ gsub(/[^a-zA-Z]/,"",$i); if(length($i)>=3) {$i=tolower($i); if(palindrome($i)) arr[$i]++ }} } } END{for(i in arr) print arr[i],i}' file | sed -E '/^[0-9]+ (.)\1+$/d'
在 1.2GB 文件上进行了测试,执行时间为〜4m 40s (i5-6440HQ @ 2.60GHz/4 cores/16GB)
Tested on 1.2GB file and execution time was ~4m 40s (i5-6440HQ @ 2.60GHz/4 cores/16GB)
说明:
awk '
function palindrome(str) # Function to check Palindrome
{
len=length(str);
for(k=1; k<=len/2+len%2; k++)
{
if(substr(str,k,1)!=substr(str,len+1-k,1))
return 0
}
return 1
}
{
for(i=1; i<=NF; i++) # For Each field in a record
{
if(length($i)>=3) # if length>=3
{
gsub(/[^a-zA-Z]/,"",$i); # remove non-alpha character from it
if(length($i)>=3) # Check length again after removal
{
$i=tolower($i); # Covert to lowercase
if(palindrome($i)) # Check if it's palindrome
arr[$i]++ # and store it in array
}
}
}
}
END{for(i in arr) print arr[i],i}' file | sed -E '/^[0-9]+ (.)\1+$/d'
sed -E '/^[0-9]+ (.)\1+$/d'
:从最终结果中检查哪些字符串由重复的字符组成,例如AAA
,BBB
等,并删除它们.
sed -E '/^[0-9]+ (.)\1+$/d'
: From the final result check which strings are composed of just repeated chracters like AAA
, BBB
etc and remove them.
旧答案(在编辑之前)
如果需要,您可以尝试以下步骤:
You can try below steps if you want to :
第1步:预处理
删除所有不必要的字符并将结果存储在临时文件中
Step 1 : Pre-processing
Remove all unnecessary chars and store the result in temp file
tr -dc 'a-zA-Z\n\t ' <file | tr ' ' '\n' > temp
tr -dc 'a-zA-Z\n\t '
这将删除字母,\n
,\t
和空格
tr -dc 'a-zA-Z\n\t '
This will remove all except letters,\n
,\t
, space
tr ' ' '\n'
这会将空格转换为\n
,以换行符分隔每个单词
tr ' ' '\n'
This will convert space to \n
to separate each word in newlines
第2步:处理
grep -wof temp <(rev temp) | sed -E -e '/^(.)\1+$/d' | awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'
grep -wof temp <(rev temp)
这将给您所有回文症
-w
:仅选择那些包含组成整个单词的匹配项的行.
例如:level
与levelAAA
不匹配
-o
:仅打印匹配的组
-f
:将temp
文件中的每个字符串用作在<(rev temp)
grep -wof temp <(rev temp)
This will give you all palindromes
-w
: Select only those lines containing matches that form whole words.
For example : level
won't match with levelAAA
-o
: Print only the matched group
-f
: To use each string in temp
file as pattern to search in <(rev temp)
sed -E -e '/^(.)\1+$/d'
:这将删除由相同字母组成的单词,例如AAA
,BBBBB
sed -E -e '/^(.)\1+$/d'
: This will remove words formed of same letters like AAA
, BBBBB
awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'
:这将过滤具有length>=3
的单词并计算其出现频率,最后打印结果
awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'
: This will filter words having length>=3
and counts their frequency and finally prints the result
示例:
输入文件:
$ cat file
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
输出:
$ tr -dc 'a-zA-Z\n\t ' <file | tr ' ' '\n' > temp
$ grep -wof temp <(rev temp) | sed -E -e '/^(.)\1+$/d' | awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'
3 dad
3 kayak
3 bob
这篇关于在文本文件中计算回文数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!