在文本文件中计算回文数 [英] Counting palindromes in a text file

查看:97
本文介绍了在文本文件中计算回文数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关注此线程 BASH在.txt文件中查找回文我无法弄清楚我的脚本在做什么错.

Having followed this thread BASH Finding palindromes in a .txt file I can't figure out what am I doing wrong with my script.

#!/bin/bash
search() {
tr -d '[[:punct:][:digit:]@]' \
| sed -E -e '/^(.)\1+$/d'      \
| tr -s '[[:space:]]'           \
| tr '[[:space:]]' '\n'
}

search "$1"

paste <(search <"$1") <(search < "$1" | rev)     \
| awk '$1 == $2 && (length($1) >=3) { print $1 }' \
| sort | uniq -c

此脚本提供的所有即时消息都是整个文本文件的输出.我只想输出> = 3的回文并计数它们,例如

All im getting from this script is output of the whole text file. I want to only output palindromes >=3 and count them such as

425做过

120非

等我的文本文件称为sample.txt,每次我使用以下命令运行脚本时:cat sample.txt |源回文,我收到消息"bash ::没有这样的文件或目录".

etc. My textfile is called sample.txt and everytime i run the script with: cat sample.txt | source palindrome I get message 'bash: : No such file or directory'.

推荐答案

使用 awk sed

awk 'function palindrome(str) {len=length(str); for(k=1; k<=len/2+len%2; k++) { if(substr(str,k,1)!=substr(str,len+1-k,1)) return 0 } return 1 } {for(i=1; i<=NF; i++) {if(length($i)>=3){ gsub(/[^a-zA-Z]/,"",$i); if(length($i)>=3) {$i=tolower($i); if(palindrome($i)) arr[$i]++ }} } } END{for(i in arr) print arr[i],i}' file | sed -E '/^[0-9]+ (.)\1+$/d'

1.2GB 文件上进行了测试,执行时间为〜4m 40s (i5-6440HQ @ 2.60GHz/4 cores/16GB)

Tested on 1.2GB file and execution time was ~4m 40s (i5-6440HQ @ 2.60GHz/4 cores/16GB)

说明:

awk '
    function palindrome(str)               # Function to check Palindrome
    {
        len=length(str); 
        for(k=1; k<=len/2+len%2; k++) 
        { 
            if(substr(str,k,1)!=substr(str,len+1-k,1)) 
            return 0 
        } 
        return 1 
    } 

    {
        for(i=1; i<=NF; i++)               # For Each field in a record
        {
            if(length($i)>=3)              # if length>=3
            { 
                gsub(/[^a-zA-Z]/,"",$i);   # remove non-alpha character from it
                if(length($i)>=3)          # Check length again after removal
                {
                    $i=tolower($i);        # Covert to lowercase
                    if(palindrome($i))     # Check if it's palindrome
                        arr[$i]++          # and store it in array
                }
            }
        } 
    } 

    END{for(i in arr) print arr[i],i}' file | sed -E '/^[0-9]+ (.)\1+$/d' 

sed -E '/^[0-9]+ (.)\1+$/d':从最终结果中检查哪些字符串由重复的字符组成,例如AAABBB等,并删除它们.

sed -E '/^[0-9]+ (.)\1+$/d' : From the final result check which strings are composed of just repeated chracters like AAA, BBB etc and remove them.


旧答案(在编辑之前)

如果需要,您可以尝试以下步骤:

You can try below steps if you want to :

第1步:预处理
删除所有不必要的字符并将结果存储在临时文件中

Step 1 : Pre-processing
Remove all unnecessary chars and store the result in temp file

tr -dc 'a-zA-Z\n\t ' <file | tr ' ' '\n' > temp

tr -dc 'a-zA-Z\n\t '这将删除字母,\n\t和空格

tr -dc 'a-zA-Z\n\t ' This will remove all except letters,\n,\t, space

tr ' ' '\n'这会将空格转换为\n,以换行符分隔每个单词

tr ' ' '\n' This will convert space to \n to separate each word in newlines

第2步:处理

grep -wof temp <(rev temp)  | sed -E -e '/^(.)\1+$/d' | awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'

grep -wof temp <(rev temp)这将给您所有回文症
-w:仅选择那些包含组成整个单词的匹配项的行. 例如:levellevelAAA
不匹配 -o:仅打印匹配的组
-f:将temp文件中的每个字符串用作在<(rev temp)

grep -wof temp <(rev temp) This will give you all palindromes
-w : Select only those lines containing matches that form whole words. For example : level won't match with levelAAA
-o : Print only the matched group
-f : To use each string in temp file as pattern to search in <(rev temp)

sed -E -e '/^(.)\1+$/d':这将删除由相同字母组成的单词,例如AAABBBBB

sed -E -e '/^(.)\1+$/d': This will remove words formed of same letters like AAA, BBBBB

awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }':这将过滤具有length>=3的单词并计算其出现频率,最后打印结果

awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }' : This will filter words having length>=3 and counts their frequency and finally prints the result

示例:

输入文件:

$ cat file
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done

输出:

$ tr -dc 'a-zA-Z\n\t ' <file | tr ' ' '\n' > temp
$ grep -wof temp <(rev temp)  | sed -E -e '/^(.)\1+$/d' | awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }' 
3 dad
3 kayak
3 bob

这篇关于在文本文件中计算回文数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆