如何在句子之间进行比较并计算相似度? [英] How to make a comparison between sentences and calculate the similarity?

查看：95 发布时间：2021/4/14 19:36:58 linux bash shell unix command-line

本文介绍了如何在句子之间进行比较并计算相似度?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何比较第二句的第一句和第一句与第三句等，并使用 shell脚本或 bash


How to make a comparison between the first sentence of the second sentence and the first sentence with the third sentence and so on, and calculate the similarity using shell script or bash
我有一个包含重复单词的句子，例如文件 my_text.txt 中的输入数据并应忽略每个句子中重复的单词，填充词和非字母字符.
I have a sentences containing duplicate words, for example, the input data in file my_text.txt
and should ignore duplicated words per sentence, filler words, and non-alphabetical characters.
 Shell脚本
Linux Shell脚本
Shell或Bash很有趣

Shell Script

Linux Shell Script

Shell or bash are fun
我用这个shell脚本来寻找相似之处
I  used this shell script to find similarity
  words=$(
  < my_text.txt tr 'A-Z' 'a-z' |
  grep -Eon '\b[a-z]*\b' |
  grep -Fwvf <(printf %s\\n is a to be by the and for) |
  sort -u | cut -d: -f2 | sort
  )
  union=$(uniq <<< "$words" | wc -l)
  intersection=$(uniq -d <<< "$words" | wc -l)
  echo "similarity is $(bc -l <<< "$intersection/$union")"

上面的脚本一次计算所有句子的相似度，但是我想找到所有相似度对(例如1:2、1:3、1:4，…，2:3、2:4，…，3:4，...)
The script Above calculates similarity for all sentences one time, but I want to find want all pairs of similarities (e.g. 1:2, 1:3, 1:4, …, 2:3, 2:4, …, 3:4, ...)
我想找到像这样的2个示例的相似之处:
I want to find similarity like this 2 example:
 第一句话和第二句话: 
两个句子的交集: Shell + Script  
工会"大小两个句子中的一个: 3  
 相似性: 0.66666666  
 第一句话和第三句话:
两个句子的交集: Shell  
工会"大小两个句子中的一个: 4  
 相似度: 0.25  
有人可以帮忙吗?
推荐答案
对我对上一个问题的回答进行了一些细微调整a>，仍将GNU awk用于FPAT和数组数组:

With a small tweak to my answer to your previous question, still using GNU awk for FPAT and arrays of arrays:
$ cat tst.awk
BEGIN {
    split("is a to be by the and for",tmp)
    for (i in tmp) {
        stopwords[tmp[i]]
    }
    FPAT="[[:alnum:]_]+"
}
{
    for (i=1; i<=NF; i++) {
        word = tolower($i)
        if ( !(word in stopwords) ) {
            words[NR>1?2:1][word]
        }
    }
}
NR > 1 {
    numCommon = 0
    for (word in words[1]) {
        if (word in words[2]) {
            numCommon++
        }
    }
    totWords = length(words[1]) + length(words[2]) - numCommon
    print (totWords ? numCommon / totWords : 0)
    delete words[2]
}


$ awk -f tst.awk file
0.666667
0.166667


                        这篇关于如何在句子之间进行比较并计算相似度?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在句子之间进行比较并计算相似度? [英] How to make a comparison between sentences and calculate the similarity?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

如何在句子之间进行比较并计算相似度? [英] How to make a comparison between sentences and calculate the similarity?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭