相似性算法建议，使用二维关联数组 [英] Similarity algorithm advice, using two dimensional associative array

查看：77 发布时间：2021/4/9 20:33:06 php arrays algorithm similarity

本文介绍了相似性算法建议，使用二维关联数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

该算法的主要目标是从不同的Web来源中找到新闻标题的相似标题并将其分组，比如说相似性高于55.55％.

The main goal of this algorithm is to find similar titles of news articles from different sources of web and group them, let's say above 55.55% similarity.

我当前的算法方法包括以下步骤:

My current approach of the algorithm consist of following steps:

将数据从MYSQL数据库输入二维数组. $ arrayOne .
将该数组的另一个副本复制到ex中. $ arrayTwo .
创建一个干净的数组，该数组将仅包含相似的标题和其他内容. $ array_smlr .
循环，foreach $ arrayOne article_title 检查与 $ arrayTwo article_title
如果两个标题之间的相似度超过55％，并且该文章不是来自同一新闻来源(这样我就不会检查来自同一新闻来源的同一文章)，则将其添加到 $ array_smlr
根据相似性百分比对 $ array_smlr 进行排序，这样我就可以对相似的标题进行分组.

Feed data from MYSQL database into a two-dimensional array ex. $arrayOne.
Make another copy of that array into ex. $arrayTwo.
Create a clean array which will only contain similar titles and other content ex. $array_smlr.
Loop, foreach $arrayOne article_title check for similarity with $arrayTwo article_title
If similarity of between two titles is above 55% and if the article is not from the same news source (this way I don't check same articles from the same source) add it to $array_smlr
Sort the $array_smlr based on percentages of similarity, this way I end up grouping titles that are similar.

下面是我执行上述任务的代码.

Below is my code for the above tasks mentioned.

$result = mysqli_query($conn,"SELECT id_articles,article_img,article_title,LEFT(article_content , 200),psource, date_fetched FROM project.articles WHERE " . rtrim($values,' or') . " ORDER BY date_fetched DESC LIMIT 70");

$arrayOne=array();
$arrayTwo=array();

while($row = mysqli_fetch_assoc($result)){
    $arrayOne[] = $row;
}
$arrayTwo = $arrayOne;
$array_smlr=array();
foreach ($arrayOne as $rowOne) {
    foreach($arrayTwo as $rowTwo){
        $compare = similar_text($rowOne['article_title'], $rowTwo['article_title'], $p);
        if ( round($p,2) >= 55.50 and $rowOne['psource'] != $rowTwo['psource'] ){
            $data =  array('percentage' => round($p,2), 'article_title' => $rowTwo['article_title'], 'psource' => $rowTwo['psource'], 'id_articles' => $rowTwo['id_articles'], 'date_fetched' =>$rowTwo['date_fetched']);
            $array_smlr[]=$data; 
        }
    }
}
array_multisort($array_smlr);
foreach($array_smlr as $row3){
    echo $row3['percentage'] . $row3['article_title'] . $row3['psource'] . $row3['id_articles'] . $row3['date_fetched'] . "<br><br>";
}

仅当我有两个相似的标题时，这才可以使用有限的功能，但是如果我有3个相似的标题，则它将包含 $ array_smlr 中重复的数据行.

This would work with limited functionality, only if I had two similar titles, but let's say if I had 3 similar titles, it would include duplicated rows of data in $array_smlr.

如果您有关于优化此算法以提高性能的任何建议，我将不胜感激.

I would appreciate if you have any suggestions on optimization of this algorithm in order to improve the performance.

谢谢

推荐答案

如果没有$ key通配符，您实际上并不需要2个数组而不是foreach循环，可以将它与$ key一起使用，并在$ key为相同.然后，您还可以避免受骗.

You don't really need 2 arrays instead of the foreach loop without $key wildcard you can use it with $key and skip the solver when the $key is the same. Then you also avoid dupes.

foreach ($arrayOne as $key => $rowOne) {
   foreach($arrayOne as $ikey => $rowTwo){
      if ($ikey != $key) {
        $compare = similar_text($rowOne['article_title'],$rowTwo['article_title'], $p);
        if ( round($p,2) >= 55.50 and $rowOne['psource'] != $rowTwo['psource'] ){
            $data =  array('percentage' => round($p,2), 'article_title' => $rowTwo['article_title'], 'psource' => $rowTwo['psource'], 'id_articles' => $rowTwo['id_articles'], 'date_fetched' =>$rowTwo['date_fetched']);
            $array_smlr[$rowTwo['id_articles']]=$data; 
        }
    }
}

这篇关于相似性算法建议，使用二维关联数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

相似性算法建议，使用二维关联数组 [英] Similarity algorithm advice, using two dimensional associative array

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

相似性算法建议，使用二维关联数组 [英] Similarity algorithm advice, using two dimensional associative array

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭