文字和关键字列表之间的相似性? [英] Affinity between a text and a list of keywords?
本文介绍了文字和关键字列表之间的相似性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一部分文字(500-1500个字符)
I have a portion of text (500-1500 chars)
我有一个关键字列表(1000条记录).
And I have a list of keywords (1000 records)..
我应该怎么做才能从该列表中找到与给定文本相关的关键字?
What should I do to find the keywords from that list that are related to my given text?
我当时想搜索列表中每个关键字的文本中这些关键字的出现频率,但是我认为这有点昂贵"
I was thinking to search the occorences of those keywords in my text for every keywords in the list, but it's a bit "expensive" i think
谢谢
推荐答案
我把帽子戴在戒指里……
I throw my hat in the ring …
function extractWords($text, $minWordLength = null, array $stopwords = array(), $caseIgnore = true)
{
$pattern = '/\w'. (is_null($minWordLength) ? '+' : '{'.$minWordLength.',}') .'/';
$matches = array();
preg_match_all($pattern, $text, $matches);
$words = $matches[0];
if ($caseIgnore) {
$words = array_map('strtolower', $words);
$stopWords = array_map('strtolower', $stopwords);
}
$words = array_diff($words, $stopwords);
return $words;
}
function countKeywords(array $words, array $keywords, $threshold = null, $caseIgnore = true)
{
if ($caseIgnore) {
$keywords = array_map('strtolower', $keywords);
}
$words = array_intersect($words, $keywords);
$counts = array_count_values($words);
arsort($counts, SORT_NUMERIC);
if (!is_null($threshold)) {
$counts = array_filter($counts, function ($count) use ($threshold) { return $count >= $threshold; });
}
return $counts;
}
用法:
$text = 'a b c a'; // your text
$keywords = array('a', 'b'); // keywords from your database
$words = extractWords($text);
$count = countKeywords($words, $keywords);
print_r($count);
$total = array_sum($count);
var_dump($total);
$affinity = ($total == 0 ? 0 : 1 / (count($words) / $total));
var_dump($affinity);
打印
数组 ( [a] => 2 [b] => 1 )
int(3)
浮动(0.75)
Array ( [a] => 2 [b] => 1 )
int(3)
float(0.75)
这篇关于文字和关键字列表之间的相似性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文