优化简单的搜索算法 [英] Optimizing a simple search algorithm
问题描述
我一直在玩一个相当简单的自制搜索引擎,我现在正在研究一些相关性排序代码。
I have been playing around a bit with a fairly simple, home-made search engine, and I'm now twiddling with some relevancy sorting code.
它不是非常漂亮,但是当谈到聪明的算法时我不是很好,所以我希望我能得到一些建议:)
It's not very pretty, but I'm not very good when it comes to clever algorithms, so I was hoping I could get some advice :)
基本上,我想要每个搜索结果根据搜索条件匹配的单词数来获得评分。每个确切单词3分,部分匹配1分
Basically, I want each search result to get scoring based on how many words match the search criteria. 3 points per exact word and one point for partial matches
例如,如果我搜索冬季雪,结果就是:
For example, if I search for "winter snow", these would be the results:
- 冬天 雪 => 6分
- 冬天 雪 ing => 4分
- 冬天土地雪 => 4分
- 冬天太阳=> 3分
- 冬天土地雪 ing => 2分
- winter snow => 6 points
- winter snowing => 4 points
- winterland snow => 4 points
- winter sun => 3 points
- winterland snowing => 2 points
以下是代码:
String[] resultWords = result.split(" ");
String[] searchWords = searchStr.split(" ");
int score = 0;
for (String resultWord : resultWords) {
for (String searchWord : searchWords) {
if (resultWord.equalsIgnoreCase(searchWord))
score += 3;
else if (resultWord.toLowerCase().contains(searchWord.toLowerCase()))
score++;
}
}
推荐答案
您的代码对我来说似乎没问题。我建议稍作改动:
Your code seems ok to me. I suggest little changes:
由于你正在经历所有可能的组合,你可能得到你的 toLowerCase()
回到开头。
Since your are going through all possible combinations you might get the toLowerCase()
of your back at the start.
此外,如果已经发生完全匹配,则无需执行另一个等于
。
Also, if an exact match already occurred, you don't need to perform another equals
.
result = result.toLowerCase();
searchStr = searchStr.toLowerCase();
String[] resultWords = result.split(" ");
String[] searchWords = searchStr.split(" ");
int score = 0;
for (String resultWord : resultWords) {
boolean exactMatch = false;
for (String searchWord : searchWords) {
if (!exactMatch && resultWord.equals(searchWord)) {
exactMatch = true;
score += 3;
} else if (resultWord.contains(searchWord))
score++;
}
}
当然,这是一个非常基本的水平。如果你真的对这个计算机科学领域感兴趣并希望了解更多关于实现搜索引擎的信息,请从以下术语开始:
Of course, this is a very basic level. If you are really interested in this area of computer science and want to learn more about implementing search engines start with these terms:
- Natural Language Processing
- Information retrieval
- Text mining
这篇关于优化简单的搜索算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!