字符串中的搜索建议 [英] Search suggestion in strings
问题描述
我有一个文本文件,其中包含:
mariam amr sara john jessy salma mkkkkkaooooorllll
I have a text file containing:
mariam amr sara john jessy salma mkkkkkaooooorllll
用户输入要搜索的单词:例如:maram
the user enters a word to search for: for example: maram
如您所见,它在我的文本文件中不存在..我想提出建议,类似于maram是mariam这个词
As you can see, it does not exist in my text file .. I want to give suggestions, similar to the word maram is mariam
我使用了最长的公共子序列,但它给出了mariam
和mkkkkkaooooorllll
,因为它们都包含了最长的公共子序列"mar"
I used longest common subsequence but it gives mariam
and mkkkkkaooooorllll
because both contain the Longest common subsequence "mar"
我只想强制选择马里亚姆 有什么想法吗?
I want to force the choice of mariam only Any ideas ?
预先感谢
/**
** Java Program to implement Longest Common Subsequence Algorithm
**/
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
/** Class LongestCommonSubsequence **/
public class LongestCommonSubsequence
{
/** function lcs **/
public String lcs(String str1, String str2)
{
int l1 = str1.length();
int l2 = str2.length();
int[][] arr = new int[l1 + 1][l2 + 1];
for (int i = l1 - 1; i >= 0; i--)
{
for (int j = l2 - 1; j >= 0; j--)
{
if (str1.charAt(i) == str2.charAt(j))
arr[i][j] = arr[i + 1][j + 1] + 1;
else
arr[i][j] = Math.max(arr[i + 1][j], arr[i][j + 1]);
}
}
int i = 0, j = 0;
StringBuffer sb = new StringBuffer();
while (i < l1 && j < l2)
{
if (str1.charAt(i) == str2.charAt(j))
{
sb.append(str1.charAt(i));
i++;
j++;
}
else if (arr[i + 1][j] >= arr[i][j + 1])
i++;
else
j++;
}
return sb.toString();
//read text file, if a word contains sb.toString() , print it
}
/** Main Function **/
public static void main(String[] args) throws IOException
{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Longest Common Subsequence Algorithm Test\n");
System.out.println("\nEnter string 1");
String str1 = br.readLine();
System.out.println("\nEnter string 2");
String str2 = br.readLine();
LongestCommonSubsequence obj = new LongestCommonSubsequence();
String result = obj.lcs(str1, str2);
System.out.println("\nLongest Common Subsequence : "+ result);
}
}
推荐答案
有一些这样的模糊匹配技术-Apache Commons提供了一些出色的工具来比较两个字符串之间的相似度.在Javadoc中查看 Jaro Winkler Distance 计算方法.
There are a few techniques for fuzzy matching like this - Apache Commons provides some excellent tools for comparing how similar two strings are to one another. Check out the javadoc for Levenshtein Distance and Jaro Winkler Distance calculation methods.
使用Levenshtein距离时,得分越低,字符串越相似:
With Levenshtein Distance, the lower the score, the more similar the strings are:
StringUtils.getLevenshteinDistance("frog", "fog") == 1
StringUtils.getLevenshteinDistance("fly", "ant") == 3
You could also consider calculating the Double Metaphone for each string - this will allow you to determine how similar the strings 'sound' when spoken, even if they aren't necessarily spelt similarly.
回到您的问题-使用这些工具,如果用户的搜索词在文本文件中任何字符串的某个阈值之内,您都可以提出建议.
Back to your question - using these tools, you could throw up suggestions if the user's search term is within a certain threshold of any of the strings in your text file.
这篇关于字符串中的搜索建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!