字符串中的搜索建议 [英] Search suggestion in strings

查看:106
本文介绍了字符串中的搜索建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,其中包含: mariam amr sara john jessy salma mkkkkkaooooorllll

I have a text file containing: mariam amr sara john jessy salma mkkkkkaooooorllll

用户输入要搜索的单词:例如:maram

the user enters a word to search for: for example: maram

如您所见,它在我的文本文件中不存在..我想提出建议,类似于maram是mariam这个词

As you can see, it does not exist in my text file .. I want to give suggestions, similar to the word maram is mariam

我使用了最长的公共子序列,但它给出了mariammkkkkkaooooorllll,因为它们都包含了最长的公共子序列"mar"

I used longest common subsequence but it gives mariam and mkkkkkaooooorllll because both contain the Longest common subsequence "mar"

我只想强制选择马里亚姆 有什么想法吗?

I want to force the choice of mariam only Any ideas ?

预先感谢

/**
 ** Java Program to implement Longest Common Subsequence Algorithm
 **/

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;

   /** Class  LongestCommonSubsequence **/
    public class  LongestCommonSubsequence
    {    
   /** function lcs **/
    public String lcs(String str1, String str2)
    {
    int l1 = str1.length();
    int l2 = str2.length();

    int[][] arr = new int[l1 + 1][l2 + 1];

    for (int i = l1 - 1; i >= 0; i--)
    {
        for (int j = l2 - 1; j >= 0; j--)
        {
            if (str1.charAt(i) == str2.charAt(j))
                arr[i][j] = arr[i + 1][j + 1] + 1;
            else 
                arr[i][j] = Math.max(arr[i + 1][j], arr[i][j + 1]);
        }
    }

    int i = 0, j = 0;
    StringBuffer sb = new StringBuffer();
    while (i < l1 && j < l2) 
    {
        if (str1.charAt(i) == str2.charAt(j)) 
        {
            sb.append(str1.charAt(i));
            i++;
            j++;
        }
        else if (arr[i + 1][j] >= arr[i][j + 1]) 
            i++;
        else
            j++;
    }


    return sb.toString(); 
   //read text file, if a word contains sb.toString() , print it


}

/** Main Function **/
public static void main(String[] args) throws IOException
{    
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    System.out.println("Longest Common Subsequence Algorithm Test\n");

    System.out.println("\nEnter string 1");
    String str1 = br.readLine();

    System.out.println("\nEnter string 2");
    String str2 = br.readLine();

    LongestCommonSubsequence obj = new LongestCommonSubsequence(); 
    String result = obj.lcs(str1, str2);

    System.out.println("\nLongest Common Subsequence : "+ result);
}

}

推荐答案

有一些这样的模糊匹配技术-Apache Commons提供了一些出色的工具来比较两个字符串之间的相似度.在Javadoc中查看

There are a few techniques for fuzzy matching like this - Apache Commons provides some excellent tools for comparing how similar two strings are to one another. Check out the javadoc for Levenshtein Distance and Jaro Winkler Distance calculation methods.

使用Levenshtein距离时,得分越低,字符串越相似:

With Levenshtein Distance, the lower the score, the more similar the strings are:

StringUtils.getLevenshteinDistance("frog", "fog") == 1
StringUtils.getLevenshteinDistance("fly", "ant") == 3

您还可以考虑计算

You could also consider calculating the Double Metaphone for each string - this will allow you to determine how similar the strings 'sound' when spoken, even if they aren't necessarily spelt similarly.

回到您的问题-使用这些工具,如果用户的搜索词在文本文件中任何字符串的某个阈值之内,您都可以提出建议.

Back to your question - using these tools, you could throw up suggestions if the user's search term is within a certain threshold of any of the strings in your text file.

这篇关于字符串中的搜索建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆