用Java实现最佳匹配搜索 [英] Implementing a best match search in Java

查看:137
本文介绍了用Java实现最佳匹配搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获得最佳匹配字符串匹配,以使用现有Java数据结构工作。虽然速度很慢,但是任何改进其性能的建议都会受到欢迎。



示例数据如下所示:

 键| V 
---------------------
0060175559138 | VIP
--------------
006017555 |本国
--------------
006017 |本地
---------------
0060 | X
--------------

所以键上的最佳匹配搜索= 0060175552020将返回006017555



我可以想到的一种方法是使用散列将多个TreeMap转换为不同的地图,从而形成搜索区域

  private final TreeMap< String,V>指数; 

公共场合< V> syncBestMatch(String key){
Entry< String,V> entry = index.headMap(key,true)
.descendingMap()。entrySet()。stream()
.filter(e-> isPartiallyOrFullyMatching(key,e.getKey()))
.findFirst()
.orElseThrow(()-> new NoMatchException(找不到匹配项));

Set< V>结果=新的HashSet<>();
results.add(entry.getValue());
个返回结果;
}


解决方案

使用 TreeMap floorEntry(K键) 方法:


返回与小于或等于给定键的最大键关联的键值映射,如果没有这样的键,则返回 null


以下内容已简化。如果找到无效的条目,例如如果地图上的键为 0060175551000 ,在这种情况下,您需要找到搜索键和找到的键之间的公共前缀,然后再次进行查找。冲洗并重复。

  TreeMap< String,String> map = new TreeMap<>(); 
map.put( 0060175559138, VIP);
map.put( 006017555, National);
map.put( 006017, Local);
map.put( 0060, X);

字符串键= 0060175552020;
Entry< String,String> entry = map.floorEntry(key);
if(entry == null)
System.out.println(找不到: +键);
else {
System.out.println(key);
System.out.println(entry);
}

输出

  0060175552020 
006017555 =国家






更新有完整的代码,带有用于扩展搜索的循环。

 私有静态Entry< String,String> lookup(NavigableMap< String,String>地图,字符串键){
String keyToFind = key; (;;){
Entry< String,String> entry = map.floorEntry(keyToFind);
if(entry == null)
返回null;
字符串foundKey = entry.getKey();
int prefixLen = 0;
而(prefixLen< keyToFind.length()&& prefixLen< foundKey.length()&&
keyToFind.charAt(prefixLen)== foundKey.charAt(prefixLen))
prefixLen ++;
if(prefixLen == 0)
返回null;
if(prefixLen == foundKey.length())
返回条目;
keyToFind = key.substring(0,prefixLen);
}
}

Test

  TreeMap< String,String> map = new TreeMap<>(); 
map.put( 0060175559138, VIP);
map.put( 0060175551000, Other);
map.put( 006017555, National);
map.put( 006017, Local);
map.put( 0060, X);

System.out.println(lookup(map, 0060175559138));
System.out.println(lookup(map, 0060175552020));
System.out.println(lookup(map, 0055708570068));
System.out.println(lookup(map, 8684064893870));

输出



< pre class = lang-none prettyprint-override> 0060175559138 = VIP
006017555 =国家
null
null


I am trying to get a best match string matching to work using existing Java data structures. It is quite slow though, any suggestions to improve its performance will be welcomed .

the Sample data would look like this

Key | V
--------------------- 
0060175559138 | VIP
--------------
006017555     | National
--------------
006017        | Local
---------------
0060          | X
--------------

so a best match search on the key = 0060175552020 will return 006017555

One way I can think of is having multiple TreeMaps using hashing to divert the data into different maps hence making the search area smaller.

private final TreeMap<String, V> index;

public Set<V> syncBestMatch(String key) {              
    Entry<String,V> entry = index.headMap(key, true)
                .descendingMap().entrySet().stream()
                .filter(e -> isPartiallyOrFullyMatching(key, e.getKey()))
                .findFirst()
                .orElseThrow(() -> new NoMatchException("No match found"));

    Set<V> results = new HashSet<>();
    results.add(entry.getValue());
    return results;
}

解决方案

Use a TreeMap and the floorEntry(K key) method:

Returns a key-value mapping associated with the greatest key less than or equal to the given key, or null if there is no such key.

The following is simplified. Real code would need to search if an invalid entry is found, e.g. if the map had a key 0060175551000, in which case you'd need to find the common prefix between the search key and the found key, then do the lookup again. Rinse and repeat.

TreeMap<String, String> map = new TreeMap<>();
map.put("0060175559138", "VIP");
map.put("006017555"    , "National");
map.put("006017"       , "Local");
map.put("0060"         , "X");

String key = "0060175552020";
Entry<String, String> entry = map.floorEntry(key);
if (entry == null)
    System.out.println("Not found: " + key);
else {
    System.out.println(key);
    System.out.println(entry);
}

Output

0060175552020
006017555=National


UPDATE There is the full code, with loop for extended search.

private static Entry<String, String> lookup(NavigableMap<String, String> map, String key) {
    String keyToFind = key;
    for (;;) {
        Entry<String, String> entry = map.floorEntry(keyToFind);
        if (entry == null)
            return null;
        String foundKey = entry.getKey();
        int prefixLen = 0;
        while (prefixLen < keyToFind.length() && prefixLen < foundKey.length() &&
               keyToFind.charAt(prefixLen) == foundKey.charAt(prefixLen))
            prefixLen++;
        if (prefixLen == 0)
            return null;
        if (prefixLen == foundKey.length())
            return entry;
        keyToFind = key.substring(0, prefixLen);
    }
}

Test

TreeMap<String, String> map = new TreeMap<>();
map.put("0060175559138", "VIP");
map.put("0060175551000", "Other");
map.put("006017555"    , "National");
map.put("006017"       , "Local");
map.put("0060"         , "X");

System.out.println(lookup(map, "0060175559138"));
System.out.println(lookup(map, "0060175552020"));
System.out.println(lookup(map, "0055708570068"));
System.out.println(lookup(map, "8684064893870"));

Output

0060175559138=VIP
006017555=National
null
null

这篇关于用Java实现最佳匹配搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆