Radix(Trie)树在Java中用于Cutomer搜索的实现 [英] Radix(Trie) Tree implementation for Cutomer search in Java

查看:246
本文介绍了Radix(Trie)树在Java中用于Cutomer搜索的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从事一个项目,需要搜索数百万客户的数据.我想实现基数(trie)搜索算法.我已经阅读并实现了用于简单字符串收集的基数.但是在这里,我有一个客户集合,想按名称或手机号码进行搜索.

I am working on a project and need to search in data of millions of customers. I want to implement radix(trie) search algorithm. I have read and implement radix for a simple string collections. But Here I have a collection of customers and want to search it by name or by mobile number.

客户类别:

public class Customer {

    String name;
    String mobileNumer;


    public Customer (String name, String phoneNumer) {
        this.name = name;
        this.mobileNumer = phoneNumer;
    }

    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getPhoneNumer() {
        return mobileNumer;
    }
    public void setPhoneNumer(String phoneNumer) {
        this.mobileNumer = phoneNumer;
    }



}

RadixNode类:

import java.util.HashMap;
import java.util.Map;

class RadixNode {
    private final Map<Character, RadixNode> child = new HashMap<>();
    private final Map<Customer, RadixNode> mobileNum = new HashMap<>();
    private boolean endOfWord;

    Map<Character, RadixNode> getChild() {
        return child;
    }

    Map<Customer, RadixNode> getChildPhoneDir() {
        return mobileNum;
    }

    boolean isEndOfWord() {
        return endOfWord;
    }

    void setEndOfWord(boolean endOfWord) {
        this.endOfWord = endOfWord;
    }
}

基数类别:

class Radix {
    private RadixNode root;

    Radix() {
        root = new RadixNode();
    }

    void insert(String word) {
        RadixNode current = root;

        for (int i = 0; i < word.length(); i++) {
            current = current.getChild().computeIfAbsent(word.charAt(i), c -> new RadixNode());
        }
        current.setEndOfWord(true);
    }

    void insert(Customer word) {
        RadixNode current = root;
        System.out.println("==========================================");
        System.out.println(word.mobileNumer.length());
        for (int i = 0; i < word.mobileNumer.length(); i++) {
            current = current.getChildPhoneDir().computeIfAbsent(word.mobileNumer.charAt(i), c -> new RadixNode());
            System.out.println(current);
        }
        current.setEndOfWord(true);
    }

    boolean delete(String word) {
        return delete(root, word, 0);
    }

    boolean containsNode(String word) {
        RadixNode current = root;

        for (int i = 0; i < word.length(); i++) {
            char ch = word.charAt(i);
            RadixNode node = current.getChild().get(ch);
            if (node == null) {
                return false;
            }
            current = node;
        }
        return current.isEndOfWord();
    }

    boolean isEmpty() {
        return root == null;
    }

    private boolean delete(RadixNode current, String word, int index) {
        if (index == word.length()) {
            if (!current.isEndOfWord()) {
                return false;
            }
            current.setEndOfWord(false);
            return current.getChild().isEmpty();
        }
        char ch = word.charAt(index);
        RadixNode node = current.getChild().get(ch);
        if (node == null) {
            return false;
        }
        boolean shouldDeleteCurrentNode = delete(node, word, index + 1) && !node.isEndOfWord();

        if (shouldDeleteCurrentNode) {
            current.getChild().remove(ch);
            return current.getChild().isEmpty();
        }
        return false;
    }

    public void displayContactsUtil(RadixNode curNode, String prefix) 
    { 

        // Check if the string 'prefix' ends at this Node 
        // If yes then display the string found so far 
        if (curNode.isEndOfWord()) 
            System.out.println(prefix); 

        // Find all the adjacent Nodes to the current 
        // Node and then call the function recursively 
        // This is similar to performing DFS on a graph 
        for (char i = 'a'; i <= 'z'; i++) 
        { 
            RadixNode nextNode = curNode.getChild().get(i); 
            if (nextNode != null) 
            { 
                    displayContactsUtil(nextNode, prefix + i); 
            } 
        } 
    }


    public boolean displayContacts(String str) 
    { 
        RadixNode prevNode = root; 

        // 'flag' denotes whether the string entered 
        // so far is present in the Contact List 

        String prefix = ""; 
        int len = str.length(); 

        // Display the contact List for string formed 
        // after entering every character 
        int i; 
        for (i = 0; i < len; i++) 
        { 
            // 'str' stores the string entered so far 
            prefix += str.charAt(i); 

            // Get the last character entered 
            char lastChar = prefix.charAt(i); 

            // Find the Node corresponding to the last 
            // character of 'str' which is pointed by 
            // prevNode of the Trie 
            RadixNode curNode = prevNode.getChild().get(lastChar); 

            // If nothing found, then break the loop as 
            // no more prefixes are going to be present. 
            if (curNode == null) 
            { 
                System.out.println("No Results Found for \"" + prefix + "\""); 
                i++; 
                break; 
            } 

            // If present in trie then display all 
            // the contacts with given prefix. 
            System.out.println("Suggestions based on \"" + prefix + "\" are"); 
            displayContactsUtil(curNode, prefix); 

            // Change prevNode for next prefix 
            prevNode = curNode; 
        } 

        for ( ; i < len; i++) 
        { 
            prefix += str.charAt(i); 
            System.out.println("No Results Found for \""  + prefix + "\""); 
        }

        return true;
    }


    public void displayContactsUtil(RadixNode curNode, String prefix, boolean isPhoneNumber) 
    { 

        // Check if the string 'prefix' ends at this Node 
        // If yes then display the string found so far 
        if (curNode.isEndOfWord()) 
            System.out.println(prefix); 

        // Find all the adjacent Nodes to the current 
        // Node and then call the function recursively 
        // This is similar to performing DFS on a graph 
        for (char i = '0'; i <= '9'; i++) 
        { 
            RadixNode nextNode = curNode.getChildPhoneDir().get(i); 
            if (nextNode != null) 
            { 
                    displayContactsUtil(nextNode, prefix + i); 
            } 
        } 
    }

    public boolean displayContacts(String str, boolean isPhoneNumber) 
    { 
        RadixNode prevNode = root; 

        // 'flag' denotes whether the string entered 
        // so far is present in the Contact List 

        String prefix = ""; 
        int len = str.length(); 

        // Display the contact List for string formed 
        // after entering every character 
        int i; 
        for (i = 0; i < len; i++) 
        { 
            // 'str' stores the string entered so far 
            prefix += str.charAt(i); 

            // Get the last character entered 
            char lastChar = prefix.charAt(i); 

            // Find the Node corresponding to the last 
            // character of 'str' which is pointed by 
            // prevNode of the Trie 
            RadixNode curNode = prevNode.getChildPhoneDir().get(lastChar); 

            // If nothing found, then break the loop as 
            // no more prefixes are going to be present. 
            if (curNode == null) 
            { 
                System.out.println("No Results Found for \"" + prefix + "\""); 
                i++; 
                break; 
            } 

            // If present in trie then display all 
            // the contacts with given prefix. 
            System.out.println("Suggestions based on \"" + prefix + "\" are"); 
            displayContactsUtil(curNode, prefix, isPhoneNumber); 

            // Change prevNode for next prefix 
            prevNode = curNode; 
        } 

        for ( ; i < len; i++) 
        { 
            prefix += str.charAt(i); 
            System.out.println("No Results Found for \""  + prefix + "\""); 
        }

        return true;
    } 


}

我尝试在集合中进行搜索,但被卡住了.任何帮助/建议将不胜感激.

I have tried to search in a collection but got stuck. Any help / suggestion would be appreciated.

推荐答案

我为您提出了两种方法.

I propose you 2 ways of doing it.

第一种方法:使用单个树.
可以将所有需要的信息存储在一个trie中.您的客户类别很好,这是可能的RadixNode实现.
我认为不能有两个具有相同名称或电话号码的客户.如果不是这种情况(例如,可能有姓名相同且电话nb不同的人),请在我将要编辑的评论中告诉我.
需要了解的重要一点是,如果您想以两种不同的方式来寻找客户,并且您使用一个特里树,则每个客户在您的特里树中将出现两次.在对应于其名称的路径末尾出现一次,并在对应其电话号码的路径末尾出现一次.

First way: with a single trie.
It is possible to store all you need in a single trie. Your customer class is fine, and here is a possible RadixNode implementation.
I consider that there cannot be two customers with the same name, or with the same phone number. If it is not the case (possibility to have people with same name and different phone nb for instance) tell me in a comment I'll edit.
The thing that is important to understand, is that if you want to have two different ways of finding a customer, and you use a single trie, each customer will appear twice in your trie. Once at the end of the path corresponding to its name, and once after the end of the path corresponding to its phone number.

import java.util.HashMap;
import java.util.Map;

class RadixNode {
    private Map<Character, RadixNode> children;
    private Customer customer;

    public RadixNode(){
        this.children = new Map<Character, RadixNode>();
        this.Customer = NULL;
    }
    Map<Character, RadixNode> getChildren() {
        return children;
    }
    boolean hasCustomer() {
        return this.customer != NULL;
    }
    Customer getCustomer() {
        return customer;
    }
    void setCustomer(Customer customer) {
        this.customer = customer;
    }
}

如您所见,只有一张地图存储该节点的子节点.那是因为我们可以看到一个电话号码为一串数字,所以这个Trie将存储所有的客户……两次.每个名称一次,每个电话号码一次.
现在让我们看一下一个插入函数.您的特里需要一个根,我们称之为root.

As you can see, there is only one map storing the node's children. That is because we can see a phone number as a string of digits, so this trie will store all the customers ... twice. Once per name, once per phone number.
Now let's see an insert function. Your trie will need a root,n let's call it root.

public void insert(RadixNode root, Customer customer){
    insert_with_name(root, customer, 0);
    insert_with_phone_nb(root, customer, 0);
}

public void insert_with_name(RadixNode node, Customer customer, int idx){
    if (idx == customer.getName().length()){
        node.setCustomer(customer);
    } else {
        Character current_char = customer.getName().chatAt(idx);
        if (! node.getChlidren().containsKey(current_char){
            RadixNode new_child = new RadixNode();
            node.getChildren().put(current_char, new_child);
        }
        insert_with_name(node.getChildren().get(current_char), customer, idx+1);
    }
}

insert_with_phone_nb()方法与此类似.只要人们具有唯一的名称,唯一的电话号码,并且某人的名字不能是某人的电话号码,此功能就可以使用.
如您所见,该方法是递归的.我建议您递归地构建trie结构(以及通常基于树结构的所有内容),因为它可以简化和通用的更简洁的代码.
搜索功能几乎是插入功能的复制粘贴:

The insert_with_phone_nb() method is similar. This will work as long as people has unique names, unique phone numbers, and that someone's name cannot be someone's phone number.
As you can see, the method is recursive. I advice you to build your trie structure (and generally, everything based on tree structures) recursively, as it makes for simpler, and generallay cleaner code.
The search function is almost a copy-paste of the insert function:

public void search_by_name(RadixNode node, String name, int idx){
    // returns NULL if there is no user going by that name
    if (idx == name.length()){
        return node.getCustomer();
    } else {
        Character current_char =  name.chatAt(idx);
        if (! node.getChlidren().containsKey(current_char){
            return NULL;
        } else {
            return search_by_name(node.getChildren().get(current_char), name, idx+1);
        }
    }
}

第二种方式:尝试2次
原理是相同的,您所要做的就是重用上面的代码,但是保留两个不同的root节点,每个节点都将构建一个特里(一个用于名称,一个用于电话号码). 唯一的区别是insert函数(因为它将调用具有两个不同根的insert_with_nameinsert_with_phone_nb)以及搜索功能,该搜索功能也必须在正确的Trie中进行搜索.

Second way: with 2 tries
The principle is the same, all you have to do is reuse the code above, but keep two distinct root nodes, each of them will build a trie (one for names, one for phone numbers). The only difference will be the insert function (as it will call insert_with_name and insert_with_phone_nb with 2 different roots), and the search function which will have to search in the right trie as well.

public void insert(RadixNode root_name_trie, RadixNode root_phone_trie, Customer customer){
    insert_with_name(root_name_trie, customer, 0);
    insert_with_phone_nb(root_phone_trie, customer, 0);
}

编辑:在说明可能有相同名称的客户之后,这是一个替代实现,允许RadixNode包含对多个Customer的引用.
用例如Vector<Customer>替换RadixNode中的Customer customer属性.当然,必须对方法进行相应的修改,然后按名称进行搜索将向您返回客户向量(可能为空),因为这种搜索会导致多个结果.
在您的情况下,我会尝试一个单一的尝试,其中包含客户向量.因此,您既可以按姓名和电话进行搜索(将号码显示为String),也可以维护一个数据结构.

Edit: After comment precising there might be customers with the same name, here is an alternative implementation, to allow a RadixNode to contain references toward several Customer.
Replace the Customer customer attribute in RadixNode by, for example, a Vector<Customer>. The methods will have to be modified accordingly of course, and a search by name will then return to you a vector of customers (possibly empty), since this search can then lead to several results.
In your case, I'd go for a single trie, containing vectors of customers. So you can have both a search by name and phone (cast the number as a String), and a single data structure to maintain.

这篇关于Radix(Trie)树在Java中用于Cutomer搜索的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆