如何查找给定集中的多集中的所有子集? [英] How to find all subsets of a multiset that are in a given set?

查看:85
本文介绍了如何查找给定集中的多集中的所有子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一组多集D:

D = {
  {d, g, o},
  {a, e, t, t},
  {a, m, t},
}

给出多重集M,例如

M = {a, a, m, t}

我希望算法f为我提供D的所有元素,这些元素是M的子集(或更准确地说,是子多重集"):

I would like an algorithm f to give me all elements of D that are subsets (or more precisely, "submultisets") of M:

f = {{a, m, t}}

如果仅执行一个这样的查询,则扫描D的所有元素(在O(#D)时间内)显然是最佳的.但是,如果我们想针对相同的D和不同的M回答许多这样的查询,则可以通过将D预处理为一些更智能的数据结构来使其更快.

If we do only one such query, scanning over all elements of D (in O(#D) time) is clearly optimal. But if we want to answer many such queries for the same D and different M, we might be able to make it faster by preprocessing D into some smarter data structure.

我们可以将所有D扔到哈希表中,并遍历M的所有可能子集,在哈希表中查找每个子集,但这就是O(2^#M).对于小型M很好,而对于中大型M则不太好.

We could toss all of D into a hashtable and iterate over all possible subsets of M, looking each up in the hashtable, but that's O(2^#M). Fine for small M, not so fine for medium to large M.

是否可以在#M的多项式时间内执行此操作?还是将已知的NP完全问题简化为这个问题,以证明不可能快速?

Is it possible to do this in polynomial time in #M? Or maybe to reduce a known NP-complete problem to this one, to prove that it's impossible to be fast?

我只是意识到,在最坏的情况下,我们需要输出所有D,因此#D仍会出现在时间复杂度中.因此,我们假设输出的大小受某个常数限制.

I just realized that in the worst case, we need to output all of D, so #D is still going to appear in the time complexity. So let's assume that the size of the output is bounded by some constant.

推荐答案

此处是TernarySearchTree(TST)的快速实现,可以帮助您解决问题.几年前,我受到DrDobbs中一篇文章的启发.您可以在 http://www.drdobbs.com/database/上了解有关此内容的更多信息. ternary-search-trees/184410528 .它提供了有关TST和性能分析的一些背景.

Here is a quick implementation of TernarySearchTree (TST) which can help in your problem. A number of years ago, I was inspired by an article in DrDobbs. You can read more about it at http://www.drdobbs.com/database/ternary-search-trees/184410528. It provides some background about TST and performance analysis.

在您的问题描述示例中,D将是您的字典,其中包含"dgo","aett"和"amt"键.这些值与键相同.

In your problem description example, D would be your dictionary containing "dgo","aett" and "amt" keys. The values are identical to the keys.

M是您的搜索字符串,它的基本含义是将包含子集中或所有这些字母的键提供给字典中的所有值".字符的顺序并不重要.性格 '.'在搜索中用作通配符.

M is your search string which basically says "Give me all the values in the dictionary with keys containing a subset or all of these alphabets". The order of the characters are not important. The character '.' is used as wildcard in the search.

对于任何给定的M,此算法和数据结构都不需要您查看D的所有元素.因此,在这方面它将是快速的.我还对访问的节点数进行了一些测试,并且在大多数情况下,即使对于未找到的键,访问的节点数也只是字典中总节点数的一小部分.

For any given M, this algorithm and data structure does not require you to look at all elements of D. So in that respect it will be fast. I have also done some tests on the number of nodes visited and most of the time, the number of nodes visited is just a small fraction of the total nodes in the dictionary even for keys that are not found.

此算法还可以选择允许您输入最小和最大长度,以限制字典返回的键.

This algorithm also optionally allows you to enter the minimum and maximum length which limits the keys returned by the dictionary.

很抱歉,冗长的代码,但是您可以进行测试了.

Sorry for the lengthy code but it is complete for you to be able to test.

import java.util.ArrayList;
import java.io.*;

public class TSTTree<T>
{
    private TSTNode<T> root;
    private int size = 0;
    private int totalNodes = 0;

    public int getSize() { return size; }

    public int getTotalNodes() { return totalNodes; }

    public TSTTree()
    {
    }

    public TSTNode<T> getRoot() { return root; }

    public void insert(String key, T value)
    {
        if(key==null || key.length()==0) return;

        char[] keyArray = key.toCharArray();

        if(root==null) root = new TSTNode<T>(keyArray[0]);
        TSTNode<T> currentNode = root;
        TSTNode<T> parentNode = null;

        int d = 0;
        int i = 0;

        while(currentNode!=null)
        {
            parentNode = currentNode;
            d = keyArray[i] - currentNode.getSplitChar();
            if(d==0)
            {
                if((++i) == keyArray.length) // Found the key
                {
                    if(currentNode.getValue()!=null)
                        System.out.println(currentNode.getValue() + " replaced with " + value);
                    else
                        size++;
                    currentNode.setValue(value);        // Replace value at Node
                    return;
                }
                else
                    currentNode = currentNode.getEqChild();
            }
            else if(d<0)
                currentNode = currentNode.getLoChild();
            else
                currentNode = currentNode.getHiChild();
        }

        currentNode = new TSTNode<T>(keyArray[i++]);
        totalNodes++;
        if(d==0)
            parentNode.setEqChild(currentNode);
        else if(d<0)
            parentNode.setLoChild(currentNode);
        else
            parentNode.setHiChild(currentNode);

        for(;i<keyArray.length;i++)
        {
            TSTNode<T> tempNode = new TSTNode<T>(keyArray[i]);
            totalNodes++;
            currentNode.setEqChild(tempNode);
            currentNode = tempNode;
        }

        currentNode.setValue(value);        // Insert value at Node
        size++;
    }

    public ArrayList<T> find(String charsToSearch) {
        return find(charsToSearch,1,charsToSearch.length());
    }

    // Return all values with keys between minLen and maxLen containing "charsToSearch".
    public ArrayList<T> find(String charsToSearch, int minLen, int maxLen) {
        ArrayList<T> list = new ArrayList<T>();
        char[] charArray = charsToSearch.toCharArray();
        int[] charFreq = new int[256];
        for(int i=0;i<charArray.length;i++) charFreq[charArray[i]]++;
        maxLen = charArray.length>maxLen ? maxLen : charArray.length;
        pmsearch(root,charFreq,minLen,maxLen,1, list);
        return list;
    }

    public void pmsearch(TSTNode<T> node, int[] charFreq, int minLen, int maxLen, int depth, ArrayList<T> list) {
        if(node==null) return;

        char c = node.getSplitChar();
        if(isSmaller(charFreq,c))
            pmsearch(node.getLoChild(),charFreq,minLen,maxLen,depth,list);
        if(charFreq[c]>0) {
            if(depth>=minLen && node.getValue()!=null) list.add(node.getValue());
            if(depth<maxLen) {
                charFreq[c]--;
                pmsearch(node.getEqChild(),charFreq,minLen,maxLen,depth+1,list);
                charFreq[c]++;
            }
        }
        else if(charFreq['.']>0) { // Wildcard
            if(depth>=minLen && node.getValue()!=null) list.add(node.getValue());
            if(depth<maxLen) {
                charFreq['.']--;
                pmsearch(node.getEqChild(),charFreq,minLen,maxLen,depth+1,list);
                charFreq['.']++;
            }
        }            
        if(isGreater(charFreq,c))
            pmsearch(node.getHiChild(),charFreq,minLen,maxLen,depth,list);
    }

    private boolean isGreater(int[] charFreq, char c) {
        if(charFreq['.']>0) return true;

        boolean retval = false;
        for(int i=c+1;i<charFreq.length;i++) {
            if(charFreq[i]>0) {
                retval = true;
                break;
            }
        }
        return retval;
    }

    private boolean isSmaller(int[] charFreq, char c) {
        if(charFreq['.']>0) return true;

        boolean retval = false;
        for(int i=c-1;i>-1;i--) {
            if(charFreq[i]>0) {
                retval = true;
                break;
            }
        }
        return retval;
    }
}

下面是一个小型测试程序.测试程序仅按确切顺序在示例中插入4个键/值对.如果您有一个包含很多元素的D,那么最好首先对其进行排序并以锦标赛的方式构建字典(即插入中间元素,然后递归地填充左半部分和右半部分).这样可以确保树是平衡的.

Below is a small test program. The test program just inserts the 4 key/value pairs in your example in the exact order. If you have a D with a lot of elements, then it would be best to sort it first and build the dictionary in a tournament fashion (ie. insert middle element, then recursively populate left half and right half). This will ensure the tree is balanced.

import org.junit.*;
import org.junit.runner.*;
import java.io.*;
import java.util.*;
import org.junit.runner.notification.Failure;

public class MyTest
{
    static TSTTree<String> dictionary = new TSTTree<String>();

    @BeforeClass
    public static void initialize() {
        dictionary.insert("dgo","dgo");
        dictionary.insert("aett","aett");
        dictionary.insert("amt","amt");
    }

    @Test
    public void testMethod() {
        System.out.println("testMethod");
        ArrayList<String> result = dictionary.find("aamt");
        System.out.println("Results: ");
        for(Iterator it=result.iterator();it.hasNext();) {
            System.out.println(it.next());
        }
    }

    @Test
    // Test with a wildcard which finds "dgo" value
    public void testMethod1() {
        System.out.println("testMethod1");
        ArrayList<String> result = dictionary.find("aamtdo.");
        System.out.println("Results: ");
        for(Iterator it=result.iterator();it.hasNext();) {
            System.out.println(it.next());
        }
    }

    public static void main(String[] args) {
        Result result = JUnitCore.runClasses(MyTest.class);
        for (Failure failure : result.getFailures()) {
        System.out.println(failure.toString());
        }
    }
}

这篇关于如何查找给定集中的多集中的所有子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆