词至少2常用字母 [英] Words with at least 2 common letters

查看:155
本文介绍了词至少2常用字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个字符串,命名为 2一致如果每个字至少有2个字母的共同点与下一个单词。

A string is named 2-consistent if each word has at least 2 letters in common with the next word.

例如

原子另一个时代[原子 A T 的共同点与另一个和   另一个电子 A 的共同点与时代(答案不唯一)。

"Atom another era" [atom has a and t in common with another and another has e and a in common with era (the answer is not unique).

首先,我需要一个数据结构,这需要2个字和答案在固定时间内的问题你这句话至少有2个字母的共同点?

First of all I need a data structure which takes 2 words and answers in constant time at the question "Do these words have at least 2 letters in common?"

现在,给出一个字符串 N 的话,我需要找到最长2洽子。

Now, given a string of n words I need to find the longest 2-consistent substring.

我想不出用什么数据结构。我认为基数树 preFIX树,但我无法找到答案。你能帮助我吗?

I can't figure out what data structure to use. I thought to radix tree or prefix tree, but I could not find the answer. Can you help me?

推荐答案

假设重音的字母,忽略大小写,每个字,你可以在32位整数,其中0-25位被设置为1,如果储存一个位域从AZ相应的字母是present。

Assuming unaccented letters and ignoring capitalization, for each word you can store a bit-field in a 32-bit integer where bits 0-25 are set to 1 if the corresponding letter from a-z is present.

整数,可以计算线性时间是这样的:

The integer can be computed in linear time like this:

int getBitField(char* word)
{
    int bits = 0;
    while(*word)
        bits |= 1 << ((*word++) - 'a');
    return bits;
}

如果的话被认为是词语的英文或其他语言,最大字长然后不断的和线性时间之间的区别是没有意义的,因为(为了讨论的)超过最大长度减去所有的话就可以进行填充与非匹配字符,这将导致一个恒定的时间的算法。

If the words are assumed to be words in English or some other language, with a maximum word length then the difference between constant and linear time is fairly meaningless because (for the sake of argument) all words less than the maximum length can be padded out with non-matching characters, which will result in a constant time algorithm.

一旦你有两个词位字段,你可以测试,如果他们都在不断的时间与运算在一起,并检查2一致,如果结果不为零(这表明,在普通没有字母)和不电源2(其将表明只有一个共同的信作为仅单个位被置位)。可以通过与运算测试2的幂的一些与本身减去1

Once you have the bit fields for two words you can test if they are 2-consistent in constant time by ANDing them together and checking if the result is not zero (which would indicate no letters in common) and not a power of 2 (which would indicate only one letter in common as only a single bit is set). You can test for a power of 2 by ANDing a number with itself minus 1.

bool is2Consistent(int word1bits, int word2bits)
{
    int common = word1bits & word2bits;
    return (common & (common - 1)) != 0;
}

如果你打算定义像'满足'和'牛肉'已多次字母2一致的话这将无法正常工作。

This won't work if you intend to define words like 'meet' and 'beef' which have repeated letters as 2-consistent.

如果你想测试3一致性,你只需要一个额外的行添加到函数:

If you wanted to test for 3-consistency, you just need to add an extra line to the function:

bool is3Consistent(int word1bits, int word2bits)
{
    int common = word1bits & word2bits;
    common &= (common - 1);
    return (common & (common - 1)) != 0;
}

AND运算与自己减一只是删除了至少显著位,所以你可以使用它任意次数,以测试4一致性,5一致性等。一个整数

ANDing an integer with itself minus one just removes the least significant bit, so you could apply it an arbitrary number of times to test for 4-consistency, 5-consistency etc.

这篇关于词至少2常用字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆