添加字线索结构字典 [英] Adding word to Trie Structure dictionary

查看:131
本文介绍了添加字线索结构字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个可插入字线索结构,但结构必须是完全一样的:

I'm trying to create a trie structure which can be inserted words, but the struct has to be exactly like this:

typedef struct tNode_t {
    struct tNode_t **l;
    char *w;
  } tNode;

**→是一个指向27指针数组tNodes,那我不理解的部分。

**l is a pointer to an array of 27 pointers to tNodes, thats the part I don't understand.

如果数组指针tNodes的,我怎么插入的话成吗?而且,由于数组的大小为27(26小写字母A-Z和终止字符)你怎么知道在哪里输入单词取决于开始的盘符是什么?

If the array is of pointers to tNodes, how do I insert words into it? And since the size of the array is 27 (26 lower case letters a-z and the terminating character) how do you know where to input the word depending on what the beginning letter is?

推荐答案

让我们假设这个词是'出租车'。

Let's suppose the word is 'cab'.

粗略地讲,会员是W 指向字符。在列表中的节点26 是字母A到Z。

Roughly speaking, the member w points to a character. 26 of the nodes in the list l are for letters 'a' through 'z'.

由于词'驾驶室',第一个字母是'C',所以你看在根节点指针根 - →1 [3] ,它包含指向所有的话开始的'c';称之为 TNODE *信=根 - →1 [3]; 。然后你会寻找信纸→1 [1] 启动以CA字样; 信=信纸→1 [1]; 。然后你会寻找信纸→1 [2] 为开头的词语驾驶室。在这一点上,你知道你已经达到了你搜索的单词的末尾,而信纸>!W = 0 告诉你,这个词是啥为您提供了字的文本。也可能有其他的话进一步向下树(关于'驾驶室','线','小集团'等)。

Given the word 'cab', the first letter is 'c', so you'll look in the root node pointer for root->l[3], which contains pointers to all the words starting with 'c'; call it tNode *letter = root->l[3];. Then you'll look for letter->l[1] for the words starting with 'ca'; letter = letter->l[1];. Then you'll look for letter->l[2] for the words starting with 'cab'. At this point, you know that you've reached the end of the word your searching for, and letter->w != 0 tells you that the word is valid and gives you the text of the word. There may also be other words further down the tree (for 'cabs', 'cable', 'cabal', etc).

您已经教有所了解,或给予一定的规范。可能有其他的方式来补充细节。

You will have been taught something about this, or given some specification. There are probably other ways to fill in the details.

我不清楚,使用动态分配的数组比使用 TNODE * L [27]较好; 在结构(有适当调整的声明),但是这是一个单独的讨论。我愉快地假定在列表中的第27节点没有更深层次的意义。你可以查找网络上这样的事情,当然是:维基百科趋于一个合理的源涉及计算机科学无争议的问题,但我还没有研究过链接的页面呢。

I'm not clear that using a dynamically allocated array is better than using tNode *l[27]; in the structure (with appropriate adjustments to the declaration), but that's a separate discussion. I've blithely assumed the 27th node in the list has no deeper significance. You could look up such things on the web, of course: Wikipedia tends to be a reasonable source for non-contentious issues related to computer science, but I've not studied the linked page yet.

不是数组;这是一个指向指针数组,这是混淆了我。

l isn't an array; it's a pointer to an array of pointers, which is confusing to me.

指针变量过着双重生活 - 化身博士和海德先生 - 以及可视为指针或(随便)阵列

Pointer variables live a dual life — Dr Jekyll and Mr Hyde — and can be regarded as pointers or (casually) arrays.

如果你写的char * str中 STR 不是一个字符数组;它是一个指向(的第一个字符)的字符阵列。您可以使用海峡[I] 访问 I 字符的字符串中。以同样的方式,的char ** argv的不是字符串数组;它是一个指针(的第一字符串)字符串的数组。您可以使用的argv [I] 访问 I 参数主程序

If you write char *str, str isn't an array of characters; it is a pointer to (the first character of) an array of characters. You can use str[i] to access the ith character in the string. In the same way, char **argv isn't an array of strings; it is a pointer to (the first string of) an array of character strings. You can use argv[i] to access the ith argument to a main program.

而在这个线索结构,不是 TNODE * 的数组;它是一个指向(第一元件)线索结构的阵列。但是你可以使用 trie-→1 [I] 访问 I 线索结构列表中的指向 trie-方式>→

And in this trie structure, l isn't an array of tNode *; it is a pointer to (the first element of) an array of trie structures. But you can use trie->l[i] to access the ith trie structure in the list pointed to by trie->l.

我没有尝试用播放之前,所以我放在一起code下面根据你的数据结构。无论是正式的正确与否是你需要调查;低于code对我的作品对于给定的测试用例,而的valgrind 给人健康的codeA清洁提单(无泄漏的内存,无记忆滥用)。

I hadn't played with tries before, so I put together the code below based on your data structure. Whether it's formally correct or not is something you'll need to investigate; the code below works for me for the given test cases, and valgrind gives the code a clean bill of health (no leaked memory, no memory abuse).

在code强行包括一个通常来说是一个头文件。至于外界关注的是,线索类型( TNODE )是不透明的。

The code forcibly includes what would normally be in a header file. As far as the outside world is concerned, the trie type (tNode) is opaque.

#ifndef TRIE_H_INCLUDED
#define TRIE_H_INCLUDED

typedef struct tNode_t tNode;
extern void trie_add_word(tNode *trie, char const *word);
extern char const *trie_find_word(tNode *trie, char const *word);
extern void trie_print(tNode *trie);
extern tNode *trie_new(void);
extern void trie_free(tNode *trie);

#endif /* TRIE_H_INCLUDED */

/*#include "trie.h"*/
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>

struct tNode_t
{
    struct tNode_t **l;
    char            *w;
};

static void db_print(char const *fmt, ...);

tNode *trie_new(void)
{
    tNode *trie = malloc(sizeof(tNode));
    assert(trie != 0);      // Abysmal way to validate memory allocation
    trie->w = 0;
    trie->l = (tNode **)calloc(27, sizeof(tNode *));
    assert(trie->l != 0);   // Abysmal way to validate memory allocation
    return(trie);
}

void trie_free(tNode *trie)
{
    assert(trie != 0);
    assert(trie->l != 0);
    for (size_t i = 0; i < 27; i++)
    {
        if (trie->l[i] != 0)
            trie_free(trie->l[i]);
    }
    free(trie->l);
    free(trie->w);
    free(trie);
}

static void add_word_suffix(tNode *trie, char const *word, char const *suffix)
{
    int c;
    assert(trie != 0);
    assert(trie->l != 0);
    db_print("-->> %s: word [%s], suffix [%s]\n", __func__, word, suffix);
    while ((c = *suffix++) != '\0')
    {
        if (isalpha(c))
        {
            db_print("---- %s: letter %c (index %d)\n", __func__, c, c - 'a' + 1);
            c = tolower(c) - 'a' + 1;
            assert(trie->l != 0);
            if (trie->l[c] == 0)
                trie->l[c] = trie_new();
            db_print("---- %s: recurse: [%s]/[%s]\n", __func__, word, suffix);
            add_word_suffix(trie->l[c], word, suffix);
            db_print("<<-- %s\n", __func__);
            return;
        }
    }
    if (trie->w != 0)
    {
        db_print("---- %s: trie already contains word [%s] at [%s]\n", __func__, word, trie->w);
        return;
    }
    trie->w = strdup(word);
    db_print("<<-- %s: inserted word [%s]\n", __func__, trie->w);
}

void trie_add_word(tNode *trie, char const *word)
{
    add_word_suffix(trie, word, word);
}

static char const *find_word_suffix(tNode *trie, char const *word, char const *suffix)
{
    int c;
    db_print("-->> %s: word [%s] suffix[%s]\n", __func__, word, suffix);
    for ( ; (c = *suffix) != '\0'; suffix++)
    {
        if (isalpha(c))
        {
            db_print("---- %s: letter %c\n", __func__, c);
            c = tolower(c) - 'a' + 1;
            if (trie->l[c] == 0)
                return(0);
            char const *rv = find_word_suffix(trie->l[c], word, suffix+1);
            if (rv == 0)
            {
                db_print("<<-- %s: missing [%s]/[%s]\n", __func__, word, suffix);
                return 0;
            }
            db_print("<<-- %s: found [%s] for [%s]/[%s]\n", __func__, rv, word, suffix);
            return rv;
        }
    }
    if (trie->w == 0)
    {
        db_print("<<-- %s: missing [%s]/[%s]\n", __func__, word, suffix);
        return 0;
    }
    db_print("<<-- %s: found [%s] for [%s]/[%s]\n", __func__, trie->w, word, suffix);
    return(trie->w);
}

char const *trie_find_word(tNode *trie, char const *word)
{
    return find_word_suffix(trie, word, word);
}

void trie_print(tNode *trie)
{
    assert(trie != 0);
    assert(trie->l != 0);
    if (trie->w != 0)
        printf("%s\n", trie->w);
    for (size_t i = 0; i < 27; i++)
    {
        if (trie->l[i] != 0)
            trie_print(trie->l[i]);
    }
}

static int debug = 0;

static void db_print(char const *fmt, ...)
{
    if (debug > 0)
    {
        va_list args;
        va_start(args, fmt);
        vprintf(fmt, args);
        va_end(args);
    }
}

int main(int argc, char **argv)
{
    tNode *trie = trie_new();

    /* Set debugging - and use argv */
    if (argc > 1 && argv[argc] == 0)
        debug = 1;

    /* First test */
    char const *word = "cab";
    trie_add_word(trie, word);
    char const *leaf = trie_find_word(trie, word);
    printf("Leaf word = %s\n", leaf);
    trie_free(trie);

    /* Second, more comprehensive test */
    static char const *words[] =
    {
        "cabal",         "cabbie",
        "cab",           "centre",
        "cinema",        "cold",
        "culminate",     "culmination",
        "duck",          "cabs",
        "amniocentesis", "amniocentesis",
        "amniocentesis", "cam",
        "cab",           "cab",
        "zulu",          "alpha",
        "bravo",         "Charlie",
        "delta",         "echo",
        "foxtrot",       "golf",
        "hotel",         "India",
        "Juliet",        "kilo",
        "Lima",          "Mike",
        "November",      "Oscar",
        "Papa",          "Quebec",
        "Romeo",         "Sierra",
        "Tango",         "uMBRelLA",
        "Victor",        "Whisky",
        "X-ray",         "Yankee",
        "Zulu",          "Aquarius",
    };
    size_t num_words = sizeof(words) / sizeof(words[0]);
    size_t counter = 0;

    /* First time, add every other word; second time, every word */
    for (size_t mod = 2; mod > 0; mod--)
    {
        trie = trie_new();
        printf("\nTest %zu\n", ++counter);
        for (size_t i = 0; i < num_words; i++)
        {
            if (i % mod == 0)
                trie_add_word(trie, words[i]);
            char const *leaf = trie_find_word(trie, words[i]);
            if (leaf == 0)
                printf("Word [%s] is missing\n", words[i]);
            else
                printf("Word [%s] for [%s]\n", leaf, words[i]);
        }
        printf("\nTrie:\n");
        trie_print(trie);
        trie_free(trie);
    }

    return(0);
}

在code使用断言()内存分配检查。这是方便,但糟糕的风格。不要模仿。你可以决定,如果你想在 trie_find_word() trie_add_word()的功能组合成一个单一的功能。你可以通过一个参数(任何参数),以测试程序看到相当详细的诊断。

The code uses assert() for memory allocation checking. It is convenient but abysmal style. Do not mimic that. You could decide to combine the trie_find_word() and trie_add_word() functionality into a single function if you wished. You can see fairly detailed diagnostics by passing an argument (any argument) to the test program.

在code使用27(而不是在宏或枚举),但它会很好地工作在数组中只有26元,如果你删除 + 1 出现在几个地方进行转换'A' 1 (将其转换为 0 来代替)。

The code uses 27 (not in a macro or enum), but it would work fine with just 26 elements in the array if you remove the + 1 that appears in a couple of places to convert 'a' to 1 (it would convert it to 0 instead).

这篇关于添加字线索结构字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆