如何从R中的单词列表构建字母树? [英] How to build an alphabetical tree from a list of words in R?

查看:54
本文介绍了如何从R中的单词列表构建字母树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题很简单.我的单词很长,例如修道院,住持,住所,住所.

My problem is simple. I have a long list of words, e.g. abbey, abbot, abbr, abide.

我想建立一棵树,如下所示:

I would like to build a tree as follows:


Level 0                             A
                                    | 
Level 1                             B
                                  /   \
Level 2                         B       I
                              / | \     |
Level 3                     E   O   R   D   
                            |   |       |
Level 4                     Y   T       E

是否有一种简单的方法可以解析单词表并在R中创建这样的结构?

Is there an easy way to parse the wordlist and create such a structure in R?

非常感谢您的帮助!

此致,克里斯

推荐答案

这是基于 igraph 的解决方案,该解决方案使用部分单词标记图的每个节点,从而以完整的名称来命名终端节点字词:

Here's an igraph-based solution that labels each node of the graph with the partial word, so that terminal nodes are named with full words:

library(igraph)
library(stringr)

initgraph = function(){
    # create a graph with one empty-named node and no edges
    g=graph.empty(n=1)
    V(g)$name=""
    g
}


wordtree <- function(g=initgraph(),wordlist){
    for(word in wordlist){
        # turns "word" into c("w","wo","wor","word")
        subwords = str_sub(word, 1, 1:nchar(word))
        # make a graph long enough to hold all those sub-words plus start node
        subg = graph.lattice(length(subwords)+1,directed=TRUE)
        # set vertex nodes to start node plus sub-words
        V(subg)$name=c("",subwords)
        # merge *by name* into the existing graph
        g = graph.union(g, subg)
    }
    g
}

加载后,

g = wordtree(initgraph(), c("abbey","abbot","abbr","abide"))
plot(g)

获取

您可以通过将单词作为第一个参数传入来向现有树中添加单词:

You can add words to an existing tree by passing it in as first parameter:

> g = wordtree(g,c("now","accept","answer","please"))
> plot(g)

树始终以名称为"的节点为根,并且所有终端节点(没有输出边缘的终端节点)都带有单词. igraph 中有一些功能可以在需要时将其拉出.当您完成此操作时,您实际上并没有说过要做什么...或者当我们我们为您完成操作时:)

The tree is always rooted at the node with name "" and all terminal nodes (those with no outgoing edges) have words. There's functions in igraph to pull those out if you need them. You haven't actually said what you want to do with this when you've done it... Or when we've done it for you :)

请注意,有一个很好的布局树布局,看起来像您的ascii示例:

Note there is a nice layout for plotting trees which looks like your ascii example:

plot(g,layout=layout.reingold.tilford)

这篇关于如何从R中的单词列表构建字母树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆