什么是布朗聚类算法的输出是什么意思? [英] What does the Brown clustering algorithm output mean?

查看:1323
本文介绍了什么是布朗聚类算法的输出是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 https://github.com/percyliang/brown-cluster 跑棕聚类算法也是一个Python实现 https://github.com/mheilman/tan-clustering 。他们都给予每一个独特的标记另一个整数某种形式的二进制和。例如:

I've ran the brown-clustering algorithm from https://github.com/percyliang/brown-cluster and also a python implementation https://github.com/mheilman/tan-clustering. And they both give some sort of binary and another integer for each unique token. For example:

0        the        6
10        chased        3
110        dog        2
1110        mouse        2
1111        cat        2

什么是二进制和整数是什么意思?

从第一链接,二进制就是所谓的位串,见<一href="http://saffron.deri.ie/acl_acl/document/ACL_ANTHOLOGY_ACL_P11-1053/">http://saffron.deri.ie/acl_acl/document/ACL_ANTHOLOGY_ACL_P11-1053/

From the first link, the binary is known as a bit-string, see http://saffron.deri.ie/acl_acl/document/ACL_ANTHOLOGY_ACL_P11-1053/

但我怎么知道从狗和老鼠和猫是一个集群和的追赶输出不在同一群集

But how do I tell from the output that dog and mouse and cat is one cluster and the and chased is not in the same cluster?

推荐答案

如果我理解正确的话,该算法为您提供了一棵树,你需要截断它在一定程度上得到集群。在这些情况下,位串,你应该只取第一字符。

If I understand correctly, the algorithm gives you a tree and you need to truncate it at some level to get clusters. In case of those bit strings, you should just take first L characters.

例如,在切割第二个字符为您提供了两个群集

For example, cutting at the second character gives you two clusters

10           chased     

11           dog        
11           mouse      
11           cat        

在第三个字符,你得到

At the third character you get

110           dog        

111           mouse      
111           cat        

切割策略是不同的主体,但。

The cutting strategy is a different subject though.

这篇关于什么是布朗聚类算法的输出是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆