Brown聚类算法的输出是什么意思? [英] What does the Brown clustering algorithm output mean?
问题描述
我已经从 https://github.com/percyliang/brown-cluster 运行了棕色聚类算法还有一个 python 实现 https://github.com/mheilman/tan-clustering.他们都为每个唯一的标记提供某种二进制和另一个整数.例如:
I've ran the brown-clustering algorithm from https://github.com/percyliang/brown-cluster and also a python implementation https://github.com/mheilman/tan-clustering. And they both give some sort of binary and another integer for each unique token. For example:
0 the 6
10 chased 3
110 dog 2
1110 mouse 2
1111 cat 2
二进制和整数是什么意思?
从第一个 link 开始,二进制文件被称为 bit-string
,参见http://saffron.deri.ie/acl_acl/document/ACL_ANTHOLOGY_ACL_P11-1053/
From the first link, the binary is known as a bit-string
, see http://saffron.deri.ie/acl_acl/document/ACL_ANTHOLOGY_ACL_P11-1053/
但是我如何从输出中判断dog and mouse and cat
是一个集群,而the andchaped
不在同一个集群中?
But how do I tell from the output that dog and mouse and cat
is one cluster and the and chased
is not in the same cluster?
推荐答案
如果我理解正确,该算法会为您提供一棵树,您需要在某个级别截断它以获得集群.对于那些位串,你应该只取第一个 L
字符.
If I understand correctly, the algorithm gives you a tree and you need to truncate it at some level to get clusters. In case of those bit strings, you should just take first L
characters.
例如,在第二个字符处切割会得到两个簇
For example, cutting at the second character gives you two clusters
10 chased
11 dog
11 mouse
11 cat
在你得到的第三个字符处
At the third character you get
110 dog
111 mouse
111 cat
不过,切割策略是一个不同的主题.
The cutting strategy is a different subject though.
这篇关于Brown聚类算法的输出是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!