Canopy聚类字符串以及输入格式和hadoop实现 [英] Canopy clustering over strings and the input format and hadoop implementation

查看：55 发布时间：2019/6/17 5:13:57 Java

本文介绍了Canopy聚类字符串以及输入格式和hadoop实现的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想对字符串进行冠层聚类以减少距离和度量。但我不知道如何对字符串集进行冠层聚类。

当我搜索时，我获得了文本聚类的Apache hadoop实现。但是他们说输入格式应该是顺序矢量文件，其中输入应该是矢量可读格式。

我有一列字符串以及如何将其更改为java中的顺序文件和矢量文件以及如何使用hadoop canopy群集高效。

一栏话的例子：

很快< br $> b $ b需要

关闭

这个？

岳父

亲戚

来了

位置？

小

具体

''其中

确切地说

chennai-bangalore

路？''，

远

路？< br $>
州

对

地区

in？

发布

留言

brahmma

周

max

帮我谢谢

I want to do canopy clustering over strings to reduce the distance and the measures. But I not having any idea how to do canopy clustering over set of strings.
When I searched I got the Apache hadoop implementation of text clustering. But in that they said the input format should be sequential vector file in which the input should vector readable format.

I have a column of strings and how to change this into sequential file and vector file in java and how to use hadoop canopy clustering efficiently.

example of one column words :

quickly
need
close
this?
father-in-law
relatives
come
location?
little
specific
''where
exactly
chennai-bangalore
road?'',
far
road?
state
right
locality
in?
post
message
brahmma
weeks
max

help me thanks

Canopy聚类字符串以及输入格式和hadoop实现 [英] Canopy clustering over strings and the input format and hadoop implementation

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

Canopy聚类字符串以及输入格式和hadoop实现 [英] Canopy clustering over strings and the input format and hadoop implementation

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭