如何在R中找到相似的句子/短语? [英] how to find similar sentences / phrases in R?

查看：110 发布时间：2020/5/18 0:49:02 r statistics nlp

本文介绍了如何在R中找到相似的句子/短语?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

例如，我有数十亿个短短语，并且我想将它们相似的词组.

Example, I have billions of short phrases, and I want to clusters of them that are similar.

> strings.to.cluster <- c("Best Toyota dealer in bay area. Drive out with a new car today",
                        "Largest Selection of Furniture. Stock updated everyday" , 
                        " Unique selection of Handcrafted Jewelry",
                        "Free Shipping for orders above $60. Offer Expires soon",
                        "XXXX is where smart men buy anniversary gifts",
                        "2012 Camrys on Sale. 0% APR for select customers",
                        "Closing Sale on office desks. All Items must go" 
                         )

假定此向量为数十万行. R中是否有一个软件包可以按含义将这些短语聚类? 或者有人可以建议一种通过对给定短语的含义对相似"短语进行排名的方法.

assume that this vector is hundreds of thousands of rows. Is there a package in R to cluster these phrases by meaning? or could someone suggest a way to rank "similar" phrases by meaning to a given phrase.

推荐答案

您可以将短语视为单词袋"，即构建一个矩阵(术语文档"矩阵)，每个短语一行，每个字一栏，如果该字词出现在词组中则为1，否则为0. (您可以用一些权重代替1，以解决词组长度和词频).然后，您可以应用任何聚类算法. tm程序包可以帮助您构建此矩阵.

You can view your phrases as "bags of words", i.e., build a matrix (a "term-document" matrix), with one row per phrase, one column per word, with 1 if the word occurs in the phrase and 0 otherwise. (You can replace 1 with some weight that would account for phrase length and word frequency). You can then apply any clustering algorithm. The tm package can help you build this matrix.

library(tm)
library(Matrix)
x <- TermDocumentMatrix( Corpus( VectorSource( strings.to.cluster ) ) )
y <- sparseMatrix( i=x$i, j=x$j, x=x$v, dimnames = dimnames(x) )  
plot( hclust(dist(t(y))) )

这篇关于如何在R中找到相似的句子/短语?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在R中找到相似的句子/短语? [英] how to find similar sentences / phrases in R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在R中找到相似的句子/短语? [英] how to find similar sentences / phrases in R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭