在Spark中使用LSH在数据框中的每个点上运行最近的邻居查询 [英] Using LSH in spark to run nearest neighbors query on every point in dataframe

查看：483 发布时间：2020/9/4 18:42:57 apache-spark pyspark apache-spark-mllib pyspark-sql

本文介绍了在Spark中使用LSH在数据框中的每个点上运行最近的邻居查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于数据帧中的每个特征向量，我需要k个最近的邻居.我正在从pyspark使用BucketedRandomProjectionLSHModel.

I need k nearest neighbors for each feature vector in the dataframe. I'm using BucketedRandomProjectionLSHModel from pyspark.

用于创建模型的代码

brp = BucketedRandomProjectionLSH(inputCol="features", outputCol="hashes",seed=12345, bucketLength=n)

model = brp.fit(data_df)
df_lsh = model.transform(data_df)

现在，如何为data_df中的每个点运行近似最近的邻居查询.

Now, How do I run approx nearest neighbor query for each point in data_df.

我尝试广播模型，但出现泡菜错误. 另外，定义udf来访问模型也会给出错误Method __getstate__([]) does not exist

I have tried broadcasting the model but got pickle error. Also, defining a udf to access the model gives error Method __getstate__([]) does not exist

在Spark中使用LSH在数据框中的每个点上运行最近的邻居查询 [英] Using LSH in spark to run nearest neighbors query on every point in dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Spark中使用LSH在数据框中的每个点上运行最近的邻居查询 [英] Using LSH in spark to run nearest neighbors query on every point in dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭