如何在 PySpark ALS 中使用长用户 ID [英] How to use long user ID in PySpark ALS

查看：35 发布时间：2021/11/14 21:07:45 apache-spark pyspark apache-spark-mllib

本文介绍了如何在 PySpark ALS 中使用长用户 ID的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在 PySpark MLlib (1.3.1) 的 ALS 模型中使用长用户/产品 ID，但遇到了问题.此处给出了代码的简化版本:

I am attempting to use long user/product IDs in the ALS model in PySpark MLlib (1.3.1) and have run into an issue. A simplified version of the code is given here:

from pyspark import SparkContext
from pyspark.mllib.recommendation import ALS, Rating

sc = SparkContext("","test")

# Load and parse the data
d = [ "3661636574,1,1","3661636574,2,2","3661636574,3,3"]
data = sc.parallelize(d)
ratings = data.map(lambda l: l.split(',')).map(lambda l: Rating(long(l[0]), long(l[1]), float(l[2])) )

# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 20
model = ALS.train(ratings, rank, numIterations)

运行此代码会产生 java.lang.ClassCastException，因为该代码试图将 long 转换为整数.查看源代码，ml ALS 类允许较长的用户/产品 ID，但是 mllib ALS 类强制使用整数.

Running this code yields a java.lang.ClassCastException because the code is attempting to convert the longs to integers. Looking through the source code, the ml ALS class in Spark allows for long user/product IDs but then the mllib ALS class forces the use of ints.

问题:是否有在 PySpark ALS 中使用长用户/产品 ID 的解决方法?

Question: Is there a workaround to use long user/product IDs in PySpark ALS?

如何在 PySpark ALS 中使用长用户 ID [英] How to use long user ID in PySpark ALS

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 PySpark ALS 中使用长用户 ID [英] How to use long user ID in PySpark ALS

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭