PySpark 2: KMeans 输入数据不直接缓存 [英] PySpark 2: KMeans The input data is not directly cached

查看：34 发布时间：2021/11/14 23:02:20 python apache-spark pyspark apache-spark-sql k-means

本文介绍了PySpark 2: KMeans 输入数据不直接缓存的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我不知道为什么我收到消息

I don't know why I receive the message

WARN KMeans: The input data is not directly cached, which may hurt performance if its parent RDDs are also uncached.

当我尝试使用 Spark KMeans

When I try to use Spark KMeans

df_Part = assembler.transform(df_Part)    
df_Part.cache()
while (k<=max_cluster) and (wssse > seuilStop):
                    kmeans = KMeans().setK(k)
                    model = kmeans.fit(df_Part)
                    wssse = model.computeCost(df_Part)
                    k=k+1

它说我的输入(Dataframe)没有被缓存！！

It says that my input (Dataframe) is not cached !!

我尝试打印 df_Part.is_cached 并且我收到 True 这意味着我的数据帧被缓存了，那么为什么 Spark 仍然警告我这个?

I tried to print df_Part.is_cached and I received True which means that my dataframe is cached, So why Spark still warns me about this?

PySpark 2: KMeans 输入数据不直接缓存 [英] PySpark 2: KMeans The input data is not directly cached

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

PySpark 2: KMeans 输入数据不直接缓存 [英] PySpark 2: KMeans The input data is not directly cached

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭