错误:AttributeError:'DataFrame' 对象没有属性 '_jdf' [英] Error: AttributeError: 'DataFrame' object has no attribute '_jdf'

查看:222
本文介绍了错误:AttributeError:'DataFrame' 对象没有属性 '_jdf'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 pyspark 执行 k 折交叉验证来微调参数,我正在使用 pyspark.ml.我收到属性错误.

I want to perform k-fold cross validation using pyspark to finetune the parameters and I'm using pyspark.ml. I am getting Attribute Error.

AttributeError: 'DataFrame' 对象没有属性 '_jdf'

AttributeError: 'DataFrame' object has no attribute '_jdf'

我最初尝试使用 pyspark.mllib,但未能成功执行 k 折交叉验证

I have tried initially using pyspark.mllib but was not able to succeed in performing k-fold cross validation

import pandas as pd
from pyspark import SparkConf, SparkContext
from pyspark.ml.classification import DecisionTreeClassifier

data=pd.read_csv("file:///SparkCourse/wdbc.csv", header=None)
type(data)
print(data)

conf = SparkConf().setMaster("local").setAppName("SparkDecisionTree")
sc = SparkContext(conf = conf)

# Create initial Decision Tree Model
dt = DecisionTreeClassifier(labelCol="label", featuresCol="features", 
maxDepth=3)

# Train model with Training Data
dtModel = dt.fit(data)

# I expect the model to be trained but I'm getting the following error 
AttributeError: 'DataFrame' object has no attribute '_jdf'

注意:我可以打印数据.错误在 dtModel

Note: I'm able to print the data. Error is in dtModel

推荐答案

将 Panadas 转换为 Spark

Convert Panadas to Spark

from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

spark_dff = sqlContext.createDataFrame(panada_df)

这篇关于错误:AttributeError:'DataFrame' 对象没有属性 '_jdf'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆