创建在Python星火数据框labeledPoints [英] Create labeledPoints from Spark DataFrame in Python

查看：575 发布时间：2016/5/22 15:34:34 python pandas apache-spark apache-spark-mllib

本文介绍了创建在Python星火数据框labeledPoints的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

什么.MAP（）函数在Python做我用从一个火花数据帧创建一组labeledPoints的？什么是符号如果标签/结果不是第一列，但我可以参考它的列名，状态？

What .map() function in python do I use to create a set of labeledPoints from a spark dataframe? What is the notation if The label/outcome is not the first column but I can refer to its column name, 'status'?

我创建了蟒蛇数据框与此.MAP（）函数：

I create the python dataframe with this .map() function:

def parsePoint(line):
    listmp = list(line.split('\t'))
    dataframe = pd.DataFrame(pd.get_dummies(listmp[1:]).sum()).transpose()
    dataframe.insert(0, 'status', dataframe['accepted'])
    if 'NULL' in dataframe.columns:
        dataframe = dataframe.drop('NULL', axis=1)  
    if '' in dataframe.columns:
        dataframe = dataframe.drop('', axis=1)  
    if 'rejected' in dataframe.columns:
        dataframe = dataframe.drop('rejected', axis=1)  
    if 'accepted' in dataframe.columns:
        dataframe = dataframe.drop('accepted', axis=1)  
    return dataframe

我把它转换成一个数据框火花的减少功能重组所有的大熊猫dataframes后。

I convert it to a spark dataframe after the reduce function has recombined all the pandas dataframes.

parsedData=sqlContext.createDataFrame(parsedData)

但现在我怎么创造Python从这个labledPoints？我想这可能是另一个.MAP（）函数？

But now how do I create labledPoints from this in python? I assume it may be another .map() function?

推荐答案

如果您已经拥有数字功能和不需要您可以使用其他的转换 VectorAssembler 结合包含列自变量：

If you already have numerical features and which require no additional transformations you can use VectorAssembler to combine columns containing independent variables:

from pyspark.ml.feature import VectorAssembler

assembler = VectorAssembler(
    inputCols=["your", "independent", "variables"],
    outputCol="features")

transformed = assembler.transform(parsedData)

接下来，你可以简单的映射：

Next you can simply map:

from pyspark.mllib.regression import LabeledPoint
from pyspark.sql.functions import col

(transformed.select(col("outcome_column").alias("label"), col("features"))
  .map(lambda row: LabeledPoint(row.label, row.features)))

这篇关于创建在Python星火数据框labeledPoints的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

创建在Python星火数据框labeledPoints [英] Create labeledPoints from Spark DataFrame in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

创建在Python星火数据框labeledPoints [英] Create labeledPoints from Spark DataFrame in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭