使用 python 的 Spark 矩阵乘法 [英] Spark Matrix multiplication with python

查看:37
本文介绍了使用 python 的 Spark 矩阵乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Apache Spark 和 Python 进行矩阵乘法.

I am trying to do matrix multiplication using Apache Spark and Python.

这是我的数据

from pyspark.mllib.linalg.distributed import RowMatrix

我的向量RDD

rows_1 = sc.parallelize([[1, 2], [4, 5], [7, 8]])
rows_2 = sc.parallelize([[1, 2], [4, 5]])

我的极品

mat1 = RowMatrix(rows_1)
mat2 = RowMatrix(rows_2)

我想做这样的事情:

mat = mat1 * mat2

我写了一个函数来处理矩阵乘法,但我担心处理时间很长.这是我的功能:

I wrote a function to process the matrix multiplication but I'm afraid to have a long processing time. Here is my function:

def matrix_multiply(df1, df2):
    nb_row = df1.count()    
    mat=[]
    for i in range(0, nb_row):
        row=list(df1.filter(df1['index']==i).take(1)[0])
        row_out = []
        for r in range(0, len(row)):
            r_value = 0
            col = df2.select(df2[list_col[r]]).collect()
            col = [list(c)[0] for c in col]
            for c in range(0, len(col)): 
                r_value += row[c] * col[c]
            row_out.append(r_value)            
        mat.append(row_out)
    return mat 

我的函数做了很多火花动作(采取、收集等).该函数是否会占用大量处理时间?如果有人有其他想法,这将对我有所帮助.

My function make a lot of spark actions (take, collect, etc.). Does the function will take a lot of processing time? If someone have another idea it will be helpful for me.

推荐答案

你不能.由于 RowMatrix 没有有意义的行索引,因此不能用于乘法.即使忽略唯一的分布式矩阵 支持与另一个分布式结构的乘法BlockMatrix.

You cannot. Since RowMatrix has no meaningful row indices it cannot be used for multiplications. Even ignoring that the only distributed matrix which supports multiplication with another distributed structure is BlockMatrix.

from pyspark.mllib.linalg.distributed import *

def as_block_matrix(rdd, rowsPerBlock=1024, colsPerBlock=1024):
    return IndexedRowMatrix(
        rdd.zipWithIndex().map(lambda xi: IndexedRow(xi[1], xi[0]))
    ).toBlockMatrix(rowsPerBlock, colsPerBlock)

as_block_matrix(rows_1).multiply(as_block_matrix(rows_2))

这篇关于使用 python 的 Spark 矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆