使用py4j将矩阵作为int [] []数组从Python发送到Java [英] Using py4j to send matrices to from Python to Java as int[][] arrays

查看:344
本文介绍了使用py4j将矩阵作为int [] []数组从Python发送到Java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用py4j围绕不那么用户友好的Java库构建一个用户友好的Python库.在大多数情况下,这是一件轻而易举的事,而py4j一直是一个很好的工具.但是,在Python和Java之间发送矩阵时遇到了麻烦.

I've been using py4j to build a user-friendly Python library around a less user-friendly Java library. For the most part, this has been a breeze, and py4j has been a great tool. However, I've come across a snag when sending matrices between Python and Java.

具体地说,我在Java中有一个静态函数,它接受一个整数矩阵作为其参数:

Specifically, I have a static function in java that accepts, as its arguments, an integer matrix:

public class MyClass {
   // ...
   public static MyObject create(int[][] matrix) {
      // ...
   }
}

我希望能够像这样从Py4j调用它:

I'd like to be able to call this from Py4j like so:

def create_java_object(numpy_matrix):
   # <code here checks that numpy_matrix is a (3 x n) integer matrix>
   # ...
   return java_instance.jvm.my.namespace.MyClass.create(numpy_matrix)

这是行不通的,这不足为奇,如果将numpy_matrix转换为简单的python列表,也行不通.我曾期望解决方案是构造一个Java数组并在函数调用之前传输数据:

This doesn't work, which isn't too surprising, nor does it work if the numpy_matrix is instead converted to a list of plain python lists. I had expected that the solution would be to construct a java array and transfer the data over prior to the function call:

def create_java_object(numpy_matrix):
   # <code here checks that numpy_matrix is a (3 x n) integer matrix>
   # ...
   java_matrix = java_instance.new_array(java_instance.jvm.int, 3, n)
   for i in range(numpy_matrix.shape[1]):
      java_matrix[0][i] = int(numpy_matrix[0, i])
      java_matrix[1][i] = int(numpy_matrix[1, i])
      java_matrix[2][i] = int(numpy_matrix[2, i])
   return java_instance.jvm.my.namespace.MyClass.create(java_matrix)

现在,此代码可以正确运行.但是,这大约需要两分钟才能运行.顺便说一下,我正在使用的矩阵大约是(3 x〜300,000)个元素.

Now, this code runs correctly. However, it requires approximately two minutes to run. The matrices I'm working with, by the way, are on the order of (3 x ~300,000) elements.

在Py4j中是否有一种规范的方法来执行此操作,而该方法不需要花费大量时间即可转换矩阵?我不介意花一两秒钟,但这太慢了.如果没有为这种通信设置Py4j,是否有适用于Python的Java互操作库?

Is there a canonical way to do this in Py4j that doesn't require incredible amounts of time just to convert a matrix? I don't mind it taking a second or two, but this is far too slow. If Py4j isn't setup for this kind of communication, is there a Java interop library for Python that is?

注意:Java库将int[][]矩阵视为不可变数组.也就是说,它从不尝试对其进行修改.

Note: The Java library treats the int[][] matrix as an immutable array; i.e., it never attempts to modify it.

推荐答案

我找到了适用于这种特殊情况的解决方案.虽然不是很优雅:

I found a solution for this particular case that works; though it is not terribly elegant:

Py4j支持将Python bytearray对象作为byte[]数组有效地传递给Java.我通过修改原始库和Python代码来解决此问题.

Py4j supports efficiently passing a Python bytearray object to Java as a byte[] array. I worked around the problem by modifying the original library and my Python code.

新的Java代码:

public class MyClass {
   // ...
   public static MyObject create(int[][] matrix) {
      // ...
   }
   public static MyObject createFromPy4j(byte[] data) {
      java.nio.ByteBuffer buf = java.nio.ByteBuffer.wrap(data);
      int n = buf.getInt(), m = buf.getInt();
      int[][] matrix = new int[n][m];
      for (int i = 0; i < n; ++i)
         for (int j = 0; j < m; ++j)
            matrix[i][j] = buf.getInt();
      return MyClass.create(matrix);
   }
}

新的Python代码:

The new Python code:

def create_java_object(numpy_matrix):
   header = array.array('i', list(numpy_matrix.shape))
   body = array.array('i', numpy_matrix.flatten().tolist());
   if sys.byteorder != 'big':
      header.byteswap()
      body.byteswap()
   buf = bytearray(header.tostring() + body.tostring())
   return java_instance.jvm.my.namespace.MyClass.createFromPy4j(buf)

这将在几秒钟而不是几分钟内完成.

This runs in a few seconds rather than a few minutes.

这篇关于使用py4j将矩阵作为int [] []数组从Python发送到Java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆