无法在Apache Spark中使用休眠类 [英] Not able to use hibernate classes in Apache Spark

查看:146
本文介绍了无法在Apache Spark中使用休眠类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Apache Spark进行高速计算.计算结果需要存储在Oracle中.

I am trying to use Apache Spark to do high speed computation. The results of this computation needs to be stored in Oracle.

我正在使用休眠模式来执行此操作.但是由于休眠中的某些类(如(JDBCTransaction))不可序列化-使用它们时,它们会抛出java.io.NotSerializableException:org.hibernate.transaction.JDBCTransaction),因此Spark群集不支持使用休眠.

I am using hibernate to do this. But since certain classes in hibernate like (JDBCTransaction) are not serializable - when they are used they throw a java.io.NotSerializableException: org.hibernate.transaction.JDBCTransaction), spark cluster does not support using hibernate.

是否可以使用冬眠方法来处理Spark?

Is there any work around to use hibernate to work with spark ?

我尝试使用字节码注入将JDBCTransaction类标记为可序列化,但是它抛出java.lang.IllegalStateException:未读的块数据异常.

I tried using byte code injection to mark JDBCTransaction class as serializable, but it throws java.lang.IllegalStateException: unread block data exception.

推荐答案

听起来您正在尝试在驱动程序中创建Transaction对象,并在RDD分区上执行事务.这就是为什么您会获得可序列化的异常的原因. Spark执行正在尝试将事务对象发送到远程进程,这显然是行不通的.即使可以序列化它,在多个远程并行事务中使用同一事务对象也是无效的.

It sounds like you're trying to create a Transaction object in your driver, and execute transactions on RDD partitions. This is why you're getting serializable exceptions; the spark execution is trying to send the transaction object to a remote process, which obviously won't work. Even if you could serialize it, it would not be valid to use the same transaction object in multiple remote parallel transactions.

如果需要并行将数据写入数据库,则可能应该查看

If you need to write data to the database in parallel, you should probably look at RDD.foreachPartition(), which would allow you to create a database connection and transaction locally for each separate paritition/process.

如果要写入数据库的数据相对较小,则可以改为collect()并将其作为驱动程序本地的对象返回,然后可以从那里写入数据库.

If the data you want to write to the database is relatively small, you might instead collect() it, which returns it as objects local to the driver, then you can write to the database from there.

这篇关于无法在Apache Spark中使用休眠类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆