如何序列化hadoop中的对象(在HDFS中) [英] How to Serialize object in hadoop (in HDFS)

查看:168
本文介绍了如何序列化hadoop中的对象(在HDFS中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个HashMap< String,ArrayList<整数>>。我想将我的HashMap对象(hmap)序列化到HDFS位置,然后在Mapper和Reducers中反序列化以使用它。



为了在HDFS上序列化我的HashMap对象,我使用了普通的java对象序列化代码,如下所示,但出现错误(permission denied)

  try 
{
FileOutputStream fileOut = new FileOutputStream(hashmap.ser);
ObjectOutputStream out = new ObjectOutputStream(fileOut);
out.writeObject(hm);
out.close();

$ b catch(例外e)
{
e.printStackTrace();
}

我得到以下例外:

  java.io.FileNotFoundException:hashmap.ser(Permission denied)$ b $ at java.io.FileOutputStream.open(Native Method)
at java.io .FileOutputStream<初始化>(FileOutputStream.java:221)在java.io.FileOutputStream中
将。INIT>(FileOutputStream.java:110)
。在KMerIndex.createIndex(KMerIndex.java:121 )在MyDriverClass.formRefIndex(MyDriverClass.java:717
)在MyDriverClass.main(MyDriverClass.java:768

在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)
。在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
在java.lang.reflect.Method.invoke( Method.java:606)在org.apache.hadoop.util.RunJar.run(RunJar.java:221

在org.apache.hadoop.util.RunJar.main(RunJar.java: 136)

有人可以建议或共享如何在hadoop上序列化hdfs中的对象的示例代码吗?

解决方案

请尝试使用 SerializationUtils 来自Apache Commons Lang。



以下是方法

  static Object clone(Serializable object)/ /使用序列化深入克隆对象。 
static Object deserialize(byte [] objectData)//从一个字节数组中反序列化单个对象。
static Object deserialize(InputStream inputStream)//反序列化指定流中的对象。
static byte [] serialize(Serializable obj)//将对象序列化为字节数组以存储/序列化。
static void serialize(Serializable obj,OutputStream outputStream)//将一个Object序列化到指定的流。

在存储到HDFS时,您可以存储 byte [] 是从serialize返回的。
在获取Object的同时,您可以将类型转换为ex:File对象的相应对象,然后将其恢复。



,我在Hbase列中存储了一个HashMap,我将它恢复,在我的映射器方法中作为Hashmap,因为它是成功的。



当然,你也可以用同样的方法来做到这一点......

另外一个是你也可以使用Apache Commons IO 引用此 org.apache.commons.io .FileUtils );
但稍后您需要将此文件复制到HDFS。因为你需要HDFS作为数据存储。

  FileUtils.writeByteArrayToFile(new File(pathname),myByteArray); 

注意:hadoop集群中始终提供jars apache commons io和apache commons lang。


I have a HashMap < String,ArrayList < Integer > >. I want to serialize my HashMap object(hmap) to HDFS location and later deserialize it at Mapper and Reducers for using it.

To serialize my HashMap object on HDFS I used normal java object serialization code as follows but got error (permission denied)

try
        {
            FileOutputStream fileOut =new FileOutputStream("hashmap.ser");
            ObjectOutputStream out = new ObjectOutputStream(fileOut);
            out.writeObject(hm);
            out.close();

        }
        catch(Exception e)
        {
            e.printStackTrace();
        }

I got the following exception

java.io.FileNotFoundException: hashmap.ser (Permission denied)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
    at KMerIndex.createIndex(KMerIndex.java:121)
    at MyDriverClass.formRefIndex(MyDriverClass.java:717)
    at MyDriverClass.main(MyDriverClass.java:768)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Can someone please suggest or share the sample code of how to serialize object in hadoop on hdfs ?

解决方案

Please try using SerializationUtils from Apache Commons Lang.

Below are methods

static Object   clone(Serializable object)  //Deep clone an Object using serialization.
static Object   deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object   deserialize(InputStream inputStream)  //Deserializes an Object from the specified stream.
static byte[]   serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.

While storing in to HDFS you can store byte[] which was returned from serialize. While getting the Object you can type cast to corresponding object for ex: File object and can get it back.

In my case, I was storing one hashmap in Hbase column, I retrieved it back, in my mapper method as Hashmap as it is.. and was successful in that.

Surely, you can also do that in the same way...

Another thing is You can also Use Apache Commons IO refer this (org.apache.commons.io.FileUtils); but later you need to copy this file to HDFS. since you wanted HDFS as datastore.

FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray);

Note : Both jars apache commons io and apache commons lang are always available in hadoop cluster.

这篇关于如何序列化hadoop中的对象(在HDFS中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆