关于 LoadFunc 的示例和更多说明 [英] Example and more explanation about LoadFunc

查看:29
本文介绍了关于 LoadFunc 的示例和更多说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在哪里可以找到有关 LoadFunc 的更多信息/示例.除了 http://web.archive.org/web/20130701024312/http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html我没有看到任何使用新 LoadFunc API 的示例.谁能告诉我在哪里可以找到一些编写 Load UDF 的示例?

Where can I find more information/example about LoadFunc. Except for the http://web.archive.org/web/20130701024312/http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html I dont see any examples that use the new LoadFunc APis. Can anyone please let me know where I can find some example for writing Load UDF?

推荐答案

从 0.7.0 开始,Pig 加载器扩展了 LoadFunc 抽象类.这意味着它们需要覆盖 4 个方法:

As of 0.7.0, Pig loaders extend the LoadFunc abstract class.This means they need to override 4 methods:

  • getInputFormat() 这个方法返回给调用者一个加载器支持的 InputFormat 的实例.实际的加载过程需要在加载时使用一个实例,并且不想对该实例的创建方式施加任何限制.

  • getInputFormat() this method returns to the caller an instance of the InputFormat that the loader supports. The actual load process needs an instance to use at load time, and doesn't want to place any constraints on how that instance is created.

prepareToRead() 在读取拆分之前被调用.它传入在拆分读取期间使用的读取器,以及实际拆分.加载器的实现通常会保留读者,如果需要,可能想要访问实际的拆分.

prepareToRead() is called prior to reading a split. It passes in the reader used during the reads of the split, as well as the actual split. The implementation of the loader usually keeps the reader, and may want to access the actual split if needed.

setLocation() Pig 调用它来将加载位置传达给加载器,加载器负责将该信息传递给底层 InputFormat 对象.这个方法可以被多次调用,所以应该没有与该方法相关联的状态(除非在调用该方法时该状态被重置).

setLocation() Pig calls this to communicate the load location to the loader, which is responsible for passing that information to the underlying InputFormat object. This method can be called multiple times, so there should be no state associated with the method (unless that state gets reset when the method is called).

getNext() 在所有设置完成后,Pig 调用它以从加载器中获取下一个元组.如果此方法返回 NULL,Pig 假定通过 prepareToRead() 方法传递的拆分中的所有信息都已处理.

getNext() Pig calls this to get the next tuple from the loader once all setup has been done. If this method returns a NULL, Pig assumes that all information in the split passed via the prepareToRead() method has been processed.

这里有几篇为 Pig 编写自定义加载函数的好文章:

Here are a few nice articles to write Custom Load Function for Pig:

这篇关于关于 LoadFunc 的示例和更多说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆