从 Storm bolt 将行插入 HBase [英] Insert rows into HBase from a Storm bolt

查看:21
本文介绍了从 Storm bolt 将行插入 HBase的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够从分布式(非本地)Storm 拓扑中将新条目写入 HBase.有一些 GitHub 项目提供 HBase Mappers预制 Storm bolts 将元组写入 HBase.这些项目提供了在 LocalCluster 上执行其示例的说明.

I would like to be able to write new entries into HBase from a distributed (not local) Storm topology. There exist a few GitHub projects that provide either HBase Mappers or pre-made Storm bolts to write Tuples into HBase. These projects provide instructions for executing their samples on the LocalCluster.

我在使用这两个项目并直接从 Bolt 访问 HBase API 时遇到的问题是,它们都需要将 HBase-site.xml 文件包含在类路径中.使用直接 API 方法,也许还有 GitHub 方法,当您执行 HBaseConfiguration.create(); 时,它会尝试从类路径上的条目中找到它需要的信息.

The problem that I am running into with both of these projects, and directly accessing the HBase API from the bolt, is that they all require the HBase-site.xml file to be included on the classpath. With the direct API approach, and perhaps with the GitHub ones as well, when you execute HBaseConfiguration.create(); it will try to find the information it needs from an entry on the classpath.

如何修改 Storm bolts 的类路径以包含 Hbase 配置文件?

How can I modify the classpath for the storm bolts to include the Hbase configuration file?

更新:使用 danehammer 的回答,这就是我的工作方式

将以下文件复制到您的 ~/.storm 目录中:

Copy the following files into your ~/.storm directory:

  • hbase-common-0.98.0.2.1.2.0-402-hadoop2.jar
  • hbase-site.xml
  • storm.yaml :注意:如果您不将storm.yaml 复制到该目录中,那么storm jar 命令将不会在类路径中使用该目录(请参阅storm.py python 脚本 亲自查看该逻辑 - 如果这是记录在案)
  • hbase-common-0.98.0.2.1.2.0-402-hadoop2.jar
  • hbase-site.xml
  • storm.yaml : NOTE: if you do not copy storm.yaml into that directory, then the storm jar command will NOT use that directory in the classpath (see the storm.py python script to see that logic for yourself - would be nice if this was documented)

接下来,在拓扑类的 main 方法中获取 HBase 配置并将其序列化:

Next, in your topology class's main method get the HBase Configuration and serialize it:

final Configuration hbaseConfig = HBaseConfiguration.create();
final DataOutputBuffer databufHbaseConfig = new DataOutputBuffer();
hbaseConfig.write(databufHbaseConfig);
final byte[] baHbaseConfigSerialized = databufHbaseConfig.getData();

通过构造函数将字节数组传递给您的 spout 类.spout 类将这个字节数组保存到一个字段中(不要在构造函数中反序列化.我发现如果 spout 有一个 Configuration 字段你会在运行拓扑时得到一个无法序列化的异常)

Pass the byte array to your spout class through the constructor. The spout class saves this byte array to a field (Do not deserialize in the constructor. I found that if the spout has a Configuration field you will get a cannot serialize exception when running the topology)

在 spout 的 open 方法中,反序列化配置并访问 hbase 表:

in the spout's open method, deserialize the config and access the hbase table:

Configuration hBaseConfiguration = new Configuration();
ByteArrayInputStream bas = new ByteArrayInputStream(baHbaseConfigSerialized);
hBaseConfiguration.readFields(new DataInputStream(bas));
HTable tbl = new HTable(hBaseConfiguration, HBASE_TABLE_NAME);

Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("YOUR_COLUMN"));

scnrTbl = tbl.getScanner(scan);

现在,在 nextTuple 方法中,您可以使用 Scanner 获取下一行:

Now, in your nextTuple method you can use the Scanner to get the next row:

Result rsltWaveform = scnrWaveformTbl.next();

从结果中提取您想要的内容,并将这些值以某个可序列化对象的形式传递给螺栓.

Extract what you want from the result, and pass those values in some serializable object to the bolts.

推荐答案

当您使用storm jar"命令部署拓扑时,~/.storm 文件夹将位于类路径上(请参阅此链接在 jar 命令下).如果您将 hbase-site.xml 文件(或相关的 *-site.xml 文件)放在该文件夹中,storm jar"期间的 HBaseConfiguration.create() 将找到该文件并正确返回给您org.apache.hadoop.configuration.Configuration.这需要在您的拓扑中存储和序列化,以便在集群周围分发该配置.

When you deploy a topology with the "storm jar" command, the ~/.storm folder will be on the classpath (see this link under jar command). If you placed the hbase-site.xml file (or related *-site.xml files) in that folder, HBaseConfiguration.create() during "storm jar" would find that file and correctly return you an org.apache.hadoop.configuration.Configuration. This would need to be stored and serialized within your topology in order to distribute that config around the cluster.

这篇关于从 Storm bolt 将行插入 HBase的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆