在Apache Drill上制作存储插件到HDFS [英] Making storage plugin on Apache Drill to HDFS

查看:276
本文介绍了在Apache Drill上制作存储插件到HDFS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为Hadoop(hdfs)和Apache Drill创建存储插件. 其实我很困惑,我不知道将什么设置为hdfs://连接的端口,以及什么设置为位置. 这是我的插件:

I'm trying to make storage plugin for Hadoop (hdfs) and Apache Drill. Actually I'm confused and I don't know what to set as port for hdfs:// connection, and what to set for location. This is my plugin:

 {
 "type": "file",
 "enabled": true,
 "connection": "hdfs://localhost:54310",
 "workspaces": {
 "root": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": null
},
"tmp": {
  "location": "/tmp",
  "writable": true,
  "defaultInputFormat": null
}
 },
"formats": {
  "psv": {
  "type": "text",
  "extensions": [
    "tbl"
  ],
  "delimiter": "|"
},
"csv": {
  "type": "text",
  "extensions": [
    "csv"
  ],
  "delimiter": ","
},
"tsv": {
  "type": "text",
  "extensions": [
    "tsv"
  ],
  "delimiter": "\t"
},
"parquet": {
  "type": "parquet"
},
"json": {
  "type": "json"
},
"avro": {
  "type": "avro"
   }
 }
}

因此,设置 localhost:54310 是正确的,因为我通过命令获得了该信息:

So, is ti correct to set localhost:54310 because I got that with command:

 hdfs -getconf -nnRpcAddresses 

或者是:8020?

第二个问题,我需要为位置设置什么?我的hadoop文件夹位于:

Second question, what do I need to set for location? My hadoop folder is in:

/usr/local/hadoop

,在那里您可以找到/etc/bin/lib/log ...那么,我是否需要在我的datanode上设置位置,或者?

, and there you can find /etc /bin /lib /log ... So, do I need to set location on my datanode, or?

第三个问题.当我连接到Drill时,我要通过 sqlline ,而不是像下面这样在Zookeeper上进行连接:

Third question. When I'm connecting to Drill, I'm going through sqlline and than connecting on my zookeeper like:

  !connect jdbc:drill:zk=localhost:2181 

我的问题是,在制作存储插件之后,当我使用zk连接到Drill时,可以查询hdfs文件吗?

My question here is, after I make storage plugin and when I connect to Drill with zk, can I query hdfs file?

如果这是一个菜鸟问题,我感到非常抱歉,但是我在互联网上找不到任何有用的东西,或者至少它没有帮助我. 如果您能向我解释一些内容,我将不胜感激.

I'm very sorry if this is a noob question but I haven't find anything useful on internet or at least it haven't helped me. If you are able to explain me some stuff, I'll be very grateful.

推荐答案

按照Drill "connection" 中,

In "connection",

输入名称节点服务器地址.

put namenode server address.

如果您不确定该地址. 检查core-site.xml中的fs.default.namefs.defaultFS属性.

If you are not sure about this address. Check fs.default.name or fs.defaultFS properties in core-site.xml.

进入"workspaces"

Coming to "workspaces",

您可以在其中保存工作区.在上面的示例中,有一个workspace,名称为root,位置为/user/root/drill. 这是您的HDFS位置.

you can save workspaces in this. In the above example, there is a workspace with name root and location /user/root/drill. This is your HDFS location.

如果在/user/root/drill hdfs目录下有文件,则可以使用此工作空间名称查询它们.

If you have files under /user/root/drill hdfs directory, you can query them using this workspace name.

示例:abc在此目录下.

 select * from dfs.root.`abc.csv`


成功创建插件后,您可以开始钻取并开始查询.


After successfully creating the plugin, you can start drill and start querying .

您可以查询任何目录,而与工作空间无关.

You can query any directory irrespective to workspaces.

说您要在/tmp/data hdfs目录中查询employee.json.

Say you want to query employee.json in /tmp/data hdfs directory.

查询是:

select * from dfs.`/tmp/data/employee.json`

这篇关于在Apache Drill上制作存储插件到HDFS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆