通过SOCKS代理使用Hadoop? [英] Using Hadoop through a SOCKS proxy?

查看:134
本文介绍了通过SOCKS代理使用Hadoop?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我们的Hadoop集群在某些节点上运行,并且只能从这些节点进行访问.您通过SSH进入它们并完成工作.

So our Hadoop cluster runs on some nodes and can only be accessed from these nodes. You SSH into them and do your work.

因为这很烦人,但是(可以理解)没有人甚至会尝试配置访问控制,以便某些人可以从外部使用它,我正在尝试下一个最好的方法,即使用SSH运行SOCKS代理到集群:

Since that is quite annoying, but (understandably) nobody will even go near trying to configure access control so that it may be usable from outside for some, I'm trying the next best thing, i.e. using SSH to run a SOCKS proxy into the cluster:

$ ssh -D localhost:10000 the.gateway cat

有一些关于SOCKS支持的消息(很自然,我还没有找到任何文档),而且显然是在core-site.xml中使用的:

There are whispers of SOCKS support (naturally I haven't found any documentation), and apparently that goes into core-site.xml:

<property>
  <name>fs.default.name</name>
  <value>hdfs://reachable.from.behind.proxy:1234/</value></property>
<property>
  <name>mapred.job.tracker</name>
  <value>reachable.from.behind.proxy:5678</value></property>
<property>
  <name>hadoop.rpc.socket.factory.class.default</name>
  <value>org.apache.hadoop.net.SocksSocketFactory</value></property>
<property>
  <name>hadoop.socks.server</name>
  <value>localhost:10000</value></property>

除了hadoop fs -ls /仍然失败,没有提及SOCKS.

Except hadoop fs -ls / still fails, without any mention of SOCKS.

有什么提示吗?

我仅尝试运行作业,而不管理集群. 我只需要通过SOCKS访问HDFS并提交作业(似乎在群集节点之间使用SSL/代理完全不同;我不希望那样,我的机器不应该属于群集,只是客户.)

I'm only trying to run jobs, not administer the cluster. I only need to access HDFS and submit jobs, through SOCKS (it seems there's an entirely separate thing about using SSL/Proxies between the cluster nodes etc; I don't want that, my machine shouldn't be part of the cluster, just a client.)

是否有任何有用的文档?为了说明我未能打开有用的东西的情况:通过strace -f运行hadoop客户端并检出读取的配置文件,找到了配置值.

Is there any useful documentation on that? To illustrate my failure to turn up anything useful: I found the configuration values by running the hadoop client through strace -f and checking out the configuration files it read.

在任何地方都有描述,甚至它会对哪些配置值做出反应? (我从字面上发现零参考文档,只是过时的教程,我希望我缺少了什么?)

Is there a description anywhere of which configuration values it even reacts to? (I have literally found zero reference documentation, just differently outdated tutorials, I hope I've been missing something?)

是否有一种方法可以转储其实际使用的配置值?

Is there a way to dump the configuration values it is actually using?

推荐答案

https://issues.apache.org/jira/browse/HADOOP-1822

但是本文还指出,您必须将套接字类更改为SOCKS

But this article also notes that you have to change the socket class to SOCKS

http://rainerpeter.wordpress .com/2014/02/12/connect-to-hdfs-using-a-proxy/

使用

<property> <name>hadoop.rpc.socket.factory.class.default</name> <value>org.apache.hadoop.net.SocksSocketFactory</value> </property>

<property> <name>hadoop.rpc.socket.factory.class.default</name> <value>org.apache.hadoop.net.SocksSocketFactory</value> </property>

请注意,这些属性位于不同的文件中

Note that the properties go in different files:

  1. fs.default.name和hadoop.socks.server和hadoop.rpc.socket.factory.class.default需要进入 core-site.xml
  2. mapred.job.tracker和mapred.job.tracker.http.address配置需要进入 mapred-site.xml (用于map-reduce配置)
  1. fs.default.name and hadoop.socks.server and hadoop.rpc.socket.factory.class.default needs to go into core-site.xml
  2. mapred.job.tracker and mapred.job.tracker.http.address config needs to go into mapred-site.xml (for map-reduce config)

这篇关于通过SOCKS代理使用Hadoop?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆