是否可以在没有 HDFS 的情况下在伪分布式操作中运行 Hadoop? [英] Is it possible to run Hadoop in Pseudo-Distributed operation without HDFS?

查看：14 发布时间：2022/1/13 23:24:18 hadoop mapreduce local-storage hdfs

本文介绍了是否可以在没有 HDFS 的情况下在伪分布式操作中运行 Hadoop?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在探索在本地系统上运行 hadoop 应用程序的选项.

I'm exploring the options for running a hadoop application on a local system.

与许多应用程序一样，前几个版本应该能够在单个节点上运行，只要我们可以使用所有可用的 CPU 内核(是的，这与这个问题).当前的限制是，在我们的生产系统上我们有 Java 1.5，因此我们绑定到 Hadoop 0.18.3 作为最新版本(参见这个问题).所以很遗憾，我们还不能使用这个新功能.

As with many applications the first few releases should be able to run on a single node, as long as we can use all the available CPU cores (Yes, this is related to this question). The current limitation is that on our production systems we have Java 1.5 and as such we are bound to Hadoop 0.18.3 as the latest release (See this question). So unfortunately we can't use this new feature yet.

第一个选项是简单地以伪分布式模式运行 hadoop.本质上:创建一个完整的 hadoop 集群，其上的所有内容都在 1 个节点上运行.

The first option is to simply run hadoop in pseudo distributed mode. Essentially: create a complete hadoop cluster with everything on it running on exactly 1 node.

这种形式的缺点"是它还使用成熟的 HDFS.这意味着为了处理输入数据，必须首先将其上传"到本地存储的 DFS 上.因此，这需要输入和输出数据的额外传输时间，并使用额外的磁盘空间.当我们停留在单节点配置上时，我想避免这两种情况.

The "downside" of this form is that it also uses a full fledged HDFS. This means that in order to process the input data this must first be "uploaded" onto the DFS ... which is locally stored. So this takes additional transfer time of both the input and output data and uses additional disk space. I would like to avoid both of these while we stay on a single node configuration.

所以我在想:是否可以覆盖fs.hdfs.impl"设置并将其从org.apache.hadoop.dfs.DistributedFileSystem"更改为(例如)org.apache.hadoop.fs.LocalFileSystem"?

So I was thinking: Is it possible to override the "fs.hdfs.impl" setting and change it from "org.apache.hadoop.dfs.DistributedFileSystem" into (for example) "org.apache.hadoop.fs.LocalFileSystem"?

如果这可行，本地"hadoop 集群(只能由一个节点组成)可以使用现有文件而无需任何额外的存储要求，并且它可以更快地启动，因为无需上传文件.我希望仍然有一个工作和任务跟踪器，也许还有一个名称节点来控制整个事情.

If this works the "local" hadoop cluster (which can ONLY consist of ONE node) can use existing files without any additional storage requirements and it can start quicker because there is no need to upload the files. I would expect to still have a job and task tracker and perhaps also a namenode to control the whole thing.

以前有人试过吗?它可以工作还是这个想法离预期用途太远了?

Has anyone tried this before? Can it work or is this idea much too far off the intended use?

或者有没有更好的方法来达到同样的效果:Pseudo-Distributed operation without HDFS?

Or is there a better way of getting the same effect: Pseudo-Distributed operation without HDFS?

感谢您的见解.

编辑 2:

这是我为 hadoop 0.18.3 创建的配置conf/hadoop-site.xml 使用 bajafresh4life 提供的答案.

This is the config I created for hadoop 0.18.3 conf/hadoop-site.xml using the answer provided by bajafresh4life.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>file:///</value>
  </property>

  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:33301</value>
  </property>

  <property>
    <name>mapred.job.tracker.http.address</name>
    <value>localhost:33302</value>
    <description>
    The job tracker http server address and port the server will listen on.
    If the port is 0 then the server will start on a free port.
    </description>
  </property>

  <property>
    <name>mapred.task.tracker.http.address</name>
    <value>localhost:33303</value>
    <description>
    The task tracker http server address and port.
    If the port is 0 then the server will start on a free port.
    </description>
  </property>

</configuration>

是否可以在没有 HDFS 的情况下在伪分布式操作中运行 Hadoop? [英] Is it possible to run Hadoop in Pseudo-Distributed operation without HDFS?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否可以在没有 HDFS 的情况下在伪分布式操作中运行 Hadoop? [英] Is it possible to run Hadoop in Pseudo-Distributed operation without HDFS?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭