是否可以将文件保存在Hadoop中而不将其保存在本地文件系统中? [英] Is it possible to save files in Hadoop without saving them in local file system?

查看:86
本文介绍了是否可以将文件保存在Hadoop中而不将其保存在本地文件系统中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将文件保存在Hadoop中而不将其保存在本地文件系统中?我想做如下所示的操作,但是我想直接在HDFS中保存文件。目前,我将文件保存在文档目录中,然后才可以使用 hadoop fs -put 将其保存在HDFS中。

Is it possible to save files in Hadoop without saving them in local file system? I would like to do something like shown below however I would like to save file directly in HDFS. At the moment I save files in documents directory and only then I can save them in HDFS for instance using hadoop fs -put.

class DataUploadView(GenericAPIView):

    def post(self, request):

            myfile = request.FILES['photo']
            fs = FileSystemStorage(location='documents/')
            filename = fs.save(myfile.name, myfile)
            local_path = 'my/path/documents/' + str(myfile.name)            
            hdfs_path = '/user/user1/' + str(myfile.name)
            run(['hadoop', 'fs', '-put', local_path, hdfs_path], shell=True)            


推荐答案

Hadoop具有REST API,可让您通过WebHDFS创建文件。

Hadoop has REST APIs that allow you to create files via WebHDFS.

因此您可以编写自己的 create 基于REST API,使用python库(例如 requests )执行HTTP。但是,也有几个支持Hadoop / HDFS且已经使用REST API或通过 libhdfs 使用RPC机制的python库。

So you could write your own create based on the REST APIs using a python library like requests for doing the HTTP. However, there are also several python libraries that support Hadoop/HDFS and already use the REST APIs or that use the RPC mechanism via libhdfs.


  • pydoop

  • hadoopy

  • snakebite

  • pywebhdfs

  • hdfscli

  • pyarrow

  • pydoop
  • hadoopy
  • snakebite
  • pywebhdfs
  • hdfscli
  • pyarrow

只需确保您在寻找如何创建文件的方法,而不是让python库调用 hdfs dfs -put hadoop fs -put

Just make sure you look for how to create a file rather than having the python library call hdfs dfs -put or hadoop fs -put.

有关更多信息,请参见以下内容:

See the following for more information:

  • pydoop vs hadoopy - hadoop python client
  • List all files in HDFS Python without pydoop
  • A Guide to Python Frameworks for Hadoop
  • Native Hadoop file system (HDFS) connectivity in Python
  • PyArrow
  • https://github.com/pywebhdfs/pywebhdfs
  • https://github.com/spotify/snakebite
  • https://crs4.github.io/pydoop/api_docs/hdfs_api.html
  • https://hdfscli.readthedocs.io/en/latest/
  • WebHDFS REST API:Create and Write to a File

这篇关于是否可以将文件保存在Hadoop中而不将其保存在本地文件系统中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆