Python写入HDFS文件 [英] Python write to hdfs file

查看:110
本文介绍了Python写入HDFS文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从本地python脚本在远程 HDFS中创建/写入/更新文件的最佳方法是什么?

What is the best way to create/write/update a file in remote HDFS from local python script?

我能够列出文件和目录,但是写似乎是个问题.

I am able to list files and directories but writing seems to be a problem.

我搜索了 hdfs

I have searched hdfs and snakebite but none of them give a clean way to do this.

推荐答案

尝试HDFS库.您可以使用write(). https://hdfscli.readthedocs.io/en/Latest/api.html#hdfs.client.Client.write

try HDFS liberary.. its really good You can use write(). https://hdfscli.readthedocs.io/en/latest/api.html#hdfs.client.Client.write

示例:

创建连接:

from hdfs import InsecureClient
client = InsecureClient('http://host:port', user='ann')

from json import dump, dumps
records = [
  {'name': 'foo', 'weight': 1},
  {'name': 'bar', 'weight': 2},
]

# As a context manager:
with client.write('data/records.jsonl', encoding='utf-8') as writer:
  dump(records, writer)

# Or, passing in a generator directly:
client.write('data/records.jsonl', data=dumps(records), encoding='utf-8')

对于CSV,您可以

import pandas as pd
df=pd.read.csv("file.csv")
with client_hdfs.write('path/output.csv', encoding = 'utf-8') as writer:
  df.to_csv(writer)

这篇关于Python写入HDFS文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆