更改 dfs 文件的块大小 [英] Change block size of dfs file

查看:14
本文介绍了更改 dfs 文件的块大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的地图目前在解析一组特定文件(总共 2 TB)时效率低下.我想更改 Hadoop dfs 中文件的块大小(从 64MB 到 128MB).我在文档中找不到仅针对一组文件而不是整个集群的方法.

My map is currently inefficient when parsing one particular set of files (a total of 2 TB). I'd like to change the block size of files in the Hadoop dfs (from 64MB to 128 MB). I can't find how to do it in the documentation for only one set of files and not the entire cluster.

上传时哪个命令会更改块大小?(比如从本地复制到dfs.)

Which command changes the block size when I upload? (Such as copying from local to dfs.)

推荐答案

我改变了我的答案!您只需要在使用命令行时适当地设置 fs.local.block.size 配置设置即可.

I change my answer! You just need to set the fs.local.block.size configuration setting appropriately when you use the command line.

hadoop fs -D fs.local.block.size=134217728 -put local_name remote_location

原答案

您可以在使用 Hadoop API 创建文件时以编程方式指定块大小.不幸的是,您无法使用 hadoop fs -put 命令在命令行上执行此操作.为了做你想做的事,你必须编写自己的代码来将本地文件复制到远程位置;这并不难,只需为本地文件打开一个 FileInputStream,使用 FileSystem.create,然后使用类似 IOUtils.copy 来自 Apache Commons IO 以在两个流之间复制.

You can programatically specify the block size when you create a file with the Hadoop API. Unfortunately, you can't do this on the command line with the hadoop fs -put command. To do what you want, you'll have to write your own code to copy the local file to a remote location; it's not hard, just open a FileInputStream for the local file, create the remote OutputStream with FileSystem.create, and then use something like IOUtils.copy from Apache Commons IO to copy between the two streams.

这篇关于更改 dfs 文件的块大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆