更改 dfs 文件的块大小 [英] Change block size of dfs file
问题描述
我的地图目前在解析一组特定文件(总共 2 TB)时效率低下.我想更改 Hadoop dfs 中文件的块大小(从 64MB 到 128MB).我在文档中找不到仅针对一组文件而不是整个集群的方法.
My map is currently inefficient when parsing one particular set of files (a total of 2 TB). I'd like to change the block size of files in the Hadoop dfs (from 64MB to 128 MB). I can't find how to do it in the documentation for only one set of files and not the entire cluster.
上传时哪个命令会更改块大小?(比如从本地复制到dfs.)
Which command changes the block size when I upload? (Such as copying from local to dfs.)
推荐答案
我改变了我的答案!您只需要在使用命令行时适当地设置 fs.local.block.size
配置设置即可.
I change my answer! You just need to set the fs.local.block.size
configuration setting appropriately when you use the command line.
hadoop fs -D fs.local.block.size=134217728 -put local_name remote_location
原答案
您可以在使用 Hadoop API 创建文件时以编程方式指定块大小.不幸的是,您无法使用 hadoop fs -put
命令在命令行上执行此操作.为了做你想做的事,你必须编写自己的代码来将本地文件复制到远程位置;这并不难,只需为本地文件打开一个 FileInputStream
,使用 FileSystem.create
,然后使用类似 IOUtils.copy
来自 Apache Commons IO 以在两个流之间复制.
You can programatically specify the block size when you create a file with the Hadoop API. Unfortunately, you can't do this on the command line with the hadoop fs -put
command. To do what you want, you'll have to write your own code to copy the local file to a remote location; it's not hard, just open a FileInputStream
for the local file, create the remote OutputStream
with FileSystem.create
, and then use something like IOUtils.copy
from Apache Commons IO to copy between the two streams.
这篇关于更改 dfs 文件的块大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!