如何将数据从一个HDFS复制到另一个HDFS? [英] How to do I copy data from one HDFS to another HDFS?
问题描述
我有两个HDFS设置,希望将HDFS1中的某些表复制到HDFS2中(而不是迁移或移动)。如何将数据从一个HDFS复制到另一个HDFS?是否可以通过Sqoop或其他命令行?
DistCp(分布式副本)是一种用于在群集之间复制数据的工具。它使用MapReduce来实现分布,错误处理和恢复以及报告。它将文件和目录列表扩展为映射任务的输入,其中每个文件和目录都将复制源列表中指定文件的一个分区。
用法: $ hadoop distcp< src> < dst>
示例: $ hadoop distcp hdfs:// nn1:8020 / file1 hdfs:// nn2:8020 / file2
file1
from nn1
被复制到 nn2
,文件名为 file2
Distcp是目前最好的工具。 Sqoop用于将关系数据库中的数据复制到HDFS,反之亦然,但不能在HDFS到HDFS之间复制数据。
更多信息:
- http://hadoop.apache .org / docs / r1.2.1 / distcp.html
- http://hadoop.apache.org/docs/r1.2.1/distcp2.html
$ b $有两个版本可用 -
distcp2
中的运行时性能与 distcp
I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How to do I copy data from one HDFS to another HDFS? Is it possible via Sqoop or other command line?
DistCp (distributed copy) is a tool used for copying data between clusters. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.
Usage: $ hadoop distcp <src> <dst>
example: $ hadoop distcp hdfs://nn1:8020/file1 hdfs://nn2:8020/file2
file1
from nn1
is copied to nn2
with filename file2
Distcp is the best tool as of now. Sqoop is used to copy data from relational database to HDFS and vice versa, but not between HDFS to HDFS.
More info:
There are two versions available - runtime performance in distcp2
is more compared to distcp
这篇关于如何将数据从一个HDFS复制到另一个HDFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!