如何将数据从一个HDFS复制到另一个HDFS? [英] How to copy data from one HDFS to another HDFS?

查看:1164
本文介绍了如何将数据从一个HDFS复制到另一个HDFS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个HDFS设置,并且想要将某些表从HDFS1复制(而不是迁移或移动)到HDFS2.如何将数据从一个HDFS复制到另一个HDFS?是否可以通过Sqoop或其他命令行?

解决方案

DistCp(分布式副本)是用于在集群之间复制数据的工具.它使用MapReduce来实现其分发,错误处理和恢复以及报告.它将文件和目录的列表扩展为映射任务的输入,每个任务都会复制源列表中指定的文件分区.

用法:$ hadoop distcp <src> <dst>

示例:$ hadoop distcp hdfs://nn1:8020/file1 hdfs://nn2:8020/file2

nn1中的

file1复制到文件名为file2nn2

Distcp是目前最好的工具. Sqoop用于将数据从关系数据库复制到HDFS,反之亦然,但不能在HDFS和HDFS之间复制数据.

更多信息:

有两个版本可用-与distcp相比,distcp2中的运行时性能更高

I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How to copy data from one HDFS to another HDFS? Is it possible via Sqoop or other command line?

解决方案

DistCp (distributed copy) is a tool used for copying data between clusters. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.

Usage: $ hadoop distcp <src> <dst>

example: $ hadoop distcp hdfs://nn1:8020/file1 hdfs://nn2:8020/file2

file1 from nn1 is copied to nn2 with filename file2

Distcp is the best tool as of now. Sqoop is used to copy data from relational database to HDFS and vice versa, but not between HDFS to HDFS.

More info:

There are two versions available - runtime performance in distcp2 is more compared to distcp

这篇关于如何将数据从一个HDFS复制到另一个HDFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆