关于DSC中的文件 [英] about the files in DSC

查看:67
本文介绍了关于DSC中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


当我向DSC添加本地文件时,使用命令"dsc fileset add localdirectory filesetname",然后将本地文件添加到DSC.Does  ;整个文件被添加到许多节点,每个节点都有部分文件,或者孔文件被添加到
一个节点?例如,如果文件大小是10G,那么10G被保存到一个节点或多个节点,每个节点都有部分文件?


我想将10G文件保存到多个节点。但是在将文件添加到DSC之前,我不知道想要将文件拆分成许多小文件。有没有办法可以帮助我自动将文件分区到多个节点?


期待你的回复!

解决方案

您指定的目录中的每个文件都会添加到DSC。这些文件不会以任何方式更改。然后根据群集的复制因子将它们复制到尽可能多的节点(在测试版中,默认情况下为1,但对于
RTM,它将为3。


如果你想将10Gb文件放到集群上,那么你可以使用DSC将它作为一个文件添加,但它只能被复制到一个节点然后根据复制因子进行复制。这是无效的,因为Dryad无法生成充分利用集群中节点的所有
。在所有节点上获取这样的单个文件的最简单方法是使用RangePartition或HashPartition对其进行分区。这样就有了创建新集合的效果。包含相同记录的文件在群集中的所有
分开。


从Dryad中获得良好性能的唯一方法是在群集中对数据进行分区,如下所示我不确定我是否理解你不拆分文件的要求。这使得使用像Dryad这样的系统失败了。


Ade


Hi,

When i added a local file to DSC,using the command "dsc fileset add localdirectory filesetname",then the local file was added to DSC.Does the whole  file was added to many nodes ,each node has part of the file,or the hole file was added to one node?For example,if the file size is 10G,then 10G was saved to one node or to many nodes,each node has part of the file?

I want to save the 10G file to more than one node.But before adding the file to DSC ,i do not want to split the file to many small files.Does there is a way can help me to automatically partition the file to many nodes?

Looking forward for you reply!

解决方案

Each file within the directory you specified gets added to DSC. The files are not altered in any way. They are then replicated across as many nodes according to the replication factor of the cluster (in the beta this is 1 by default but for RTM it will be 3.

If you want to get a 10Gb file onto the cluster then you can use DSC to add it as one file but it dill only be copied to the a single node and then replicated according to the replication factor. This is not effiencient as Dryad cannot make good use of all the nodes in the cluster. The simpest way to get the get a single file like this on all the nodes is to partition it using a RangePartition or HashPartition. This has the efffect of creating a new set of files comtaining the same records split up across all the nodes in the cluster.

The only way to get good performance out of Dryad is to partition your data across the cluster like this. I'm not sure I understand your requirement to not split up your file. This defeats the point of using a system like Dryad.

Ade


这篇关于关于DSC中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆