将BLOB(图像或二进制对象)写入DSC - DSC序列化 [英] Writing BLOBs (images or binary objects) to DSC - DSC Serialization

查看:158
本文介绍了将BLOB(图像或二进制对象)写入DSC - DSC序列化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将图像写入DSC(如一堆jpeg图像)是否需要自定义序列化机制?我见过使用FromEnumerable的DupPic2示例,后者又使用ObjectRecord包装器将文件传输到DSC。

Does writing images to DSC (like a bunch of jpeg images) require a custom serialization mechanism? I have seen the DupPic2 example which uses FromEnumerable which in turn uses the ObjectRecord wrapper for transferring files to DSC.

如果我不使用HPCLinqExtras.dll并尝试将图像传输到DSC使用基本API,挑战是什么?

If I were to not use the HPCLinqExtras.dll and try to transfer images to DSC using the basic API, what is the challenge?

从学习的角度来看,这更多的是:数据转移到DSC时究竟发生了什么?为什么DSC需要单独的序列化机制,为什么有些情况需要自定义序列化?

This is more from a learning point of view: What exactly happens while data is trasferred to DSC? Why does DSC require a separate serialization mechanism, and why do some cases require custom serialization?

文档还解释了一些我无法清楚掌握的引导问题。

The documentation also explains about a certain bootstrapping problem, which I couldn't grasp clearly.

我只想更好地理解这一点。如果您能指出一些我可以阅读并获得理解的文档/资源,我将不胜感激。

I just want to understand this better. I would appreciate if you can point me to some documentation/resource that I can read and get an understanding.

推荐答案

只有DupPic2示例中使用的FileRecord类才需要,因为DupPic按原始文件名返回重复文件的列表。因此,FileRecord用于存储二进制文件数据及其原始路径。如果这不是
的要求,那么您可以使用DSC.EXE命令行工具或DscService API将图像直接加载到集群上。要确保查询访问存储在顶点本地的文件,可以使用Utilities.CreateFileSetNodeMap帮助程序。这个
的一个例子可以在DataImportExamplesWithUtilities.cs中的PorgammingGuideSamples项目中找到,ReadLibrarySerializedDataFiles()示例。

The FileRecord class used in the DupPic2 example is only required because DupPic returns a list of the duplicate files by their original file names. Hence FileRecord is used to store both the binary file data and it's original path. If this is not a requirement then you can use the DSC.EXE command line tool or the DscService APIs to load images directly onto the cluster. To ensure that queries access files that are stored local to the vertex you can use the Utilities.CreateFileSetNodeMap helper. An example of this can be found in the PorgammingGuideSamples project in DataImportExamplesWithUtilities.cs, the ReadLibrarySerializedDataFiles() example.

要清楚,当数据传输到DSC时没有任何反应。文件将直接复制到群集而不进行任何更改。您可以创建一个包含(比方说)jpeg图像的文件集,并直接从DSC共享加载这些图像,它们与
保持不变,(明显的)例外有不同的文件扩展名。

To be clear nothing happens to data when it's transferred to DSC. The files are copied directly to the cluster with no changes. You can create a fileset containing (say) jpeg images and load those images directly off the DSC shares, they are unchanged with the (obvious) exception of have a different file extension.

这里的样本有点不清楚。我已经为下一个版本更新了它。这是代码。您将看到您不必使用FileRecord:

The sample is somewhat unclear here. I've updated it for our next release. Here's the code. You'll see you don't have to use FileRecord:

 

 

public
static
void
ReadLibrarySerializedDataFiles(
HpcLinqContext
context)

public static void ReadLibrarySerializedDataFiles(HpcLinqContext context)

{

 

 

 

CustomTypeFileGenerator 。CreateFiles( SampleConfiguration 。ImageFilesPath );

CustomTypeFileGenerator.CreateFiles(SampleConfiguration.ImageFilesPath);

RemoveFileSets(context);

RemoveFileSets(context);

 

context.DscService.CreateFileSet(DataPartitionedFileSetName)

context.DscService.CreateFileSet(DataPartitionedFileSetName)

 

.AddNewFiles(

.AddNewFiles(

 

目录 .GetFiles( SampleConfiguration 。ImageFilesPath ,
" * .jpg" ))

Directory.GetFiles(SampleConfiguration.ImageFilesPath, "*.jpg"))

.Seal();

 

 

 

var
imageData = context.CreateFileSetNodeMap(DataPartitionedFileSetName)

var imageData = context.CreateFileSetNodeMap(DataPartitionedFileSetName)

。选择(r => GetImageData(r.Line));

.Select(r => GetImageData(r.Line));

 

 

 

控制台 。WriteLine( " \ n \\\
Partitioned
数据作为图像记录的DSC文件集(前10条记录):"
);

Console.WriteLine("\n\nPartitioned data as DSC file set of Image records (first 10 records):");

 

 

foreach
var
rec
in
imageData.Take(10 ))

foreach (var rec in imageData.Take(10))

 

 

控制台 。WriteLine( "
{0} \t {1} x {2}"
,rec.Item1,rec.Item2,rec.Item3);

Console.WriteLine(" {0}\t{1}x{2}", rec.Item1, rec.Item2, rec.Item3);

}

 

 

 

public
静态
元组 < string
int
int >
GetImageData(
string
路径"

public static Tuple<string, int, int> GetImageData(string path)

{

 

 

 

使用
Image
img =
图像 。FromFile(path))

using (Image img = Image.FromFile(path))

{

 

 

 

返回
new
元组 < string
int
int >(路径,
img.Width,img.Height);

return new Tuple<string, int, int>(path, img.Width, img.Height);

}

 

}

 


这篇关于将BLOB(图像或二进制对象)写入DSC - DSC序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆