我自己的自定义二进制文件有NoSql吗? [英] NoSql With My Own Custom Binary Files?

查看:84
本文介绍了我自己的自定义二进制文件有NoSql吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最初,我只需要处理1.5 [TB]的数据。由于我只需要快速的读写操作(无需任何SQL),因此我设计了自己的平面二进制文件格式(使用 python 实现),并轻松地(愉快地)保存了数据并在一台机器上操纵它。当然,出于备份目的,我添加了两台机器用作精确镜像(使用 rsync )。

Originally, I had to deal with just 1.5[TB] of data. Since I just needed fast write/reads (without any SQL), I designed my own flat binary file format (implemented using python) and easily (and happily) saved my data and manipulated it on one machine. Of course, for backup purposes, I added 2 machines to be used as exact mirrors (using rsync).

目前,我的需求正在增长,并且有必要构建一种能够成功扩展至20TB(甚至更多)数据的解决方案。 我很高兴继续使用我的平面文件格式进行存储

Presently, my needs are growing, and there's a need to build a solution that would successfully scale up to 20[TB] (and even more) of data. I would be happy to continue using my flat file format for storage. It is fast, reliable and gives me everything I need.

我担心的事情是复制,数据一致性等(显然,数据必须分发- -并非所有数据可以通过网络存储在一台计算机上。

The thing I am concerned about is replication, data consistency etc (as obviously, data will have to be distributed -- not all data can be stored on one machine) across the network.

是否有任何现成的解决方案(基于Linux / python的)可以让我使用文件格式进行存储,却可以处理 NoSql 解决方案通常提供的其他组件? (数据一致性/可用性/易于复制)?

Are there any ready-made solutions (Linux / python based) that would allow me to keep using my file format for storage, yet would handle the other components that NoSql solutions normally provide? (data consistency / availability / easy replication)?

基本上,我想确保所有二进制文件在整个网络中都是一致的。我正在使用由60个核心双核计算机组成的网络(每个计算机都具有 1GB RAM 1.5TB磁盘

basically, all I want to make sure is that my binary files are consistent throughout my network. I am using a network of 60 core-duo machines (each with 1GB RAM and 1.5TB disk)

推荐答案

方法:使用Disco Project在Python中进行分布式地图缩减

似乎是解决问题的一种好方法。我曾将迪斯科项目用于类似的问题。

Seems like a good way of approaching your problem. I have used the disco project with similar problems.

您可以在n个机器(进程)之间分配文件,并实现映射并减少适合您逻辑的功能。

You can distribute your files among n numbers of machines (processes), and implement the map and reduce functions that fit your logic.

迪斯科项目的教程,准确地描述了如何为您的问题实施解决方案。您将只需要编写很少的代码就可以打动您,并且绝对可以保留二进制文件的格式。

The tutorial of the disco project, exactly describes how to implement a solution for your problems. You'll be impressed about how little code you need to write and definitely you can keep the format of your binary file.

另一个类似的选择是使用亚马逊的Elastic MapReduce

Another similar option is to use Amazon's Elastic MapReduce

这篇关于我自己的自定义二进制文件有NoSql吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆