将json存储为cassandra中的文本对blob的优点或缺点是什么? [英] What are the pros or cons of storing json as text vs blob in cassandra?

查看:580
本文介绍了将json存储为cassandra中的文本对blob的优点或缺点是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我来说,blob的一个问题是,在java中,ByteBuffer(它被映射到cassandra中的blob)是不可序列化的,因此不能很好地与EJB一起使用。



考虑到json是相当大的什么将是更好的类型存储json在cassandra。是文本还是blob?



决定blob vs json时,json的大小是否重要?



如果它是任何其他数据库像oracle,它是常见的使用blob / clob。但是在Cassandra中,每个单元格可以容纳2GB,这是否重要?



请考虑这个问题作为文本与blob之间的选择,关于是否为json使用单列的建议。

解决方案

我不认为存储文本 JSON数据作为Cassandra中的 BLOB 。最好你的存储成本是相同的,一般来说,API在使用 BLOB 类型方面不太方便,因为它们用于处理字符串/文本。



例如,如果您使用 Java API ,然后为了使用参数化的 PreparedStatement 将数据存储为 BLOB 您首先需要将它全部加载到 ByteBuffer 中,例如将JSON数据打包到 InputStream 中。



除非你处理的是非常大的 JSON代码段,它强迫你流式传输你的数据,这是一个额外的工作,访问 BLOB 类型。你会从中获得什么?



但是,我认为有一些优点,询问'应该将JSON存储为文本还是gzip并将压缩数据存储为 BLOB



答案就是你如何配置Cassandra和你的表。特别是,只要你使用Cassandra版本1.1或更高版本,你的表默认启用压缩。这可能是足够的,特别是如果你的JSON数据在每一行是相当一致的。



但是,Cassandra的内置压缩是应用表,而不是个别行。因此,通过在存储之前手动压缩JSON数据,将压缩字节写入 ByteBuffer ,然后将数据以<$ c的形式发送到Cassandra,可以获得更好的压缩率$ c> BLOB 。



因此,从存储空间与编程方便性与CPU使用率之间的权衡。我将决定如下内容:


  1. 最小化您最大的关注所占用的存储空间量?


    • 如果是,压缩JSON数据并将压缩的字节存储为 BLOB



      • 如果没有(如果您无法启用压缩),请压缩JSON数据,将压缩的字节存储为 BLOB ;

      • 否则,继续执行#3。

      >可能对于JSON数据,答案是yes,在这种情况下,您应该将数据存储为文本,并允许Cassandra处理压缩;
    • 否则继续执行#4。

  2. 您想要效率还是方便?


    • 效率;压缩JSON数据并将压缩的字节存储为 BLOB

    • 方便;



One problem with blob for me is, in java, ByteBuffer (which is mapped to blob in cassandra) is not Serializable hence does not work well with EJBs.

Considering the json is fairly large what would be the better type for storing json in cassandra. Is it text or blob?

Does the size of the json matter when deciding the blob vs json?

If it were any other database like oracle, it's common to use blob/clob. But in Cassandra where each cell can hold as large as 2GB, does it matter?

Please consider this question as the choose between text vs blob for this case, instead of sorting to suggestions regarding whether to use single column for json.

解决方案

I don't think there's any benefit for storing the literal JSON data as a BLOB in Cassandra. At best your storage costs are identical, and in general the API's are less convenient in terms of working with BLOB types as they are for working with strings/text.

For instance, if you're using their Java API then in order to store the data as a BLOB using a parameterized PreparedStatement you first need to load it all into a ByteBuffer, for instance by packing your JSON data into an InputStream.

Unless you're dealing with very large JSON snippets that force you to stream your data anyways, that's a fair bit of extra work to get access to the BLOB type. And what would you gain from it? Essentially nothing.

However, I think there's some merit in asking 'Should I store JSON as text, or gzip it and store the compressed data as a BLOB?'.

And the answer to that comes down to how you've configured Cassandra and your table. In particular, as long as you're using Cassandra version 1.1 or later your tables have compression enabled by default. That may be adequate, particularly if your JSON data is fairly uniform across each row.

However, Cassandra's built-in compression is applied table-wide, rather than to individual rows. So you may get a better compression ratio by manually compressing your JSON data before storage, writing the compressed bytes into a ByteBuffer, and then shipping the data into Cassandra as a BLOB.

So it essentially comes down to a tradeoff in terms of storage space vs. programming convenience vs. CPU usage. I would decide the matter as follows:

  1. Is minimizing the amount of storage consumed your biggest concern?
    • If yes, compress the JSON data and store the compressed bytes as a BLOB;
    • Otherwise, proceed to #2.
  2. Is Cassandra's built-in compression available and enabled for your table?
    • If no (and if you can't enable the compression), compress the JSON data and store the compressed bytes as a BLOB;
    • Otherwise, proceed to #3.
  3. Is the data you'll be storing relatively uniform across each row?
    • Probably for JSON data the answer is 'yes', in which case you should store the data as text and let Cassandra handle the compression;
    • Otherwise proceed to #4.
  4. Do you want efficiency, or convenience?
    • Efficiency; compress the JSON data and store the compressed bytes as a BLOB.
    • Convenience; compress the JSON data, base64 the compressed data, and then store the base64-encoded data as text.

这篇关于将json存储为cassandra中的文本对blob的优点或缺点是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆