如何在Cassandra中处理动态列 [英] How to handle Dynamic columns in Cassandra

查看:67
本文介绍了如何在Cassandra中处理动态列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过Python脚本将JSON数据加载到Cassandra表中.但是很少有Json文件具有比平常更多的列.目前,我已经创建了具有100列的表格,并且能够插入所有表格.但是有机会很少有json文件具有超过100列.如何处理呢?如果Json的列多于表,有什么方法可以创建动态列?

I am loading JSON data to Cassandra table through Python script. But few Json files have more columns than usual. Currently i have created table with 100 columns and able to insert all. But there are chances that few json files will have more than 100 columns. How to handle this? Is there any way we can create dynamic columns if Json has more columns than table?

推荐答案

如果使用的是CQL,则需要在插入数据之前定义所有列.从理论上讲,您可以使用 ALTER TABLE add ... 添加新列,但是通常不建议以编程方式进行操作,因为这可能会导致架构分歧和其他问题.

If you're using CQL, then you need to define all columns before inserting the data. Theoretically you can use ALTER TABLE add ... to add new columns, but usually it's not recommended to do programmatically as it may cause schema disagreement, and other problems.

您可以通过以下方法解决此问题:

You may workaround this problem as following:

  1. 除了主键和最常用的列外,还以文本形式存储JSON,然后在读取数据时解析数据(甚至可以在
  1. Store the JSON as text in addition to primary key and the most used columns, and then parse data when you read it (it could be even done automatically in Java driver 3.x by using extra codecs;
  2. Store data in the maps (best to use frozen map if you won't update individual values) with key as text, and value corresponding to actual value type - int, text, etc. For example:

create table test (
  pk1 ..,
  pk2 ..,
  pkN ..,
  imap frozen<map<text, int>,
  tmap frozen<map<text, text>,
  ...
  primary key(pk1, pk2, ...)
);

,然后在您的代码中,按类型将各列分开,然后插入到对应的地图中.

and then in your code, separate columns by types, and insert into corresponding map.

这篇关于如何在Cassandra中处理动态列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆