将Cassandra OrderedMapSerializedKey转换为Python字典 [英] Transforming a Cassandra OrderedMapSerializedKey to a Python dictionary

查看:176
本文介绍了将Cassandra OrderedMapSerializedKey转换为Python字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Cassandra中有一列,由一列列表组成,当使用Python驱动程序查询时,它返回一个OrderedMapSerializedKey结构.此结构是列表的映射.我想将整个查询放入大熊猫中.

I have a column in Cassandra composed of a map of lists which when queried with the Python driver it returns an OrderedMapSerializedKey structure. This structure is a map of lists. I would like to put the whole query into pandas.

要从该OrderedMapSerializedKey结构中提取数据,这意味着获取键并将其用作新列的标签,并且仅将列表的第一个元素用作值,我使用提到的方法

To extract data from that OrderedMapSerializedKey structure, meaning to get the key and and use it as the label for a new column and keeping only the first element of the list as the value I use the approach mentioned here with some complex/dirty manipulation in the factory before returning the built DataFrame.

有人在此处提出了类似的问题,一个答案.

A similar problem was asked here, without really an answer.

是否有更好的方法将这样的OrderedMapSerializedKey结构转换为可以轻松加载到pandas DataFrame中的Python字典?

Is there a better way to turn such an OrderedMapSerializedKey structure into a Python dictionary that can be readily loaded into a pandas DataFrame?

推荐答案

我认为最终的解决方案可能是将OrderedMapSerializedKey Cassandra结构作为dict存储在数据框列中,然后可以将此值/列传输给任何人你要.最终,因为您可能不知道Cassandra行中的实际键(可能在行中插入了不同的键).

I think an ultimate solution could be to store OrderedMapSerializedKey Cassandra structure as a dict in your dataframe column then you could transfer this value / column to anyone you want. Ultimate because you may not know the actual keys in Cassandra rows (maybe different keys are inserted into rows).

因此,在这里,我测试过的解决方案只需要改进

So here the solution I've tested, you only have to improve the pandas_factory funciton:

在先前的解决方案中,我仅替换了Cassandra数据集的第一行(第0行)(rows是元组列表,其中每个元组都是Cassandra中的一行)

In previous solution I replaced only the first (0th) row of Cassandra dataset (rows are list of tuples where every tuple is a row in Cassandra)

from cassandra.util import OrderedMapSerializedKey

def pandas_factory(colnames, rows):

    # Convert tuple items of 'rows' into list (elements of tuples cannot be replaced)
    rows = [list(i) for i in rows]

    # Convert only 'OrderedMapSerializedKey' type list elements into dict
    for idx_row, i_row in enumerate(rows):

        for idx_value, i_value in enumerate(i_row):

            if type(i_value) is OrderedMapSerializedKey:

                rows[idx_row][idx_value] = dict(rows[idx_row][idx_value])

    return pd.DataFrame(rows, columns=colnames)

您必须插入一些自动检查功能,以检查在Cassandra映射字段之前/之后是否存在最小的一个值,或相应地手动修改以上脚本.

You have to insert some automatic check whether there is minimum one value before / after the Cassandra map field or manually modify above script accordingly.

美好的一天!

这篇关于将Cassandra OrderedMapSerializedKey转换为Python字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆