将 Cassandra OrderedMapSerializedKey 转换为 Python 字典 [英] Transforming a Cassandra OrderedMapSerializedKey to a Python dictionary
问题描述
我在 Cassandra 中有一个列,由列表映射组成,当使用 Python 驱动程序查询时,它返回一个 OrderedMapSerializedKey 结构.这个结构是一个列表映射.我想将整个查询放入熊猫中.
I have a column in Cassandra composed of a map of lists which when queried with the Python driver it returns an OrderedMapSerializedKey structure. This structure is a map of lists. I would like to put the whole query into pandas.
要从 OrderedMapSerializedKey 结构中提取数据,这意味着获取键并将其用作新列的标签并仅保留列表的第一个元素作为值我使用提到的方法 此处 在返回构建的 DataFrame 之前在工厂中进行一些复杂/脏操作.
To extract data from that OrderedMapSerializedKey structure, meaning to get the key and and use it as the label for a new column and keeping only the first element of the list as the value I use the approach mentioned here with some complex/dirty manipulation in the factory before returning the built DataFrame.
在这里问了一个类似的问题,但实际上并没有一个答案.
A similar problem was asked here, without really an answer.
是否有更好的方法将这样的 OrderedMapSerializedKey 结构转换为可以轻松加载到 Pandas DataFrame 中的 Python 字典?
Is there a better way to turn such an OrderedMapSerializedKey structure into a Python dictionary that can be readily loaded into a pandas DataFrame?
推荐答案
我认为最终的解决方案可能是将 OrderedMapSerializedKey
Cassandra 结构作为 dict
存储在您的数据框列中然后您可以将此值/列传输给您想要的任何人.最终,因为您可能不知道 Cassandra 行中的实际键(可能将不同的键插入到行中).
I think an ultimate solution could be to store OrderedMapSerializedKey
Cassandra structure as a dict
in your dataframe column then you could transfer this value / column to anyone you want. Ultimate because you may not know the actual keys in Cassandra rows (maybe different keys are inserted into rows).
所以这里是我测试过的解决方案,你只需要改进 pandas_factory
函数:
So here the solution I've tested, you only have to improve the pandas_factory
funciton:
在之前的解决方案中,我只替换了 Cassandra 数据集的第一(0)行(rows
是元组列表,其中每个元组都是 Cassandra 中的一行)
In previous solution I replaced only the first (0th) row of Cassandra dataset (rows
are list of tuples where every tuple is a row in Cassandra)
from cassandra.util import OrderedMapSerializedKey
def pandas_factory(colnames, rows):
# Convert tuple items of 'rows' into list (elements of tuples cannot be replaced)
rows = [list(i) for i in rows]
# Convert only 'OrderedMapSerializedKey' type list elements into dict
for idx_row, i_row in enumerate(rows):
for idx_value, i_value in enumerate(i_row):
if type(i_value) is OrderedMapSerializedKey:
rows[idx_row][idx_value] = dict(rows[idx_row][idx_value])
return pd.DataFrame(rows, columns=colnames)
您必须插入一些自动检查在 Cassandra 地图字段之前/之后是否至少有一个值或相应地手动修改上面的脚本.
You have to insert some automatic check whether there is minimum one value before / after the Cassandra map field or manually modify above script accordingly.
美好的一天!
这篇关于将 Cassandra OrderedMapSerializedKey 转换为 Python 字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!