将Numpy数组插入并分类到Django建模的数据库EAV模式中 [英] Insert and categorize a Numpy array into a Django modelled database EAV schema

查看：254 发布时间：2017/5/31 21:25:50 python arrays django numpy

本文介绍了将Numpy数组插入并分类到Django建模的数据库EAV模式中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个熊猫数据透视表，格式如下：

  income_category age_category收入年龄
高中年人123,564.235 23.456 
中老18,324.356 65.432

我有一个类别层次结构匹配在维度的自引用表中标记。即，

  dimension_id label parent_dimension_id 
 1年龄类别
 2年轻1 
 3中年1 
 4旧1 
 
 ...同样收入

<我真的很努力地一次选择一行，并随机访问该行中的单元格。

我有父类别ID dimension_id （在下面的代码中它已经在 cat_id_age ）。所以我想迭代Numpy数组，获得该行的匹配类别 dimension_id ，并将其插入一个值表及其相应的值。但是我不知道如何用Python或Django来做这个。（只有几个类别，所以我认为下面的Dictionary方法查找 dimension_id 是最好的。）对于我的迭代头脑，过程是：

 ＃填充一个字典以查找维__ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $'''''''''''''''' ）
 
在Numpy_array中的行：
 
 dim_id = Dimension.get（row.age_category）
 
＃还是Dict方法不正确？我试图做：SELECT dimension_id FROM dimension WHERE parent_dimension_id = cat_id_age AND label = row.age_category 
＃Djagonically？ dim = Dimension.objects.get（parent_id = cat_id_age，label = row.age_category）
 
＃然后插入分类值，即INSERT INTO float_value（value，dimension_id）VALUES（row.age，dimension_id） 
 float_val = FloatValue（value = row.age，dimension_id = dim_id）
 float_val.save（）
 
 ...然后重复收入类别和收入。

然而，我正在努力反复这样 - 这可能是我唯一的问题，但我已经包括其余的是传达我想要做的事情，因为我经常看起来像Python的一个范例（例如，sth like cursor.executemany（insert values（？，？，？） map（tuple，numpy_arr [x：]。tolist（）））？）

任何指针都非常感激。（我正在使用Django 1.7和Python 3.4。）

解决方案

Anzel回答了迭代问题 here - 使用熊猫to_csv（）函数。我的字典语法也是错误的。因此，我的最终解决方案是：

 ＃填充字典以查找类别标签的维_ 
 parent_dimension_age = Dimension.objects。 get（name ='Age'）
 parent_dimension_income = Dimension.objects.get（name ='Income'）
 dims_age = dict（[（d.name，d.id）for d在Dimension.objects .filter（parent_id = parent_dimension_age.id）]）
 dims_income = dict（[（d.name，d.id）for d in Dimension.objects.filter（parent_id = parent_dimension_income.id）]）
 
＃一次检索一行到逗号分隔的字符串
，用于pandas_pivottable.to_csv（header = False，index = True，sep ='\t'）中的行split（'\\\
 '）：
如果行：
＃row [0] =收入类别，行[1] =年龄类别，行[2] =年龄，行[3] =收入
 row = line.split（'\t'）
 entity = Entity（name ='data pivot row'，dataset_id = dataset.id）
 entity.save（）
＃dims_age.get（行[1]）获取类别的ID e名称匹配行[1]的内容
 age_val = FloatValue（value = row [2]，entity_id = entity.id，attribute_id = attrib_age.id，dimension_id = dims_age.get（row [1]））
 age_val.save（）
 income_val = FloatValue（value = row [3]，entity_id = entity.id，attribute_id = attrib_income.id，dimension_id = dims_income.get（row [0]））$ b $有关实体 - 属性值（EAV）模式的更多信息，请参见 
 
 
 
 $ b 维基百科页面上的href =http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model =nofollow noreferrer（如果您正在考虑它参见 Django-EAV扩展）。然而，在本项目的下一次迭代中，我将用 postgresql的新JSONB类型替换它/ A>。这样做有助于使数据更清晰，性能更好，更好。
 
I have a Pandas pivot table of the format:
income_category     age_category      income         age
High                Middle aged       123,564.235    23.456
Medium              Old               18,324.356     65.432
I have a category hierarchy with matching labels in a self-referencing table called dimension. Ie, 
dimension_id       label             parent_dimension_id
1                  Age categories
2                  Young             1
3                  Middle aged       1
4                  Old               1

...and similarly for income
I'm really struggling to pick a row at a time and access cells in that row randomly.

I have the parent category id dimension_id (in the code below it is already in cat_id_age). So I want to iterate through the Numpy array, getting the matching category dimension_id for that row, and insert it into a value table along with its corresponding value. However I've no idea how to do this Pythonically or Djangonically. (There are only a few categories so I think the Dictionary approach below for looking up dimension_id is best.) To my iterative mind the process is:
# populate a Dictionary to find dimension_ids
age_dims = Dimension.objects.filter(parent_id=cat_id_age).values('label', 'id')

for row in Numpy_array:

    dim_id = Dimension.get(row.age_category)

    # Or is the Dict approach incorrect? I'm trying to do: SELECT dimension_id FROM dimension WHERE parent_dimension_id=cat_id_age AND label=row.age_category
    # Djagonically? dim = Dimension.objects.get(parent_id=cat_id_age, label=row.age_category)

    # Then insert categorized value, ie, INSERT INTO float_value (value, dimension_id) VALUES (row.age, dimension_id)
    float_val = FloatValue(value=row.age, dimension_id=dim_id)
    float_val.save()

...then repeat for income_category and income.
However I'm struggling with iterating like this - that may be my only problem but I've included the rest to communicate what I'm trying to do as I often seem a paradigm away Python (eg, sth like cursor.executemany("""insert values(?, ?, ?)""", map(tuple, numpy_arr[x:].tolist()))?).

Any pointers really appreciated. (I'm using Django 1.7 and Python 3.4.)
 解决方案 Anzel answered the iterating problem here - use the Pandas to_csv() function. My dictionary syntax was also wrong. My final solution was therefore:
# populate a Dictionary to find dimension_ids for category labels
parent_dimension_age = Dimension.objects.get(name='Age')
parent_dimension_income = Dimension.objects.get(name='Income')
dims_age = dict([ (d.name, d.id) for d in Dimension.objects.filter(parent_id=parent_dimension_age.id) ])
dims_income = dict([ (d.name, d.id) for d in Dimension.objects.filter(parent_id=parent_dimension_income.id) ])

# Retrieves a row at a time into a comma delimited string
for line in pandas_pivottable.to_csv(header=False, index=True, sep='\t').split('\n'):
    if line:
        # row[0] = income category, row[1] = age category, row[2] = age, row[3] = income
        row = line.split('\t')
        entity = Entity(name='data pivot row', dataset_id=dataset.id)
        entity.save()
        # dims_age.get(row[1]) gets the ID for the category whose name matches the contents of row[1]
        age_val = FloatValue(value=row[2], entity_id=entity.id, attribute_id=attrib_age.id, dimension_id=dims_age.get(row[1]))
        age_val.save()
        income_val = FloatValue(value=row[3], entity_id=entity.id, attribute_id=attrib_income.id, dimension_id=dims_income.get(row[0]))
        income_val.save()
For more on the Entity-Attribute-Value (EAV) schema see the Wikipedia page, (if you are considering it see the Django-EAV extension). In the next iteration of this project however, I will be replacing it with postgresql's new JSONB type. This promises to make the data more legible and perform equally or better.

                        这篇关于将Numpy数组插入并分类到Django建模的数据库EAV模式中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

将Numpy数组插入并分类到Django建模的数据库EAV模式中 [英] Insert and categorize a Numpy array into a Django modelled database EAV schema

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将Numpy数组插入并分类到Django建模的数据库EAV模式中 [英] Insert and categorize a Numpy array into a Django modelled database EAV schema

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭