插入和分类一个numpy的阵列到Django的数据库模型架构EAV [英] Insert and categorize a Numpy array into a Django modelled database EAV schema
问题描述
我有格式的熊猫透视表:
I have a Pandas pivot table of the format:
income_category age_category income age
High Middle aged 123,564.235 23.456
Medium Old 18,324.356 65.432
我有在被称为自引用表匹配标签
个类别层次尺寸
。也就是说,
I have a category hierarchy with matching label
s in a self-referencing table called dimension
. Ie,
dimension_id label parent_dimension_id
1 Age categories
2 Young 1
3 Middle aged 1
4 Old 1
...and similarly for income
我真的很挣扎在该行任意一个时间和访问细胞接一行。
I'm really struggling to pick a row at a time and access cells in that row randomly.
我父类ID dimension_id
(在code以下,它已经在 cat_id_age
) 。所以,我想通过numpy的数组迭代,得到相应的类别 dimension_id
该行,并将其插入到一个值表,其对应的值一起。但是我不知道如何做到这一点Pythonically或Djangonically。 (只有几类,所以我觉得下面的字典方法用于查找 dimension_id
最好。)在我心中反复的过程是:
I have the parent category id dimension_id
(in the code below it is already in cat_id_age
). So I want to iterate through the Numpy array, getting the matching category dimension_id
for that row, and insert it into a value table along with its corresponding value. However I've no idea how to do this Pythonically or Djangonically. (There are only a few categories so I think the Dictionary approach below for looking up dimension_id
is best.) To my iterative mind the process is:
# populate a Dictionary to find dimension_ids
age_dims = Dimension.objects.filter(parent_id=cat_id_age).values('label', 'id')
for row in Numpy_array:
dim_id = Dimension.get(row.age_category)
# Or is the Dict approach incorrect? I'm trying to do: SELECT dimension_id FROM dimension WHERE parent_dimension_id=cat_id_age AND label=row.age_category
# Djagonically? dim = Dimension.objects.get(parent_id=cat_id_age, label=row.age_category)
# Then insert categorized value, ie, INSERT INTO float_value (value, dimension_id) VALUES (row.age, dimension_id)
float_val = FloatValue(value=row.age, dimension_id=dim_id)
float_val.save()
...then repeat for income_category and income.
不过我与迭代这样的挣扎 - 这可能是我唯一的问题,但我已经包括了其他沟通什么,我试图做的,因为我似乎经常范式远的Python(如某物像 cursor.executemany(插入值(?,)?,图(元组,numpy_arr [X:]了ToList()))?
)
任何指针真的AP preciated。 (我使用Django 1.7和Python 3.4)。
Any pointers really appreciated. (I'm using Django 1.7 and Python 3.4.)
推荐答案
Anzel回答问题迭代<一个href=\"http://stackoverflow.com/questions/28097319/how-can-i-iterate-over-pandas-pivot-table-a-multi-index-dataframe\"标题=点击这里>这里 - 使用的熊猫to_csv()函数。我的字典语法也是错误的。因此,我最终的解决办法是:
Anzel answered the iterating problem here - use the Pandas to_csv() function. My dictionary syntax was also wrong. My final solution was therefore:
# populate a Dictionary to find dimension_ids for category labels
parent_dimension_age = Dimension.objects.get(name='Age')
parent_dimension_income = Dimension.objects.get(name='Income')
dims_age = dict([ (d.name, d.id) for d in Dimension.objects.filter(parent_id=parent_dimension_age.id) ])
dims_income = dict([ (d.name, d.id) for d in Dimension.objects.filter(parent_id=parent_dimension_income.id) ])
# Retrieves a row at a time into a comma delimited string
for line in pandas_pivottable.to_csv(header=False, index=True, sep='\t').split('\n'):
if line:
# row[0] = income category, row[1] = age category, row[2] = age, row[3] = income
row = line.split('\t')
entity = Entity(name='data pivot row', dataset_id=dataset.id)
entity.save()
# dims_age.get(row[1]) gets the ID for the category whose name matches the contents of row[1]
age_val = FloatValue(value=row[2], entity_id=entity.id, attribute_id=attrib_age.id, dimension_id=dims_age.get(row[1]))
age_val.save()
income_val = FloatValue(value=row[3], entity_id=entity.id, attribute_id=attrib_income.id, dimension_id=dims_income.get(row[0]))
income_val.save()
有关更多的实体 - 属性 - 值(EAV)架构看的维基百科页面,(如果你正在考虑它看到的Django的EAV扩展)。然而,在这个项目的下一个迭代,我将与<一来取代它href=\"http://stackoverflow.com/questions/22654170/explanation-of-jsonb-introduced-by-postgresql\">postgresql's新JSONB类型。这有望使数据更清晰,同样执行或更好。
For more on the Entity-Attribute-Value (EAV) schema see the Wikipedia page, (if you are considering it see the Django-EAV extension). In the next iteration of this project however, I will be replacing it with postgresql's new JSONB type. This promises to make the data more legible and perform equally or better.
这篇关于插入和分类一个numpy的阵列到Django的数据库模型架构EAV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!