pandas pytable:如何指定MultiIndex元素的min_itemsize [英] Pandas pytable: how to specify min_itemsize of the elements of a MultiIndex
问题描述
我将熊猫数据框存储为包含MultiIndex的pytable.
I am storing a pandas dataframe as a pytable which contains a MultiIndex.
MultiIndex的第一级是与用户ID对应的字符串.现在,大多数用户ID的长度为13个字符,但其中一些则为15个字符.当我追加一条包含长userID的记录时,pytables会引发错误,因为它期望使用13个字符的字段.
The first level of the MultiIndex is a string corresponding to a userID. Now, most of the userIDs are 13 characters long, but some of them are 15 characters long. When I append a record containing the long userID, pytables raises an error because it is expecting a 13 characters field.
ValueError('Trying to store a string with len [15] in [user] column but\nthis column has a limit of [13]!\nConsider using min_itemsize to preset the sizes on these columns',)
但是,我不知道如何为MultiIndex的元素设置属性min_itemsize.我已经尝试过{'index': 15}
,但是它不起作用...
However, I do not know how to set the attribute min_itemsize for the elements of a MultiIndex. I have tried {'index': 15}
and it does not work...
我知道我可以通过添加空格来强制所有ID从一开始就长度为15个字符,但我希望避免这种情况.
I know that I could force all IDs to be 15 characters long from the beginning by appending spaces, but I would prefer to avoid this if possible.
谢谢您的帮助!
推荐答案
您需要指定要为其设置min_itemsize
的多索引级别的名称.这是一个示例:
You need to specify the name of the multi-index level that you want to set a min_itemsize
for. Here's an example:
创建2个多索引框架
In [1]: df1 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([['abcdefghijklm','foo'],[1,2]],names=['string','number']))
In [2]: df2 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([['abcdefghijklmop','foo'],[1,2]],names=['string','number']))
In [3]: df1
Out[3]:
0 1
string number
abcdefghijklm 1 0.737976 0.840718
2 0.605763 1.797398
foo 1 1.589278 0.104186
2 0.029387 1.417195
[4 rows x 2 columns]
In [4]: df2
Out[4]:
0 1
string number
abcdefghijklmop 1 0.539507 -1.059085
2 1.263722 -1.773187
foo 1 1.625073 0.078650
2 -0.030827 -1.691805
[4 rows x 2 columns]
创建商店
In [9]: store = pd.HDFStore('test.h5',mode='w')
In [10]: store.append('df1',df1)
这里是长度的计算
In [12]: store.get_storer('df1').table
Out[12]:
/df1/table (Table(4,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),
"number": Int64Col(shape=(), dflt=0, pos=2),
"string": StringCol(itemsize=13, shape=(), dflt='', pos=3)}
byteorder := 'little'
chunkshape := (1456,)
autoindex := True
colindexes := {
"index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
"number": Index(6, medium, shuffle, zlib(1)).is_csi=False,
"string": Index(6, medium, shuffle, zlib(1)).is_csi=False}
这是您现在遇到的错误
In [13]: store.append('df1',df2)
ValueError: Trying to store a string with len [15] in [string] column but
this column has a limit of [13]!
Consider using min_itemsize to preset the sizes on these columns
用级别名称指定min_itemsize
In [14]: store.append('df',df1,min_itemsize={ 'string' : 15 })
In [15]: store.get_storer('df').table
Out[15]:
/df/table (Table(4,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),
"number": Int64Col(shape=(), dflt=0, pos=2),
"string": StringCol(itemsize=15, shape=(), dflt='', pos=3)}
byteorder := 'little'
chunkshape := (1394,)
autoindex := True
colindexes := {
"index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
"number": Index(6, medium, shuffle, zlib(1)).is_csi=False,
"string": Index(6, medium, shuffle, zlib(1)).is_csi=False}
追加
In [16]: store.append('df',df2)
In [19]: store.df
Out[19]:
0 1
string number
abcdefghijklm 1 0.737976 0.840718
2 0.605763 1.797398
foo 1 1.589278 0.104186
2 0.029387 1.417195
abcdefghijklmop 1 0.539507 -1.059085
2 1.263722 -1.773187
foo 1 1.625073 0.078650
2 -0.030827 -1.691805
[8 rows x 2 columns]
In [20]: store.close()
这篇关于 pandas pytable:如何指定MultiIndex元素的min_itemsize的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!