pandas HDF5选择非自然名称列上的位置 [英] Pandas HDF5 Select with Where on non natural-named columns

查看:326
本文介绍了 pandas HDF5选择非自然名称列上的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我持续不断的大熊猫/HDF5问题热潮中,我遇到了以下问题:

in my continuing spree of exotic pandas/HDF5 issues, I encountered the following:

我有一系列非自然的命名列(nb:由于充分的理由,负数是系统" id等),通常不会出现问题:

I have a series of non-natural named columns (nb: because of a good reason, with negative numbers being "system" ids etc), which normally doesn't give an issue:

fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'])

但是,我的select语句确实覆盖了它:

however, my select statement does fall over it:

>>> fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'], where=[('a-6', '=', [0, 25, 28])])
blablabla
File "/srv/www/li/venv/local/lib/python2.7/site-packages/tables/table.py", line 1251, in _required_expr_vars
    raise NameError("name ``%s`` is not defined" % var)
NameError: name ``a`` is not defined

有什么办法可以解决它?我可以将负值从"a-1"重命名为"a_1",但这意味着重新加载系统中的所有数据.相当多! :)

Is there any way to work around it? I could rename my negative value from "a-1" to a "a_1" but that means reloading all of the data in my system. Which is rather much! :)

非常欢迎提出建议!

推荐答案

下面是一个测试表

In [1]: df = DataFrame({ 'a-6' : [1,2,3,np.nan] })

In [2]: df
Out[2]: 
   a-6
0    1
1    2
2    3
3  NaN

In [3]: df.to_hdf('test.h5','df',mode='w',table=True)

 In [5]: df.to_hdf('test.h5','df',mode='w',table=True,data_columns=True)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_kind'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_dtype'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)

有一个非常好的方法,但是可以将其构建到代码本身中.您可以按以下方式对列名称进行变量替换.这是现有的例程(在母版中)

There is a very way, but would to build this into the code itself. You can do a variable substitution on the column names as follows. Here is the existing routine (in master)

   def select(self):
        """
        generate the selection
        """
        if self.condition is not None:
            return self.table.table.readWhere(self.condition.format(), start=self.start, stop=self.stop)
        elif self.coordinates is not None:
            return self.table.table.readCoordinates(self.coordinates)
        return self.table.table.read(start=self.start, stop=self.stop)

相反,您可以这样做

(Pdb) self.table.table.readWhere("(x>2.0)",
      condvars={ 'x' : getattr(self.table.table.cols,'a-6')})
array([(2, 3.0)], 
      dtype=[('index', '<i8'), ('a-6', '<f8')])

例如通过将x替换为列引用,您可以获取数据.

e.g. by subsituting x with the column reference, you can get the data.

这可以在检测到无效的列名时完成,但这非常棘手.

This could be done on detection of invalid column names, but is pretty tricky.

不幸的是,我建议重命名您的列.

Unfortunately I would suggest renaming your columns.

这篇关于 pandas HDF5选择非自然名称列上的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆