Pandas:查询字符串,其中列名包含特殊字符 [英] Pandas: query string where column name contains special characters

查看:981
本文介绍了Pandas:查询字符串,其中列名包含特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用具有以下结构的数据框:

I am working with a data frame that has a structure something like the following:

In[75]: df.head(2)
Out[75]: 
  statusdata             participant_id association  latency response  \
0   complete  CLIENT-TEST-1476362617727       seeya      715  dislike   
1   complete  CLIENT-TEST-1476362617727      welome      800     like   

   stimuli elementdata statusmetadata demo$gender  demo$question2  \
0  Sample B    semi_imp       complete        male              23   
1  Sample C    semi_imp       complete      female              23   

我希望能够对demo$gender列运行查询字符串.

I want to be able to run a query string against the column demo$gender.

df.query("demo$gender=='male'")

但是$符号有问题.如果用另一个定界符(如-)替换$符号,则问题仍然存在.我可以修复我的查询字符串以避免此问题.我不希望重命名这些列,因为它们与我的应用程序的其他部分紧密对应.

But this has a problem with the $ sign. If I replace the $ sign with another delimited (like -) then the problem persists. Can I fix up my query string to avoid this problem. I would prefer not to rename the columns as these correspond tightly with other parts of my application.

我真的想坚持使用查询字符串,因为它是由技术堆栈的另一个组件提供的,而创建解析器对于看似简单的问题将是沉重的负担.

I really want to stick with a query string as it is supplied by another component of our tech stack and creating a parser would be a heavy lift for what seems like a simple problem.

谢谢.

推荐答案

对此感兴趣的是我用来完成任务的简单过程:

For the interested here is a simple proceedure I used to accomplish the task:

# Identify invalid column names
invalid_column_names = [x for x in list(df.columns.values) if not x.isidentifier() ]

# Make replacements in the query and keep track
# NOTE: This method fails if the frame has columns called REPL_0 etc.
replacements = dict()
for cn in invalid_column_names:
    r = 'REPL_'+ str(invalid_column_names.index(cn))
    query = query.replace(cn, r)
    replacements[cn] = r

inv_replacements = {replacements[k] : k for k in replacements.keys()}

df = df.rename(columns=replacements) # Rename the columns
df  = df.query(query) # Carry out query

df = df.rename(columns=inv_replacements)

这等同于标识无效的列名称,转换查询并重命名列.最后,我们执行查询,然后将列名称转换回去.

Which amounts to identifying the invalid column names, transforming the query and renaming the columns. Finally we perform the query and then translate the column names back.

@chrisb的回答将我引向正确的方向

Credit to @chrisb for their answer that pointed me in the right direction

这篇关于Pandas:查询字符串,其中列名包含特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆