pandas 数据框列名称:删除特殊字符 [英] pandas dataframe column name: remove special character

查看:106
本文介绍了 pandas 数据框列名称:删除特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一些小丑制作了一个 Lotus 数据库/小程序,用于跟踪我们公司的工程问题.开玩笑的是,关键信息是用一个特殊字符命名的……一个数字符号(井号、井号、\u0023).

Some joker made a Lotus database/applet thingy for tracking engineering issues in our company. The joke is that the key piece of information was named with a special character... a number sign (hash tag, pound sign, \u0023).

缩写示例:

KA#         Issue Date      Current Position
27144       1/9/2014        Accounting
27194       12/20/2012      Engineering
32474       4/21/2008       Engineering
32623-HOLD  4/25/2016       Engineering
32745       11/13/2012      SEPE
32812       10/30/2013      Engineering
32817       12/7/2012       Purchasing
32839       1/8/2013        SEPE

我将此表(4K 行,15 列)输出到一个 csv 文件并在 python3 中作为 Pandas 数据帧进行处理.

I output this table (4K rows, 15 columns) to a csv file and process in python3 as a pandas dataframe.

我生成各种输出.如果我使用类似的东西:

I generate various outputs. If I use something like:

df.iloc[:,[0,3,1,8,9,10]]

我得到了适当的输出,并且键列显示为 "KA#".(当我说关键列"时,我的意思是最重要的"......不是索引".我保留一个序列索引)

I get appropriate output and the key column shows up as "KA#". (When I say "key column", I mean "most important"... NOT "index". I keep a serial index)

不幸的是,人们有时会在我导出到 csv 之间弄乱 Lotus 中的列顺序,所以我不能保证 "KA#" 将是任何特定的列号.我想使用列名:

Unfortunately, people sometimes mess with the column order in Lotus between my exports to csv so I can not guarantee that "KA#" will be any particular column number. I would like to use column names:

df.loc[:,["KA#","Issue Date","Current Position"]]

但是 "KA#" 列填充了 NaN.

But the "KA#" column is filled with NaN's.

感谢您提供的任何帮助.

Thanks for any help you can offer.

最后,如果我尝试将 "KA#" 重命名为简单的 "KA":

Finally, if I try to rename "KA#" to simply "KA":

df['KA#'].name = 'KA'

抛出一个 KeyError 和

throws a KeyError and

df = df.rename(columns={"KA#": "ka"})

完全被忽略.该列显示为 "KA#".

is completely ignored. The column shows up as "KA#".

谁能想出一种方法来摆脱或处理那个符号?在这一点上,我什至选择使用正则表达式.

Can anyone think of a way to get rid of or handle that symbol? I'd even settle for a regex at this point.

推荐答案

use str.replace:
df.columns=df.columns.str.replace('#','')

您可以在文档中查看.

这篇关于 pandas 数据框列名称:删除特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆