在 pandas 列重命名 [英] Column dupe renaming in pandas

查看:114
本文介绍了在 pandas 列重命名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下csv文件数据:

I have the following csv file of data:

id,number,id
132605,1,1
132750,2,1

Pandas当前将其重命名为:

Pandas currently renames this to:

       id number id.1
0  132605      1    1
1  132750      2    1

是否可以自定义重命名的方式?例如,我希望:

Is there a way to customize how this is renamed? For example, I would prefer:

           id number id2
0  132605      1    1
1  132750      2    1

推荐答案

rename:使用句点分隔符

假设重复的列标签是仅 实例,其中列名称包含句点(.),则可以将自定义函数与

rename: use period delimiter

Assuming duplicate column labels are the only instances where a column name contains a period (.), you can use a custom function with pd.DataFrame.rename:

from io import StringIO

file = """id,number,id
132605,1,1
132750,2,1"""

def rename_func(x):
    if '.' not in x:
        return x
    name, num = x.split('.')
    return f'{name}{int(num)+1}'

# replace StringIO(file) with 'file.csv'
df = pd.read_csv(StringIO(file))\
       .rename(columns=rename_func)

print(df)

       id  number  id2
0  132605       1    1
1  132750       2    1

csv.reader:可靠的解决方案

使用标准库中的csv模块可以提供可靠的解决方案:

csv.reader: robust solution

A robust solution is possible with the csv module from the standard library:

from collections import defaultdict
import csv

# replace StringIO(file) with open('file.csv', 'r')
with StringIO(file) as fin:
    headers = next(csv.reader(fin))

def rename_duplicates(original_cols):
    count = defaultdict(int)
    for x in original_cols:
        count[x] += 1
        yield f'{x}{count[x]}' if count[x] > 1 else x

df.columns = rename_duplicates(headers)

这篇关于在 pandas 列重命名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆