重命名数据框中的列而没有其他特定列 [英] Renaming columns in dataframe w.r.t another specific column

查看:92
本文介绍了重命名数据框中的列而没有其他特定列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:大型excel映射文件,大约100列和200行转换为.csv。然后存储为数据框。 df的一般格式如下。

BACKGROUND: Large excel mapping file with about 100 columns and 200 rows converted to .csv. Then stored as dataframe. General format of df as below.

从一个命名列(例如Sales)开始,随后的两列需要重命名。对于excel文件中的所有列,都需要重复此模式。

Starts with a named column (e.g. Sales) and following two columns need to be renamed. This pattern needs to be repeated for all columns in excel file.

本质上:将随后的2列链接到它们前面的父列。

Essentially: Link the subsequent 2 columns to the "parent" one preceding them.

 Sales Unnamed: 2  Unnamed: 3  Validation Unnamed: 5 Unnamed: 6
0       Commented  No comment             Commented  No comment                                   
1     x                                             x                        
2                            x          x                                                
3                x                                             x 

方法解决方案:我认为可以从索引开始(例如,Sales列的索引1 = x),然后将以下两列重命名为(x + 1)和(x + 2)。
然后输入下一个命名列的文本(例如Validation),依此类推。

APPROACH FOR SOLUTION: I assume it would be possible to begin with an index (e.g. index of Sales column 1 = x) and then rename the following two columns as (x+1) and (x+2). Then take in the text for the next named column (e.g. Validation) and so on.

我知道 rename()用于数据框。

但是,不确定如何迭代地更改更改列标题。

BUT, not sure how to apply the iteratively for changing column titles.

期望的输出:未命名2& 3分别更改为Sales_Commented和Sales_No_Comment。

EXPECTED OUTPUT: Unnamed 2 & 3 changed to Sales_Commented and Sales_No_Comment, respectively.

类似地未命名为5& 6再次更改为Validation_Commented和Validation_No_Comment。

Similarly Unnamed 5 & 6 change to Validation_Commented and Validation_No_Comment.

再次对文件的所有100列重复一次。

Again, repeated for all 100 columns of file.

编辑:由于文件中有大量cols,因此创建一个手动列表来存储列名并不是一个可行的解决方案。我已经在SO的其他地方看到了这一点。同样,列和部门(销售,验证)的数量会随着映射在不同的excel文件中发生变化。因此,需要动态解决方案。

Due to the large number of cols in the file, creating a manual list to store column names is not a viable solution. I have already seen this elsewhere on SO. Also, the amount of columns and departments (Sales, Validation) changes in different excel files with the mapping. So a dynamic solution is required.

  Sales Sales_Commented Sales_No_Comment Validation Validation_Commented Validation_No_Comment
0             Commented       No comment                       Commented            No comment
1     x                                                                x                      
2                                      x                                                      
3                     x                           x                                          x

作为python新手,我考虑了一种可行的方法解决方案使用我所掌握的有限知识,但不确定如何将其作为可行的代码。

As a python novice, I considered a possible approach for the solution using the limited knowledge I have, but not sure what this would look like as a workable code.

我将感谢所有帮助和指导。

I would appreciate all help and guidance.

推荐答案

1。您需要制作一个包含所需列名的列表。

2。以旧列名作为键,新列名作为值。

3.使用df.rename(列= your_dictionary)。

1.You need is to make a list with the column names that you would want.
2.Make it a dict with the old column names as the keys and new column name as the values.
3. Use df.rename(columns = your_dictionary).

import numpy as np
import pandas as pd
df = pd.read_excel("name of the excel file",sheet_name = "name of sheet")


print(df.head()) 
Output>>>
    Sales   Unnamed : 2     Unnamed : 3     Validation  Unnamed : 5     Unnamed : 6     Unnamed :7
0   NaN     Commented   No comment  NaN     Comment     No comment  Extra
1   1.0     2   1   1.0     1   1   1
2   3.0     1   1   1.0     1   1   1
3   4.0     3   4   5.0     5   6   6
4   5.0     1   1   1.0     21  3   6

# get new names based on the values of a previous named column
new_column_names = []
counter = 0
for col_name in df.columns:

    if (col_name[:7].strip()=="Unnamed"):

        new_column_names.append(base_name+"_"+df.iloc[0,counter].replace(" ", "_"))
    else:
        base_name = col_name
        new_column_names.append(base_name)

    counter +=1


# convert to dict key pair
dictionary = dict(zip(df.columns.tolist(),new_column_names))

# rename columns
df = df.rename(columns=dictionary)

# drop first column
df = df.iloc[1:].reset_index(drop=True)

print(df.head())
Output>>
    Sales   Sales_Commented     Sales_No_comment    Validation  Validation_Comment  Validation_No_comment   Validation_Extra
0   1.0     2   1   1.0     1   1   1
1   3.0     1   1   1.0     1   1   1
2   4.0     3   4   5.0     5   6   6
3   5.0     1   1   1.0     21  3   6

这篇关于重命名数据框中的列而没有其他特定列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆