将任何其他列追加到前三列 [英] Append any further columns to the first three columns

查看:77
本文介绍了将任何其他列追加到前三列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在复制格式错误的Excel工作表摘录(带有pd.read_clipboard).这大约是120列宽,具有不同的列长.在每三列之后,应将下一列追加到第一列之后.所以我应该以三列结尾.

I am copying an excerpt of a badly formatted Excel sheet (with pd.read_clipboard). This is about 120 columns wide, with different column lengths. After every third column the next column should be appended to the very first one. So I should end up with three columns.

我设置了一个示例数据框:

I setup a sample dataframe:

df = pd.DataFrame({
    "1": np.random.randint(900000000, 999999999, size=5),
    "2": np.random.choice( ["A","B","C", np.nan], 5),
    "3": np.random.choice( [np.nan, 1], 5),

    "4": np.random.randint(900000000, 999999999, size=5),
    "5": np.random.choice( ["A","B","C", np.nan], 5),
    "6": np.random.choice( [np.nan, 1], 5)
})

结果如下:

  1         2   3   4         5   6
0 925846412 nan 1.0 994235729 nan NaN 
1 991877917 B   1.0 970766032 nan NaN 
2 931608603 B   NaN 937096948 B   NaN 
3 977083128 A   NaN 974190653 B   1.0 
4 937344792 nan NaN 972948910 B   1.0 

这是我到目前为止所拥有的:

This is what I have so far:

col_counter = 0
df_neu = pd.DataFrame(columns=["A", "B", "C"])

for column in df.columns:
    if col_counter == 3:
        col_counter = 0

    if col_counter == 0:
        # set_trace()
        df_neu["A"] = df_neu["A"].append(df[column]).reset_index(drop = True)
    elif col_counter == 1:
        df_neu["B"] = df_neu["B"].append(df[column]).reset_index(drop = True)
    elif col_counter == 2:
        df_neu["C"] = df_neu["C"].append(df[column]).reset_index(drop = True)

    col_counter +=1

所需的结果将是:

  A         B   C
0 925846412 nan 1.0
1 991877917 B   1.0
2 931608603 B   NaN 
3 977083128 A   NaN
4 937344792 nan NaN 
5 994235729 nan NaN 
6 970766032 nan NaN 
7 937096948 B   NaN 
8 974190653 B   1.0 
9 972948910 B   1.0

但是我收到以下信息:

  A         B   C
0 925846412 NaN NaN 
1 991877917 NaN NaN 
2 931608603 NaN NaN 
3 977083128 NaN NaN 
4 937344792 NaN NaN 

因此,仅会添加最初迭代中的第一列.其他任何列都将被忽略.

So only the first column from the very first iterations gets appended. Any other columns are ignored.

所以我的问题是:

  1. 我怎么了?
  2. 我该如何解决?
  3. 是否有更好的方法?这样做的感觉就像是一种不太性感"的方式.

推荐答案

您可以按整数在列中创建MultiIndex,然后按按列长创建的数组对模进行模除,然后按

You can create MultiIndex in columns by integer and modulo division by array created by length of columns and then reshape by unstack, sort_index and last reset_index for remove MultiIndex:

np.random.seed(2019)

df = pd.DataFrame({
    "1": np.random.randint(900000000, 999999999, size=5),
    "2": np.random.choice( ["A","B","C", np.nan], 5),
    "3": np.random.choice( [np.nan, 1], 5),

    "4": np.random.randint(900000000, 999999999, size=5),
    "5": np.random.choice( ["A","B","C", np.nan], 5),
    "6": np.random.choice( [np.nan, 1], 5)
})
print (df)
           1    2    3          4  5    6
0  960189042    B  NaN  991581392  A  1.0
1  977655199  nan  1.0  964195250  A  1.0
2  961771966    A  NaN  969007327  B  1.0
3  955308022    C  1.0  973316485  A  NaN
4  933277976    A  1.0  976749175  A  NaN


arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]

df = df.stack(0).sort_index(level=[1, 0]).reset_index(drop=True)
df.columns = ['A','B','C']
print (df)
           A    B    C
0  960189042    B  NaN
1  977655199  nan  1.0
2  961771966    A  NaN
3  955308022    C  1.0
4  933277976    A  1.0
5  991581392    A  1.0
6  964195250    A  1.0
7  969007327    B  1.0
8  973316485    A  NaN
9  976749175    A  NaN

您的解决方案在附加到Series并最后由构造器创建DataFrame的情况下有效:

Your solution working if append to Series and last create DataFrame by contructor:

col_counter = 0
a,b,c = pd.Series(),pd.Series(),pd.Series()

for column in df.columns:
    if col_counter == 3:
        col_counter = 0

    if col_counter == 0:
        # set_trace()
        a = a.append(df[column]).reset_index(drop = True)
    elif col_counter == 1:
        b = b.append(df[column]).reset_index(drop = True)
    elif col_counter == 2:
        c = c.append(df[column]).reset_index(drop = True)

    col_counter +=1

df_neu = pd.DataFrame({"A":a, "B":b, "C":c})
print (df_neu)
           A    B    C
0  960189042    B  NaN
1  977655199  nan  1.0
2  961771966    A  NaN
3  955308022    C  1.0
4  933277976    A  1.0
5  991581392    A  1.0
6  964195250    A  1.0
7  969007327    B  1.0
8  973316485    A  NaN
9  976749175    A  NaN

这篇关于将任何其他列追加到前三列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆