将任何其他列追加到前三列 [英] Append any further columns to the first three columns
问题描述
我正在复制格式错误的Excel工作表摘录(带有pd.read_clipboard).这大约是120列宽,具有不同的列长.在每三列之后,应将下一列追加到第一列之后.所以我应该以三列结尾.
I am copying an excerpt of a badly formatted Excel sheet (with pd.read_clipboard). This is about 120 columns wide, with different column lengths. After every third column the next column should be appended to the very first one. So I should end up with three columns.
我设置了一个示例数据框:
I setup a sample dataframe:
df = pd.DataFrame({
"1": np.random.randint(900000000, 999999999, size=5),
"2": np.random.choice( ["A","B","C", np.nan], 5),
"3": np.random.choice( [np.nan, 1], 5),
"4": np.random.randint(900000000, 999999999, size=5),
"5": np.random.choice( ["A","B","C", np.nan], 5),
"6": np.random.choice( [np.nan, 1], 5)
})
结果如下:
1 2 3 4 5 6
0 925846412 nan 1.0 994235729 nan NaN
1 991877917 B 1.0 970766032 nan NaN
2 931608603 B NaN 937096948 B NaN
3 977083128 A NaN 974190653 B 1.0
4 937344792 nan NaN 972948910 B 1.0
这是我到目前为止所拥有的:
This is what I have so far:
col_counter = 0
df_neu = pd.DataFrame(columns=["A", "B", "C"])
for column in df.columns:
if col_counter == 3:
col_counter = 0
if col_counter == 0:
# set_trace()
df_neu["A"] = df_neu["A"].append(df[column]).reset_index(drop = True)
elif col_counter == 1:
df_neu["B"] = df_neu["B"].append(df[column]).reset_index(drop = True)
elif col_counter == 2:
df_neu["C"] = df_neu["C"].append(df[column]).reset_index(drop = True)
col_counter +=1
所需的结果将是:
A B C
0 925846412 nan 1.0
1 991877917 B 1.0
2 931608603 B NaN
3 977083128 A NaN
4 937344792 nan NaN
5 994235729 nan NaN
6 970766032 nan NaN
7 937096948 B NaN
8 974190653 B 1.0
9 972948910 B 1.0
但是我收到以下信息:
A B C
0 925846412 NaN NaN
1 991877917 NaN NaN
2 931608603 NaN NaN
3 977083128 NaN NaN
4 937344792 NaN NaN
因此,仅会添加最初迭代中的第一列.其他任何列都将被忽略.
So only the first column from the very first iterations gets appended. Any other columns are ignored.
所以我的问题是:
- 我怎么了?
- 我该如何解决?
- 是否有更好的方法?这样做的感觉就像是一种不太性感"的方式.
推荐答案
您可以按整数在列中创建MultiIndex
,然后按按列长创建的数组对模进行模除,然后按 sort_index
和最后一个
You can create MultiIndex
in columns by integer and modulo division by array created by length of columns and then reshape by unstack
, sort_index
and last reset_index
for remove MultiIndex
:
np.random.seed(2019)
df = pd.DataFrame({
"1": np.random.randint(900000000, 999999999, size=5),
"2": np.random.choice( ["A","B","C", np.nan], 5),
"3": np.random.choice( [np.nan, 1], 5),
"4": np.random.randint(900000000, 999999999, size=5),
"5": np.random.choice( ["A","B","C", np.nan], 5),
"6": np.random.choice( [np.nan, 1], 5)
})
print (df)
1 2 3 4 5 6
0 960189042 B NaN 991581392 A 1.0
1 977655199 nan 1.0 964195250 A 1.0
2 961771966 A NaN 969007327 B 1.0
3 955308022 C 1.0 973316485 A NaN
4 933277976 A 1.0 976749175 A NaN
arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]
df = df.stack(0).sort_index(level=[1, 0]).reset_index(drop=True)
df.columns = ['A','B','C']
print (df)
A B C
0 960189042 B NaN
1 977655199 nan 1.0
2 961771966 A NaN
3 955308022 C 1.0
4 933277976 A 1.0
5 991581392 A 1.0
6 964195250 A 1.0
7 969007327 B 1.0
8 973316485 A NaN
9 976749175 A NaN
您的解决方案在附加到Series
并最后由构造器创建DataFrame
的情况下有效:
Your solution working if append to Series
and last create DataFrame
by contructor:
col_counter = 0
a,b,c = pd.Series(),pd.Series(),pd.Series()
for column in df.columns:
if col_counter == 3:
col_counter = 0
if col_counter == 0:
# set_trace()
a = a.append(df[column]).reset_index(drop = True)
elif col_counter == 1:
b = b.append(df[column]).reset_index(drop = True)
elif col_counter == 2:
c = c.append(df[column]).reset_index(drop = True)
col_counter +=1
df_neu = pd.DataFrame({"A":a, "B":b, "C":c})
print (df_neu)
A B C
0 960189042 B NaN
1 977655199 nan 1.0
2 961771966 A NaN
3 955308022 C 1.0
4 933277976 A 1.0
5 991581392 A 1.0
6 964195250 A 1.0
7 969007327 B 1.0
8 973316485 A NaN
9 976749175 A NaN
这篇关于将任何其他列追加到前三列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!