在其他列为NaN的情况下,填写相同数量的字符 [英] Fill in same amount of characters where other column is NaN
问题描述
我有以下虚拟数据框:
df = pd.DataFrame({'Col1':['a,b,c,d', 'e,f,g,h', 'i,j,k,l,m'],
'Col2':['aa~bb~cc~dd', np.NaN, 'ii~jj~kk~ll~mm']})
Col1 Col2
0 a,b,c,d aa~bb~cc~dd
1 e,f,g,h NaN
2 i,j,k,l,m ii~jj~kk~ll~mm
真实数据集的形状为 500000,90
.
我需要将这些值嵌套到行中,并且为此使用了新的 explode
方法,该方法工作正常.
I need to unnest these values to rows and I'm using the new explode
method for this, which works fine.
问题是 NaN
,它们会在 explode
之后导致不相等的长度,因此我需要填写与填充值相同数量的定界符.在这种情况下, ~~~
因为第1行具有三个逗号.
The problem is the NaN
, these will cause unequal lengths after the explode
, so I need to fill in the same amount of delimiters as the filled values. In this case ~~~
since row 1 has three comma's.
预期产量
Col1 Col2
0 a,b,c,d aa~bb~cc~dd
1 e,f,g,h ~~~
2 i,j,k,l,m ii~jj~kk~ll~mm
尝试1 :
df['Col2'].fillna(df['Col1'].str.count(',')*'~')
尝试2:
np.where(df['Col2'].isna(), df['Col1'].str.count(',')*'~', df['Col2'])
这可行,但我觉得有一种更简单的方法:
This works, but I feel like there's an easier method for this:
characters = df['Col1'].str.replace('\w', '').str.replace(',', '~')
df['Col2'] = df['Col2'].fillna(characters)
print(df)
Col1 Col2
0 a,b,c,d aa~bb~cc~dd
1 e,f,g,h ~~~
2 i,j,k,l,m ii~jj~kk~ll~mm
d1 = df.assign(Col1=df['Col1'].str.split(',')).explode('Col1')[['Col1']]
d2 = df.assign(Col2=df['Col2'].str.split('~')).explode('Col2')[['Col2']]
final = pd.concat([d1,d2], axis=1)
print(final)
Col1 Col2
0 a aa
0 b bb
0 c cc
0 d dd
1 e
1 f
1 g
1 h
2 i ii
2 j jj
2 k kk
2 l ll
2 m mm
问题:有没有更简单,更通用的方法?还是我的方法没问题?
Question: is there an easier and more generalized method for this? Or is my method fine as is.
推荐答案
pd.concat
delims = {'Col1': ',', 'Col2': '~'}
pd.concat({
k: df[k].str.split(delims[k], expand=True)
for k in df}, axis=1
).stack()
Col1 Col2
0 0 a aa
1 b bb
2 c cc
3 d dd
1 0 e NaN
1 f NaN
2 g NaN
3 h NaN
2 0 i ii
1 j jj
2 k kk
3 l ll
4 m mm
这会在 df
中的列上循环.在 delims
词典中的键上循环可能更明智.
This loops on columns in df
. It may be wiser to loop on keys in the delims
dictionary.
delims = {'Col1': ',', 'Col2': '~'}
pd.concat({
k: df[k].str.split(delims[k], expand=True)
for k in delims}, axis=1
).stack()
同一件事,不同的外观
delims = {'Col1': ',', 'Col2': '~'}
def f(c): return df[c].str.split(delims[c], expand=True)
pd.concat(map(f, delims), keys=delims, axis=1).stack()
这篇关于在其他列为NaN的情况下,填写相同数量的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!