如何将一些行合并为一行 [英] How to combine some rows into a single row
本文介绍了如何将一些行合并为一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
对不起,我应该删除旧问题,然后创建新问题。
我有一个包含两列的数据框。 df如下所示:
Sorry, I should delete the old question, and create the new one. I have a dataframe with two columns. The df looks as follows:
Word Tag
0 Asam O
1 instruksi O
2 - O
3 instruksi X
4 bahasa Y
5 Instruksi P
6 - O
7 instruksi O
8 sebuah Q
9 satuan K
10 - L
11 satuan O
12 meja W
13 Tiap Q
14 - O
15 tiap O
16 karakter P
17 - O
18 ke O
19 - O
20 karakter O
并且我想将包含破折号-
的某些行合并到一行。因此输出应为以下内容:
and I'd like to merge some rows which contain dash -
to one row. so the output should be the following:
Word Tag
0 Asam O
1 instruksi-instruksi O
2 bahasa Y
3 Instruksi-instruksi P
4 sebuah Q
5 satuan-satuan K
6 meja W
7 Tiap-tiap Q
8 karakter-ke-karakter P
有什么想法吗?提前致谢。我尝试了Jacob K的答案,它起作用了,然后在我的数据集中发现,中间有多个-
行。我已经输入了预期的输出,例如索引号8
Any ideas? Thanks in advance. I have tried the answer from Jacob K, it works, then I found in my dataset, there are more than one -
row in between. I have put the expected output, like index number 8
Jacob K的解决方案:
Solution from Jacob K:
# Import packages
import pandas as pd
import numpy as np
# Get 'Word' and 'Tag' columns as numpy arrays (for easy indexing)
words = df.Word.to_numpy()
tags = df.Tag.to_numpy()
# Create empty lists for new colums in output dataframe
newWords = []
newTags = []
# Use while (rather than for loop) since index i can change dynamically
i = 0 # To not cause any issues with i-1 index
while (i < words.shape[0] - 1):
if (words[i] == "-"):
# Concatenate the strings above and below the "-"
newWords.append(words[i-1] + "-" + words[i+1])
newTags.append(tags[i-1])
i += 2 # Don't repeat any concatenated values
else:
if (words[i+1] != "-"):
# If there is no "-" next, append the regular word and tag values
newWords.append(words[i])
newTags.append(tags[i])
i += 1 # Increment normally
# Create output dataframe output_df
d2 = {'Word': newWords, 'Tag': newTags}
output_df = pd.DataFrame(data=d2)
推荐答案
我对 GroupBy.agg
:
#df['Word'] = df['Word'].str.replace(' ', '') #if necessary
blocks = df['Word'].shift().ne('-').mul(df['Word'].ne('-')).cumsum()
new_df = df.groupby(blocks, as_index=False).agg({'Word' : ''.join, 'Tag' : 'first'})
print(new_df)
输出
Word Tag
0 Asam O
1 instruksi-instruksi O
2 bahasa Y
3 Instruksi-instruksi P
4 sebuah Q
5 satuan-satuan K
6 meja W
7 Tiap-tiap Q
8 karakter-ke-karakter P
块(详细信息)
print(blocks)
0 1
1 2
2 2
3 2
4 3
5 4
6 4
7 4
8 5
9 6
10 6
11 6
12 7
13 8
14 8
15 8
16 9
17 9
18 9
19 9
20 9
Name: Word, dtype: int64
这篇关于如何将一些行合并为一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文