pandas 行操作 - 如果找到 startwith 关键字 - 将行追加到前一行的末尾 [英] pandas row manipulation - If startwith keyword found - append row to end of previous row

查看:26
本文介绍了pandas 行操作 - 如果找到 startwith 关键字 - 将行追加到前一行的末尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于文本文件处理的问题.我的文本文件打印为一列.该列的数据分散在各行中,视觉上看起来很棒&然而有些统一,仍然只有一列.最终,我想将找到关键字的行附加到前一行的末尾,直到数据是一个长行.然后我将使用 str.split() 根据需要将部分切割成列.

I have a question regarding text file handling. My text file prints as one column. The column has data scattered throughout the rows and visually looks great & somewhat uniform however, still just one column. Ultimately, I'd like to append the row where the keyword is found to the end of the top previous row until data is one long row. Then I'll use str.split() to cut up sections into columns as I need.

在 Excel(顶部下方的代码)中,我使用相同的文本文件并删除标题、左对齐并执行关键字搜索.找到后,Excel 有一个很好的功能,称为偏移量,您可以使用此偏移量(x,y)值从活动单元格开始位置基本上在任何位置放置或附加单元格值.完成后,我将删除该行.这使我能够将数据转换为我可以使用的表格列格式.

In Excel (code below-Top) I took this same text file and removed headers, aligned left, and performed searches for keywords. When found, Excel has a nice feature called offset where you can place or append the cell value basically anywhere using this offset(x,y).value from the active-cell start position. Once done, I would delete the row. This allowed my to get the data into a tabular column format that I could work with.

我需要什么:下面的 Python 代码将在每一行中循环查找关键字地址:".这部分代码有效.找到关键字后,下一行应将该行附加到前一行的末尾.这就是我的问题所在.我找不到将活动行号放入变量的方法,因此我可以使用 [index] 代替活动行的单词.或 [index-1] 为前一行.

What I Need: The below Python code will cycle down through each row looking for the keyword 'Address:'. This part of the code works. Once it finds the keyword, the next line should append the row to the end of the previous row. This is where my problem is. I can not find a way to get the active row number into a variable so I can use in place of the word [index] for the active row. Or [index-1] for the previous row.

类似任务的Excel代码

Excel Code of similar task

Do
    Set Rng = WorkRng.Find("Address", LookIn:=xlValues)
    If Not Rng Is Nothing Then
        Rng.Offset(-1, 2).Value = Rng.Value
        Rng.Value = ""
    End If
Loop While Not Rng Is Nothing

Python 等价物

import pandas as pd
from pandas import DataFrame, Series


file = {'Test': ['Last Name: Nobody','First Name: Tommy','Address: 1234 West Juniper St.','Fav 
Toy', 'Notes','Time Slot' ] }

df = pd.DataFrame(file)

                             Test
0               Last Name: Nobody
1               First Name: Tommy
2  Address: 1234 West Juniper St.
3                         Fav Toy
4                           Notes
5                       Time Slot

我尝试了以下方法:

for line in df.Test:
    if line.startswith('Address:'):
        df.loc[[index-1],:].values = df.loc[index-1].values + ' ' + df.loc[index].values        
            Line above does not work with index statement
    else:
        pass


# df.loc[[1],:] = df.loc[1].values + ' ' + df.loc[2].values  # copies row 2 at the end of row 1, 
                                                             # works with static row numbers only
# df.drop([2,0], inplace=True)  # Deletes row from df

预期输出:

                                               Test
0                                 Last Name: Nobody
1  First Name: Tommy Address: 1234 West Juniper St.
2                    Address: 1234 West Juniper St.
3                                           Fav Toy
4                                             Notes
5                                         Time Slot

我试图围绕整个系列矢量化方法进行思考,但仍然坚持尝试我不太熟悉的循环.如果有办法实现这一点,请指出正确的方向.

I am trying to wrap my head around the entire series vectorization approach but still stuck trying loops that I'm semi familiar with. If there is a way to achieve this please point me in the right direction.

一如既往,我感谢您的时间和知识.如果您能帮我解决这个问题,请告诉我.

As always, I appreciate your time and your knowledge. Please let me know if you can help with this issue.

谢谢,

推荐答案

使用 Series.shift on Test 然后使用 Series.str.startswith 创建一个布尔掩码,然后使用布尔值使用此掩码索引以更新 Test 列中的值:

Use Series.shift on Test then use Series.str.startswith to create a boolean mask, then use boolean indexing with this mask to update the values in Test column:

s = df['Test'].shift(-1)
m = s.str.startswith('Address', na=False)
df.loc[m, 'Test'] += (' ' + s[m])

结果:

                                              Test
0                                 Last Name: Nobody
1  First Name: Tommy Address: 1234 West Juniper St.
2                    Address: 1234 West Juniper St.
3                                           Fav Toy
4                                             Notes
5                                         Time Slot

这篇关于pandas 行操作 - 如果找到 startwith 关键字 - 将行追加到前一行的末尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆