如何将数据从合并的单元格拆分为Python数据框同一行中的其他单元格? [英] How to split data from a merged cell into other cells in its same row of a Python data frame?

查看:80
本文介绍了如何将数据从合并的单元格拆分为Python数据框同一行中的其他单元格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框示例,如下所示:

I have a sample of a data frame which looks like this:

+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
|   | Date                                                                                 | Professional  | Description                                |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 0 | 2019-12-19 00:00:00                                                                  | Katie Cool    | Travel to Space ...                        |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 1 | 2019-12-20 00:00:00                                                                  | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 2 | 2019-12-27 00:00:00                                                                  | Jenn Blossoms | Review lots of stuff/o...                  |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 3 | 2019-12-27 00:00:00                                                                  | Jenn Blossoms | Draft email to world leader...             |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 4 | 2019-12-30 00:00:00                                                                  | Jenn Blossoms | Review this thing.                         |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 5 | 12-30-2019 Jenn Blossoms Telephone   Call   to   A.   Bell   return   her   multiple | NaN           | NaN                                        |
|   | voicemails.                                                                          |               |                                            |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+

该行的很多数据都在日期单元格中。

Much of the row's data is in the date cell.

我希望样本看起来像这样:

I would like for the sample to look like this:

+---+---------------------+---------------+-------------------------------------------------------------+
|   | Date                | Professional  | Description                                                 |
+---+---------------------+---------------+-------------------------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool    | Travel to Space ...                                         |
+---+---------------------+---------------+-------------------------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ...                  |
+---+---------------------+---------------+-------------------------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o...                                   |
+---+---------------------+---------------+-------------------------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader...                              |
+---+---------------------+---------------+-------------------------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing.                                          |
+---+---------------------+---------------+-------------------------------------------------------------+
| 5 | 12-30-2019          | Jenn Blossoms | Telephone   Call   to   A.   Bell   return   her   multiple |
|   |                     |               | voicemails.                                                 |
+---+---------------------+---------------+-------------------------------------------------------------+

我尝试了以下代码:

date = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[0]
name = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[1]
description = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[2]

dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Professional'] = name
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Description'] = description
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Date'] = date

但是当我运行上面的代码时,数据框示例如下所示:

But when I run the above code, the data frame sample looks like this:

+---+------------+---------------+--------------------------------------------+
|   | Date       | Professional  | Description                                |
+---+------------+---------------+--------------------------------------------+
| 0 | 12/19/2019 | Katie Cool    | Travel to space ...                        |
+---+------------+---------------+--------------------------------------------+
| 1 | 12/20/2019 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+------------+---------------+--------------------------------------------+
| 2 | 12/27/2019 | Jenn Blossoms | Review lots of stuff/o…                    |
+---+------------+---------------+--------------------------------------------+
| 3 | 12/27/2019 | Jenn Blossoms | Draft email to world leader...             |
+---+------------+---------------+--------------------------------------------+
| 4 | 12/30/2019 | Jenn Blossoms | Review this thing.                         |
+---+------------+---------------+--------------------------------------------+
| 5 | NaN        | NaN           | NaN                                        |
+---+------------+---------------+--------------------------------------------+


推荐答案

您可以使用 str.split 方法将字符串拆分为单词。

You can use the str.split method to split the string into "words".

df['list_of_words'] = dftopdata['Date'].str.split()

如果有一种模式可以从中拆分 Professional Description 部分 list_of_words -您可以使用它。例如,如果 list_of_words 的前两个单词组成专业人员的名称,那么您可以-

If there is a pattern to split the Professional and Description parts from this list_of_words - you can use it. For instance, if the first 2 words of list_of_words make up the name of the professional then you can do -

df['Professional'] = df.apply(lambda x: ' '.join(x['list_of_words'][:2]), axis=1)
df['Description'] = df.apply(lambda x: ' '.join(x['list_of_words'][2:]), axis=1)

这篇关于如何将数据从合并的单元格拆分为Python数据框同一行中的其他单元格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆