如何查找重复的患者并添加新列 [英] How to find repeated patients and add a new column

查看:48
本文介绍了如何查找重复的患者并添加新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个庞大的医学数据集.现在,我想添加一个表示再入院的列,也就是说,如果患者最多进行了6个月的手术,那么该列再入院"将是该患者在最近6个月内进行的手术次数.否则,它将为"0".我将共享部分数据集:

I am dealing with a large medical dataset. Now I want to add a column that represent the readmission, that is, if a patient has had surgery at most 6 months ago, then that column "Readmission" will be the number of surgeries that that patient have had in the last 6 months. Otherwise, it will be "0". I will share part of the dataset:

Patient_ID Surgery_Date
1838       2017-01-05
1838       2018-04-26
87        2017-01-11
1838       2017-07-06
87        2017-03-17
1838       2018-08-02
87        2017-11-15
1838       2018-11-22
87        2017-02-01
87        2017-06-21
1838       2018-06-14

因此,通过这种方式,我想在此示例中添加一个新列,如下所示:

So, in this way, I want to have a new column, in this example, like this:

Patient_ID Surgery_Date  Readmission
1838       2017-01-05        0
1838       2018-04-26        0
087        2017-01-11        0
1838       2017-07-06        0
087        2017-03-17        2
1838       2018-08-02        2
087        2017-11-15        1
1838       2018-11-22        2
087        2017-02-01        1
087        2017-06-21        3
1838       2018-06-14        1

有人可以帮助我吗?

推荐答案

这是问题的编辑答案

import pandas as pd
import datetime as dt
import numpy as np

# Your data plus a new patient that comes often                                                                                                                                                                    
data = {'Patient_ID':[12,1352,55,1352,12,6,1352,100,100,100,100] ,
        'Surgery_Date': ['25/01/2009', '28/01/2009','29/01/2009','12/12/2008','23/02/2008','2/02/2009','12/01/2009','01/01/2009','01/02/2009','01/01/2010','01/02/2010']}

df = pd.DataFrame(data,columns = ['Patient_ID','Surgery_Date'])
readmissions = pd.Series(np.zeros(len(df),dtype=int),index=df.index))

# Loop through all unique ids                                                                                                                                                                                      
all_id = df['Patient_ID'].unique()
id_admissions = {}
for pid in all_id:
    # These are all the times a patient with a given ID has had surgery                                                                                                                                            
    patient = df.loc[df['Patient_ID']==pid]
    admissions_sorted = pd.to_datetime(patient['Surgery_Date'], format='%d/%m/%Y').sort_values()

    # This checks if the previous surgery was longer than 180 days ago                                                                                                                                              
    frequency = admissions_sorted.diff()<dt.timedelta(days=180)

    # Compute the readmission                                                                                                                                                                                      
    n_admissions = [0]
    for v in frequency.values[1:]:
       n_admissions.append((n_admissions[-1]+1)*v)

    # Add these value to the time series                                                                                                                                                                           
    readmissions.loc[admissions_sorted.index] = n_admissions


df['Readmission'] = readmissions

这将返回

    Patient_ID Surgery_Date  Readmission
0           12   25/01/2009            0
1         1352   28/01/2009            2
2           55   29/01/2009            0
3         1352   12/12/2008            0
4           12   23/02/2008            0
5            6    2/02/2009            0
6         1352   12/01/2009            1
7          100   01/01/2009            0
8          100   01/02/2009            1
9          100   01/01/2010            0
10         100   01/02/2010            1

希望这会有所帮助!这可能不是非常像python或pandas的样式,但是它应该可以正常工作.我坚信这可以提高效率和可读性.

Hope this helps ! This is probably not very python-esque or pandas-esque, but it should work as inteded. I am convinced this could be made much more efficient and readable.

这篇关于如何查找重复的患者并添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆