通过复制对数据进行归一化 [英] normalizing data by duplication

查看:102
本文介绍了通过复制对数据进行归一化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:此问题确实是 Split的重复pandas dataframe字符串条目用于分隔行,但是此处提供的答案更加通用和有益,因此,在所有方面,我选择不删除线程

note: this question is indeed a duplicate of Split pandas dataframe string entry to separate rows, but the answer provided here is more generic and informative, so with all respect due, I chose not to delete the thread

我有一个具有以下格式的数据集":

I have a 'dataset' with the following format:

     id | value | ...
--------|-------|------
      a | 156   | ...
    b,c | 457   | ...
e,g,f,h | 346   | ...
    ... | ...   | ...

,我想通过复制每个id的所有值来对其进行归一化:

and I would like to normalize it by duplicating all values for each ids:

     id | value | ...
--------|-------|------
      a | 156   | ...
      b | 457   | ...
      c | 457   | ...
      e | 346   | ...
      g | 346   | ...
      f | 346   | ...
      h | 346   | ...
    ... | ...   | ...

我正在做的是使用.groupby应用pandas的拆分应用组合原理,为每个组(groupby value, pd.DataFrame())创建一个tuple

What I'm doing is applying the split-apply-combine principle of pandas using .groupby that creates a tuple for each group (groupby value, pd.DataFrame())

我创建了一个列进行分组,该列仅对行中的id进行计数:

I created a column to group by that simply counts the ids in the row:

df['count_ids'] = df['id'].str.split(',').apply(lambda x: len(x))

     id | value | count_ids
--------|-------|------
      a | 156   | 1
    b,c | 457   | 2
e,g,f,h | 346   | 4
    ... | ...   | ...

我复制行的方式如下:

pd.DataFrame().append([group]*count_ids)

我的进度很慢,但是确实很复杂,对于能与这类问题分享的最佳实践或建议,我将不胜感激.

I'm slowly progressing, but it is really complex, and I would appreciate any best practice or recommendation you can share with this type of problems.

推荐答案

尝试一下:

In [44]: df
Out[44]:
        id  value
0        a    156
1      b,c    457
2  e,g,f,h    346

In [45]: (df['id'].str.split(',', expand=True)
   ....:          .stack()
   ....:          .reset_index(level=0)
   ....:          .set_index('level_0')
   ....:          .rename(columns={0:'id'})
   ....:          .join(df.drop('id',1), how='left')
   ....: )
Out[45]:
  id  value
0  a    156
1  b    457
1  c    457
2  e    346
2  g    346
2  f    346
2  h    346

说明:

In [48]: df['id'].str.split(',', expand=True).stack()
Out[48]:
0  0    a
1  0    b
   1    c
2  0    e
   1    g
   2    f
   3    h
dtype: object

In [49]: df['id'].str.split(',', expand=True).stack().reset_index(level=0)
Out[49]:
   level_0  0
0        0  a
0        1  b
1        1  c
0        2  e
1        2  g
2        2  f
3        2  h

In [50]: df['id'].str.split(',', expand=True).stack().reset_index(level=0).set_index('level_0')
Out[50]:
         0
level_0
0        a
1        b
1        c
2        e
2        g
2        f
2        h

In [51]: df['id'].str.split(',', expand=True).stack().reset_index(level=0).set_index('level_0').rename(columns={0:'id'})
Out[51]:
        id
level_0
0        a
1        b
1        c
2        e
2        g
2        f
2        h

In [52]: df.drop('id',1)
Out[52]:
   value
0    156
1    457
2    346

这篇关于通过复制对数据进行归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆