将类似的行合并到python数据框中的一行 [英] Combine similar rows to one row in python dataframe

查看：1103 发布时间：2017/3/26 4:46:49 python sql pandas dataframe pandasql

本文介绍了将类似的行合并到python数据框中的一行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些数据框如下，我想做的是将相同的
yyyymmdd和hr的行组合成一行。
（有几行具有相同的yyyymmdd和hr）

  yyyymmdd hr ariel cat kiki mmax vicky gaolie shiu nick ck 
 10 2015-12-27 9 0 0 0 0 0 0 23 0 
 181 2015-12-27 10 0 0 0 0 0 0 0 2 0 
 65 2015 -12-27 11 0 0 0 0 0 0 0 20 0 
 4 2015-12-27 12 0 0 0 0 0 0 0 4 0 
 0 2015-12-27 17 0 0 0 0 0 0 0 2 0 
 141 2015-12-27 19 1 0 0 0 0 0 0 0 0 
 160 2015-12-28 8 0 8 0 0 0 0 0 0 0 
 82 2015 -12-28 9 0 0 0 0 0 0 19 0 0 
 113 2015-12-28 9 11 0 0 0 0 0 0 0 0 
 180 2015-12-28 9 0 11 0 0 0 0 0 0 0 
 9 2015-12-28 10 0 13 0 0 0 0 0 0 0 
 76 2015-12-28 10 85 0 0 0 0 0 0 0 0 
 107 2015-12-28 10 0 0 0 0 0 0 15 0 0 
 188 2015-12-28 10 0 0 0 0 2 0 0 0 0 
 34 2015-12 -28 11 0 0 0 0 0 0 14 0 0 
 69 2015-12-28 11 0 0 0 0 2 0 0 0 0 
 134 2015-12-28 11 0 11 0 0 0 0 0 0 0 
 158 2015-12-28 11 2 0 0 0 0 0 0 0 0

我想要的输出的一部分应该是这样的：

  yyyymmdd hr ariel cat kiki mmax vicky gaolie shiu nick ck 
 2015-12-28 10 85 13 0 0 2 0 15 0 0

请分享一些想法我可以在python大熊猫或SQL中使用，谢谢！

===================== ================================================

现在我再问两个问题：

我如何填我的小时 ndex的数据帧？
它假设应该是这样的：

yyyymmdd hr ariel cat kiki mmax vicky gaolie shiu nick ck 0 2015 -12-27 8 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 2015-12-27 9 0 0 0 0 0 0 0 23 0 2 2015-12-27 10 0 0 0 0 0 0 0 2 0 3 2015-12-27 11 0 0 0 0 0 0 0 20 0 4 2015-12-27 12 0 0 0 0 0 0 0 4 0 5 2015 -12-27 13 NaN NaN NaN NaN NaN NaN NaN NaN NaN 6 2015-12-27 14 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 2015-12-27 15 NaN NaN NaN NaN NaN NaN NaN NaN NaN 8 2015-12-27 16 NaN NaN NaN NaN NaN NaN NaN NaN 9 2015-12-27 17 0 0 0 0 0 0 0 2 0 10 2015 -12-27 18 NaN NaN NaN NaN NaN NaN NaN NaN NaN 11 2015-12-27 19 1 0 0 0 0 0 0 0 0 12 20 15-12-27 20 NaN NaN NaN NaN NaN NaN NaN NaN NaN 13 2015-12-28 8 0 8 0 0 0 0 0 0 0 14 2015-12-28 9 11 11 0 0 0 0 19 0 0 15 2015-12-28 10 85 13 0 0 2 0 15 0 0 16 2015-12-28 11 2 11 0 0 2 0 14 0 0 17 2015-12-28 12 2 20 0 4 0 0 10 0 0 18 2015-12-28 13 8 9 0 9 3 0 9 0 0 19 2015-12-28 14 4 10 0 8 0 0 22 0 0 20 2015-12-28 15 3 3 0 2 0 0 16 0 0 21 2015-12-28 16 14 5 1 1 0 0 19 0 0 22 2015-12-28 17 15 1 2 0 0 0 19 0 0 23 2015-12-28 18 0 0 0 6 0 0 0 0 0 24 2015-12-28 19 0 0 0 5 0 0 0 0 0 25 2015-12-28 20 0 0 0 1 0 0 0 0 0

如何根据列和小时绘制折线图？
（x轴=列，即：ariel，cat，kiki ...）
（y轴=小时，即：8,9,10 ... 20）
每图表代表一个数据（即2015-12-27，2015-12-28 ..）

谢谢！

解决方案

将您的数据放入熊猫数据框中，然后分组并获取每个组的最大值
Copy-将您的示例粘贴到csv中，如下所示：

 将pandas导入pd 
 df = pd.read_csv 'df.csv'，index_col = 0）
 df_combined = df.groupby（['yyyymmdd'，'hr']）max（）
 df_combined

输出：

使用reset_index（），以防您不希望使用多重索引。

I have some dataframe as below, what I want to do is to combine the rows with same "yyyymmdd" and "hr " into one row. (there are several rows with same "yyyymmdd" and "hr" )

       yyyymmdd  hr ariel cat kiki mmax vicky gaolie shiu nick ck
10   2015-12-27   9     0   0    0    0     0      0    0   23  0
181  2015-12-27  10     0   0    0    0     0      0    0    2  0
65   2015-12-27  11     0   0    0    0     0      0    0   20  0
4    2015-12-27  12     0   0    0    0     0      0    0    4  0
0    2015-12-27  17     0   0    0    0     0      0    0    2  0
141  2015-12-27  19     1   0    0    0     0      0    0    0  0
160  2015-12-28   8     0   8    0    0     0      0    0    0  0
82   2015-12-28   9     0   0    0    0     0      0   19    0  0
113  2015-12-28   9    11   0    0    0     0      0    0    0  0
180  2015-12-28   9     0  11    0    0     0      0    0    0  0
9    2015-12-28  10     0  13    0    0     0      0    0    0  0
76   2015-12-28  10    85   0    0    0     0      0    0    0  0
107  2015-12-28  10     0   0    0    0     0      0   15    0  0
188  2015-12-28  10     0   0    0    0     2      0    0    0  0
34   2015-12-28  11     0   0    0    0     0      0   14    0  0
69   2015-12-28  11     0   0    0    0     2      0    0    0  0
134  2015-12-28  11     0  11    0    0     0      0    0    0  0
158  2015-12-28  11     2   0    0    0     0      0    0    0  0

part of the output I want should like this for instance:

    yyyymmdd  hr ariel cat kiki mmax vicky gaolie shiu nick ck
2015-12-28  10     85   13    0    0     2      0    15    0  0

please share some ideas that I can use in python pandas or SQL, thanks!

=========================================================================

Now I have 2 more question want to ask :

how can I "fill" the "hr" index of the dataframe ? it suppose should be something like this :

yyyymmdd hr ariel cat kiki mmax vicky gaolie shiu nick ck 0 2015-12-27 8 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 2015-12-27 9 0 0 0 0 0 0 0 23 0 2 2015-12-27 10 0 0 0 0 0 0 0 2 0 3 2015-12-27 11 0 0 0 0 0 0 0 20 0 4 2015-12-27 12 0 0 0 0 0 0 0 4 0 5 2015-12-27 13 NaN NaN NaN NaN NaN NaN NaN NaN NaN 6 2015-12-27 14 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 2015-12-27 15 NaN NaN NaN NaN NaN NaN NaN NaN NaN 8 2015-12-27 16 NaN NaN NaN NaN NaN NaN NaN NaN NaN 9 2015-12-27 17 0 0 0 0 0 0 0 2 0 10 2015-12-27 18 NaN NaN NaN NaN NaN NaN NaN NaN NaN 11 2015-12-27 19 1 0 0 0 0 0 0 0 0 12 2015-12-27 20 NaN NaN NaN NaN NaN NaN NaN NaN NaN 13 2015-12-28 8 0 8 0 0 0 0 0 0 0 14 2015-12-28 9 11 11 0 0 0 0 19 0 0 15 2015-12-28 10 85 13 0 0 2 0 15 0 0 16 2015-12-28 11 2 11 0 0 2 0 14 0 0 17 2015-12-28 12 2 20 0 4 0 0 10 0 0 18 2015-12-28 13 8 9 0 9 3 0 9 0 0 19 2015-12-28 14 4 10 0 8 0 0 22 0 0 20 2015-12-28 15 3 3 0 2 0 0 16 0 0 21 2015-12-28 16 14 5 1 1 0 0 19 0 0 22 2015-12-28 17 15 1 2 0 0 0 19 0 0 23 2015-12-28 18 0 0 0 6 0 0 0 0 0 24 2015-12-28 19 0 0 0 5 0 0 0 0 0 25 2015-12-28 20 0 0 0 1 0 0 0 0 0
how can I plot the line charts based on columns and hr ? (x-axis = columns , i.e. : ariel ,cat, kiki...) (y-axis = hr, i.e. : 8,9,10...20 ) every chart represents one data (i.e. 2015-12-27, 2015-12-28..)

Thanks!!

解决方案

Put your data into a Pandas dataframe, and then groupby and get the max of each group, Copy-Pasting your example into a csv, it looks like this:

import pandas as pd
df = pd.read_csv('df.csv',index_col=0)
df_combined = df.groupby(['yyyymmdd','hr']).max()
df_combined

Output:

Use reset_index() in case you don't want the multi-index.

这篇关于将类似的行合并到python数据框中的一行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将类似的行合并到python数据框中的一行 [英] Combine similar rows to one row in python dataframe

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将类似的行合并到python数据框中的一行 [英] Combine similar rows to one row in python dataframe

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭