pandas - 将列值排列成新的列 [英] Pandas - unstack column values into new columns

查看：107 发布时间：2017/3/26 3:31:29 python pandas dataframe

本文介绍了 pandas - 将列值排列成新的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大数据帧，我正在存储很多冗余值，这些值使我很难处理我的数据。我有一个数据框格式：

import pandas as pd df = pd.DataFrame [a，g，n1，y1]，[a，g，n2，y2]，[b，h y3]，[b，h，n2，y4]]，列= [meta1，meta2，name，data]） >>>> df meta1 meta2 name data ag n1 y1 ag n2 y2 bh n1 y3 bh n2 y4 pre>

我在名称中的新列名称以及 data 。

我想生成一个数据框：

  df = pd.DataFrame（[[a，g，y1，y2]，[b，h，y3 ，y4]]，columns = [meta1，meta2，n1，n2]）
 
>>> df 
 
 meta1 meta2 n1 n2 
ag y1 y2 
bh y3 y4

名为 meta 的列大约包含大多数数据的15个以上的列，我认为特别适合用于索引。这个想法是，我现在有很多重复/冗余数据存储在 meta 中，我想生成更紧凑的数据帧。

我已经找到了一些类似的Q，但是无法确定我需要做什么样的操作：枢轴，重新索引，堆栈或拆包等。

PS - 原始索引值对我的目的不重要。

任何帮助将不胜感激。

问题我认为是相关的：

我认为以下Q与我想要做的相关，我看不到如何应用它，因为我不想产生更多的索引。

Python熊猫 - 如何拆卸数据透视表每个值成为一个新列的两个值？

解决方案

如果您将元列分组到列表中，那么可以这样做：

  metas = ['meta1'，'meta2'] 
 
 new_df = df.set_index（['name'] + metas）.unstack（'name'）
 print new_df 
 
 data 
 name n1 n2 
 meta1 meta2 
ag y1 y2 
bh y3 y4

你大多数的方式在那里额外的裁缝可以让你休息一下。

  print new_df.data.rename_axis（[None]，axis = 1） .reset_index（）
 
 meta1 meta2 n1 n2 
 0 ag y1 y2 
 1 bh y3 y4

I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:

import pandas as pd

df = pd.DataFrame([["a","g","n1","y1"], ["a","g","n2","y2"], ["b","h","n1","y3"], ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])

>>> df

  meta1 meta2 name data
    a     g   n1   y1
    a     g   n2   y2
    b     h   n1   y3
    b     h   n2   y4

where I have the names of the new columns I would like in name and the respective data in data.

I would like to produce a dataframe of the form:

df = pd.DataFrame([["a","g","y1","y2"], ["b","h","y3","y4"]], columns=["meta1", "meta2", "n1", "n2"])

>>> df

meta1 meta2  n1  n2
  a     g  y1  y2
  b     h  y3  y4

The columns called meta are around 15+ other columns that contain most of the data, and I don't think are particularly well suited to for indexing. The idea is that I have a lot of repeated/redundant data stored in meta at the moment and I would like to produce the more compact dataframe presented.

I have found some similar Qs but can't pinpoint what sort of operations I need to do: pivot, re-index, stack or unstack, etc.?

PS - the original index values are unimportant for my purposes.

Any help would be much appreciated.

Question I think is related:

I think the following Q is related to what I am trying to do, but I can't see how to apply it, as I don't want to produce more indexes.

Python Pandas- how to unstack a pivot table with two values with each value becoming a new column?

解决方案

If you group your meta columns into a list then you can do this:

metas = ['meta1', 'meta2']

new_df = df.set_index(['name'] + metas).unstack('name')
print new_df

            data    
name          n1  n2
meta1 meta2         
a     g       y1  y2
b     h       y3  y4

Which gets you most of the way there. Additional tailoring can get you the rest of the way.

print new_df.data.rename_axis([None], axis=1).reset_index()

  meta1 meta2  n1  n2
0     a     g  y1  y2
1     b     h  y3  y4

这篇关于 pandas - 将列值排列成新的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas - 将列值排列成新的列 [英] Pandas - unstack column values into new columns

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas - 将列值排列成新的列 [英] Pandas - unstack column values into new columns

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭