大 pandas 将某些列转换为行 [英] pandas convert some columns into rows
问题描述
因此,我的数据集包含一些有关n个日期的信息.问题在于每个日期实际上是一个不同的列标题.例如,CSV看起来像
So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like
location name Jan-2010 Feb-2010 March-2010
A "test" 12 20 30
B "foo" 18 20 25
我想要的是它的外观
location name Date Value
A "test" Jan-2010 12
A "test" Feb-2010 20
A "test" March-2010 30
B "foo" Jan-2010 18
B "foo" Feb-2010 20
B "foo" March-2010 25
问题是我不知道列中有多少个日期(尽管我知道它们总是以名字开头)
problem is I don't know how many dates are in the column (though I know they will always start after name)
推荐答案
更新
从v0.20开始,melt
是一阶函数,您现在可以使用
UPDATE
From v0.20, melt
is a first order function, you can now use
df.melt(id_vars=["location", "name"],
var_name="Date",
value_name="Value")
location name Date Value
0 A "test" Jan-2010 12
1 B "foo" Jan-2010 18
2 A "test" Feb-2010 20
3 B "foo" Feb-2010 20
4 A "test" March-2010 30
5 B "foo" March-2010 25
OLD(ER)VERSIONS:< 0.20
您可以使用 pd.melt
来获取大部分方式,然后排序:
You can use pd.melt
to get most of the way there, and then sort:
>>> df
location name Jan-2010 Feb-2010 March-2010
0 A test 12 20 30
1 B foo 18 20 25
>>> df2 = pd.melt(df, id_vars=["location", "name"],
var_name="Date", value_name="Value")
>>> df2
location name Date Value
0 A test Jan-2010 12
1 B foo Jan-2010 18
2 A test Feb-2010 20
3 B foo Feb-2010 20
4 A test March-2010 30
5 B foo March-2010 25
>>> df2 = df2.sort(["location", "name"])
>>> df2
location name Date Value
0 A test Jan-2010 12
2 A test Feb-2010 20
4 A test March-2010 30
1 B foo Jan-2010 18
3 B foo Feb-2010 20
5 B foo March-2010 25
(可能想扔一个.reset_index(drop=True)
,只是为了保持输出整洁.)
(Might want to throw in a .reset_index(drop=True)
, just to keep the output clean.)
注意:pd.DataFrame.sort
已弃用支持 pd.DataFrame.sort_values
.
Note: pd.DataFrame.sort
has been deprecated in favour of pd.DataFrame.sort_values
.
这篇关于大 pandas 将某些列转换为行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!