根据 pandas 中的相等性计算数据的符咒长度 [英] computing spell lengths of data based on equality in pandas
问题描述
我想基于熊猫数据框中相邻列的相等性来计算spell
长度.最好的方法是什么?
I would like to compute spell
lengths based on equality of the adjacent column in a pandas dataframe. What is the best way to do this?
一个例子:
import pandas as pd
d1 = pd.DataFrame([['4', '4', '4', '5'], ['23', '23', '24', '24'], ['112', '112', '112', '112']],
index=['c1', 'c2', 'c3'], columns=[1962, 1963, 1964, 1965])
产生一个看起来像
我想返回一个如下所示的数据框.此输出记录每行发生的咒语数量.在这种情况下,c1
有2个咒语,第一个咒语发生在1962年至1964年,第二个咒语始于1965年:
I would like to return a dataframe such as the following below. This output documents the number of spells that occur on each row. In this case c1
has 2 spells the first one occurs in 1962 to 1964 and the second starts and finishes in 1965:
和一个描述咒语长度的数据框,如下所示.例如,c1
的一个咒语持续时间为3年,第二个咒语持续时间为1年.
And a dataframe that describes the spell length as shown below. For example c1
has one spell of 3 years and a second spell of 1 year long in duration.
此重新编码在生存分析中很有用.
This re-coding is useful in survival analysis.
推荐答案
The following works for your dataset, needed to ask a question in order to reduce my original answer to using list comprehensions and itertools:
In [153]:
def num_spells(x):
t = list(x.unique())
return [t.index(el)+1 for el in x]
d1.apply(num_spells, axis=1)
Out[153]:
1962 1963 1964 1965
c1 1 1 1 2
c2 1 1 2 2
c3 1 1 1 1
In [144]:
from itertools import chain, repeat
def spell_len(x):
t = list(x.value_counts())
return list(chain.from_iterable(repeat(i,i) for i in t))
d1.apply(spell_len, axis=1)
Out[144]:
1962 1963 1964 1965
c1 3 3 3 1
c2 2 2 2 2
c3 4 4 4 4
这篇关于根据 pandas 中的相等性计算数据的符咒长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!