将计算出的列添加到Pandas的数据框中 [英] Adding calculated column(s) to a dataframe in pandas

查看:68
本文介绍了将计算出的列添加到Pandas的数据框中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个OHLC价格数据集,该数据集已从CSV解析为Pandas数据框,并重新采样至15分钟柱形:

I have an OHLC price data set, that I have parsed from CSV into a Pandas dataframe and resampled to 15 min bars:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 500047 entries, 1998-05-04 04:45:00 to 2012-08-07 00:15:00
Freq: 15T
Data columns:
Close    363152  non-null values
High     363152  non-null values
Low      363152  non-null values
Open     363152  non-null values
dtypes: float64(4)

我想添加各种计算出的列,从简单的列开始,例如周期范围(H-L),然后是布尔值以指示将要定义的价格模式的出现-例如锤形蜡烛图案,其示例定义为:

I would like to add various calculated columns, starting with simple ones such as period Range (H-L) and then booleans to indicate the occurrence of price patterns that I will define - e.g. a hammer candle pattern, for which a sample definition:

def closed_in_top_half_of_range(h,l,c):
    return c > l + (h-l)/2

def lower_wick(o,l,c):
    return min(o,c)-l

def real_body(o,c):
    return abs(c-o)

def lower_wick_at_least_twice_real_body(o,l,c):
    return lower_wick(o,l,c) >= 2 * real_body(o,c)

def is_hammer(row):
    return lower_wick_at_least_twice_real_body(row["Open"],row["Low"],row["Close"]) \
    and closed_in_top_half_of_range(row["High"],row["Low"],row["Close"])

基本问题:如何将函数映射到列,特别是在我想引用多个其他列或整行或其他内容的地方?

Basic problem: how do I map the function to the column, specifically where I would like to reference more than one other column or the whole row or whatever?

此post 处理从单个源列添加两个计算出的列,这很接近,但不完全相同.

This post deals with adding two calculated columns off of a single source column, which is close, but not quite it.

稍微先进一点:对于参照多个单杠(T)确定的价格模式,我如何从函数定义中引用不同的行(例如T-1,T-2等)?

And slightly more advanced: for price patterns that are determined with reference to more than a single bar (T), how can I reference different rows (e.g. T-1, T-2 etc.) from within the function definition?

推荐答案

确切的代码对于您要执行的每个列都会有所不同,但是您可能会想使用mapapply函数.在某些情况下,您可以直接使用现有的列进行计算,因为这些列是Pandas Series对象,它们也可以作为Numpy数组使用,对于常规的数学运算,它们会自动以元素方式工作.

The exact code will vary for each of the columns you want to do, but it's likely you'll want to use the map and apply functions. In some cases you can just compute using the existing columns directly, since the columns are Pandas Series objects, which also work as Numpy arrays, which automatically work element-wise for usual mathematical operations.

>>> d
    A   B  C
0  11  13  5
1   6   7  4
2   8   3  6
3   4   8  7
4   0   1  7
>>> (d.A + d.B) / d.C
0    4.800000
1    3.250000
2    1.833333
3    1.714286
4    0.142857
>>> d.A > d.C
0     True
1     True
2     True
3    False
4    False

如果需要在一行中使用max和min之类的运算,则可以将applyaxis=1结合使用,以将所需的任何函数应用于每一行.这是一个计算min(A, B)-C的示例,它看起来像是您的下部灯芯":

If you need to use operations like max and min within a row, you can use apply with axis=1 to apply any function you like to each row. Here's an example that computes min(A, B)-C, which seems to be like your "lower wick":

>>> d.apply(lambda row: min([row['A'], row['B']])-row['C'], axis=1)
0    6
1    2
2   -3
3   -3
4   -7

希望这会让您对如何进行操作有所了解.

Hopefully that gives you some idea of how to proceed.

将行与相邻行进行比较,最简单的方法是对要比较的列进行切片,保留开始/结束的位置,然后比较所得切片.例如,这将告诉您A列中的元素的哪些行小于C列中下一行的元素:

to compare rows against neighboring rows, the simplest approach is to slice the columns you want to compare, leaving off the beginning/end, and then compare the resulting slices. For instance, this will tell you for which rows the element in column A is less than the next row's element in column C:

d['A'][:-1] < d['C'][1:]

这是另一种方法,告诉您哪些行的A小于上一行的C:

and this does it the other way, telling you which rows have A less than the preceding row's C:

d['A'][1:] < d['C'][:-1]

['A"][:-1]切成A列的最后一个元素,而将['C'][1:]切成C列的第一个元素,因此当您将这两个元素排成一行并进行比较时,您正在将A中的每个元素与下一行的C.

Doing ['A"][:-1] slices off the last element of column A, and doing ['C'][1:] slices off the first element of column C, so when you line these two up and compare them, you're comparing each element in A with the C from the following row.

这篇关于将计算出的列添加到Pandas的数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆