Cythonising Pandas:内容，索引和列的ctypes [英] Cythonising Pandas: ctypes for content, index and columns

查看：54 发布时间：2021/4/28 18:34:11 pandas cython

本文介绍了Cythonising Pandas:内容，索引和列的ctypes的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Cython的新手，但已经经历了非凡的加速，只需将我的 .py 复制到 .pyx (并cimport cython ， numpy 等)，然后使用 pyximport 导入到 ipython3 中.许多教程都从这种方法开始，下一步是为每种数据类型添加 cdef 声明，我可以在for循环中为迭代器执行此操作.但是，与大多数Pandas Cython教程或示例不同的是，我不是应用函数，而是使用切片，求和和除法(等)来更多地处理数据.

I am very new to Cython, yet am already experiencing extraordinary speedups just copying my .py to .pyx (and cimport cython, numpy etc) and importing into ipython3 with pyximport. Many tutorials start in this approach with the next step being to add cdef declarations for every data type, which I can do for the iterators in my for loops etc. But unlike most Pandas Cython tutorials or examples I am not apply functions so to speak, more manipulating data using slices, sums and division (etc).

所以问题是:我是否可以通过声明DataFrame仅包含浮点数( double )，列为 int 和是 int ?

So the question is: Can I increase the speed at which my code runs by stating that my DataFrame only contains floats (double), with columns that are int and rows that are int?

如何定义嵌入列表的类型?即 [[int，int]，[int]]

How to define the type of an embedded list? i.e [[int,int],[int]]

下面是一个生成DF分区的AIC得分的示例，抱歉，它太冗长了:

Here is an example that generates the AIC score for a partitioning of a DF, sorry it is so verbose:

    cimport cython
    import numpy as np
    cimport numpy as np
    import pandas as pd

    offcat = [
        "breakingPeace", 
        "damage", 
        "deception", 
        "kill", 
        "miscellaneous", 
        "royalOffences", 
        "sexual", 
        "theft", 
        "violentTheft"
        ]

    def partitionAIC(EmpFrame, part, OffenceEstimateFrame, ReturnDeathEstimate=False):
        """EmpFrame is DataFrame of ints, part is nested list of ints, OffenceEstimate frame is DF of float"""
        """partOf/block is a list of ints"""
        """ll, AIC,  is series/frame of floats"""
        ##Cython cdefs
        cdef int DFlen
        cdef int puns
        cdef int DeathPun    
        cdef int k
        cdef int pId
        cdef int punish

        DFlen = EmpFrame.shape[1]
        puns = 2
        DeathPun = 0
        PartitionModel = pd.DataFrame(index = EmpFrame.index, columns = EmpFrame.columns)

        for partOf in part:
            Grouping = [puns*x + y for x in partOf for y in list(range(0,puns))]
            PartGroupSum = EmpFrame.iloc[:,Grouping].sum(axis=1)

            for punish in range(0,puns):
                PunishGroup = [x*puns+punish for x in partOf]
                punishPunishment = ((EmpFrame.iloc[:,PunishGroup].sum(axis = 1) + 1/puns).div(PartGroupSum+1)).values[np.newaxis].T
                PartitionModel.iloc[:,PunishGroup] = punishPunishment
        PartitionModel = PartitionModel*OffenceEstimateFrame

        if ReturnDeathEstimate:
            DeathProbFrame = pd.DataFrame([[part]], index=EmpFrame.index, columns=['Partition'])
            for pId,block in enumerate(part):
                DeathProbFrame[pId] = PartitionModel.iloc[:,block[::puns]].sum(axis=1)
            DeathProbFrame = DeathProbFrame.apply(lambda row: sorted( [ [format("%6.5f"%row[idx])]+[offcat[X] for X in  x ] 
                for idx,x in enumerate(row['Partition'])],
                key=lambda x: x[0], reverse=True),axis=1)
        ll = (EmpFrame*np.log(PartitionModel.convert_objects(convert_numeric=True))).sum(axis=1)
        k = (len(part))*(puns-1)
        AIC = 2*k-2*ll

        if ReturnDeathEstimate:
            return AIC, DeathProbFrame
        else:
            return AIC

Cythonising Pandas:内容，索引和列的ctypes [英] Cythonising Pandas: ctypes for content, index and columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Cythonising Pandas:内容，索引和列的ctypes [英] Cythonising Pandas: ctypes for content, index and columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭