使用numpy/scikit函数保持 pandas 结构 [英] Keep pandas structure with numpy/scikit functions
问题描述
我正在使用熊猫提供的出色的read_csv()
功能,该功能可以:
I'm using the excellent read_csv()
function from pandas, which gives:
In [31]: data = pandas.read_csv("lala.csv", delimiter=",")
In [32]: data
Out[32]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12083 entries, 0 to 12082
Columns: 569 entries, REGIONC to SCALEKER
dtypes: float64(51), int64(518)
但是当我从scikit-learn中应用一个函数时,我会丢失有关列的信息:
but when i apply a function from scikit-learn i loose the informations about columns:
from sklearn import preprocessing
preprocessing.scale(data)
提供numpy数组.
是否可以在不丢失信息的情况下将scikit或numpy函数应用于DataFrames?
Is there a way to apply scikit or numpy function to DataFrames without loosing the information?
推荐答案
一种简单的方法是分别存储数据框架的结构(即其列和索引),然后从您的预处理结果如下:
A (slightly naive) way would be to store the structure of your data frame, i.e. its columns and index, separately, and then create a new data frame from your preprocessed results like so:
In [15]: data = np.zeros((2,2))
In [16]: data
Out[16]:
array([[ 0., 0.],
[ 0., 0.]])
In [17]: from pandas import DataFrame
In [21]: df = DataFrame(data, index = ['first', 'second'], columns=['c1','c2'])
In [22]: df
Out[22]:
c1 c2
first 0 0
second 0 0
In [26]: i = df.index
In [27]: c = df.columns
# generate new data as a numpy array
In [29]: df = DataFrame(np.random.rand(2,2), index=i, columns=c)
In [30]: df
Out[30]:
c1 c2
first 0.821354 0.936703
second 0.138376 0.482180
正如在Out[22]
中看到的那样,我们从一个数据框架开始,然后在In[29]
中,我们在框架中放置一些新数据,而行和列保持不变.我假设您的预处理将not
整理数据的行/列.
As you can see in Out[22]
, we start off with a data frame, and then in In[29]
we place some new data inside the frame, leaving the rows and columns unchanged. I am assuming your preprocessing will not
shuffle the rows/ columns of the data.
这篇关于使用numpy/scikit函数保持 pandas 结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!