什么是Matlab细胞阵列的等效物? [英] What is the equivalent to a Matlab cell array?

查看:61
本文介绍了什么是Matlab细胞阵列的等效物?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Python的新手,正在尝试创建与Matlab的单元数组"等效的东西.假设我有100个客户索引"C001","C002"等,并且每个客户的数据都不同:

  • 以平方米为单位的房屋面积[实数]
  • 分类数据,显示它们是商业",住宅"还是其他"
  • 2014年用电量的每小时时间序列,即8760个实际值的日期时间索引数组

在Python 2.7中构建将单个值,分类数据和时间索引数组组合在一起的数据集的最佳方法是什么?我正在尝试将熊猫用于此目的,但到目前为止没有成功.

非常感谢您

解决方案

与MATLAB单元格数组等效的是一个numpy对象数组.但是,很少使用它们,因为它们在实践中很少是您想要的.在大多数情况下,有人会在MATLAB中使用Cell时,列表或嵌套列表就足够了:

>>> a = [obj1, obj2, obj, obj4]
>>> b = [[obj1, obj2], [obj3, obj4]]

但是,这不是您要执行的操作.您的问题是是"问题的经典示例.您要问的是如何实现对问题的特定解决方案,而不是要问如何解决问题本身. Python可以做很多MATLAB无法做的事情,因此试图使Python表现得像MATLAB一样,通常会导致解决方案欠佳.

在这种情况下,您需要的是 pandas DataFrame .它根本不像MATLAB细胞单元阵列,但可以更好地拟合您的数据集.您可以使用 MultiIndex 来存储参数,并使用列来存储时间序列数据.这使您可以按名称,大小,类别,日期等进行索引.例如,您可以用一行代码来计算第三季度每类物业的平均能耗,这些物业的面积超过500平方米. /p>

所以这是一个示例,您可以如何构造此类数据:

>>> names = ['C001', 'C002', 'C003', 'C004']
>>> sizes = np.abs(np.random.random(4))*1000
>>> category = ['Commerical', 'Residential', 'Residential', 'Other']
>>> ts = np.random.random([100, 4])
>>> timestamps = pd.date_range('1/1/2011', periods=100, freq='W') 
>>> 
>>> cols = pd.MultiIndex.from_arrays([names, sizes, category])
>>> 
>>> df = pd.DataFrame(ts, index=timestamps, columns=cols)
>>> df.columns.names = ['Name', 'Size', 'Category']
>>> df.index.name = 'Time'
>>> 
>>> print(df)
Name             C001        C002        C003       C004
Size       36.719201   732.278278  795.755755 551.383120
Category   Commerical Residential Residential      Other
Time                                                    
2011-01-02   0.108720    0.018492    0.057233   0.694548
2011-01-09   0.959845    0.968857    0.422210   0.975767
2011-01-16   0.709676    0.119963    0.004481   0.830328
2011-01-23   0.084271    0.535408    0.209943   0.668001
2011-01-30   0.626125    0.052301    0.212636   0.995429
2011-02-06   0.376399    0.199327    0.482884   0.632472
2011-02-13   0.302807    0.353679    0.599427   0.993996
2011-02-20   0.185445    0.005769    0.755981   0.923540
2011-02-27   0.109611    0.994292    0.873782   0.542741
2011-03-06   0.561404    0.778414    0.595238   0.082001
2011-03-13   0.056986    0.869344    0.459753   0.450071
2011-03-20   0.261320    0.675317    0.603043   0.371950
2011-03-27   0.890803    0.061619    0.831677   0.801890
2011-04-03   0.498199    0.846559    0.370336   0.225477
2011-04-10   0.248914    0.693038    0.145255   0.233058
2011-04-17   0.621441    0.683213    0.048944   0.650139
2011-04-24   0.459869    0.055751    0.912097   0.457605
2011-05-01   0.814447    0.780415    0.184241   0.429139
2011-05-08   0.586905    0.209121    0.428080   0.246584
2011-05-15   0.754021    0.909181    0.846984   0.948835
2011-05-22   0.513610    0.203925    0.338072   0.596325
2011-05-29   0.497080    0.557908    0.916812   0.680242
2011-06-05   0.646791    0.641024    0.399427   0.308346
2011-06-12   0.573922    0.539285    0.098703   0.461480
2011-06-19   0.062978    0.939339    0.713087   0.380326
2011-06-26   0.422484    0.109185    0.459734   0.800468
2011-07-03   0.962368    0.632361    0.388565   0.503425
2011-07-10   0.802551    0.261161    0.590494   0.526307
2011-07-17   0.261447    0.686405    0.636970   0.622476
2011-07-24   0.634331    0.630028    0.069925   0.504036
...               ...         ...         ...        ...
2012-05-06   0.185331    0.375717    0.658463   0.697377
2012-05-13   0.273510    0.665318    0.756944   0.083542
2012-05-20   0.895984    0.850881    0.680869   0.987420
2012-05-27   0.450593    0.262195    0.458893   0.199141
2012-06-03   0.696102    0.332312    0.419764   0.338074
2012-06-10   0.113108    0.167605    0.812625   0.329429
2012-06-17   0.527418    0.087454    0.868973   0.744649
2012-06-24   0.977674    0.831538    0.410719   0.598423
2012-07-01   0.577802    0.141307    0.310356   0.276271
2012-07-08   0.772117    0.288240    0.820701   0.548857
2012-07-15   0.699628    0.467952    0.429433   0.304482
2012-07-22   0.782641    0.337854    0.561191   0.572241
2012-07-29   0.010225    0.962770    0.793041   0.166877
2012-08-05   0.895516    0.628526    0.782264   0.908301
2012-08-12   0.787210    0.698185    0.255306   0.741693
2012-08-19   0.042833    0.556469    0.165885   0.408108
2012-08-26   0.942076    0.377714    0.927170   0.119004
2012-09-02   0.567978    0.007891    0.777752   0.869950
2012-09-09   0.120134    0.417996    0.328654   0.484447
2012-09-16   0.833769    0.946456    0.594471   0.569707
2012-09-23   0.515544    0.090017    0.344200   0.498175
2012-09-30   0.419152    0.315412    0.683195   0.498630
2012-10-07   0.879582    0.958591    0.531812   0.051948
2012-10-14   0.488241    0.683242    0.096560   0.197295
2012-10-21   0.425213    0.279539    0.476436   0.492512
2012-10-28   0.238334    0.836782    0.901589   0.132700
2012-11-04   0.030562    0.797666    0.238895   0.550427
2012-11-11   0.875454    0.973046    0.457116   0.154175
2012-11-18   0.557967    0.895320    0.478239   0.448102
2012-11-25   0.075152    0.047344    0.650615   0.293129

[100 rows x 4 columns]

I am new to Python and trying to create something equivalent to Matlab's "cell array". Lets say I have 100 customers index 'C001', 'C002' etc. and I have different data for each customer:

  • Size of premises in square meters [real number]
  • categorical data showing whether they are 'commercial', 'residential' or 'other'
  • hourly time series of their electricity consumption in 2014 i.e. datetime-indexed array of 8760 real values

What is the best way to buildsuch a dataset in Python 2.7 that combines single values, categorical data and time-index arrays? I am trying to use pandas for this but no success so far.

Thank you very much in advance

解决方案

The equivalent of a MATLAB cell array is a numpy object array. However, these are rarely used because they are rarely what you want in practice. In most cases where someone would use a Cell in MATLAB, a list or nested list would suffice:

>>> a = [obj1, obj2, obj, obj4]
>>> b = [[obj1, obj2], [obj3, obj4]]

However, that is not what you want to do in your case. Your question is a classic example of X Y problem. You are asking how implement a particular solution to your problem, rather than asking how to solve the problem itself. Python can do a lot of things MATLAB can't, so trying to make Python behave like MATLAB will often result in sub-optimal solutions.

In this case, what you want is a pandas DataFrame. It is nothing at all like a MATLAB cell array, but fits your data set much better. You can use a MultiIndex to store the parameters, and columns to store the time series data. This allows you to index by name, size, category, date, etc. You can calculate, for example, the mean energy usage for each category of property in the third quarter for properties over 500 square meters in just one line of code.

So here is an example how you could structure such data:

>>> names = ['C001', 'C002', 'C003', 'C004']
>>> sizes = np.abs(np.random.random(4))*1000
>>> category = ['Commerical', 'Residential', 'Residential', 'Other']
>>> ts = np.random.random([100, 4])
>>> timestamps = pd.date_range('1/1/2011', periods=100, freq='W') 
>>> 
>>> cols = pd.MultiIndex.from_arrays([names, sizes, category])
>>> 
>>> df = pd.DataFrame(ts, index=timestamps, columns=cols)
>>> df.columns.names = ['Name', 'Size', 'Category']
>>> df.index.name = 'Time'
>>> 
>>> print(df)
Name             C001        C002        C003       C004
Size       36.719201   732.278278  795.755755 551.383120
Category   Commerical Residential Residential      Other
Time                                                    
2011-01-02   0.108720    0.018492    0.057233   0.694548
2011-01-09   0.959845    0.968857    0.422210   0.975767
2011-01-16   0.709676    0.119963    0.004481   0.830328
2011-01-23   0.084271    0.535408    0.209943   0.668001
2011-01-30   0.626125    0.052301    0.212636   0.995429
2011-02-06   0.376399    0.199327    0.482884   0.632472
2011-02-13   0.302807    0.353679    0.599427   0.993996
2011-02-20   0.185445    0.005769    0.755981   0.923540
2011-02-27   0.109611    0.994292    0.873782   0.542741
2011-03-06   0.561404    0.778414    0.595238   0.082001
2011-03-13   0.056986    0.869344    0.459753   0.450071
2011-03-20   0.261320    0.675317    0.603043   0.371950
2011-03-27   0.890803    0.061619    0.831677   0.801890
2011-04-03   0.498199    0.846559    0.370336   0.225477
2011-04-10   0.248914    0.693038    0.145255   0.233058
2011-04-17   0.621441    0.683213    0.048944   0.650139
2011-04-24   0.459869    0.055751    0.912097   0.457605
2011-05-01   0.814447    0.780415    0.184241   0.429139
2011-05-08   0.586905    0.209121    0.428080   0.246584
2011-05-15   0.754021    0.909181    0.846984   0.948835
2011-05-22   0.513610    0.203925    0.338072   0.596325
2011-05-29   0.497080    0.557908    0.916812   0.680242
2011-06-05   0.646791    0.641024    0.399427   0.308346
2011-06-12   0.573922    0.539285    0.098703   0.461480
2011-06-19   0.062978    0.939339    0.713087   0.380326
2011-06-26   0.422484    0.109185    0.459734   0.800468
2011-07-03   0.962368    0.632361    0.388565   0.503425
2011-07-10   0.802551    0.261161    0.590494   0.526307
2011-07-17   0.261447    0.686405    0.636970   0.622476
2011-07-24   0.634331    0.630028    0.069925   0.504036
...               ...         ...         ...        ...
2012-05-06   0.185331    0.375717    0.658463   0.697377
2012-05-13   0.273510    0.665318    0.756944   0.083542
2012-05-20   0.895984    0.850881    0.680869   0.987420
2012-05-27   0.450593    0.262195    0.458893   0.199141
2012-06-03   0.696102    0.332312    0.419764   0.338074
2012-06-10   0.113108    0.167605    0.812625   0.329429
2012-06-17   0.527418    0.087454    0.868973   0.744649
2012-06-24   0.977674    0.831538    0.410719   0.598423
2012-07-01   0.577802    0.141307    0.310356   0.276271
2012-07-08   0.772117    0.288240    0.820701   0.548857
2012-07-15   0.699628    0.467952    0.429433   0.304482
2012-07-22   0.782641    0.337854    0.561191   0.572241
2012-07-29   0.010225    0.962770    0.793041   0.166877
2012-08-05   0.895516    0.628526    0.782264   0.908301
2012-08-12   0.787210    0.698185    0.255306   0.741693
2012-08-19   0.042833    0.556469    0.165885   0.408108
2012-08-26   0.942076    0.377714    0.927170   0.119004
2012-09-02   0.567978    0.007891    0.777752   0.869950
2012-09-09   0.120134    0.417996    0.328654   0.484447
2012-09-16   0.833769    0.946456    0.594471   0.569707
2012-09-23   0.515544    0.090017    0.344200   0.498175
2012-09-30   0.419152    0.315412    0.683195   0.498630
2012-10-07   0.879582    0.958591    0.531812   0.051948
2012-10-14   0.488241    0.683242    0.096560   0.197295
2012-10-21   0.425213    0.279539    0.476436   0.492512
2012-10-28   0.238334    0.836782    0.901589   0.132700
2012-11-04   0.030562    0.797666    0.238895   0.550427
2012-11-11   0.875454    0.973046    0.457116   0.154175
2012-11-18   0.557967    0.895320    0.478239   0.448102
2012-11-25   0.075152    0.047344    0.650615   0.293129

[100 rows x 4 columns]

这篇关于什么是Matlab细胞阵列的等效物?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆