Deedle Frame.mapRows如何正确使用它以及如何正确构建对象系列 [英] Deedle Frame.mapRows how to properly use it and how to construct objectseries properly

查看:241
本文介绍了Deedle Frame.mapRows如何正确使用它以及如何正确构建对象系列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我还注意到Deedle mapRows函数有些奇怪,我无法解释:

  let col1 = Series.ofObservations [1 => ; 10.0; 2 => System.Double.NaN; 3 => System.Double.NaN; 4 => 10.0; 5 => System.Double.NaN; 6 => 10.0; ] 

let col2 = Series.ofObservations [1 => 9.0; 2 => 5.5; 3 => System.Double.NaN; 4 => 9.0; 5 =>系统。 Double.NaN; 6 => 9.0; ]
let f1 = Frame.ofColumns [c1=> COL1; c2=> col2]
let f2 = f1 |> Frame.mapRows(fun k r - > r)|> Frame.ofRows
let f3 = f1 |> Frame.mapRows(fun k r - > let x = r.Get(c1);
let y = r.Get(c2);
r)|> Frame.ofRows


val f1:Frame< int,string> =

c1 c2
1 - > 10 9
2 - > <缺失> 5.5
3 - > <缺失> <缺失>
4 - > 10 9
5 - > <缺失> <缺失>
6 - > 10 9

val f2:Frame< int,string> =

c1 c2
1 - > 10 9
2 - > <缺失> 5.5
3 - > <缺失> <缺失>
4 - > 10 9
5 - > <缺失> <缺失>
6 - > 10 9

val f3:Frame< int,string> =

c1 c2
1 - > 10 9
2 - > <缺失> <缺失>
3 - > <缺失> <缺失>
4 - > 10 9
5 - > <缺失> <缺失>
6 - > 10 9

f3与f2有什么不同?我试图用这个mapRows函数来做基于行的过程并产生一个对象然后mapRows可以创建一个对象具有相同行键的新框架。该过程必须是基于行的,因为列值需要根据自己的值和相邻值进行更新。



不能直接使用列到列进行计算计算会根据行值进行更改。



欣赏任何建议



更新

自从发布原始问题以来,我已经在C#中使用了Deedle。令我惊讶的是,基于行的计算在C#中非常容易,C#Frame.rows函数处理缺失值的方式与F#mapRows函数非常不同。以下是我用来尝试真正逻辑的一个非常基本的例子。对于任何正在寻找类似应用程序的人来说,这可能是有用的:

需要注意的是:
1. rows函数没有删除行而两列的价值都缺失
2.平均函数足够智能,可根据可用数据点计算均值。

 使用System.Text; 
使用System.Threading.Tasks;
使用Deedle;

命名空间TestDeedleRowProcessWithMissingValues
{
class Program
{
static void Main(string [] args)
{
var s1 =新的SeriesBuilder< DateTime,double>(){
{DateTime.Today.Date.AddDays(-5),10.0},
{DateTime.Today.Date.AddDays(-4),9.0} ,
{DateTime.Today.Date.AddDays(-3),8.0},
{DateTime.Today.Date.AddDays(-2),double.NaN},
{DateTime。 Today.Date.AddDays(-1),6.0},
{DateTime.Today.Date.AddDays(-0),5.0}
} .Series;

var s2 = new SeriesBuilder< DateTime,double>(){
{DateTime.Today.Date.AddDays(-5),10.0},
{DateTime.Today。 Date.AddDays(-4),double.NaN},
{DateTime.Today.Date.AddDays(-3),8.0},
{DateTime.Today.Date.AddDays(-2), double.NaN},
{DateTime.Today.Date.AddDays(-1),6.0}
} .Series;

var f = Frame.FromColumns(new KeyValuePair< string,Series< DateTime,double>> [] {
KeyValue.Create(s1,s1),
KeyValue.Create(s2,s2)
});

s1.Print();
f.Print();


f.Rows.Select(kvp => kvp.Value).Print();

// 29/05/2015 12:00:00 AM - >系列[s1 => 10; s2 => 10]
// 30/05/2015 12:00:00 AM - >系列[s1 => 9; s2 => < missing>]
// 31/05/2015 12:00:00 AM - >系列[s1 => 8; s2 => 8]
// 1/06/2015 12:00:00 AM - >系列[s1 => <缺少取代; s2 => < missing>]
// 2/06/2015 12:00:00 AM - >系列[s1 => 6; s2 => 6]
// 3/06/2015 12:00:00 AM - >系列[s1 => 5; s2 => < missing>]


f.Rows.Select(kvp => kvp.Value.As< double>()。Mean())。

// 29/05/2015 12:00:00 AM - > 10
// 30/05/2015 12:00:00 AM - > 9
// 31/05/2015 12:00:00 AM - > 8
// 1/06/2015 12:00:00 AM - > <缺失>
// 2/06/2015 12:00:00 AM - > 6
// 3/06/2015 12:00:00 AM - > 5


//Console.ReadLine();




解决方案

之所以 f3 不同于 mapRows 处理缺失值的方式。



使用 r.Get(C1)访问一个值时,您可以获取该值,或者获得 ValueMissingException mapRows 函数处理这个异常并将整行标记为缺失。如果你只写:

  let f3 = f1 |> Frame.mapRows(fun kr  - > 
let x = r.Get(c1);
let y = r.Get(c2);
r)

然后结果是:

  1  - >系列[c1 => 10; c2 => 9] 
2 - > <缺失>
3 - > <缺失>
4 - >系列[c1 => 10; c2 => 9]
5 - > <缺失>
6 - >系列[c1 => 10; c2 => 9]

如果您想编写一个返回帧的函数(从数据读取原始行并生成新行),您可以执行如下操作:

  f1 
|> Frame.mapRows(fun kr - >
[X=> OptionalValue.asOption(r.TryGet(c1));
Y=> OptionalValue.asOption(r。 TryGet(c2))]
|> Series.ofOptionalObservations)
|> Frame.ofRows


I also noticed something strange about Deedle mapRows function i cant explain:

let col1 =       Series.ofObservations[1=>10.0;2=>System.Double.NaN;3=>System.Double.NaN;4=>10.0;5=>System.Double.NaN;6=>10.0; ]

let col2 = Series.ofObservations[1=>9.0;2=>5.5;3=>System.Double.NaN;4=>9.0;5=>System.Double.NaN;6=>9.0; ]
let f1 = Frame.ofColumns [ "c1" => col1; "c2" => col2 ]
let f2 = f1 |> Frame.mapRows (fun k r -> r) |> Frame.ofRows
let f3 = f1 |> Frame.mapRows (fun k r -> let x = r.Get("c1"); 
                                          let y = r.Get("c2");  
                                          r) |> Frame.ofRows


val f1 : Frame<int,string> =

      c1        c2        
 1 -> 10        9         
 2 -> <missing> 5.5       
 3 -> <missing> <missing> 
 4 -> 10        9         
 5 -> <missing> <missing> 
 6 -> 10        9         

 val f2 : Frame<int,string> =

      c1        c2        
 1 -> 10        9         
 2 -> <missing> 5.5       
 3 -> <missing> <missing> 
 4 -> 10        9         
 5 -> <missing> <missing> 
 6 -> 10        9         

 val f3 : Frame<int,string> =

      c1        c2        
 1 -> 10        9         
 2 -> <missing> <missing> 
 3 -> <missing> <missing> 
 4 -> 10        9         
 5 -> <missing> <missing> 
 6 -> 10        9         

How can f3 has a different value than f2? all i did with f3 is to get value from the obejectseries.

I am trying to use this mapRows function to do row based process and produce a objectseries then mapRows can create a new frame with the same row keys. The process has to be row based as the column value needs to be updated based on its own value and neighboring value.

The calculation cant be done using column to column directly as the calculation changes based on the row value.

Appreciate any advice

Update

Since the original question was posted, I have since used Deedle in C#. To my surprise the row based calculation is very easy in C# and the way C# Frame.rows function handle missing values are very different than F# mapRows function. The following is a very basic example i used to try and true the logic. it might be useful to anyone who is searching for similar application:

Things to pay attention to are: 1. The rows function didn't remove the row while both columns' value are missing 2. The mean function is smart enough to calculate mean based on available data point.

using System.Text;
using System.Threading.Tasks;
using Deedle;

namespace TestDeedleRowProcessWithMissingValues
{
    class Program
    {
        static void Main(string[] args)
        {
            var s1 = new SeriesBuilder<DateTime, double>(){
                 {DateTime.Today.Date.AddDays(-5),10.0},
                 {DateTime.Today.Date.AddDays(-4),9.0},
                 {DateTime.Today.Date.AddDays(-3),8.0},
                 {DateTime.Today.Date.AddDays(-2),double.NaN},
                 {DateTime.Today.Date.AddDays(-1),6.0},
                 {DateTime.Today.Date.AddDays(-0),5.0}
             }.Series;

            var s2 = new SeriesBuilder<DateTime, double>(){
                 {DateTime.Today.Date.AddDays(-5),10.0},
                 {DateTime.Today.Date.AddDays(-4),double.NaN},
                 {DateTime.Today.Date.AddDays(-3),8.0},
                 {DateTime.Today.Date.AddDays(-2),double.NaN},
                 {DateTime.Today.Date.AddDays(-1),6.0}                 
             }.Series;

            var f = Frame.FromColumns(new KeyValuePair<string, Series<DateTime, double>>[] { 
                KeyValue.Create("s1",s1),
                KeyValue.Create("s2",s2)
            });

            s1.Print();
            f.Print();


            f.Rows.Select(kvp => kvp.Value).Print();

//            29/05/2015 12:00:00 AM -> series [ s1 => 10; s2 => 10]
//            30/05/2015 12:00:00 AM -> series [ s1 => 9; s2 => <missing>]
//            31/05/2015 12:00:00 AM -> series [ s1 => 8; s2 => 8]
//            1/06/2015 12:00:00 AM  -> series [ s1 => <missing>; s2 => <missing>]
//            2/06/2015 12:00:00 AM  -> series [ s1 => 6; s2 => 6]
//            3/06/2015 12:00:00 AM  -> series [ s1 => 5; s2 => <missing>]


            f.Rows.Select(kvp => kvp.Value.As<double>().Mean()).Print();

//            29/05/2015 12:00:00 AM -> 10
//            30/05/2015 12:00:00 AM -> 9
//            31/05/2015 12:00:00 AM -> 8
//            1/06/2015 12:00:00 AM  -> <missing>
//            2/06/2015 12:00:00 AM  -> 6
//            3/06/2015 12:00:00 AM  -> 5


            //Console.ReadLine();
        }
    }
}

解决方案

The reason why f3 differs follows from the way mapRows handles missing values.

When you're accessing a value using r.Get("C1"), you either get the value or you get a ValueMissingException. The mapRows function handles this exception and marks the entire row as missing. If you write just:

let f3 = f1 |> Frame.mapRows (fun k r -> 
  let x = r.Get("c1"); 
  let y = r.Get("c2");  
  r)

Then the result will be:

1 -> series [ c1 => 10; c2 => 9] 
2 -> <missing>                   
3 -> <missing>                   
4 -> series [ c1 => 10; c2 => 9] 
5 -> <missing>                   
6 -> series [ c1 => 10; c2 => 9] 

If you want to write a function that returns the frame as it was (reading the data from original rows and producing new rows), you could do something like:

f1 
|> Frame.mapRows (fun k r -> 
  [ "X" => OptionalValue.asOption(r.TryGet("c1")); 
    "Y" => OptionalValue.asOption(r.TryGet("c2")) ] 
  |> Series.ofOptionalObservations )
|> Frame.ofRows

这篇关于Deedle Frame.mapRows如何正确使用它以及如何正确构建对象系列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆