F#类型提供程序和数据处理 [英] F# Type Providers and Data Processing

查看:135
本文介绍了F#类型提供程序和数据处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在上一个问题中(使用异构数据(以静态类型的语言)),我问到F#如何处理数据分析中的标准任务,例如处理无类型的CSV文件.动态语言擅长处理诸如

In a previous question (Working with heterogenous data in a statically typed language), I asked about how F# handles standard tasks in data analysis, such as manipulating an untyped CSV file. Dynamic langauges excel at basic tasks like

data = load('income.csv')
data.log_income = log(income)

在F#中,最优雅的方法似乎是问号(?)运算符.不幸的是,在此过程中,我们丢失了静态类型输入,并且仍然需要在此处和此处进行类型注释.

In F#, the most elegant approach seems to be the question mark (?) operator. Unfortunately in the process we lose static typing and still need type annotations here and there.

F#最令人兴奋的未来功能之一是类型提供程序.在类型安全性损失最小的情况下,CSV类型提供程序可以通过动态检查文件来提供类型.

One of the most exciting future feature of F# are Type Providers. With minimal loss of type safety, a CSV type provider could provide types by dynamically examining the file.

但是数据分析通常不止于此.我们经常通过一系列操作来转换数据并创建新的数据集.我的问题是,如果我们主要操作数据,类型提供者可以提供帮助吗?例如:

But data analysis typically doesn't stop there. We often transform the data and create new datasets, via a pipeline of operations. My question is, can Type Providers help if we mostly manipulate data? For instance:

open CSV // Type provider
let data = CSV(file='income.csv') // Type provider magic (syntax?)
let log_income = log(data.income) // works!

这有效,但会污染全局名称空间.考虑添加列而不是创建新变量通常更自然.有什么办法吗?

This works but pollutes the global namespace. It is often more natural to think about adding a column, rather than creating a new variable. Is there some way to do?

let data.logIncome = log(data.income) // won't work, sadly.

当目标是创建新的派生数据或清理后的数据集时,类型提供程序是否可以避免使用(?)运算符?

Do type providers provide an escape from using the (?) operator when the goal is creating new derivative or cleaned-up datasets?

也许是这样的:

let newdata = colBind data {logIncome = log(data.income)}  // ugly, does it work?

其他想法?

推荐答案

简短的回答为否",长回答为是"(但您不希望得到结果).要记住的关键是 F#是一种静态类型的语言,句号.

The short answer is no, the long answer is yes (but you wouldn't like the result). The key thing to remember is that F# is a statically typed language, full stop.

对于您提供的代码,newData具有什么类型?如果无法在编译时将其固定下来,则需要诉诸于Obj或从Obj进行强制转换.

For the code that you provided, what type does newData have? If it cannot be pinned down at compile-time, then you need to resort to casting to/from Obj.

// newdata MUST have a static type, even if obj
let newdata = colBind data {logIncome = log(data.income)}  

想象colBind具有以下特征:

Imagine colBind has the following sinature:

val colBind: Thingey<'a> -> 'b -> Thingey2<'a, 'b>

这实际上可以以某种方式起作用,但是它不能普遍地起作用.因为最终您将需要一个在编译时不存在的类型.

That would actually work for a ways, but it wouldn't work universally. Because eventually you would need a type that would not exist at compile time.

F#类型提供程序允许您静态键入来自标准编译时环境之外的数据.但是,类型仍然是静态的.无法在运行时动态更改这些类型*.

F# type providers allow you to statically type data originating from outside of the standard compile-time environment. However, the types are still static. There is no way to alter those types dynamically at runtime*.

*您可以在运行时使用诸如以下的恶作剧来修改对象 DynamicObject .但是,一旦你 开始走那条迷路的路 静态类型的所有好处 语言(例如Intellisense). (这是首先使用F#的主要原因.)

*You can modify the object at runtime using shenanigans such as DynamicObject. However, once you start going down that path you lose all the benefits of a statically typed language such as Intellisense. (Which is a major reason for using F# in the first place.)

从概念上讲,您想要做的很简单. System.Data.DataTable 类型已经具有使用以下方式存储表格数据的概念:动态添加列的能力.但是由于添加的列的类型信息在编译时未知,因此必须将存储在这些列中的内容视为Obj并在运行时进行强制转换.

Conceptually, what you want to do is straight forward. The System.Data.DataTable type already has the notion of storing tabular data with the ability to dynamically add columns. But since the type information for added columns is not known at compile time it follows that things stored in those columns must be treated as Obj and casted at runtime.

这篇关于F#类型提供程序和数据处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆