Scala:什么是计算标准偏差的通用方法 [英] Scala: What is the generic way to calculate standard deviation

查看:188
本文介绍了Scala:什么是计算标准偏差的通用方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇如何编写通用方法来计算scala中的标准偏差和方差.我有一种用于计算均值的通用方法(从此处窃取:编写通用均值函数在Scala中)

I am curious as to how I can write a generic method to calculate standard deviation and variance in scala. I have a generic method for calculating mean (stolen from here: Writing a generic mean function in Scala)

我试图将均值计算转换为标准偏差和方差,但对我来说似乎是错误的.泛型目前远远超出了我在Scala编程中的技能.

I have tried to convert the mean calculation to get standard deviation and variance but it looks wrong to me. Generics are WAY beyond my skills in Scala programming at the moment.

计算均值,标准差和方差的代码是:

The code for calculating the mean, standard deviation and variance is this:

package ca.mikelavender

import scala.math.{Fractional, Integral, Numeric, _}

package object genericstats {

  def stdDev[T: Numeric](xs: Iterable[T]): Double = sqrt(variance(xs))

  def variance[T: Numeric](xs: Iterable[T]): Double = implicitly[Numeric[T]] match {
    case num: Fractional[_] => {
      val avg = mean(xs)
      num.toDouble(
        xs.foldLeft(num.zero)((b, a) =>
          num.plus(b, num.times(num.minus(a, avg), num.minus(a, avg))))) /
        xs.size
    }
    case num: Integral[_] => {
      val avg = mean(xs)
      num.toDouble(
        xs.foldLeft(num.zero)((b, a) =>
          num.plus(b, num.times(num.minus(a, avg), num.minus(a, avg))))) /
        xs.size
    }
  }

  /**
    * https://stackoverflow.com/questions/6188990/writing-a-generic-mean-function-in-scala
    */
  def mean[T: Numeric](xs: Iterable[T]): T = implicitly[Numeric[T]] match {
    case num: Fractional[_] => import num._; xs.sum / fromInt(xs.size)
    case num: Integral[_] => import num._; xs.sum / fromInt(xs.size)
    case _ => sys.error("Undivisable numeric!")
  }

}

我觉得不需要使用方差法中的匹配项,或者可能更优雅.也就是说,代码的重复性在我看来是非常错误的,我应该能够只使用匹配项来获取数字类型,然后将其传递给执行计算的单个代码块.

I feel like the match case in the variance method is not needed or could be more elegant. That is, the duplicity of the code seems very wrong to me and that I should be able to just use the match to get the numeric type and then pass that on to a single block of code that does the calculation.

我不喜欢的另一件事是它总是返回Double.我觉得它应该返回相同的输入数字类型,至少对于小数而言.

The other thing I don't like is that it always returns a Double. I feel like it should return the same input numeric type, at least for the Fractional values.

那么,关于如何改进代码并使它更美观,有什么建议吗?

So, are there any suggestions on how to improve the code and make it prettier?

推荐答案

Numeric这样的类型类的目标是为该类型提供一组操作,以便您可以编写可通用处理任何类型的代码.有一个类型类的实例. Numeric提供一组操作,其子类IntegralFractional另外提供更具体的操作(但它们也表征较少的类型).如果您不需要这些更具体的操作,则可以简单地在Numeric级别上进行操作,但是不幸的是,在这种情况下,您可以这样做.

The goal of a type class like Numeric is to provide a set of operations for a type so that you can write code that works generically on any types that have an instance of the type class. Numeric provides one set of operations, and its subclasses Integral and Fractional additionally provide more specific ones (but they also characterize fewer types). If you don't need these more specific operations, you can simply work at the level of Numeric, but unfortunately in this case you do.

让我们从mean开始.这里的问题是,除法对于整数和分数类型意味着不同的事情,而对于仅Numeric的类型根本没有提供除法. Daniel的已链接的答案通过调度Numeric实例的运行时类型来解决此问题,并且如果实例不是FractionalIntegral,只会崩溃(在运行时).

Let's start with mean. The problem here is that division means different things for integral and fractional types, and isn't provided at all for types that are only Numeric. The answer you've linked from Daniel gets around this issue by dispatching on the runtime type of the Numeric instance, and just crashing (at runtime) if the instance isn't either a Fractional or Integral.

我将不同意Daniel(或至少五年前的Daniel),并说这并不是一个好方法-既要写出真正的区别,又要同时扔掉很多类型安全性.我认为有三种更好的解决方案.

I'm going to disagree with Daniel (or at least Daniel five years ago) and say this isn't really a great approach—it's both papering over a real difference and throwing out a lot of type safety at the same time. There are three better solutions in my view.

您可能会认为,取平均值对整数类型没有意义,因为整数除法会丢失结果的分数部分,而只为分数类型提供结果:

You might decide that taking the mean isn't meaningful for integral types, since integral division loses the fractional part of the result, and only provide it for fractional types:

def mean[T: Fractional](xs: Iterable[T]): T = {
  val T = implicitly[Fractional[T]]

  T.div(xs.sum, T.fromInt(xs.size))
}

或者使用漂亮的隐式语法:

Or with the nice implicit syntax:

def mean[T: Fractional](xs: Iterable[T]): T = {
  val T = implicitly[Fractional[T]]
  import T._

  xs.sum / T.fromInt(xs.size)
}

最后一个句法要点:如果我发现必须写implicitly[SomeTypeClass[A]]以获得对类型类实例的引用,则我倾向于对上下文绑定([A: SomeTypeClass]部分)进行糖化处理,以进行一些清理:

One last syntactic point: if I find I have to write implicitly[SomeTypeClass[A]] to get a reference to a type class instance, I tend to desugar the context bound (the [A: SomeTypeClass] part) to clean things up a bit:

def mean[T](xs: Iterable[T])(implicit T: Fractional[T]): T =
  T.div(xs.sum, T.fromInt(xs.size))

这完全是一个品味问题.

This is entirely a matter of taste, though.

您还可以使mean返回具体的分数类型,例如Double,并在执行操作之前将Numeric值简单地转换为该类型:

You could also make mean return a concrete fractional type like Double, and simply convert the Numeric values to that type before performing the operation:

def mean[T](xs: Iterable[T])(implicit T: Numeric[T]): Double =
  T.toDouble(xs.sum) / xs.size

或者,等效地,但使用NumerictoDouble语法:

Or, equivalently but with the toDouble syntax for Numeric:

import Numeric.Implicits._

def mean[T: Numeric](xs: Iterable[T]): Double = xs.sum.toDouble / xs.size

这为整数和小数类型都提供了正确的结果(精度可达Double),但以使操作的通用性降低为代价.

This provides correct results for both integral and fractional types (up to the precision of Double), but at the expense of making your operation less generic.

最后,您可以创建一个新的类型类,该类为FractionalIntegral提供共享的除法运算:

Lastly you could create a new type class that provides a shared division operation for Fractional and Integral:

trait Divisible[T] {
  def div(x: T, y: T): T
}

object Divisible {
  implicit def divisibleFromIntegral[T](implicit T: Integral[T]): Divisible[T] =
    new Divisible[T] {
      def div(x: T, y: T): T = T.quot(x, y)
    }

  implicit def divisibleFromFractional[T](implicit T: Fractional[T]): Divisible[T] =
    new Divisible[T] {
      def div(x: T, y: T): T = T.div(x, y)
    }
}

然后:

def mean[T: Numeric: Divisible](xs: Iterable[T]): T =
  implicitly[Divisible[T]].div(xs.sum, implicitly[Numeric[T]].fromInt(xs.size))

从本质上讲,它是原始mean的更原则上的版本-代替在运行时在子类型上分派,而是使用新的类型类来表征子类型.代码更多,但是没有运行时错误的可能性(除非xs为空,等等,但这是所有这些方法都遇到的一个正交问题).

This is essentially a more principled version of the original mean—instead of dispatching on subtype at runtime, you're characterizing the subtypes with a new type class. There's more code, but no possibility of runtime errors (unless of course xs is empty, etc., but that's an orthogonal problem that all of these approaches run into).

在这三种方法中,我可能会选择第二种,在您的情况下,这似乎特别合适,因为您的variancestdDev已经返回了Double.在那种情况下,整个事情看起来像这样:

Of these three approaches, I'd probably choose the second, which in your case seems especially appropriate since your variance and stdDev already return Double. In that case the entire thing would look like this:

import Numeric.Implicits._

def mean[T: Numeric](xs: Iterable[T]): Double = xs.sum.toDouble / xs.size

def variance[T: Numeric](xs: Iterable[T]): Double = {
  val avg = mean(xs)

  xs.map(_.toDouble).map(a => math.pow(a - avg, 2)).sum / xs.size
}

def stdDev[T: Numeric](xs: Iterable[T]): Double = math.sqrt(variance(xs))

...您就完成了.

在真实代码中,我可能会看似 Spire 之类的库,而不是使用标准库的类型课.

In real code I'd probably look at a library like Spire instead of using the standard library's type classes, though.

这篇关于Scala:什么是计算标准偏差的通用方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆