构建一个功能以将检查添加到亚马逊deequ框架 [英] building a function to add checks to amazon deequ framework

查看:572
本文介绍了构建一个功能以将检查添加到亚马逊deequ框架的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用亚马逊deequ库,我试图构建一个带有3个参数的函数,即检查对象,一个告诉需要运行哪些约束的字符串以及另一个提供约束条件的字符串.我有一堆要从mysql表读取的检查.我的意图是遍历我从mysql表获得的所有检查,并使用上述功能构建检查对象,并在源数据帧上运行检查 这是亚马逊deequ的一个例子 https://towardsdatascience.com/automated-数据质量测试,使用Apache Spark-93bb1e2c5cd0

Using amazon deequ library I'm trying to build a function that takes 3 parameters, the check object, a string telling what constraint needs to be run and another string that provides the constraint criteria. I have a bunch of checks that I want to read from a mysql table. My intention is to iterate through all the checks that I get from the mysql table and build a check object using the function I described above and run the checks on a source dataframe Here a example of the amazon deequ https://towardsdatascience.com/automated-data-quality-testing-at-scale-using-apache-spark-93bb1e2c5cd0

所以函数调用看起来像这样,

So the function call looks something like this,

var _check = build_check_object_function(check_object, "hasSize", "10000")

此函数应将一个新的hasSize检查添加到check_object并将其返回.

This function should add a new hasSize check to the check_object and return that.

我遇到的问题是如何将hasSize字符串转换为hasSize函数.

The part where I'm stuck is how to translate the hasSize string to the hasSize function.

    var _check = Check(CheckLevel.Error, "Data Validation Check")
    val listOfFunctions= _check.getClass.getMethods.filter(!_.getName().contains('$'))
    for (function <- listOfFunctions) {
       if( function.getName().toLowerCase().contains(row(2).asInstanceOf[String].toLowerCase())) {
         _check = _check.function(row(3))
        }else{
            println("Not a match")}
        }

这是我得到的错误

<console>:38: error: value function is not a member of com.amazon.deequ.checks.Check
   if( function.getName().toLowerCase().contains(row(2).asInstanceOf[String].toLowerCase())) {_check = _check.function(row(3))                                                          

推荐答案

您可以使用运行时反射,也可以在数据库和deequ声明之间构建一个薄的转换层.

You can either use runtime reflection or build a thin translation layer between your database and the deequ declarations.

我建议您将数据库约束/检查字符串显式翻译为deequ声明,例如:

I would suggest you go with translating database constraint/check strings explicitly to deequ declarations, e.g.:

if (constraint == "hasSize") {
  // as Constraint
  Constraint.sizeConstraint(_ <= 10)
  // as Check
  Check(CheckLevel.Error, "name").hasSize(_ <= 10)
}

这篇关于构建一个功能以将检查添加到亚马逊deequ框架的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆