Spark SQL的where子句排除空值 [英] Spark SQL's where clause excludes null values

查看：842 发布时间：2020/9/4 20:25:37 sql apache-spark apache-spark-sql

本文介绍了Spark SQL的where子句排除空值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在Apache Spark sql上运行查询.第一个查询工作正常，但是第二个查询也删除了空值.

I am trying to run queries on Apache spark sql. The first query works fine, but the second query removes null values also.

代码:

def main(args: Array[String]) {

    val sc = new SparkContext("local[*]", "Spark")
    val sqlContext = new SQLContext(sc)

    val pageViewsDF = getDataframe(sc, sqlContext)

    println("RUNNING SQL QUERIES ")

    sqlContext.sql("select name , count(*) from pageviews_by_second group by name").show(10)

    sqlContext.sql("select name , count(*) from pageviews_by_second where name not in (\"Rose\") group by name").show(10)

  }

  def getDataframe(sc: SparkContext, sqlContext: SQLContext): DataFrame = {

    Logger.getLogger("org").setLevel(Level.OFF);
    Logger.getLogger("akka").setLevel(Level.OFF);

    val dataArray = List(List("David", null),
      List("David", null),
      List("Charlie", "23"),
      List("Rose", null),
      List("Ben", null),
      List("Harry", "43"),
      List(null, "25"),
      List(null, "21"),
      List("David", "15"),
      List("Rose", null),
      List("Alan", "26"))
    val separator = ","

    // Create an RDD
    val dataRDD = sc.parallelize(dataArray)

    // The schema is encoded in a string
    val header = "name,age"

    // Import Spark SQL data types and Row.
    import org.apache.spark.sql._

    // Generate the schema based on the string of schema
    val schema =
      StructType(
        header.split(separator).map { fieldName =>
          StructField(fieldName, StringType, true)
        })

    val rowRDD =
      dataRDD
        .map(p => Row(p(0), p(1)))

    // Apply the schema to the RDD.
    var df = sqlContext.createDataFrame(rowRDD, schema)

    df.registerTempTable("pageviews_by_second")

    df
  }

第一个查询的结果是:

+-------+---+
|   name|_c1|
+-------+---+
|   Alan|  1|
|    Ben|  1|
|  David|  3|
|Charlie|  1|
|   Rose|  2|
|  Harry|  1|
|   null|  2|
+-------+---+

第二个查询的输出:

+-------+---+
|   name|_c1|
+-------+---+
|   Alan|  1|
|    Ben|  1|
|  David|  3|
|Charlie|  1|
|  Harry|  1|
+-------+---+

在第二个查询中，我仅排除"Rose"，但也排除了"null".

In the second query I am excluding "Rose" only but "null" is also getting excluded .

如果我的查询错误，请帮助我进行正确的查询.

If my query is wrong please help me with the correct query.

推荐答案

发生这种情况是因为SQL中的NULL等效于未知".这意味着除IS NULL/IS NOT NULL之外，与NULL的任何比较均未定义并返回NULL.

It happens because NULL in SQL is equivalent to "unknown". It means that any comparison with NULL, other than IS NULL / IS NOT NULL is undefined and returns NULL.

case class Record(id: Integer, value: String)

val df = sc.parallelize(Seq(Record(1, "foo"), Record(2, null))).toDF
df.registerTempTable("df")

sqlContext.sql("""SELECT value = "foo" FROM df""").show
// +----+
// | _c0|
// +----+
// |true|
// |null|
// +----+

sqlContext.sql("""SELECT value != "foo" FROM df""").show
// +-----+
// |  _c0|
// +-----+
// |false|
// | null|
// +-----+

因为IN/NOT IN也是未定义的，所以:

Because of that IN / NOT IN is undefined as well:

sqlContext.sql("""SELECT value IN ("foo", "bar")  FROM df""").show
// +----+
// | _c0|
// +----+
// |true|
// |null|
// +----+

这是标准的SQL行为，正确实施SQL标准的系统应以相同的方式运行.如果要过滤并保留NULLs，则必须明确地设置它:

This is a standard SQL behavior and system that correctly implements SQL standard should behave the same way. If you to filter and keep NULLs you'll have to make it explicitly:

sqlContext.sql(
  """SELECT value IN ("foo", "bar") OR value IS NULL FROM df""").show
// +----+
// | _c0|
// +----+
// |true|
// |true|
// +----+

这篇关于Spark SQL的where子句排除空值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark SQL的where子句排除空值 [英] Spark SQL's where clause excludes null values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark SQL的where子句排除空值 [英] Spark SQL&#39;s where clause excludes null values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Spark SQL的where子句排除空值 [英] Spark SQL's where clause excludes null values

登录关闭