通过Spark DataFrame中的数组值进行过滤 [英] Filter by array value in Spark DataFrame

查看：701 发布时间：2017/8/7 3:33:24 scala elasticsearch apache-spark spark-dataframe

本文介绍了通过Spark DataFrame中的数组值进行过滤的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用带有弹性搜索的apache spark 1.5数据框，我试图从包含ids列表（数组）的列中过滤id。

例如弹性搜索列的映射如下所示：

  {
people：{
properties ：{
artist：{
properties：{
id：{
index：not_analyzed，
type字符串
}，
name：{
type：string，
index：not_analyzed，
} 
 } 
} 
} 
}

示例数据格式将就像以下

  {
people：{
artist：{
 [
 {
 id：153，
name：Tom
}，
 {
id：15389，
name Cok
} 
] 
} 
} 
}，
 {
people：{
artist ：{
 [
 {
id：369，
name：Carl
}，
 {
 id：15389，
name：Cok
}，
 {
id：698，
name Sol
} 
] 
} 
} 
}

在火花中我尝试这样：

  val peopleId = 152 
 val dataFrame = sqlContext。读
 .format（org.elasticsearch.spark.sql ）
 .load（index / type）
 
 dataFrame.filter（dataFrame（people.artist.id）。contains（peopleId））
 .select（我得到了所有包含152的id，例如1523， 152978但不仅仅是id == 152 
 
 
 然后我试过
  dataFrame.filter（dataFrame（people.artist.id）。等于（peopleId））
 .select（people.artist.id）
  
我空了，我明白为什么，这是因为我有数组的people.artist.id 
 
 
 任何人告诉我如何清除ids列表？
解决方案
在Spark 1.5+中，您可以使用 array_contains  function：
  df.where（array_contains（$people.artist.id 153））
  
如果您使用较早的版本，可以尝试这样的UDF：
  val containsId = udf（
（rs：Seq [Row]，v：Strin g）=> rs.map（_。getAs [String]（id））。exists（_ == v））
 df.where（containsId（$people.artist，lit（153））） 
  
 
I am using apache spark 1.5 dataframe with elasticsearch, I am try to filter id from a column that contains a list(array) of ids.

For example the mapping of elasticsearch column is looks like this:
    {
        "people":{
            "properties":{
                "artist":{
                   "properties":{
                      "id":{
                         "index":"not_analyzed",
                         "type":"string"
                       },
                       "name":{
                          "type":"string",
                          "index":"not_analyzed",
                       }
                   }
               }
          }
    }
The example data format will be like following
{
    "people": {
        "artist": {
            [
                  {
                       "id": "153",
                       "name": "Tom"
                  },
                  {
                       "id": "15389",
                       "name": "Cok"
                  }
            ]
        }
    }
},
{
    "people": {
        "artist": {
            [
                  {
                       "id": "369",
                       "name": "Carl"
                  },
                  {
                       "id": "15389",
                       "name": "Cok"
                  },
                 {
                       "id": "698",
                       "name": "Sol"
                  }
            ]
        }
    }
}
In spark I try this:
val peopleId  = 152
val dataFrame = sqlContext.read
     .format("org.elasticsearch.spark.sql")
     .load("index/type")

dataFrame.filter(dataFrame("people.artist.id").contains(peopleId))
    .select("people_sequence.artist.id")
I got all the id that is contains 152, for example 1523 , 152978 but not only id == 152

Then I tried 
dataFrame.filter(dataFrame("people.artist.id").equalTo(peopleId))
    .select("people.artist.id")
I get empty, I understand why, it's because I have array of people.artist.id

Can anyone tell me how to filter when I have list of ids ?
 解决方案 
In Spark 1.5+ you can use array_contains function:
df.where(array_contains($"people.artist.id", "153"))
If you use an earlier version you can try an UDF  like this:
val containsId = udf(
  (rs: Seq[Row], v: String) => rs.map(_.getAs[String]("id")).exists(_ == v))
df.where(containsId($"people.artist", lit("153")))


                        
这篇关于通过Spark DataFrame中的数组值进行过滤的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

通过Spark DataFrame中的数组值进行过滤 [英] Filter by array value in Spark DataFrame

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

通过Spark DataFrame中的数组值进行过滤 [英] Filter by array value in Spark DataFrame

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭