通过传入要匹配的值列表来过滤掉 DataFrame (JSON) 中的嵌套数组条目 [英] Filtering out nested array entries in a DataFrame (JSON) by passing in a list of values to match against
问题描述
我在 DataFrame 中读取了一个巨大的文件,其中每一行都包含一个 JSON 对象,如下所示:
I read in a DataFrame with a huge file holding on each line of it a JSON object as follows:
{
"userId": "12345",
"vars": {
"test_group": "group1",
"brand": "xband"
},
"modules": [
{
"id": "New"
},
{
"id": "Default"
},
{
"id": "BestValue"
},
{
"id": "Rating"
},
{
"id": "DeliveryMin"
},
{
"id": "Distance"
}
]
}
我想将一个模块 id-s 列表传递给一个方法,并清除所有不属于该模块 id-s 列表的项目.它应该删除所有其他模块,它的 id 不等于传入列表中的任何值.
I would like to pass in to a method a list of module id-s and clear out all items, which don't make part of that list of module id-s. It should remove all other modules, which's id is not equal to any of the values from the passed in list.
你有解决办法吗?
推荐答案
如您所知 根据条件删除 DataFrame (JSON) 中的嵌套数组条目 读取 json
文件的方式和操作 modules
列,其中 schema
为
As you know from Deleting nested array entries in a DataFrame (JSON) on a condition the way of reading your json
file and manipulation of modules
column which has schema
of
root
|-- modules: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
表示modules
是struct[String]
的集合.对于当前的要求,您必须将 Array[struct[String]]
转换为 Array[String]
which says that modules
is a collection of struct[String]
. For the current requirement you will have to convert the Array[struct[String]]
to Array[String]
val finaldf = df.withColumn("modules", explode($"modules.id"))
.groupBy("userId", "vars").agg(collect_list("modules").as("modules"))
下一步将定义一个 udf
函数为
Next step would be define a udf
function as
def contains = udf((list: mutable.WrappedArray[String]) => {
val validModules = ??? //your array definition here for example : Array("Default", "BestValue")
list.filter(validModules.contains(_))
})
然后将 udf
函数调用为
finaldf.withColumn("modules", contains($"modules")).show(false)
应该是这样.希望回答对您有所帮助.
That should be it. I hope the answer is helpful.
这篇关于通过传入要匹配的值列表来过滤掉 DataFrame (JSON) 中的嵌套数组条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!