在U-SQL中解析json文件 [英] Parse json file in U-SQL

查看:96
本文介绍了在U-SQL中解析json文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用USQL解析以下Json文件,但始终会出错.

I'm trying to parse below Json file using USQL but keep getting error.

Json file@

{"dimBetType_SKey":1,"BetType_BKey":1,"BetTypeName":"Test1"}
{"dimBetType_SKey":2,"BetType_BKey":2,"BetTypeName":"Test2"}
{"dimBetType_SKey":3,"BetType_BKey":3,"BetTypeName":"Test3"}

下面是USQL脚本,我正在尝试从上面的文件中提取数据.

Below is the USQL script, I'm trying to extract the data from above file.

    REFERENCE ASSEMBLY [Newtonsoft.Json];
    REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @Full_Path string =
"adl://xxxx.azuredatalakestore.net/2017/03/28/00_0_66ffdd26541742fab57139e95080e704.json";

DECLARE @Output_Path = "adl://xxxx.azuredatalakestore.net/Output/Output.csv";

@logSchema =
EXTRACT dimBetType_SKey int
FROM @Full_Path
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

OUTPUT @logSchema
TO @Output_Path 
USING Outputters.Csv();

但是USQL始终因Vertex错误而失败

But the USQL is keep failing with Vertex error

有帮助吗?

推荐答案

这可能是因为文件的每一行上都有新的JSON块.这意味着您需要对其进行稍微不同的解析,而不是将其解析为纯JSON文件.

This is probably because you have new JSON blocks on each new line of the file. This means you need to parse it slightly differently rather than in being a straight JSON file.

首先尝试仅使用文本提取器为每个JSON元素添加新的行定界符.这样...

Try just using a text extractor first to bring in each JSON element with a new line delimiter. Like this...

DECLARE @Full_Path string = "etc"

@RawExtract = 
    EXTRACT 
        [RawString] string, 
        [FileName] string //optional, see below
    FROM
        @Full_Path
    USING 
        Extractors.Text(delimiter:'\b', quoting : false);

然后使用您引用的程序集将JSON切碎,但要使用JSON元组方法.这样...

Then shred the JSON with the assembly you've referenced, but using the JSON tuple method. Like this...

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@ParsedJSONLines = 
    SELECT 
        Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple([RawString]) AS JSONLine,
        [FileName]
    FROM 
        @RawExtract

接下来,获取值.这样...

Next, get the values out. Like this...

@StagedData =
    SELECT 
        JSONLine["dimBetType_SKey"] AS dimBetType_SKey,
        JSONLine["BetType_BKey"] AS BetType_BKey,
        JSONLine["BetTypeName"] AS BetTypeName
        [FileName]
    FROM 
        @ParsedJSONLines;

最后,将您的输出转换为CSV或其他格式.

Finally, do your output to CSV, or whatever.

DECLARE @Output_Path string = "etc"

OUTPUT @StagedData
TO @Output_Path 
USING Outputters.Csv();

作为旁注,您无需引用完整的数据湖存储路径.分析引擎知道存储的根目录在哪里,因此您可以只用此替换变量...

As a side note, you don't need to reference the complete data lake store path. The analytics engine knows where the root to the storage is so you can probably replace your variables with just this...

DECLARE @Full_Path string = "/2017/03/28/{FileName}";

希望这有助于对您的问题进行分类.

Hope this helps sort your issue.

这篇关于在U-SQL中解析json文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆