列命名中的U-SQL错误 [英] U-SQL Error in Naming the Column

查看:114
本文介绍了列命名中的U-SQL错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个JSON,其中字段的顺序不固定.

I have a JSON where the order of fields is not fixed.

即我可以有[A, B, C] or [B, C, A]

所有A,B,C都是json对象,其形式为{名称:x,值:y}.

All A, B, C are json objects are of the form {Name: x, Value:y}.

因此,当我使用USQL提取JSON(我不知道它们的顺序)并将其放入CSV(为此我需要列名称)时:

So, when I use USQL to extract the JSON (I don't know their order) and put it into a CSV (for which I will need column name):

@output =
    SELECT
        A["Value"] ?? "0" AS CAST ### (("System_" + A["Name"]) AS STRING), 
        B["Value"] ?? "0" AS "System_" + B["Name"],
        System_da

因此,我试图将列名称作为JSON中的名称"字段.

So, I am trying to put column name as the "Name" field in the JSON.

但是在上面####处出现了错误:

But am getting the error at #### above:

Message

syntax error. Expected one of: FROM ',' EXCEPT GROUP HAVING INTERSECT OPTION ORDER OUTER UNION UNION WHERE ';' ')' 

Resolution
Correct the script syntax, using expected token(s) as a guide.

Description

Invalid syntax found in the script.

Details
at token '(', line 74
near the ###:
**************

我不允许动态地"输入正确的列名,这绝对是我的问题.

I am not allowed to put the correct column name "dynamically" and it is an absolute necessity of my issue.

Input: [A, B, C,], [C, B, A]

Output:        A.name B.name C.name
Row 1's values
Row 2's values

推荐答案

@output =
SELECT
    A["Value"] ?? "0" AS CAST ### (("System_" + A["Name"]) AS STRING), 
    B["Value"] ?? "0" AS "System_" + B["Name"],
    System_da

不是有效的SELECT子句(在U-SQL或我所知道的任何其他SQL方言中都不是.)

is not a valid SELECT clause (neither in U-SQL nor any other SQL dialect I am aware of).

什么是JSON数组?它是键/值对吗?还是位置?还是您想要在数组中包含一个是否存在于数组中的标记的单个值?

What is the JSON Array? Is it a key/value pair? Or positional? Or a single value in the array that you want to have a marker for whether it is present in the array?

从您的示例来看,您似乎想要类似的东西:

From your example, it seems that you want something like:

输入:

[["A","B","C"],["C","D","B"]]

输出:

 A     B    C    D
 true  true true false
 false true true true

如果是这样,我将其写为:

If that is the case, I would write it as:

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

@input = 
  SELECT "[[\"A\", \"B\", \"C\"],[\"C\", \"D\", \"B\"]]" AS json 
  FROM (VALUES (1)) AS T(x);

@data = 
  SELECT JsonFunctions.JsonTuple(arrstring) AS a 
  FROM @input CROSS APPLY EXPLODE( JsonFunctions.JsonTuple(json).Values) AS T(arrstring);

@data = 
  SELECT a.Contains("A") AS A, a.Contains("B") AS B, a.Contains("C") AS C, a.Contains("D") AS D 
  FROM (SELECT a.Values AS a FROM @data) AS t;

OUTPUT @data
TO "/output/data.csv"
USING Outputters.Csv(outputHeader : true);

如果您需要更动态的内容,请使用生成的SqlArraySqlMap或使用上述方法来生成脚本.

If you need something more dynamic, either use the resulting SqlArray or SqlMap or use the above approach to generate the script.

但是,我想知道为什么您首先要以这种方式对信息进行建模.我建议您找到一种更合适的方法来标记JSON中值的存在.

However, I wonder why you would model your information this way in the first place. I would recommend finding a more appropriate way to mark the presence of the value in the JSON.

更新:我想念您关于内部数组成员是具有两个键值对的对象的评论,其中一个始终被称为name(对于属性),而一个始终被称为value.适当的价值.因此,这是该情况的答案.

UPDATE: I missed your comment about that the inner array members are an object with two key-value pairs, where one is always called name (for property) and one is always called value for the property value. So here is the answer for that case.

首先:使用{"Name": "propname", "Value" : "value"}在JSON中对键值对进行建模完全滥用了JSON的灵活建模功能,因此不应该这样做.如果可以,请使用{"propname" : "value"}.

First: Modelling key value pairs in JSON using {"Name": "propname", "Value" : "value"} is a complete misuse of the flexible modelling capabilities of JSON and should not be done. Use {"propname" : "value"} instead if you can.

因此,更改输入后,以下内容将为您提供枢轴值.请注意,您将需要提前知道这些值,并且有几种方法可以进行数据透视.我在创建新的SqlMap实例以减少过度建模的语句中执行此操作,然后在下一个SELECT从语句中获取值的语句中完成此操作.

So changing the input, the following will give you the pivoted values. Note that you will need to know the values ahead of time and there are several options on how to do the pivot. I do it in the statement where I create the new SqlMap instance to reduce the over-modelling, and then in the next SELECT where I get the values from the map.

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

@input = 
  SELECT "[[{\"Name\":\"A\", \"Value\": 1}, {\"Name\": \"B\", \"Value\": 2}, {\"Name\": \"C\", \"Value\":3 }], [{\"Name\":\"C\", \"Value\": 4}, {\"Name\":\"D\", \"Value\": 5}, {\"Name\":\"B\", \"Value\": 6}]]" AS json 
  FROM (VALUES (1)) AS T(x);

@data = 
  SELECT JsonFunctions.JsonTuple(arrstring) AS a
  FROM @input CROSS APPLY EXPLODE( JsonFunctions.JsonTuple(json)) AS T(rowid, arrstring);

@data =
  SELECT new SqlMap<string, string>( 
             a.Values.Select((kvp) => 
                 new KeyValuePair<string, string>(
                       JsonFunctions.JsonTuple(kvp)["Name"]
                     , JsonFunctions.JsonTuple(kvp)["Value"])
         )) AS kvp
  FROM @data;

@data =
  SELECT kvp["A"] AS A,
         kvp["B"] AS B,
         kvp["C"] AS C,
         kvp["D"] AS D
  FROM @data;

OUTPUT @data
TO "/output/data.csv"
USING Outputters.Csv(outputHeader : true);

这篇关于列命名中的U-SQL错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆