使用Azure数据工厂从列中解析JSON字符串 [英] Use Azure Data Factory to parse JSON string from a column

查看:47
本文介绍了使用Azure数据工厂从列中解析JSON字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的情况

我有Azure Table作为源,我的目标是Azure SQL数据库.源表看起来像这样:

I have Azure Table as a source, and my target is Azure SQL database. Source table looks something like this:

<身体>
id 文件名元数据
1 file_1.txt {公司":{"id":555,名称":"Company A";},"quality":[{"quality":3,"file_name":"file_1.txt"},{"quality":4,"file_name":"unkown"}]}
2 file_2.txt {公司":{"id":231,名称":"Company B";},"quality":[{"quality":4,"file_name":"file_2.txt"},{"quality":3,"file_name":"unkown"}]}
3 file_3.txt {公司":{"id":111,名称":"Company C"},"quality":[{"quality":5,"file_name":"unknown"},{"quality":4,"file_name":"file_3.txt"}]}

目标表应该看起来像这样:

The target table is supposed to look like this:

<身体>
id 文件名公司质量
1 file_1.txt 公司A 3
2 file_2.txt 公司B 4
3 file_3.txt 公司C 4

这意味着我需要解析此字符串中的数据以获得新的列值,并根据源文件中的file_name列使用质量值.

That means that I need to parse the data from this string to get the new column values, as well as use quality value depending on the file_name column from the source.

我要做的第一件事是创建一个复制管道,以将数据从Azure表一对一传输到Azure Data Lake Store上的镶木地板文件,因此我可以将其用作Data Flow中的源.接下来,想法是使用派生列并使用一些表达式来获取数据,但据我所知,没有表达式将该字符串视为JSON对象.

The first thing I've done is created a Copy pipeline to transfer the data 1 to 1 from Azure Tables to parquet file on Azure Data Lake Store so I can use it as a source in Data Flow. Next, the idea was to use derived column and use some expression to get the data but as far as I can see, there's no expression that treats this string as a JSON object.

因此,下一个想法可能是在此过程之前添加一个步骤,在该步骤中,我将元数据列的内容提取到ADLS上的单独文件中,并使用该文件作为源或查找并将其定义为JSON文件以开始和.这意味着我需要将id值添加到JSON文件中,以便能够将数据绑定回记录.

So, the next idea was to maybe add a step before this process where I would extract the contents of metadata column to a separate file on ADLS and use that file as a source or lookup and define it as a JSON file to begin with. This would imply that I need to add id value to the JSON file so I'm able to tie the data back to the record.

第一个解决方案看起来更像是一个有前途的想法,但是,如果这不是一个选择,我将研究其他可能的解决方案.

The first solution looked more promising as the idea, but if that's not an option, I'll look into other possible solutions.

谢谢

推荐答案

我的调试结果如下:

  1. 我认为您可以使用OPENJASON解析JSON字符串.

    select t.id, A.Company,A.quality,A.file_name,A.quality_s,A.file_name_s
        from dbo.test t
        CROSS APPLY OPENJSON(t.metadata) 
        WITH (
            company varchar(255) '$.Company.Name',
            quality varchar(255) '$.quality[0].quality',
            file_name varchar(255) '$.quality[0].file_name',
            quality_s varchar(255) '$.quality[1].quality',
            file_name_s varchar(255) '$.quality[1].file_name'
        ) A

查询结果如下:

然后使用数据流,然后进行进一步处理.回到我的电脑时,我将向您显示详细信息.

The query result is as follows:

Then use data flow then do further processing. I will show u details when I back to my PC.

  1. 使用ADF中的复制"活动,将查询结果复制到csv中.

  1. Use Copy activity in ADF, copy the query result into a csv.

使用数据流来处理此csv文件.将复制活动"生成的csv文件设置为源,数据预览如下:

Use data flow to process this csv file. Set the Copy activity generated csv file as the source, data preview is as follows:

使用 DerivedColumn1 生成新列,
FileName : case(equalsIgnoreCase(file_name,'unknown'),file_name_s,file_name)
QualityS : case(equalsIgnoreCase(file_name,'unknown'),quality_s,quality)数据预览如下:

Use DerivedColumn1 to generate new columns,
FileName : case(equalsIgnoreCase(file_name,'unknown'),file_name_s,file_name),
QualityS: case(equalsIgnoreCase(file_name,'unknown'),quality_s,quality) Data preview is as follows:

使用 Select1 活动来过滤所需的列.数据预览如下:

Use Select1 activity to filter columns which we want。 Data preview is as follows:

然后我们可以将结果存储到SQL表中.

Then we can sink the result to a SQL table.

这篇关于使用Azure数据工厂从列中解析JSON字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆