powerquery 按可变字段长度拆分列 [英] powerquery split column by variable field lengths

查看:40
本文介绍了powerquery 按可变字段长度拆分列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 PowerQuery 中,我需要导入一个固定宽度的 txt 文件(每行是多个字段的串联,每个字段都有固定的特定长度).当我导入它时,我会得到一个包含 txt 行的单列的表,例如格式如下:

In PowerQuery I need to import a fixed width txt file (each line is the concatenation of a number of fields, each field has a fixed specific length). When I import it I get a table with one single column that contains the txt lines, e.g. in the following format:

AAAABBCCCCCDDD

我想以这种方式添加更多列:

I want to add more columns in this way:

Column1: AAAA
Column2: BB
Column3: CCCCC
Column4: DDD

换句话说,组成源列的字段的长度是已知的,但所有字段的长度都不相同(在上面的示例中,长度为:4,2,5,3).

In other words the fields composing the source column are of known length, but this length is not the same for all fields (in the example above the lengths are: 4,2,5,3).

我想使用拆分列">按字符数"实用程序,但我一次只能插入一个长度,为了获得所需的输出,我必须重复该过程 3 次,每次添加一列,并为拆分列">按字符数"实用程序使用一次,尽可能向左"选项.

I'd like to use the "Split Column">"By number of character" utility but I can only insert one single length at a time, and to get the desired output I'd have to repeat the process 3 times, adding one column each time and using the "Once, as far left as possible" option for the "Split Column">"By number of character" utility.

我的实际案例有许多不同的行类型(文件)要导入和转换,每个都有超过 20 个字段,因此需要较少的手动方法;我想以某种方式指定记录结构(每个字段的长度)并自动分割行:)

My real life case has many different line types (files) to import and convert, each with more then 20 fields, so a less manual approach is needed; I'd like to somehow specify the record structure (the length of each field) and get the lines split automagically :)

可能需要一些 M 代码,我对此一无所知:有人能指出我正确的方向吗?

There would probably be the need for some M code, which I know nothing about: can anybody point me to the right direction?

谢谢!

推荐答案

使用以下公式创建查询.我们将此查询称为 SplitText:

Create a query with the formula below. Let's call this query SplitText:

let
    SplitText = (text, lengths) => 
    let
        LengthsCount = List.Count(lengths),
        // Keep track of the index in the lengths list and the position in the text to take the next characters from. Use this information to get the next segment and put it into a list.
        Split = List.Generate(() => {0, 0}, each _{0} < LengthsCount, each {_{0} + 1, _{1} + lengths{_{0}}}, each Text.Range(text, _{1}, lengths{_{0}}))
    in
        Split,
    // Convert the list to a record to 
    ListToRecord = (text, lengths) => 
    let
        List = SplitText(text, lengths),
        Record = Record.FromList(List, List.Transform({1 .. List.Count(List)}, each Number.ToText(_)))
    in
        Record
in
    ListToRecord

然后,在您的表格中,添加一个使用此公式的自定义列:

Then, in your table, add a custom column that uses this formula:

每个 SplitText([Column1], {4, 2, 5, 3})

第一个参数是要拆分的文本,第二个参数是要拆分的长度列表.

The first argument is the text to split, and the second argument is a list of lengths to split by.

最后,展开列以将拆分的文本值放入表格中.您可能需要重命名列,因为它们将被命名为 1、2 等.

Finally, expand the column to get the split text values into your table. You may want to rename the columns since they will be named 1, 2, etc.

这篇关于powerquery 按可变字段长度拆分列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆