在任意长度的子字符串上拆分字符串(Powershell) [英] Split string on arbitrary-length substrings (Powershell)

查看:46
本文介绍了在任意长度的子字符串上拆分字符串(Powershell)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经格式化了来自其他来源的文本文件;我无法控制这些来源或要求他们生成更适合我的用途的格式,例如 CSV.我可以查看文件的标题行以确定列宽(和名称,但它们在这里不是问题).完成后,我将拥有一系列宽度.我希望能够根据我从标题中确定的宽度分割该文件中的后续行.

显然,我可以遍历宽度数组,并咬掉适当长度的初始子字符串,但我希望有一种更有效的方法 - 例如,如果我想使用固定宽度的列,我可以只使用 -split "(\w{$foo})",其中 $foo 是包含列宽度的变量.

事实上,有没有更有效的方法来做到这一点?

示例数据:

Junction 0122 D150441-4 Ni Po De 210 Na

列宽 $cols=@(14, 5, 11, 2, 16, 3, 4, 2)

(注意:我不关心切碎数据中的尾随空格;我可以稍后管理它们.我现在只是想切碎数据.)>

(应 iRon 的要求,能够演示他的 ConvertFrom-SourceTable,这是一个可能需要解析的完整文件)

@SUB-SECTOR: sec_C 扇区: ref## 子部门内的贸易路线##--------1---------2---------3---------4---------5---------6---#PlanetName 位置.UPP 代码 B 注释 Z PBG Al LRX *#---------- ---- --------- - --------------- - --- -- --- -Lemente 1907 B897563-B Ag Ni 824 NaZamoran 2108 B674675-A Q Ag Ni 904 Dr

解决方案

事实上,有没有更有效的方法来做到这一点?

如果更高效"是指需要更少 CPU 周期的东西",那么是:

$string = 'Junction 0122 D150441-4 Ni Po De 210 Na'$cols = @(14, 5, 11, 2, 16, 3, 4, 2)$substrings = @($cols |选择 -SkipLast 1 |ForEach-Object {$string.Remove($_)$string = $string.Substring($_)}$字符串)# $substrings 现在包含单独的列值

上面的代码将通过从字符串的前一个副本中连续删除它们来获取第一个 n-1 个子字符串.

<小时>

如果更高效"是指更少代码",您可以连接您构建的正则表达式模式并一次性获取所有捕获组:

$string = 'Junction 0122 D150441-4 Ni Po De 210 Na'$cols = @(14, 5, 11, 2, 16, 3, 4, 2)# 生成正则表达式# 在这种情况下 '(.{14})(.{5})(.{11})(.{2})(.{16})(.{3})(.{4})(.{2})'$pattern = $cols.ForEach({"(.{$_})"})-join''# 使用 `-match` 和 $Matches 来获取单个组$substrings = if($string -match $pattern){$Matches[1..($cols.Length-1)]}# $substrings 再次保存我们所有的子字符串

I have formatted text files from other sources; I can't control those sources or ask them to generate a more sensible-for-my-purposes format like CSV. I can look at the header lines of the files to determine the column widths (and names, but they're not at issue here). Once I've done that, I'll have an array of widths. I'd like to be able to split subsequent lines in that file based on the widths I've determined from the header.

Obviously, I can loop through the array of widths, and bite off the initial substring of the appropriate length, but I'm hoping there's a more efficient way - for example, if I wanted to use fixed-width columns, I could just use -split "(\w{$foo})", where $foo is the variable that contains the width of the column.

Is there, in fact, a more efficient way of doing this?

Example data:

Junction      0122 D150441-4    Ni Po De           210 Na

Column widths $cols=@(14, 5, 11, 2, 16, 3, 4, 2)

(Note: I don't care about trailing spaces in the chopped-up data; I can manage those later. I'm simply looking to chop the data at this point.)

(At iRon's request to be able to demonstrate his ConvertFrom-SourceTable, this is a full file that might need to be parsed)

@SUB-SECTOR: sec_C   SECTOR: reft
#
# Trade routes within the subsector
#
#--------1---------2---------3---------4---------5---------6---
#PlanetName   Loc. UPP Code   B   Notes         Z  PBG Al LRX *
#----------   ---- ---------  - --------------- -  --- -- --- -
Lemente       1907 B897563-B    Ag Ni              824 Na
Zamoran       2108 B674675-A  Q Ag Ni              904 Dr

解决方案

Is there, in fact, a more efficient way of doing this?

If by "more efficient", you mean "something that takes fewer CPU cycles", then yes:

$string = 'Junction      0122 D150441-4    Ni Po De           210 Na'
$cols = @(14, 5, 11, 2, 16, 3, 4, 2)
$substrings = @(
  $cols |Select -SkipLast 1 |ForEach-Object {
    $string.Remove($_)
    $string = $string.Substring($_)
  }
  $string
)

# $substrings now contain the individual column values

The code above will grab the first n-1 substrings by continuously removing them from the previous copy of the string.


If by "more efficient" you mean "less code", you can concatenate your constructed regex patterns and grab all capture groups in one go:

$string = 'Junction      0122 D150441-4    Ni Po De           210 Na'
$cols = @(14, 5, 11, 2, 16, 3, 4, 2)

# generate the regex pattern 
# in this case '(.{14})(.{5})(.{11})(.{2})(.{16})(.{3})(.{4})(.{2})'
$pattern = $cols.ForEach({"(.{$_})"})-join''

# use `-match` and $Matches to grab the individual groups
$substrings = if($string -match $pattern){
  $Matches[1..($cols.Length-1)]
}

# $substrings again holds all our substrings

这篇关于在任意长度的子字符串上拆分字符串(Powershell)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆