为什么`-lt`对于字符和字符串的行为有所不同? [英] Why is `-lt` behaving differently for chars and strings?

查看:55
本文介绍了为什么`-lt`对于字符和字符串的行为有所不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近回答了一个SO问题关于将 -lt -gt 与字符串一起使用.我的答案是基于我已经完成的事情请更早阅读,其中说 -lt 一次比较每个字符串中的一个字符,直到ASCII值不等于另一个.在那一点上,结果(较低/相等/较大)决定.按照这种逻辑,"Less" -lt"less" 应该返回 True ,因为 L 具有比更低的ASCII字节值.l ,但不是:

I recently answered a SO-question about using -lt or -gt with strings. My answer was based on something I've read earlier which said that -lt compares one char from each string at a time until a ASCII-value is not equal to the other. At that point the result (lower/equal/greater) decides. By that logic, "Less" -lt "less" should return True because L has a lower ASCII-byte-value than l, but it doesn't:

[System.Text.Encoding]::ASCII.GetBytes("Less".ToCharArray())
76
101
115
115

[System.Text.Encoding]::ASCII.GetBytes("less".ToCharArray())
108
101
115
115

"Less" -lt "less"
False

似乎我可能错过了一个关键的部分:该测试不区分大小写

It seems that I may have been missing a crucial piece: the test is case-insensitive

#L has a lower ASCII-value than l. PS doesn't care. They're equal
"Less" -le "less"
True

#The last s has a lower ASCII-value than t. PS cares.
"Less" -lt "lest"
True

#T has a lower ASCII-value than t. PS doesn't care
"LesT" -lt "lest"
False

#Again PS doesn't care. They're equal
"LesT" -le "lest"
True

然后我尝试测试char与单字符字符串:

I then tried to test char vs single-character-string:

[int][char]"L"
76

[int][char]"l"
108


#Using string it's case-insensitive. L = l
"L" -lt "l"
False

"L" -le "l"
True

"L" -gt "l"
False

#Using chars it's case-sensitive! L < l
([char]"L") -lt ([char]"l")
True

([char]"L") -gt ([char]"l")
False

为了进行比较,我尝试使用区分大小写的小于运算符,但是它表示 L>l ,它与为字符返回的 -lt 相反.

For comparison, I tried to use the case-sensitive less-than operator, but it says L > l which is the opposite of what -lt returned for chars.

"L" -clt "l"
False

"l" -clt "L"
True

比较是如何工作的,因为它显然不是通过使用ASCII值来实现的,为什么它在字符与字符串上的行为有所不同?

How does the comparison work, because it clearly isn't by using ASCII-value and why does it behave differently for chars vs. strings?

推荐答案

非常感谢 PetSerAl 对于他所有宝贵的投入.

A big thank-you to PetSerAl for all his invaluable input.

tl;博士:

  • -lt -gt 通过 Unicode数值比较 [char] 实例 代码点.

  • -lt and -gt compare [char] instances numerically by Unicode codepoint.

  • 令人困惑的是,即使 -ilt -clt -igt -cgt 也是如此仅对 string 操作数有意义,但这在PowerShell语言本身中是一个怪癖(请参阅底部).
  • Confusingly, so do -ilt, -clt, -igt, -cgt - even though they only make sense with string operands, but that's a quirk in the PowerShell language itself (see bottom).

-eq (及其别名 -ieq )比较 [char] 实例 case-不敏感地,这通常是,但不一定像不区分大小写的字符串比较( -ceq 再次严格地比较数字).

-eq (and its alias -ieq), by contrast, compare [char] instances case-insensitively, which is typically, but not necessarily like a case-insensitive string comparison (-ceq again compares strictly numerically).

  • -eq / -ieq 最后也会对进行数值比较,但首先将操作数转换为它们的大写等效项使用不变文化;结果,此比较不完全等同于PowerShell的 string 比较,该比较还可以识别所谓的兼容序列(不同的字符或什至被认为具有相同含义的序列;请参见
  • -eq/-ieq ultimately also compares numerically, but first converts the operands to their uppercase equivalents using the invariant culture; as a result, this comparison is not fully equivalent to PowerShell's string comparison, which additionally recognizes so-called compatible sequences (distinct characters or even sequences considered to have the same meaning; see Unicode equivalence) as equal.
  • In other words: PowerShell special-cases the behavior of only -eq / -ieq with [char] operands, and does so in a manner that is almost, but not quite the same as case-insensitive string comparison.

这种区别导致违反直觉的行为,例如 [char]'A'-eq [char]'a' [char]'A'-lt [char]'a' both 都返回 $ true .

This distinction leads to counter-intuitive behavior such as [char] 'A' -eq [char] 'a' and [char] 'A' -lt [char] 'a' both returning $true.

为了安全起见:

    如果您想要数字(Unicode代码点)比较,
  • 总是强制转换为 [int] .
  • 如果要进行 string 比较,
  • 总是强制转换为 [string] .
  • always cast to [int] if you want numeric (Unicode codepoint) comparison.
  • always cast to [string] if you want string comparison.

有关背景信息,请继续阅读.

For background information, read on.

PowerShell通常有用的运算符重载有时会很棘手.

PowerShell's usually helpful operator overloading can be tricky at times.

请注意,在 数字上下文(无论是隐式还是显式)中,PowerShell会处理个字符( [char] ( [System.Char] )实例) 数字,通过其 Unicode 代码点(不是 ASCII).

Note that in a numeric context (whether implicit or explicit), PowerShell treats characters ([char] ([System.Char]) instances) numerically, by their Unicode codepoint (not ASCII).

[char] 'A' -eq 65  # $true, in the 'Basic Latin' Unicode range, which coincides with ASCII
[char] 'Ā' -eq 256 # $true; 0x100, in the 'Latin-1 Supplement' Unicode range

[char] 的不同之处在于,其实例通过Unicode代码点在数字上按原样相互比较 ,除了 -eq / -ieq .

What makes [char] unusual is that its instances are compared to each other numerically as-is, by Unicode codepoint, EXCEPT with -eq/-ieq.

  • ceq -lt -gt 通过Unicode代码点直接比较 ,并且-直观地反比- -ilt -clt -igt -cgt 也是如此:
  • ceq, -lt, and -gt compare directly by Unicode codepoints, and - counter-intuitively - so do -ilt, -clt, -igt and -cgt:
[char] 'A' -lt [char] 'a'  # $true; Unicode codepoint 65 ('A') is less than 97 ('a')

  • -eq (及其别名 -ieq )首先将字符转换为大写字母,然后比较生成的Unicode代码点:
    • -eq (and its alias -ieq) first transforms the characters to uppercase, then compares the resulting Unicode codepoints:
    • [char] 'A' -eq [char] 'a' # !! ALSO $true; equivalent of 65 -eq 65
      

      值得反思一下佛教的转折::在PowerShell的世界中,字符"A"都小于,而等于"a",具体取决于您的比较方式.

      It's worth reflecting on this Buddhist turn: this and that: in the world of PowerShell, character 'A' is both less than and equal to 'a', depending on how you compare.

      此外,直接或间接地-转换为大写字母后-比较Unicode代码点与将它们与 strings 比较是不同的,因为PowerShell的 string >比较另外识别所谓的兼容序列,其中字符(甚至是字符序列)如果具有相同的含义,则被视为相同"(请参见 Unicode等价);例如:

      Also, directly or indirectly - after transformation to uppercase - comparing Unicode codepoints is NOT the same as comparing them as strings, because PowerShell's string comparison additionally recognizes so-called compatible sequences, where characters (or even character sequences) are considered "the same" if they have the same meaning (see Unicode equivalence); e.g.:

      # Distinct Unicode characters U+2126 (Ohm Sign) and U+03A9 Greek Capital Letter Omega)
      # ARE recognized as the "same thing" in a *string* comparison:
      "Ω" -ceq "Ω"  # $true, despite having distinct Unicode codepoints
      
      # -eq/ieq: with [char], by only applying transformation to uppercase, the results
      # are still different codepoints, which - compared numerically - are NOT equal:
      [char] 'Ω' -eq [char] 'Ω' # $false: uppercased codepoints differ
      
      # -ceq always applies direct codepoint comparison.
      [char] 'Ω' -ceq [char] 'Ω' # $false: codepoints differ
      

      请注意,使用前缀 i c 明确表示 大小写匹配行为不足以强制字符串比较,即使从概念上来说运算符,例如 -ceq -ieq -clt -ilt -cgt -igt 仅对字符串有意义.

      Note that use of prefixes i or c to explicitly specify case-matching behavior is NOT sufficient to force string comparison, even though conceptually operators such as -ceq, -ieq, -clt, -ilt, -cgt, -igt only make sense with strings.

      有效地,当将 i c 前缀应用于 -lt -gt ,同时比较 [char] 操作数;事实证明(与我最初的想法不同),这是 PowerShell的一般陷阱-参见以下说明.

      Effectively, the i and c prefixes are simply ignored when applied to -lt and -gt while comparing [char] operands; as it turns out (unlike what I originally thought), this is a general PowerShell pitfall - see below for an explanation.

      顺便说一句:字符串比较中的 -lt -gt 逻辑不是数字,但基于 collat​​ion order (一种以 human 为中心的排序方式,独立于代码点/字节值),按.NET术语由文化(默认情况下是当前有效的文化,或者通过将文化参数传递给方法).
      正如@PetSerAl在评论中所展示的(与我最初的说法不同), PS字符串比较使用的是不变文化,而不是当前的文化,因此,无论当前的文化是什么,他们的行为都是相同的.

      As an aside: -lt and -gt logic in string comparison is not numeric, but based on collation order (a human-centric way of ordering independent of codepoints / byte values), which in .NET terms is controlled by cultures (either by default by the one currently in effect, or by passing a culture parameter to methods).
      As @PetSerAl demonstrates in a comment (and unlike what I originally claimed), PS string comparisons use the invariant culture, not the current culture, so their behavior is the same, irrespective of what culture is the current one.

      幕后花絮:

      正如@PetserAl在评论中所解释的那样,PowerShell的 parsing 不能区分运算符的基本形式和 i 前缀形式.例如,将 -lt -ilt 都转换为 same 值,即 Ilt .
      因此, Powershell 不能 -lt -ilt -gt vs. igt ,... ,因为它在语法级别上将它们视为相同.

      As @PetserAl explains in the comments, PowerShell's parsing doesn't distinguish between the base form of an operator its i-prefixed form; e.g., both -lt and -ilt are translated to the same value, Ilt.
      Thus, Powershell cannot implement differing behavior for -lt vs. -ilt, -gt vs. igt, ..., because it treats them the same at the syntax level.

      这会导致违反直觉的行为,因为当比较区分大小写没有意义的数据类型时,运算符前缀实际上会被忽略 -而不是被强制转换为正如人们所期望的那样;例如:

      This leads to somewhat counter-intuitive behavior in that operator prefixes are effectively ignored when comparing data types where case-sensitivity has no meaning - as opposed to getting coerced to strings, as one might expect; e.g.:

      "10" -cgt "2"  # $false, because "2" comes after "1" in the collation order
      
      10 -cgt 2  # !! $true; *numeric* comparison still happens; the `c` is ignored.
      

      在后一种情况下,我希望使用 -cgt 将操作数强制转换为字符串,因为区分大小写的比较只是字符串比较中的一个有意义的概念,但事实并非如此可以.

      In the latter case I would have expected the use of -cgt to coerce the operands to strings, given that case-sensitive comparison is only a meaningful concept in string comparison, but that is NOT how it works.

      如果您想更深入地了解PowerShell的运行方式,请参阅下面的@PetSerAl注释.

      If you want to dig deeper into how PowerShell operates, see @PetSerAl's comments below.

      这篇关于为什么`-lt`对于字符和字符串的行为有所不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆