为什么FINDSTR不能正确处理的情况下(在某些情况下)? [英] Why does findstr not handle case properly (in some circumstances)?

查看:138
本文介绍了为什么FINDSTR不能正确处理的情况下(在某些情况下)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然CMD.EXE最近编写一些脚本,我有必要用 FINDSTR 与常规的前pressions - 规定标准的cmd.exe命令的客户(无的GnuWin32也不Cygwin的,也不VBS也不Powershell的)。

While writing some recent scripts in cmd.exe, I had a need to use findstr with regular expressions - customer required standard cmd.exe commands (no GnuWin32 nor Cygwin nor VBS nor Powershell).

我只是想知道如果一个变量包含任何大写字符,并试图使用:

I just wanted to know if a variable contained any upper-case characters and attempted to use:

> set myvar=abc
> echo %myvar%|findstr /r "[A-Z]"
abc
> echo %errorlevel%
0

%MYVAR%设置为 ABC ,实际输出字符串,并设置错误级别 0,说找到匹配。

When %myvar% is set to abc, that actually outputs the string and sets errorlevel to 0, saying that a match was found.

不过,全名单变种:

> echo %myvar%|findstr /r "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]"
> echo %errorlevel%
1

做的的输出线并正确设置错误级别 1

does not output the line and it correctly sets errorlevel to 1.

另外:

> echo %myvar%|findstr /r "^[A-Z]*$"
> echo %errorlevel%
1

也按预期工作。

我显然缺少的的东西的位置,即使它只是一个事实,即 FINDSTR 莫名其妙地断了。

I'm obviously missing something here even if it's only the fact that findstr is somehow broken.

为什么第一个(范围)的正则表达式不是在这种情况下工作?

Why does the first (range) regex not work in this case?

然而,更多的怪事:

> echo %myvar%|findstr /r "[A-Z]"
abc
> echo %myvar%|findstr /r "[A-Z][A-Z]"
abc
> echo %myvar%|findstr /r "[A-Z][A-Z][A-Z]"
> echo %myvar%|findstr /r "[A]"

的最后两个以上也不输出串!!

The last two above also does not output the string!!

推荐答案

我认为这主要是一个可怕的设计缺陷。

I believe this is mostly a horrible design flaw.

我们都期望范围整理基础上,ASCII code值。但他们没有 - 而不是范围的基础上,近搜索结果排序中使用的默认序列匹配归类序列。 修改 按FINDSTR使用-The确切的排序顺序现已在这个答案的底部:<一href=\"http://stackoverflow.com/a/8844873/1012053\">http://stackoverflow.com/a/8844873/1012053.

We all expect the ranges to collate based on the ASCII code value. But they don't - instead the ranges are based on a collation sequence that nearly matches the default sequence used by SORT. EDIT -The exact collation sequence used by FINDSTR is now available at the bottom of this answer: http://stackoverflow.com/a/8844873/1012053.

我prepared含有从1每个扩展的ASCII字符的一行的文本文件 - 255,但不包括10(LF),13(CR),和26(在Windows EOF)。
在每一个行我有性格,后跟一个空格,然后是小数code表示的字符。然后,我跑过排序的文件,并抓获在sortedChars.txt文件的输出。

I prepared a text file containing one line for each extended ASCII character from 1 - 255, excluding 10 (LF), 13 (CR), and 26 (EOF on Windows). On each line I have the character, followed by a space, followed by the decimal code for the character. I then ran the file through SORT and captured the output in a sortedChars.txt file.

我现在可以很容易地测试任何正则表达式打击范围此有序文件,并演示了如何的范围是由几乎相同的排序排序顺序确定的。

I now can easily test any regex range against this sorted file and demonstrate how the range is determined by a collation sequence that is nearly the same as SORT.

>findstr /nrc:"^[0-9]" sortedChars.txt
137:0 048
138:½ 171
139:¼ 172
140:1 049
141:2 050
142:² 253
143:3 051
144:4 052
145:5 053
146:6 054
147:7 055
148:8 056
149:9 057

的结果并不完全符合我们的预期在171个字符,172和253在混合抛出。但结果完全合理。行号preFIX对应的排序整理顺序,你可以看到的范围内按照排序顺序完全一致。

The results are not quite what we expected in that chars 171, 172 and 253 are thrown in the mix. But the results make perfect sense. The line number prefix corresponds to the SORT collation sequence, and you can see that the range exactly matches according to the SORT sequence.

下面是正是遵循排序顺序另一个范围的测试:

Here is another range test that exactly follows the SORT sequence:

>findstr /nrc:"^[!-=]" sortedChars.txt
34:! 033
35:" 034
36:# 035
37:$ 036
38:% 037
39:& 038
40:( 040
41:) 041
42:* 042
43:, 044
44:. 046
45:/ 047
46:: 058
47:; 059
48:? 063
49:@ 064
50:[ 091
51:\ 092
52:] 093
53:^ 094
54:_ 095
55:` 096
56:{ 123
57:| 124
58:} 125
59:~ 126
60:¡ 173
61:¿ 168
62:¢ 155
63:£ 156
64:¥ 157
65:₧ 158
66:+ 043
67:∙ 249
68:< 060
69:= 061

有一个小异常与字母字符。字符A,A和Z之间的各种但它不匹配[A-Z]。 Z后的种种Z,但它匹配[A-Z]。有与[A-Z]的相应问题。 A排序前一,但它匹配[A-Z]。 Z的各种各样的一和z,但它不匹配[A-Z]

There is one small anomaly with alpha characters. Character "a" sorts between "A" and "Z" yet it does not match [A-Z]. "z" sorts after "Z", yet it matches [A-Z]. There is a corresponding problem with [a-z]. "A" sorts before "a", yet it matches [a-z]. "Z" sorts between "a" and "z", yet it does not match [a-z].

下面是[A-Z]的结果:

Here are the [A-Z] results:

>findstr /nrc:"^[A-Z]" sortedChars.txt
151:A 065
153:â 131
154:ä 132
155:à 133
156:å 134
157:Ä 142
158:Å 143
159:á 160
160:ª 166
161:æ 145
162:Æ 146
163:B 066
164:b 098
165:C 067
166:c 099
167:Ç 128
168:ç 135
169:D 068
170:d 100
171:E 069
172:e 101
173:é 130
174:ê 136
175:ë 137
176:è 138
177:É 144
178:F 070
179:f 102
180:ƒ 159
181:G 071
182:g 103
183:H 072
184:h 104
185:I 073
186:i 105
187:ï 139
188:î 140
189:ì 141
190:í 161
191:J 074
192:j 106
193:K 075
194:k 107
195:L 076
196:l 108
197:M 077
198:m 109
199:N 078
200:n 110
201:ñ 164
202:Ñ 165
203:ⁿ 252
204:O 079
205:o 111
206:ô 147
207:ö 148
208:ò 149
209:Ö 153
210:ó 162
211:º 167
212:P 080
213:p 112
214:Q 081
215:q 113
216:R 082
217:r 114
218:S 083
219:s 115
220:ß 225
221:T 084
222:t 116
223:U 085
224:u 117
225:û 150
226:ù 151
227:ú 163
228:ü 129
229:Ü 154
230:V 086
231:v 118
232:W 087
233:w 119
234:X 088
235:x 120
236:Y 089
237:y 121
238:ÿ 152
239:Z 090
240:z 122

而[A-Z]结果

And the [a-z] results

>findstr /nrc:"^[a-z]" sortedChars.txt
151:A 065
152:a 097
153:â 131
154:ä 132
155:à 133
156:å 134
157:Ä 142
158:Å 143
159:á 160
160:ª 166
161:æ 145
162:Æ 146
163:B 066
164:b 098
165:C 067
166:c 099
167:Ç 128
168:ç 135
169:D 068
170:d 100
171:E 069
172:e 101
173:é 130
174:ê 136
175:ë 137
176:è 138
177:É 144
178:F 070
179:f 102
180:ƒ 159
181:G 071
182:g 103
183:H 072
184:h 104
185:I 073
186:i 105
187:ï 139
188:î 140
189:ì 141
190:í 161
191:J 074
192:j 106
193:K 075
194:k 107
195:L 076
196:l 108
197:M 077
198:m 109
199:N 078
200:n 110
201:ñ 164
202:Ñ 165
203:ⁿ 252
204:O 079
205:o 111
206:ô 147
207:ö 148
208:ò 149
209:Ö 153
210:ó 162
211:º 167
212:P 080
213:p 112
214:Q 081
215:q 113
216:R 082
217:r 114
218:S 083
219:s 115
220:ß 225
221:T 084
222:t 116
223:U 085
224:u 117
225:û 150
226:ù 151
227:ú 163
228:ü 129
229:Ü 154
230:V 086
231:v 118
232:W 087
233:w 119
234:X 088
235:x 120
236:Y 089
237:y 121
238:ÿ 152
240:z 122

排序排序小写之前大写。的(编辑 - 我刚才读了SORT的帮助​​,得知它不大写和小写之间的区别,我的排序输出始终把上以前低可能是输入顺序的结果的事实。)的正则表达式但显然排序前大写小写。以下所有的范围不匹配的任何字符。

Sort sorts upper case before lower case. (EDIT - I just read the help for SORT and learned that it does not differentiate between upper and lower case. The fact that my SORT output consistently put upper before lower is probably a result of the order of the input.) But regex apparently sorts lower case before upper case. All of the following ranges fail to match any characters.

>findstr /nrc:"^[A-a]" sortedChars.txt

>findstr /nrc:"^[B-b]" sortedChars.txt

>findstr /nrc:"^[C-c]" sortedChars.txt

>findstr /nrc:"^[D-d]" sortedChars.txt

反转顺序查找的字符。

Reversing the order finds the characters.

>findstr /nrc:"^[a-A]" sortedChars.txt
151:A 065
152:a 097

>findstr /nrc:"^[b-B]" sortedChars.txt
163:B 066
164:b 098

>findstr /nrc:"^[c-C]" sortedChars.txt
165:C 067
166:c 099

>findstr /nrc:"^[d-D]" sortedChars.txt
169:D 068
170:d 100

有该正则表达式各种不同的方式比SORT额外字符,但我还没有得到一个precise列表。

There are additional characters that regex sorts differently than SORT, but I haven't got a precise list.

这篇关于为什么FINDSTR不能正确处理的情况下(在某些情况下)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆