Windows FINDSTR 命令有哪些未记录的功能和限制? [英] What are the undocumented features and limitations of the Windows FINDSTR command?

查看:25
本文介绍了Windows FINDSTR 命令有哪些未记录的功能和限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Windows FINDSTR 命令的文档非常糟糕.通过FINDSTR/?HELP FINDSTR 可以获得非常基本的命令行帮助,但它非常不充足.在 https://上有更多在线文档docs.microsoft.com/en-us/windows-server/administration/windows-commands/findstr.

有许多 FINDSTR 功能和限制甚至没有在文档中暗示.如果没有先验知识和/或仔细的实验​​,也无法预料到它们.

所以问题是 - 未记录的 FINDSTR 功能和限制是什么?

这个问题的目的是提供许多未记录功能的一站式存储库,以便:

A) 开发人员可以充分利用现有的功能.

B) 开发人员不会浪费时间去想为什么某些看起来应该正常工作的东西却不起作用.

在回复之前,请确保您了解现有文档.如果该信息包含在 HELP 中,则不属于此处.

这也不是展示 FINDSTR 有趣用法的地方.如果一个有逻辑的人可以根据文档预测 FINDSTR 的特定用法的行为,那么它不属于这里.

同样,如果一个有逻辑的人可以根据任何现有答案中包含的信息预测特定用法的行为,那么它也不属于这里.

解决方案

前言
这个答案中的大部分信息都是根据在 Vista 机器上运行的实验收集的.除非另有明确说明,我尚未确认该信息是否适用于其他 Windows 版本.

FINDSTR 输出
文档从不费心解释 FINDSTR 的输出.它暗示了打印匹配行的事实,但仅此而已.

匹配行输出格式如下:

文件名:lineNumber:lineOffset:text

哪里

fileName: = 包含匹配行的文件的名称.如果请求明确针对单个文件,或者搜索管道输入或重定向输入,则不会打印文件名.打印时,fileName 将始终包含提供的任何路径信息.如果使用 /S 选项,将添加额外的路径信息.打印的路径总是相对于提供的路径,如果没有提供,则相对于当前目录.

注意 - 使用 非标准(且记录不充分)通配符 <>.可以在此处找到有关这些通配符如何工作的确切规则.最后,您可以查看这个 示例非标准通配符与 FINDSTR 一起使用.

lineNumber: = 匹配行的行号表示为十进制值,1 表示输入的第一行.仅在指定 /N 选项时打印.

lineOffset: = 匹配行开头的十进制字节偏移量,0 代表第一行的第一个字符.仅在指定 /O 选项时打印.这不是行内匹配的偏移量.是从文件开头到行首的字节数.

text = 匹配行的二进制表示,包括任何 <CR>和/或<LF>.二进制输出中没有遗漏任何内容,因此匹配所有行的示例将生成原始文件的精确二进制副本.

FINDSTR "^";文件 >FILE_COPY

/A 选项设置 fileName:、lineNumber: 和 lineOffset: 的颜色仅输出. 匹配行的文本始终以当前控制台颜色输出./A 选项仅在输出直接显示到控制台时有效.如果输出重定向到文件或通过管道传输,则/A 选项无效.请参阅2018-08-18 Aacini 的答案中的编辑,了解当输出重定向到 CON 时的错误行为描述.

大多数控制字符和许多扩展 ASCII 字符在 XP 上显示为点
XP 上的 FINDSTR 将匹配行中的大多数不可打印控制字符显示为屏幕上的点(句点).以下控制字符是例外;它们显示为自己:0x09 Tab、0x0A LineFeed、0x0B Vertical Tab、0x0C Form Feed、0x0D Carriage Return.

XP FINDSTR 也将一些扩展的 ASCII 字符转换为点.在 XP 上显示为点的扩展 ASCII 字符与在命令行上提供时转换的字符相同.请参阅命令行参数的字符限制 - 扩展的 ASCII 转换" 部分,在本文后面

如果输出通过管道传输、重定向到文件或在 FOR IN() 子句中,则控制字符和扩展 ASCII 不会在 XP 上转换为点.

Vista 和 Windows 7 始终将所有字符显示为自己,而不是点.

返回代码(ERRORLEVEL)

  • 0(成功)
    • 在至少一个文件的至少一行中找到匹配项.
  • 1(失败)
    • 在任何文件的任何行中均未找到匹配项.
    • /A:xx 选项指定的颜色无效
  • 2(错误)
    • 不兼容的选项 /L/R 都指定了
    • /A:/F:/C:/D: 之后缺少参数,或 /G:
    • 未找到由 /F:file/G:file 指定的文件
  • 255(错误)

要搜索的数据来源 (根据 Windows 7 的测试更新)
Findstr 只能从以下来源之一搜索数据:

  • 指定为参数和/或使用 /F:file 选项的文件名.

  • stdin 通过重定向 findstr "searchString";<文件

  • 来自管道类型文件的数据流 |findstr "searchString"

参数/选项优先于重定向,后者优先于管道数据.

文件名参数和 /F:file 可以组合.可以使用多个文件名参数.如果指定了多个 /F:file 选项,则只使用最后一个.文件名参数中允许使用通配符,但 /F:file 指向的文件中不允许使用通配符.

搜索字符串的来源 (根据 Windows 7 的测试更新)
/G:file/C:string 选项可以组合使用.可以指定多个 /C:string 选项.如果指定了多个 /G:file 选项,则只使用最后一个.如果使用 /G:file/C:string,则假定所有非选项参数都是要搜索的文件.如果 /G:file/C:string 均未使用,则第一个非选项参数将被视为以空格分隔的搜索词列表.

使用 /F:FILE 选项时,不得在文件中引用文件名.
文件名可能包含空格和其他特殊字符.大多数命令要求引用此类文件名.但是 FINDSTR /F:files.txt 选项要求 files.txt 中的文件名不能被引用.如果引用了名称,将找不到该文件.

BUG - 简短的 8.3 文件名可能会破坏 /D/S 选项
与所有 Windows 命令一样,在查找要搜索的文件时,FINDSTR 将尝试匹配长名称和短 8.3 名称.假设当前文件夹包含以下非空文件:

b1.txtb.txt2文件

以下命令将成功找到所有 3 个文件:

findstr/m "^";*.文本文件

b.txt2 匹配,因为对应的短名称 B9F64~1.TXT 匹配.这与所有其他 Windows 命令的行为一致.

但是 /D/S 选项的错误导致以下命令只能找到 b1.txt

findstr/m/d:.^"*.文本文件findstr/m/s "^";*.文本文件

该错误阻止了 b.txt2 以及在同一目录中在 b.txt2 之后排序的所有文件名.可以找到之前排序的其他文件,例如 a.txt.一旦触发错误,稍后排序的其他文件,如 d.txt,就会丢失.

搜索的每个目录都是独立处理的.例如,/S 选项会在父文件夹中找不到文件后成功开始在子文件夹中搜索,但是一旦错误导致子文件夹中缺少短文件名,则所有后续该子文件夹中的文件也会丢失.

如果在禁用 NTFS 8.3 名称生成的机器上创建相同的文件名,则这些命令可以正常工作.当然b.txt2不会被找到,但是c.txt会被正确找到.

并非所有短名称都会触发该错误.我所见过的所有窃听行为实例都涉及一个长度超过 3 个字符的扩展名,并带有一个简短的 8.3 名称,该名称的开头与不需要 8.3 名称的普通名称相同.

该错误已在 XP、Vista 和 Windows 7 上得到确认.

不可打印字符和 /P 选项
/P 选项使 FINDSTR 跳过包含以下任一十进制字节码的任何文件:
0-7、14-25、27-31.

换句话说,/P 选项只会跳过包含不可打印控制字符的文件.控制字符是小于或等于 31 (0x1F) 的代码.FINDSTR 将以下控制字符视为可打印:

 8 0x08 退格9 0x09 水平制表符10 0x0A 换行11 0x0B 垂直制表符12 0x0C 换页13 0x0D 回车26 0x1A 替代(文本结束)

所有其他控制字符都被视为不可打印,它们的存在会导致 /P 选项跳过文件.

管道输入和重定向输入可能附加了
如果输入通过管道输入并且流的最后一个字符不是 ,则 FINDSTR 将自动将 附加到输入.这已经在 XP、Vista 和 Windows 7 上得到证实.(我以前认为是 Windows 管道负责修改输入,但后来我发现 FINDSTR 实际上在做修改.)

Vista 上的重定向输入也是如此.如果用作重定向输入的文件的最后一个字符不是 ,则 FINDSTR 将自动将 附加到输入.但是,XP 和 Windows 7 不会改变重定向的输入.

如果重定向输入不以 <LF>
结尾,FINDSTR 会在 XP 和 Windows 7 上挂起这是一个讨厌的功能"在 XP 和 Windows 7 上.如果用作重定向输入的文件的最后一个字符不以 <LF> 结尾,则 FINDSTR 一旦到达重定向文件的末尾将无限期挂起.
>

管道数据的最后一行如果由单个字符组成可能会被忽略
如果输入是通过管道输入的,并且最后一行包含一个后面没有 <LF> 的字符,则 FINDSTR 会完全忽略最后一行.

示例 - 第一个带有单个字符且没有 <LF> 的命令匹配失败,但是带有 2 个字符的第二个命令可以正常工作,第三个带有一个字符并带有终止符的命令也是如此换行.

>设置/p=x";<空|findstr "^";>设置/p "=xx";<空|findstr "^";xx>回声 x|findstr "^";X

DosTips 用户 Sponge Belly 在 新的 findstr 错误.已在 XP、Windows 7 和 Windows 8 上确认.还没有听说过 Vista.(我不再需要测试 Vista).

选项语法
选项字母不区分大小写,因此 /i/I 是等效的.

选项可以以 /- 为前缀选项可以连接在单个 /- 之后.但是,串联的选项列表最多可以包含一个多字符选项,例如 OFF 或 F:,并且多字符选项必须是列表中的最后一个选项.

以下是对包含hello"和hello"的任何行进行不区分大小写的正则表达式搜索的所有等效方法.和再见"以任何顺序

也可以引用选项.所以 /i-i/i"-i" 都是等价的.同样,/c:string"/c":string"/c:"string"/c:string" 都是等价的.

如果搜索字符串以 /- 字面量开头,则 /C/G必须使用选项.感谢 Stephan 在评论中报告此事(已删除).

搜索字符串长度限制
在 Vista 上,单个搜索字符串的最大允许长度为 511 字节.如果任何搜索字符串超过 511,则结果为 FINDSTR:搜索字符串太长. ERRORLEVEL 2 错误.

进行正则表达式搜索时,最大搜索字符串长度为 254.长度在 255 到 511 之间的正则表达式将导致 FINDSTR: Out of memory 错误,ERRORLEVEL 2.一个正则表达式长度 >511 导致 FINDSTR: Search string too long. 错误.

在 Windows XP 上,搜索字符串的长度显然更短.Findstr 错误:搜索字符串太长":如何提取并匹配for"中的子串循环?文字和正则表达式搜索的 XP 限制为 127 字节.

行长限制
指定为命令行参数或通过/F:FILE 选项指定的文件没有已知的行长度限制.对不包含单个 <LF> 的 128MB 文件成功进行了搜索.

管道数据和重定向输入限制为每行 8191 字节.这个限制是一个特征"的 FINDSTR.它不是管道或重定向所固有的.使用重定向标准输入或管道输入的 FINDSTR 永远不会匹配任何大于等于 8k 字节的行.Lines >= 8k 向 stderr 生成错误消息,但如果在至少一个文件的至少一行中找到搜索字符串,则 ERRORLEVEL 仍为 0.

默认搜索类型:文字与正则表达式
/C:string" - 默认为/L 文字.将/L 选项与/C 显式组合:string"确实有效,但是是多余的.

字符串参数" - 默认值取决于第一个搜索字符串的内容.(记住 <space> 用于分隔搜索字符串.) 如果第一个搜索字符串是包含至少一个未转义元字符的有效正则表达式,则处理所有搜索字符串作为正则表达式.否则所有搜索字符串都被视为文字.例如, "51.4 200" 将被视为两个正则表达式,因为第一个字符串包含未转义的点,而 "200 51.4" 将被视为作为两个文字,因为第一个字符串不包含任何元字符.

/G:file - 默认值取决于文件中第一个非空行的内容.如果第一个搜索字符串是包含至少一个未转义元字符的有效正则表达式,则所有搜索字符串都被视为正则表达式.否则所有搜索字符串都被视为文字.

建议 - 在使用 字符串参数"/G:file.

BUG - 指定多个文字搜索字符串可能会产生不可靠的结果

以下简单的 FINDSTR 示例未能找到匹配项,即使它应该找到.

echo ffffaaa|findstr/l "ffffaaa faffaffddd";

此错误已在 Windows Server 2003、Windows XP、Vista 和 Windows 7 上得到确认.

根据实验,如果满足以下所有条件,FINDSTR 可能会失败:

  • 搜索使用了多个文字搜索字符串
  • 搜索字符串的长度不同
  • 较短的搜索字符串与较长的搜索字符串有一定的重叠
  • 搜索区分大小写(没有 /I 选项)

在我见过的每一次失败中,失败的总是较短的搜索字符串之一.

更多信息见为什么这个带有多个文字搜索字符串的 FINDSTR 示例没有找到匹配项?

<块引用>

命令行参数中的引号和反斜杠
注意 - 用户 MC ND 的评论反映了该部分实际复杂的规则.涉及 3 个不同的解析阶段:

  • 第一个 cmd.exe 可能需要将一些引号转义为 ^";(真的和FINDSTR无关)
  • 下一步 FINDSTR 使用 pre 2008 MS C/C++ 参数解析器,它对 " 有特殊的规则和
  • 在参数解析器完成后,FINDSTR 额外将 后跟一个字母数字字符视为文字,但将 后跟非字母数字字符视为转义字符

此突出显示部分的其余部分并非 100% 正确.它可以作为许多情况的指南,但要完全理解上述规则.

命令行搜索字符串中的转义引号
命令行搜索字符串中的引号必须用反斜杠转义,例如<代码>.对于文字和正则表达式搜索字符串都是如此.这信息已在 XP、Vista 和 Windows 7 上得到确认.

注意:CMD.EXE 解析器可能还需要对引号进行转义,但这与 FINDSTR 无关.例如,要搜索一个您可以使用单引号:

FINDSTR ^";文件&&回声找到 ||找不到回声

在命令行文字搜索字符串中转义反斜杠
文字搜索字符串中的反斜杠通常可以表示为\.它们通常是等效的.(可能有异常在 Vista 中必须始终转义反斜杠的情况,但我没有再有一台 Vista 机器来测试).

但有一些特殊情况:

当搜索连续的反斜杠时,除了最后一个必须逃脱了.最后一个反斜杠可以选择转义.

  • \ 可以编码为 \\\
  • \\ 可以编码为 \\\\\

在引用前搜索一个或多个反斜杠很奇怪.逻辑会建议必须对引号进行转义,并且每个前导反斜杠需要转义,但这不起作用!反而,每个前导反斜杠都必须进行双重转义,并且引号正常转义:

  • " 必须编码为 \\"
  • \" 必须编码为 \\\\"

如前所述,对于 CMD 解析器,一个或多个转义引号可能还需要使用 ^ 进行转义

本部分的信息已在 XP 和 Windows 7 上得到确认.

在命令行正则表达式搜索字符串中转义反斜杠

  • 仅适用于 Vista: 正则表达式中的反斜杠必须像 \\ 一样进行双重转义,或者在字符类集内进行单一转义,例如[\]

  • XP 和 Windows 7: 正则表达式中的反斜杠始终可以表示为 [\].它通常可以表示为 \.但这从来没有如果反斜杠位于转义引号之前,则有效.

    转义引号前的一个或多个反斜杠必须是双重转义,或者编码为 [\]

    • " 可以编码为 \\"[\]"
    • \" 可以编码为 \\\\"[\][\]";\[\]"

在/G:FILE 文字搜索字符串中转义引号和反斜杠
/G:file 指定的文字搜索字符串文件中的独立引号和反斜杠不需要转义,但可以转义.

"" 是等价的.

\ 是等价的.

如果目的是找到 \,那么至少前导反斜杠必须被转义.\\\ 都有效.

如果目的是找到 ",那么至少前导反斜杠必须被转义.\"\" 都可以工作.

在/G:FILE 正则表达式搜索字符串中转义引号和反斜杠
这是转义序列根据文档按预期工作的一种情况.Quote 不是正则表达式元字符,因此不需要转义(但可以转义).反斜杠是正则表达式元字符,因此必须对其进行转义.

命令行参数的字符限制 - 扩展 ASCII 转换
空字符 (0x00) 不能出现在命令行的任何字符串中.任何其他单字节字符都可以出现在字符串中 (0x01 - 0xFF).但是,FINDSTR 会将它在命令行参数中找到的许多扩展 ASCII 字符转换为其他字符.这在两个方面产生了重大影响:

  1. 如果用作命令行上的搜索字符串,许多扩展 ASCII 字符将与自身不匹配.此限制对于文字和正则表达式搜索是相同的.如果搜索字符串必须包含扩展 ASCII,则应改用 /G:FILE 选项.

  2. 如果文件名包含扩展的 ASCII 字符并且文件名是在命令行中指定的,则 FINDSTR 可能无法找到文件.如果要搜索的文件名称中包含扩展 ASCII,则应使用 /F:FILE 选项.

这里是 FINDSTR 对命令行字符串执行的扩展 ASCII 字符转换的完整列表.每个字符都表示为十进制字节代码值.第一个代码代表命令行上提供的字符,第二个代码代表它被转换成的字符.注意 - 此列表是在美国机器上编译的.我不知道其他语言可能对这个列表有什么影响.

158 被视为 080 199 被视为 221 226 被视为 071169 视为 170 200 视为 043 227 视为 112176 视为 221 201 视为 043 228 视为 083177 视为 221 202 视为 045 229 视为 115178 视为 221 203 视为 045 231 视为 116179 视为 221 204 视为 221 232 视为 070180 视为 221 205 视为 045 233 视为 084181 视为 221 206 视为 043 234 视为 079182 视为 221 207 视为 045 235 视为 100183 视为 043 208 视为 045 236 视为 056184 视为 043 209 视为 045 237 视为 102185 视为 221 210 视为 045 238 视为 101186 视为 221 211 视为 043 239 视为 110187 视为 043 212 视为 043 240 视为 061188 视为 043 213 视为 043 242 视为 061189 视为 043 214 视为 043 243 视为 061190 视为 043 215 视为 043 244 视为 040191 视为 043 216 视为 043 245 视为 041192 视为 043 217 视为 043 247 视为 126193 视为 045 218 视为 043 249 视为 250194 视为 045 219 视为 221 251 视为 118195 视为 043 220 视为 095 252 视为 110196 视为 045 222 视为 221 254 视为 221197 视为 043 223 视为 095198 视为 221 224 视为 097

任何不在上述列表中的字符 >0 都被视为它本身,包括 LF>.包含诸如 <CR><LF> 之类的奇数字符的最简单方法是将它们放入环境变量中,并在命令行参数中使用延迟扩展.

在由/G:FILE 和/F:FILE 选项指定的文件中找到的字符串的字符限制
nul (0x00) 字符可以出现在文件中,但它的作用类似于 C 字符串终止符.空字符之后的任何字符都被视为不同的字符串,就好像它们在另一行上一样.

字符被视为终止字符串的行终止符,并且不包含在字符串中.

所有其他单字节字符都完美地包含在一个字符串中.

搜索 Unicode 文件
FINDSTR 无法正确搜索大多数 Unicode(UTF-16、UTF-16LE、UTF-16BE、UTF-32),因为它无法搜索空字节,而 Unicode 通常包含许多空字节.

但是,TYPE 命令会将带有 BOM 的 UTF-16LE 转换为单字节字符集,因此类似以下的命令将适用于带有 BOM 的 UTF-16LE.

type unicode.txt|findstr "search";

请注意,您的活动代码页不支持的 Unicode 代码点将转换为 ? 字符.

只要您的搜索字符串仅包含 ASCII,就可以搜索 UTF-8.但是,任何多字节 UTF-8 字符的控制台输出都将不正确.但是如果你将输出重定向到一个文件,那么结果将被正确编码为 UTF-8.请注意,如果 UTF-8 文件包含 BOM,则 BOM 将被视为第一行的一部分,这可能会导致匹配行首的搜索失败.

如果您将搜索字符串放入 UTF-8 编码的搜索文件(无 BOM)并使用/G 选项,则可以搜索多字节 UTF-8 字符.>

行尾
FINDSTR 在每个 之后立即换行.<CR>的有无对换行没有影响.

跨行搜索
正如预期的那样, . 正则表达式元字符将不匹配 <CR>或<LF>.但是可以使用命令行搜索字符串跨换行符进行搜索.<CR>和<LF>字符必须明确匹配.如果找到多行匹配,则只打印匹配的第一行.FINDSTR 然后翻倍回到源代码中的第二行并重新开始搜索——有点像向前看".类型特征.

假设 TEXT.TXT 有这些内容(可以是 Unix 或 Windows 风格)

A一个一个乙一个一个

然后这个脚本

@echo off设置本地::定义包含换行符 (0x0A) 的 LF 变量设置 LF=^:: 以上 2 个空行很关键 - 不要删除::定义包含回车符的 CR 变量 (0x0D)for/f %%a in ('copy/Z "%~dpf0" nul') do set "CR=%%a";setlocal enableDelayedExpansion::正则表达式!CR!*!LF!"将匹配 Unix 和 Windows 风格的 End-Of-Linefindstr/n/r/c:A!CR!*!LF!A"测试文件

给出这些结果

1:A2:A5:A

使用/G:FILE 选项搜索换行符是不精确的,因为这是匹配 <CR> 的唯一方法.或<LF>是通过将 EOL 字符夹在中间的正则表达式字符类范围表达式.

  • [-<0x0B>] 匹配 ,但它也匹配 和 <0x0B>

  • [<0x0C>-!] 匹配 ,但它也匹配 <0x0C>和!

注意 - 以上是正则表达式字节流的符号表示,因为我无法以图形方式表示字符.

答案在下面的第 2 部分继续......

The Windows FINDSTR command is horribly documented. There is very basic command line help available through FINDSTR /?, or HELP FINDSTR, but it is woefully inadequate. There is a wee bit more documentation online at https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/findstr.

There are many FINDSTR features and limitations that are not even hinted at in the documentation. Nor could they be anticipated without prior knowledge and/or careful experimentation.

So the question is - What are the undocumented FINDSTR features and limitations?

The purpose of this question is to provide a one stop repository of the many undocumented features so that:

A) Developers can take full advantage of the features that are there.

B) Developers don't waste their time wondering why something doesn't work when it seems like it should.

Please make sure you know the existing documentation before responding. If the information is covered by the HELP, then it does not belong here.

Neither is this a place to show interesting uses of FINDSTR. If a logical person could anticipate the behavior of a particular usage of FINDSTR based on the documentation, then it does not belong here.

Along the same lines, if a logical person could anticipate the behavior of a particular usage based on information contained in any existing answers, then again, it does not belong here.

解决方案

Preface
Much of the information in this answer has been gathered based on experiments run on a Vista machine. Unless explicitly stated otherwise, I have not confirmed whether the information applies to other Windows versions.

FINDSTR output
The documentation never bothers to explain the output of FINDSTR. It alludes to the fact that matching lines are printed, but nothing more.

The format of matching line output is as follows:

filename:lineNumber:lineOffset:text

where

fileName: = The name of the file containing the matching line. The file name is not printed if the request was explicitly for a single file, or if searching piped input or redirected input. When printed, the fileName will always include any path information provided. Additional path information will be added if the /S option is used. The printed path is always relative to the provided path, or relative to the current directory if none provided.

Note - The filename prefix can be avoided when searching multiple files by using the non-standard (and poorly documented) wildcards < and >. The exact rules for how these wildcards work can be found here. Finally, you can look at this example of how the non-standard wildcards work with FINDSTR.

lineNumber: = The line number of the matching line represented as a decimal value with 1 representing the 1st line of the input. Only printed if /N option is specified.

lineOffset: = The decimal byte offset of the start of the matching line, with 0 representing the 1st character of the 1st line. Only printed if /O option is specified. This is not the offset of the match within the line. It is the number of bytes from the beginning of the file to the beginning of the line.

text = The binary representation of the matching line, including any <CR> and/or <LF>. Nothing is left out of the binary output, such that this example that matches all lines will produce an exact binary copy of the original file.

FINDSTR "^" FILE >FILE_COPY

The /A option sets the color of the fileName:, lineNumber:, and lineOffset: output only. The text of the matching line is always output with the current console color. The /A option only has effect when output is displayed directly to the console. The /A option has no effect if the output is redirected to a file or piped. See the 2018-08-18 edit in Aacini's answer for a description of the buggy behavior when output is redirected to CON.

Most control characters and many extended ASCII characters display as dots on XP
FINDSTR on XP displays most non-printable control characters from matching lines as dots (periods) on the screen. The following control characters are exceptions; they display as themselves: 0x09 Tab, 0x0A LineFeed, 0x0B Vertical Tab, 0x0C Form Feed, 0x0D Carriage Return.

XP FINDSTR also converts a number of extended ASCII characters to dots as well. The extended ASCII characters that display as dots on XP are the same as those that are transformed when supplied on the command line. See the "Character limits for command line parameters - Extended ASCII transformation" section, later in this post

Control characters and extended ASCII are not converted to dots on XP if the output is piped, redirected to a file, or within a FOR IN() clause.

Vista and Windows 7 always display all characters as themselves, never as dots.

Return Codes (ERRORLEVEL)

  • 0 (success)
    • Match was found in at least one line of at least one file.
  • 1 (failure)
    • No match was found in any line of any file.
    • Invalid color specified by /A:xx option
  • 2 (error)
    • Incompatible options /L and /R both specified
    • Missing argument after /A:, /F:, /C:, /D:, or /G:
    • File specified by /F:file or /G:file not found
  • 255 (error)

Source of data to search (Updated based on tests with Windows 7)
Findstr can search data from only one of the following sources:

  • filenames specified as arguments and/or using the /F:file option.

  • stdin via redirection findstr "searchString" <file

  • data stream from a pipe type file | findstr "searchString"

Arguments/options take precedence over redirection, which takes precedence over piped data.

File name arguments and /F:file may be combined. Multiple file name arguments may be used. If multiple /F:file options are specified, then only the last one is used. Wild cards are allowed in filename arguments, but not within the file pointed to by /F:file.

Source of search strings (Updated based on tests with Windows 7)
The /G:file and /C:string options may be combined. Multiple /C:string options may be specified. If multiple /G:file options are specified, then only the last one is used. If either /G:file or /C:string is used, then all non-option arguments are assumed to be files to search. If neither /G:file nor /C:string is used, then the first non-option argument is treated as a space delimited list of search terms.

File names must not be quoted within the file when using the /F:FILE option.
File names may contain spaces and other special characters. Most commands require that such file names are quoted. But the FINDSTR /F:files.txt option requires that filenames within files.txt must NOT be quoted. The file will not be found if the name is quoted.

BUG - Short 8.3 filenames can break the /D and /S options
As with all Windows commands, FINDSTR will attempt to match both the long name and the short 8.3 name when looking for files to search. Assume the current folder contains the following non-empty files:

b1.txt
b.txt2
c.txt

The following command will successfully find all 3 files:

findstr /m "^" *.txt

b.txt2 matches because the corresponding short name B9F64~1.TXT matches. This is consistent with the behavior of all other Windows commands.

But a bug with the /D and /S options causes the following commands to only find b1.txt

findstr /m /d:. "^" *.txt
findstr /m /s "^" *.txt

The bug prevents b.txt2 from being found, as well as all file names that sort after b.txt2 within the same directory. Additional files that sort before, like a.txt, are found. Additional files that sort later, like d.txt, are missed once the bug has been triggered.

Each directory searched is treated independently. For example, the /S option would successfully begin searching in a child folder after failing to find files in the parent, but once the bug causes a short file name to be missed in the child, then all subsequent files in that child folder would also be missed.

The commands work bug free if the same file names are created on a machine that has NTFS 8.3 name generation disabled. Of course b.txt2 would not be found, but c.txt would be found properly.

Not all short names trigger the bug. All instances of bugged behavior I have seen involve an extension that is longer than 3 characters with a short 8.3 name that begins the same as a normal name that does not require an 8.3 name.

The bug has been confirmed on XP, Vista, and Windows 7.

Non-Printable characters and the /P option
The /P option causes FINDSTR to skip any file that contains any of the following decimal byte codes:
0-7, 14-25, 27-31.

Put another way, the /P option will only skip files that contain non-printable control characters. Control characters are codes less than or equal to 31 (0x1F). FINDSTR treats the following control characters as printable:

 8  0x08  backspace
 9  0x09  horizontal tab
10  0x0A  line feed
11  0x0B  vertical tab
12  0x0C  form feed
13  0x0D  carriage return
26  0x1A  substitute (end of text)

All other control characters are treated as non-printable, the presence of which causes the /P option to skip the file.

Piped and Redirected input may have <CR><LF> appended
If the input is piped in and the last character of the stream is not <LF>, then FINDSTR will automatically append <CR><LF> to the input. This has been confirmed on XP, Vista and Windows 7. (I used to think that the Windows pipe was responsible for modifying the input, but I have since discovered that FINDSTR is actually doing the modification.)

The same is true for redirected input on Vista. If the last character of a file used as redirected input is not <LF>, then FINDSTR will automatically append <CR><LF> to the input. However, XP and Windows 7 do not alter redirected input.

FINDSTR hangs on XP and Windows 7 if redirected input does not end with <LF>
This is a nasty "feature" on XP and Windows 7. If the last character of a file used as redirected input does not end with <LF>, then FINDSTR will hang indefinitely once it reaches the end of the redirected file.

Last line of Piped data may be ignored if it consists of a single character
If the input is piped in and the last line consists of a single character that is not followed by <LF>, then FINDSTR completely ignores the last line.

Example - The first command with a single character and no <LF> fails to match, but the second command with 2 characters works fine, as does the third command that has one character with terminating newline.

> set /p "=x" <nul | findstr "^"

> set /p "=xx" <nul | findstr "^"
xx

> echo x| findstr "^"
x

Reported by DosTips user Sponge Belly at new findstr bug. Confirmed on XP, Windows 7 and Windows 8. Haven't heard about Vista yet. (I no longer have Vista to test).

Option syntax
Option letters are not case sensitive, so /i and /I are equivalent.

Options can be prefixed with either / or - Options may be concatenated after a single / or -. However, the concatenated option list may contain at most one multicharacter option such as OFF or F:, and the multi-character option must be the last option in the list.

The following are all equivalent ways of expressing a case insensitive regex search for any line that contains both "hello" and "goodbye" in any order

  • /i /r /c:"hello.*goodbye" /c:"goodbye.*hello"

  • -i -r -c:"hello.*goodbye" /c:"goodbye.*hello"

  • /irc:"hello.*goodbye" /c:"goodbye.*hello"

Options may also be quoted. So /i, -i, "/i" and "-i" are all equivalent. Likewise, /c:string, "/c":string, "/c:"string and "/c:string" are all equivalent.

If a search string begins with a / or - literal, then the /C or /G option must be used. Thanks to Stephan for reporting this in a comment (since deleted).

Search String length limits
On Vista the maximum allowed length for a single search string is 511 bytes. If any search string exceeds 511 then the result is a FINDSTR: Search string too long. error with ERRORLEVEL 2.

When doing a regular expression search, the maximum search string length is 254. A regular expression with length between 255 and 511 will result in a FINDSTR: Out of memory error with ERRORLEVEL 2. A regular expression length >511 results in the FINDSTR: Search string too long. error.

On Windows XP the search string length is apparently shorter. Findstr error: "Search string too long": How to extract and match substring in "for" loop? The XP limit is 127 bytes for both literal and regex searches.

Line Length limits
Files specified as a command line argument or via the /F:FILE option have no known line length limit. Searches were successfully run against a 128MB file that did not contain a single <LF>.

Piped data and Redirected input is limited to 8191 bytes per line. This limit is a "feature" of FINDSTR. It is not inherent to pipes or redirection. FINDSTR using redirected stdin or piped input will never match any line that is >=8k bytes. Lines >= 8k generate an error message to stderr, but ERRORLEVEL is still 0 if the search string is found in at least one line of at least one file.

Default type of search: Literal vs Regular Expression
/C:"string" - The default is /L literal. Explicitly combining the /L option with /C:"string" certainly works but is redundant.

"string argument" - The default depends on the content of the very first search string. (Remember that <space> is used to delimit search strings.) If the first search string is a valid regular expression that contains at least one un-escaped meta-character, then all search strings are treated as regular expressions. Otherwise all search strings are treated as literals. For example, "51.4 200" will be treated as two regular expressions because the first string contains an un-escaped dot, whereas "200 51.4" will be treated as two literals because the first string does not contain any meta-characters.

/G:file - The default depends on the content of the first non-empty line in the file. If the first search string is a valid regular expression that contains at least one un-escaped meta-character, then all search strings are treated as regular expressions. Otherwise all search strings are treated as literals.

Recommendation - Always explicitly specify /L literal option or /R regular expression option when using "string argument" or /G:file.

BUG - Specifying multiple literal search strings can give unreliable results

The following simple FINDSTR example fails to find a match, even though it should.

echo ffffaaa|findstr /l "ffffaaa faffaffddd"

This bug has been confirmed on Windows Server 2003, Windows XP, Vista, and Windows 7.

Based on experiments, FINDSTR may fail if all of the following conditions are met:

  • The search is using multiple literal search strings
  • The search strings are of different lengths
  • A short search string has some amount of overlap with a longer search string
  • The search is case sensitive (no /I option)

In every failure I have seen, it is always one of the shorter search strings that fails.

For more info see Why doesn't this FINDSTR example with multiple literal search strings find a match?

Quotes and backslahses within command line arguments
Note - User MC ND's comments reflect the actual horrifically complicated rules for this section. There are 3 distinct parsing phases involved:

  • First cmd.exe may require some quotes to be escaped as ^" (really nothing to do with FINDSTR)
  • Next FINDSTR uses the pre 2008 MS C/C++ argument parser, which has special rules for " and
  • After the argument parser finishes, FINDSTR additionally treats followed by an alpha-numeric character as literal, but followed by non-alpha-numeric character as an escape character

The remainder of this highlighted section is not 100% correct. It can serve as a guide for many situations, but the above rules are required for total understanding.

Escaping Quote within command line search strings
Quotes within command line search strings must be escaped with backslash like ". This is true for both literal and regex search strings. This information has been confirmed on XP, Vista, and Windows 7.

Note: The quote may also need to be escaped for the CMD.EXE parser, but this has nothing to do with FINDSTR. For example, to search for a single quote you could use:

FINDSTR ^" file && echo found || echo not found

Escaping Backslash within command line literal search strings
Backslash in a literal search string can normally be represented as or as \. They are typically equivalent. (There may be unusual cases in Vista where the backslash must always be escaped, but I no longer have a Vista machine to test).

But there are some special cases:

When searching for consecutive backslashes, all but the last must be escaped. The last backslash may optionally be escaped.

  • \ can be coded as \ or \\
  • \ can be coded as \\ or \\\

Searching for one or more backslashes before a quote is bizarre. Logic would suggest that the quote must be escaped, and each of the leading backslashes would need to be escaped, but this does not work! Instead, each of the leading backslashes must be double escaped, and the quote is escaped normally:

  • " must be coded as \\"
  • \" must be coded as \\\\"

As previously noted, one or more escaped quotes may also require escaping with ^ for the CMD parser

The info in this section has been confirmed on XP and Windows 7.

Escaping Backslash within command line regex search strings

  • Vista only: Backslash in a regex must be either double escaped like \\, or else single escaped within a character class set like [\]

  • XP and Windows 7: Backslash in a regex can always be represented as [\]. It can normally be represented as \. But this never works if the backslash precedes an escaped quote.

    One or more backslashes before an escaped quote must either be double escaped, or else coded as [\]

    • " may be coded as \\" or [\]"
    • \" may be coded as \\\\" or [\][\]" or \[\]"

Escaping Quote and Backslash within /G:FILE literal search strings
Standalone quotes and backslashes within a literal search string file specified by /G:file need not be escaped, but they can be.

" and " are equivalent.

and \ are equivalent.

If the intent is to find \, then at least the leading backslash must be escaped. Both \ and \\ work.

If the intent is to find ", then at least the leading backslash must be escaped. Both \" and \" work.

Escaping Quote and Backslash within /G:FILE regex search strings
This is the one case where the escape sequences work as expected based on the documentation. Quote is not a regex metacharacter, so it need not be escaped (but can be). Backslash is a regex metacharacter, so it must be escaped.

Character limits for command line parameters - Extended ASCII transformation
The null character (0x00) cannot appear in any string on the command line. Any other single byte character can appear in the string (0x01 - 0xFF). However, FINDSTR converts many extended ASCII characters it finds within command line parameters into other characters. This has a major impact in two ways:

  1. Many extended ASCII characters will not match themselves if used as a search string on the command line. This limitation is the same for literal and regex searches. If a search string must contain extended ASCII, then the /G:FILE option should be used instead.

  2. FINDSTR may fail to find a file if the name contains extended ASCII characters and the file name is specified on the command line. If a file to be searched contains extended ASCII in the name, then the /F:FILE option should be used instead.

Here is a complete list of extended ASCII character transformations that FINDSTR performs on command line strings. Each character is represented as the decimal byte code value. The first code represents the character as supplied on the command line, and the second code represents the character it is transformed into. Note - this list was compiled on a U.S machine. I do not know what impact other languages may have on this list.

158 treated as 080     199 treated as 221     226 treated as 071
169 treated as 170     200 treated as 043     227 treated as 112
176 treated as 221     201 treated as 043     228 treated as 083
177 treated as 221     202 treated as 045     229 treated as 115
178 treated as 221     203 treated as 045     231 treated as 116
179 treated as 221     204 treated as 221     232 treated as 070
180 treated as 221     205 treated as 045     233 treated as 084
181 treated as 221     206 treated as 043     234 treated as 079
182 treated as 221     207 treated as 045     235 treated as 100
183 treated as 043     208 treated as 045     236 treated as 056
184 treated as 043     209 treated as 045     237 treated as 102
185 treated as 221     210 treated as 045     238 treated as 101
186 treated as 221     211 treated as 043     239 treated as 110
187 treated as 043     212 treated as 043     240 treated as 061
188 treated as 043     213 treated as 043     242 treated as 061
189 treated as 043     214 treated as 043     243 treated as 061
190 treated as 043     215 treated as 043     244 treated as 040
191 treated as 043     216 treated as 043     245 treated as 041
192 treated as 043     217 treated as 043     247 treated as 126
193 treated as 045     218 treated as 043     249 treated as 250
194 treated as 045     219 treated as 221     251 treated as 118
195 treated as 043     220 treated as 095     252 treated as 110
196 treated as 045     222 treated as 221     254 treated as 221
197 treated as 043     223 treated as 095
198 treated as 221     224 treated as 097

Any character >0 not in the list above is treated as itself, including <CR> and <LF>. The easiest way to include odd characters like <CR> and <LF> is to get them into an environment variable and use delayed expansion within the command line argument.

Character limits for strings found in files specified by /G:FILE and /F:FILE options
The nul (0x00) character can appear in the file, but it functions like the C string terminator. Any characters after a nul character are treated as a different string as if they were on another line.

The <CR> and <LF> characters are treated as line terminators that terminate a string, and are not included in the string.

All other single byte characters are included perfectly within a string.

Searching Unicode files
FINDSTR cannot properly search most Unicode (UTF-16, UTF-16LE, UTF-16BE, UTF-32) because it cannot search for nul bytes and Unicode typically contains many nul bytes.

However, the TYPE command converts UTF-16LE with BOM to a single byte character set, so a command like the following will work with UTF-16LE with BOM.

type unicode.txt|findstr "search"

Note that Unicode code points that are not supported by your active code page will be converted to ? characters.

It is possible to search UTF-8 as long as your search string contains only ASCII. However, the console output of any multi-byte UTF-8 characters will not be correct. But if you redirect the output to a file, then the result will be correctly encoded UTF-8. Note that if the UTF-8 file contains a BOM, then the BOM will be considered as part of the first line, which could throw off a search that matches the beginning of a line.

It is possible to search multi-byte UTF-8 characters if you put your search string in a UTF-8 encoded search file (without BOM), and use the /G option.

End Of Line
FINDSTR breaks lines immediately after every <LF>. The presence or absence of <CR> has no impact on line breaks.

Searching across line breaks
As expected, the . regex metacharacter will not match <CR> or <LF>. But it is possible to search across a line break using a command line search string. Both the <CR> and <LF> characters must be matched explicitly. If a multi-line match is found, only the 1st line of the match is printed. FINDSTR then doubles back to the 2nd line in the source and begins the search all over again - sort of a "look ahead" type feature.

Assume TEXT.TXT has these contents (could be Unix or Windows style)

A
A
A
B
A
A

Then this script

@echo off
setlocal
::Define LF variable containing a linefeed (0x0A)
set LF=^


::Above 2 blank lines are critical - do not remove

::Define CR variable containing a carriage return (0x0D)
for /f %%a in ('copy /Z "%~dpf0" nul') do set "CR=%%a"

setlocal enableDelayedExpansion
::regex "!CR!*!LF!" will match both Unix and Windows style End-Of-Line
findstr /n /r /c:"A!CR!*!LF!A" TEST.TXT

gives these results

1:A
2:A
5:A

Searching across line breaks using the /G:FILE option is imprecise because the only way to match <CR> or <LF> is via a regex character class range expression that sandwiches the EOL characters.

  • [<TAB>-<0x0B>] matches <LF>, but it also matches <TAB> and <0x0B>

  • [<0x0C>-!] matches <CR>, but it also matches <0x0C> and !

Note - the above are symbolic representations of the regex byte stream since I can't graphically represent the characters.

Answer continued in part 2 below...

这篇关于Windows FINDSTR 命令有哪些未记录的功能和限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆