如何检测文件是否不是c#中的文本文件 [英] How to detect if a file is not a text file in c#
问题描述
我需要阅读许多文件并在其中搜索特定文本。我只想打开文本文件,即没有图像,电影等文件。我正在寻找一种识别非文本文件的方法。由于我将使用FileStream并进行字节搜索,因此在我看来,如果遇到小数值大于128的字节,我可以停止读取并关闭文件。这似乎是一个好方法吗?
对此,没有万无一失的答案。如果你知道任何文本文件只能是ASCII字符(用ASCII,UTF-8或类似的东西编码),那么是的,这将工作...虽然它可能无法捕获所有的非全部字符,文本文件。
但是:
- 任何文本文件都会失败非ASCII文本
- 对于某种格式的文件而言,该文件仍然可能失败,但该文件不包含任何超过128的值。
字节序列{34,87,23,10}是否代表文本或二进制数据?确实无法知道。 你做的任何事情都是启发式的。
I need to read through many files and search for specific text in them. I want to open only text files, i.e., no image, movie, etc. files. I am looking for a way to identify non-text files. Since I will be using a FileStream and doing a byte search, it seems to me I can stop reading and close a file if a byte whose decimal value is greater than 128 is encountered. Does this seem like a good approach?
There's no foolproof answer for this. If you know that any text files will only ever be ASCII characters (and encoded in ASCII, UTF-8 or something similar) then yes, that will work... although it may not catch all non-text files.
However:
- It will fail for any text files using non-ASCII text
- It could still fail for a file which is a valid binary file for some format, but happens not to contain any values above 128.
Does the sequence of bytes { 34, 87, 23, 10 } represent text or binary data? There's simply no way of knowing for sure. Anything you do will be heuristic.
这篇关于如何检测文件是否不是c#中的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!