检查TEXT文件是否包含任何可打印字符 [英] Check whether a TEXT file contains any Printable characters

查看:77
本文介绍了检查TEXT文件是否包含任何可打印字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我的应用程序正在处理文件夹中的数千个TEXT文件,我正在处理过滤功能以删除空白文件。



截至目前,我正在检查字节长度以及是否小于阈值[只是为了容纳带空格的文件]。这按预期工作得很好,但现在我遇到了一些文件,其中只包含换行符,并且没有任何可打印字符



我的用户想要过滤 - 这些文件以及空白文件也是如此。



请建议我最好的方法来检查文件是否包含可打印字符,另一方面是否有以任何方式查找TEXT文件的字节大小,不包括空格。



请注意性能是我主要关注的问题,因为我正在处理有数千个TEXT文件

Hi,

My application is dealing with thousands of TEXT files in a folder, where i am working on a filter functionality to remove blank files.

As of now, i am checking the byte length and if it is less than a threshold [just to accommodate files with blank spaces]. This was working good as expected, but now i came across few files which contains only line breaks and it doesn't have any printable character

My users want to filter-out those files as well along with blank files.

Please suggest me a best possible way to check whether a file contains a printable character or not on the other hand is there any way to find the bytesize of a TEXT file excluding blank space.

Please note that performance would my primary concern, as i am dealing with thousands of TEXT files

推荐答案

我认为没有比逐个字符(逐字节)扫描每个文件更好的方法,搜索第一个打印字符或文件结束标记。



为了获得良好的性能,我建议使用低级API,例如读取 FileStream 对象的方法,使用大缓冲区进行块转换FERS。 (基准并调整缓冲区大小。)



要检查打印字符,可以依赖.NET实用程序函数,如 Char :: IsControl ,或实现自己的查找表,涵盖所有256字节值。
I see no better way than scanning every file character by character (byte by byte), in search of either the first printing character or the end-of-file marker.

To achieve good performance, I'd recommend to use a low-level API, such as the Read method of a FileStream object, using a large buffer for block transfers. (Benchmark and tune the buffer size.)

To check for the printing characters, you can rely on .NET utility functions like Char::IsControl, or implement your own lookup table that covers all 256 byte values.


这篇关于检查TEXT文件是否包含任何可打印字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆