在java中的txt文件格式验证 [英] txt file format validation in java

查看:154
本文介绍了在java中的txt文件格式验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




  • 事实上,一个.txt文件并不是另一种类型的文件只有扩展名被修改。

  • <.txt> .txt文件的格式与指定的格式匹配(所以它能被正确解析,包含所有相关的信息等)
  • 这是所有在Java中完成的工作,然后需要检查,以确保它应该是什么。到目前为止,我只找到了JHOVE(现在是JHOVE2)作为这个任务的工具,但是没有在Java代码中通过命令行来实现它,感谢您的帮助。

    解决方案

    听起来您正在寻找一种通用的格式选项,我可以推荐定期表达给你?你可以使用正则表达式进行各种不同的匹配。我在下面写了一个简单的例子[对于所有这些正则表达式专家,如果我没有使用完美的表达式,请怜悯我])。您可以将REGEX和MAX_LINES_TO_READ常量放入一个属性文件中,然后对其进行修改,使其更加通用化。

    您基本上会测试.txt文件行数(但是需要很多行来确定格式是否正确 - 也可以使用正则表达式作为标题行,或根据需要使用多个不同的正则表达式来测试格式),如果所有这些行匹配,则文件将是标记为有效。



    这只是您可能运行的一个示例。你应该实现适当的异常处理,而不是只捕获一个异常。



    为了在Java中测试正则表达式, http://www.regexplanet.com/simple/index.html 的作品非常好。



    以下是ValidateTxtFile源代码...

      import java.io. *; 

    public class ValidateTxtFile {

    private final int MAX_LINES_TO_READ = 5;

    private final String REGEX =。{15} [] {5}。{15} [] {5} [ - ] \\ {2} \\.\ \ {2} [] {9} \\ {2} / \\ {2} / \\\\ {4};

    public void testFile(String fileName){

    int lineCounter = 1;

    尝试{

    BufferedReader br = new BufferedReader(new FileReader(fileName));

    String line = br.readLine(); ((line!= null)&& amp;(lineCounter< = MAX_LINES_TO_READ)){

    //根据正则表达式验证行的格式正确
    if(line.matches(REGEX)){
    System.out.println(Line+ lineCounter +formatted correctly);

    $ {
    System.out.println(行格式无效+ lineCounter +(+ line +));
    }

    line = br.readLine();
    lineCounter ++;
    }

    } catch(Exception ex){
    System.out.println(Exception occurred:+ ex.toString());


    $ b public static void main(String args []){

    ValidateTxtFile vtf = new ValidateTxtFile();

    vtf.testFile(transactions.txt);




    $ p $这些是transactions.txt中的内容...

      Electric Electric Co. -50.99 2011/12/28 
    食品食品店-80.31 12/28/2011
    服装店-99.36 2011/12/28
    娱乐保龄球-30.4393 2011/12/28
    餐厅麦当劳-10.35 12/28/11



    运行应用程序时的输出是...

     第1行格式正确
    第2行格式正确
    第3行格式正确
    第4行无效格式(娱乐保龄球-30.4393 2011/12/28)
    格式无效第5行(麦当劳餐厅-10.35 12/28/11)


    编辑12 / 29/2011上午10点左右

    不确定是否有性能方面的问题,b我只是简单地复制了transactions.txt中的条目来创建一个包含大约130万行的文本文件,我可以在我的PC上在大约7秒钟内完成整个文件。我改变了System.out只显示在无效(524,288)和有效(786,432)格式化条目结束时的总计数。 transactions.txt大小约为85mb。


    What is the best way to validate whether a .txt file is:

    • In fact a .txt file and not another type of file with only the extension changed.

    • The format of the .txt file matches the specified format (so it is able to be parsed correctly, contains all the relevant information, etc.)

    This is all being done in Java, where a file will be retrieved and then needs to be checked to make sure it is what it is supposed to be. So far I have only found JHOVE (and now JHOVE2) as tools for this task but have not found much in the way of documentation for implementing it within Java code as opposed to through the command line. Thanks for your help.

    解决方案

    As it sounds like you're looking for a general sort of formatting option, could I recommend regular expressions to you? You can do all sorts of different kinds of matching using regex. I've written a simple example below [for all those regex experts out there, have mercy on me if I didn't use the perfect expression ;) ]. You could put the REGEX and MAX_LINES_TO_READ constants into a properties file and modify that to make it even more generalized.

    You would basically test your ".txt" file for a maximum number of lines (however many lines are needed to establish the formatting is good - you could also use regular expressions for a header line or do multiple different regular expressions as needed to test the formatting) and if all those lines matched, the file would be flagged as "valid".

    This is just an example for you to possibly run with. You should implement proper exception handling other than just catching "Exception" for one.

    For testing your regular expressions in Java, http://www.regexplanet.com/simple/index.html works very nice.

    Here's the "ValidateTxtFile" source...

    import java.io.*;
    
    public class ValidateTxtFile {
    
        private final int MAX_LINES_TO_READ = 5;
    
        private final String REGEX = ".{15}[ ]{5}.{15}[ ]{5}[-]\\d{2}\\.\\d{2}[ ]{9}\\d{2}/\\d{2}/\\d{4}";
    
        public void testFile(String fileName) {
    
            int lineCounter = 1;
    
            try {
    
                BufferedReader br = new BufferedReader(new FileReader(fileName));
    
                String line = br.readLine();
    
                while ((line != null) && (lineCounter <= MAX_LINES_TO_READ)) {
    
                    // Validate the line is formatted correctly based on regular expressions                
                    if (line.matches(REGEX)) {
                        System.out.println("Line " + lineCounter + " formatted correctly");
                    }
                    else {
                        System.out.println("Invalid format on line " + lineCounter + " (" + line + ")");
                    }
    
                    line = br.readLine();
                    lineCounter++;
                }
    
            } catch (Exception ex) {
                System.out.println("Exception occurred: " + ex.toString());
            }
        }
    
        public static void main(String args[]) {
    
            ValidateTxtFile vtf = new ValidateTxtFile();
    
            vtf.testFile("transactions.txt");
        }   
    }
    

    Here's what's in "transactions.txt"...

    Electric            Electric Co.        -50.99         12/28/2011
    Food                Food Store          -80.31         12/28/2011
    Clothes             Clothing Store      -99.36         12/28/2011
    Entertainment       Bowling             -30.4393       12/28/2011
    Restaurant          Mcdonalds           -10.35         12/28/11
    

    The output when I ran the app was...

    Line 1 formatted correctly
    Line 2 formatted correctly
    Line 3 formatted correctly
    Invalid format on line 4 (Entertainment       Bowling             -30.4393       12/28/2011)
    Invalid format on line 5 (Restaurant          Mcdonalds           -10.35         12/28/11)
    


    EDIT 12/29/2011 about 10:00am
    Not sure if there is a performance concern on this or not, but just as an FYI I duplicated the entries in "transactions.txt" several times to build a text file with about 1.3 million rows in it and I was able to get through the whole file in about 7 seconds on my PC. I changed the System.out's to just show a grand total count at the end of invalid (524,288) and valid (786,432) formatted entries. "transactions.txt" was about 85mb in size.

    这篇关于在java中的txt文件格式验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆