在Perl中检查CSV文件的正确性 [英] Check for CSV file correctness in Perl

查看:104
本文介绍了在Perl中检查CSV文件的正确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个读取CSV文件的过程,我想在开始解析之前确保它是正确的.

I have a process that reads a CSV file, and I want to make sure it's correct before I start parsing it.

我得到一个文件名,检查它是否存在,然后检查其完整性.如果它不存在或没有正确的CSV文件,那么我会改用前一天的文件

I get a file name, check if it exists, then check its integrity. If it's not there or not a proper CSV file then I try the file from the previous day instead

有没有办法检查文件是否为正确的CSV文件?我正在使用 Text::CSV_XS 进行解析.

Is there a way to check that the file is proper CSV file? I am using Text::CSV_XS to parse it.

在Google搜索中发现了此csv -在Text::CSV_XS Git存储库中检查示例代码.看起来像我可以使用的东西.

Googling a bit I found this csv-check example code on the Text::CSV_XS Git repo. It looks like something I could use.

推荐答案

正如其他人所指出的,您必须解析 entire 文件以确定它是否有效.您也可以用一块石头杀死两只鸟,并同时进行数据处理和错误检查.

As others have noted, you have to parse the entire file to determine if it's valid. You may as well kill two birds with one stone and do your data processing and error checking at the same time.

getline()当到达EOF或无法解析行时,将返回undef.您可以使用它来解析文件,如果有任何解析错误,则暂停:

getline() returns undef when it reaches EOF or if it fails to parse a line. You can use this to parse a file, halting if there are any parse errors:

while ( my $row = $csv->getline($io) ) {
    # Process row
}
$csv->eof or do_something();

您也可以

use autodie;

或设置 auto_diag 错误时Text::CSV_XS->new()die中的选项:

$csv = Text::CSV_XS->new({ auto_diag => 2 });

您可以通过将解析代码包装在eval块中来处理错误.此方法将在die之前自动调用error_diag(),将错误打印到stderr;否则,将错误输出到stderr.这可能不是您想要的.

You can handle the errors by wrapping your parsing code in an eval block. This method will automatically call error_diag() before dieing, printing the error to stderr; this may not be what you want.

如果检测到错误,如何恢复"对前几行所做的处理?如果数据库引擎支持它们,则一种可能是数据库事务.当您开始处理文件时,请开始事务.如果遇到解析错误,只需回滚事务并移至下一个文件;否则,提交交易.

How do you "revert" the processing you did for previous rows if you detect an error? One possibility, if your database engine supports them, are database transactions. When you start processing a file, start a transaction. If you get a parse error, simply roll back the transaction and move on to the next file; otherwise, commit the transaction.

顺便说一句,我还没有看到您用于插入数据库记录的代码,因此我不确定是否适用,但是为每行设置一个单独的插入语句并不是很有效.相反,可以考虑在解析文件时构造复合插入语句.或者,对于非常大的文件,让数据库使用MySQL的

As an aside, I haven't seen your code for inserting database records so I'm not sure if this applies, but it's not very efficient to have a separate insert statement for each row. Instead, consider either constructing a compound insert statement as you parse the file; or, for very large files, let the database do the parsing with something like MySQL's LOAD DATA INFILE (just an example since I don't know what DBMS you're using).

要使用复合插入,请在内存中构建查询语句,例如建议使用Borodin .如果您到达文件末尾而没有任何解析错误,请执行以下语句;否则,请执行以下语句.否则,将其丢弃并移至下一个文件.

To use a compound insert, build the query statement in memory like Borodin suggested. If you get to the end of the file without any parse errors, execute the statement; otherwise, throw it out and move on to the next file.

对于非常大的文件,让数据库进行解析可能是最快的,尤其是如果您在插入数据之前进行最少的处理时.例如,MySQL的 LOAD DATA INFILE 将停止如果它检测到数据解释或重复的键错误.如果将语句包装在事务中,则可以在出现错误时回滚并尝试加载下一个文件.这种方法的优点是,加载有效文件的速度将非常快,比必须首先使用Perl对其进行解析的速度要快得多.

For very large files, it might be fastest to let the database do the parsing, especially if you're doing minimal processing before inserting the data. MySQL's LOAD DATA INFILE, for example, will halt if it detects data interpretation or duplicate key errors. If you wrap the statement in a transaction, you can roll back if there are errors and try to load the next file. The advantage of this approach is that loading valid files will be extremely fast, much faster than if you had to parse them with Perl first.

这篇关于在Perl中检查CSV文件的正确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆