如何检测 Perl 中的空行? [英] How can I detect a blank line in Perl?

查看:30
本文介绍了如何检测 Perl 中的空行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  • 如何在 Perl 中检查一行($_ 值)是否为空行?或其他检查它而不是使用 $_ 的好方法?

  • How do I check a line ($_ value) is a blank line in Perl? Or another good method to check it instead of using $_?

我想这样编码

if ($_ eq '') # 检查当前行是否为空行(无任何字符){$x = 0;}

if ($_ eq '') # Check current line is a blank line (no any characters) { $x = 0; }

我用下面的问题解决方案更新了一些代码.

I updated some code with a question solution below.

我的 test.txt 用于解析:

   constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
       vid = 0
       name = ""
       units = ""

   constant fixup private GemConfigAlarms = <U1 0>         /*  my Comment  */
       vid = 1
       name = "CONFIGALARMS"
   units = ""
   min = <U1 0>
   max = <U1 2>
   default = <U1 0>

我的代码如下.

这就是为什么我需要最初设置 $x = 0.我不确定它是否正常解决与否.

That's why I need to initially set $x = 0. I am not sure if it is a normal solution or not.

    sub ConstantParseAndPrint
    {
        if (/^$/)   // SOLUTION!
        {
            $x = 0;
        }

        if ($x == 0)
        {
            if (/^\s*(constant)\s*(fixup|\/\*fixup\*\/|)\s*(private|)\s*(\w+)\s+=\s+<([a-zA-Z0-9]+)\s+(["']?)([a-zA-Z0-9.:\\]+)\6>\s*(\/\*\s*(.*?)\s*\*\/|)(\r|\n|\s)/)
            {
                $name1 = $1; # Constant
                $name2 = $2; # Fixup
                $name3 = $3; # Private
                $name4 = $4;
                $name5 = $5;
                $name6 = $7;
                $name7 = $8;
                # start print
                if (!$name7 eq '')
                {
                    print DEST_XML_FILE "<!-- $name7-->\n";
                }
                print DEST_XML_FILE  "  <ECID";
                print DEST_XML_FILE " logicalName=\"$name4\"";
                print DEST_XML_FILE " valueType=\"$name5\"";
                print DEST_XML_FILE " value=\"$name6\"";
                $x = 1;
            }
        }
        elsif ($x == 1)
        {
            if(/\s*vid\s*=\s*(.*?)(\s|\n|\r)/)
            {
                $nID = $1;
                print DEST_XML_FILE " vid=\"$nID\"";
                $x = 2;
            }
        }
            elsif ($x == 2)
        {
            if(/\s*name\s*=\s*(.*?)(\s|\n|\r)/)
            {
                $nName = $1;
                print DEST_XML_FILE " name=$nName";
                $x = 3;
            }
        }
            elsif ($x == 3)
        {
            if (/\s*units\s*=\s*(.*?)(\s|\n|\r)/)
            {
                $nUnits = $1;
                print DEST_XML_FILE " units=$nUnits";
                $x = 4;
            }
        }
        elsif ($x == 4)
        {
            # \s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>\
            if (/\s*min\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
            {
                #$nMinName1 = $1;
                $nMinName2 = $2; # Find the nMin Value
                #$nMinName3 = $3;
                #$nMinName4 = $4;
                print DEST_XML_FILE " min=\"$nMinName2\"";
                $x = 5;
            }
            else
            {
                print DEST_XML_FILE  "></ECID>\n";
                $x = 0; # There is no line 4 and line 5
            }
        }
        elsif ($x == 5)
        {
            if (/\s*max\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
            {
                #$nMaxName1 = $1;
                $nMaxName2 = $2; # Find the nMax Value
                #$nMaxName3 = $3;
                #$nMaxName4 = $4;
                print DEST_XML_FILE " max=\"$nMaxName2\"";
                $x = 6;
            }
        }
        elsif ($x == 6)
        {
            if (/\s*default\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
            {
                #$nDefault1 = $1;
                $nDefault2 = $2; # Find the default Value
                #$nDefault3 = $3;
                #$nDefault4 = $4;
                print DEST_XML_FILE " default=\"$nDefault2\">";
                print DEST_XML_FILE  "</ECID>\n";
                $x = 0;
            }
        }
    }

推荐答案

根据我更好的判断,我会再次尝试帮助您.

Against my better judgment I will try to help you again.

问题不在于如何找到一个空行.问题不在于使用哪个正则表达式.根本问题是了解如何分析问题并将分析转化为代码.

The issue is not how to find a blank line. The issue is not which regex to use. The fundamental issue is understanding how to analyze a problem and turn that analysis into code.

在这种情况下,问题是我如何解析这种格式?"

In this case the problem is "How do I parse this format?"

我为你写了一个解析器.我也花时间写了一份我用来写它的过程的详细描述.

I've written a parser for you. I have also taken the time to write a detailed description of the process I used to write it.

警告:解析器并未针对所有情况进行仔细测试.它没有足够的内置错误处理功能.对于这些功能,您可以申请价目表或自己编写.

WARNING: The parser is not carefully tested for all cases. It does not have enough error handling built in. For those features, you can request a rate card or write them yourself.

这是您提供的数据样本(我不确定我是从您的几个问题中提取的):

Here's the data sample you provided (I'm not sure which of your several questions I pulled this from):

constant fixup GemEstabCommDelay = <U2 20>
    vid = 6
    name = "ESTABLISHCOMMUNICATIONSTIMEOUT"
    units = "s"
    min = <U2 0>
    max = <U2 1800>
    default = <U2 20>


constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
    vid = 4
    name = ""  units = ""


constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
    vid = 0
    name = ""
    units = ""  

在为数据文件编写解析器之前,您需要对文件的结构进行描述.如果您使用的是标准格式(比如 XML),您可以阅读现有的规范.如果您使用的是某种本土格式,则可以自己编写.

Before you can write a parser for a data file, you need to have a description the structure of the file. If you are using a standard format (say XML) you can read the existing specification. If you are using some home-grown format, you get to write it yourself.

因此,根据样本数据,我们可以看到:

So, based on the sample data, we can see that:

  1. 数据被分成块.
  2. 每个块都以第 0 列中的单词 constant 开头.
  3. 每个块都以一个空行结束.
  4. 一个块由一个起始行和零个或多个附加行组成.
  5. 起始行由关键字 constant 后跟一个或多个空格分隔的词、一个 '=' 符号和一个 <> 引用的数据值组成.
    • 最后一个关键字似乎是常量的名称.称之为constant_name
    • <> 引用的数据似乎是组合类型/值说明符.
    • 较早的关键字似乎指定了有关常量的其他元数据.让我们调用那些 options.
  1. data is broken into blocks.
  2. each block starts with the word constant in column 0.
  3. each block ends with a blank line.
  4. a block consists of a start line, and zero or more additional lines.
  5. The start line consists of the keyword constant followed by one or more whitespace delimited words, an '=' sign and an <> quoted data value.
    • The last keyword appears to be the name of the constant. Call it constant_name
    • The <>-quoted data appears to be a combined type/value specifier.
    • earlier keywords appear to specify additional metadata about the constant. Let's call those options.

好的,现在我们有了一个粗略的规范.我们用它做什么?

Okay, so now we have a rough spec. What do we do with it?

格式是如何构建的?从最大到最小考虑组织的逻辑单元.这些将决定我们代码的结构和流程.

How is the format structured? Consider the logical units of organization from largest to smallest. These will determine the structure and flow of our code.

  • 文件由块组成.
  • 块由线组成.

所以我们的解析器应该将文件分解成块,然后处理这些块.

So our parser should decompose a file into blocks, and then handle the blocks.

现在我们在评论中粗略地分析一个解析器:

Now we rough out a parser in comments:

# Parse a constant spec file.

# Until file is done:
    # Read in a whole block
    # Parse the block and return key/value pairs for a hash.

    # Store a ref to the hash in a big hash of all blocks, keyed by constant_name.

# Return ref to big hash with all block data

现在我们开始填写一些代码:

Now we start to fill in some code:

# Parse a constant spec file.
sub parse_constant_spec {
    my $fh = shift;

    my %spec;

    # Until file is done:
        # Read in a whole block
    while( my $block = read_block($fh) ) {

        # Parse the and return key/value pairs for a hash.
        my %constant = parse_block( $block );

        # Store a ref to the hash in a big hash of all blocks, keyed by constant_name.
        $spec{ $constant{name} } = \%constant;

    }

    # Return ref to big hash with all block data
    return \%spec;
}

但这行不通.parse_blockread_block 子项尚未编写.在这个阶段,没关系.关键是要在小的、可理解的块中粗略地描述特征.每隔一段时间,为了保持可读性,您需要掩盖子例程中的细节丢失 - 否则您最终会遇到无法调试的可怕的 1000 行子程序.

But it won't work. The parse_block and read_block subs haven't been written yet. At this stage that's OK. The point is to rough in features in small, understandable chunks. Every once in a while, to keep things readable you need to gloss over the details drop in a subroutine--otherwise you wind up with monstrous 1000 line subs that are impossible to debug.

现在我们知道我们需要写几个 subs 来完成,等等:

Now we know we need to write a couple of subs to finish up, et viola:

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my $fh = \*DATA;

print Dumper parse_constant_spec( $fh );


# Parse a constant spec file.
# Pass in a handle to process.
# As long as it acts like a file handle, it will work.
sub parse_constant_spec {
    my $fh = shift;

    my %spec;

    # Until file is done:
        # Read in a whole block
    while( my $block = read_block($fh) ) {

        # Parse the and return key/value pairs for a hash.
        my %constant = parse_block( $block );

        # Store a ref to the hash in a big hash of all blocks, keyed by constant_name.
        $spec{ $constant{const_name} } = \%constant;

    }

    # Return ref to big hash with all block data
    return \%spec;
}

# Read a constant definition block from a file handle.
# void return when there is no data left in the file.
# Otherwise return an array ref containing lines to in the block. 
sub read_block {
    my $fh = shift;

    my @lines;
    my $block_started = 0;

    while( my $line = <$fh> ) {

        $block_started++ if $line =~ /^constant/;

        if( $block_started ) {

            last if $line =~ /^\s*$/;

            push @lines, $line;
        }
    }

    return \@lines if @lines;

    return;
}


sub parse_block {
    my $block = shift;
    my ($start_line, @attribs) = @$block;

    my %constant;

    # Break down first line:
    # First separate assignment from option list.
    my ($start_head, $start_tail) = split /=/, $start_line;

    # work on option list
    my @options = split /\s+/, $start_head;

    # Recover constant_name from options:
    $constant{const_name} = pop @options;
    $constant{options} = \@options;

    # Now we parse the value/type specifier
    @constant{'type', 'value' } = parse_type_value_specifier( $start_tail );

    # Parse attribute lines.
    # since we've already got multiple per line, get them all at once.
    chomp @attribs;
    my $attribs = join ' ', @attribs;

    #  we have one long line of mixed key = "value" or key = <TYPE VALUE> 

    @attribs = $attribs =~ /\s*(\w+\s+=\s+".*?"|\w+\s+=\s+<.*?>)\s*/g;

    for my $attrib ( @attribs ) {
        warn "$attrib\n";
        my ($name, $value) = split /\s*=\s*/, $attrib;

        if( $value =~ /^"/ ) { 
            $value =~ s/^"|"\s*$//g;
        }
        elsif( $value =~ /^</ ) {
           $value = [ parse_type_value_specifier( $start_tail ) ];
        }
        else {
            warn "Bad line";
        }

        $constant{ $name } = $value;
    }

    return %constant;
}

sub parse_type_value_specifier {
    my $tvs = shift;

    my ($type, $value) = $tvs =~ /<(\w+)\s+(.*?)>/;

    return $type, $value;
}

__DATA__
constant fixup GemEstabCommDelay = <U2 20>
    vid = 6
    name = "ESTABLISHCOMMUNICATIONSTIMEOUT"
    units = "s"
    min = <U2 0>
    max = <U2 1800>
    default = <U2 20>


constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
    vid = 4
    name = ""  units = ""


constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
    vid = 0
    name = ""
    units = ""  

上面的代码远非完美.IMO,parse_block 太长,应该分解成更小的潜艇.此外,对格式良好的输入几乎没有足够的验证和执行.变量名称和描述可能更清楚,但我不太了解您的数据格式的语义.更好的名称会更符合数据格式的语义.

The above code is far from perfect. IMO, parse_block is too long and ought to be broken into smaller subs. Also, there isn't nearly enough validation and enforcement of well-formed input. Variable names and descriptions could be clearer, but I don't really understand the semantics of your data format. Better names would more closely match the semantics of the data format.

尽管存在这些问题,它确实会解析您的格式并生成一个方便的大数据结构,可以将其填充到您想要的任何输出格式中.

Despite these issues, it does parse your format and produce a big handy data structure that can be stuffed into whatever output format you want.

如果你在很多地方使用这种格式,我建议把解析代码放到一个模块中.有关详细信息,请参阅 perldoc perlmod.

If you use this format in many places, I recommend putting the parsing code into a module. See perldoc perlmod for more info.

现在,请停止使用全局变量并忽略好的建议.请开始阅读 perldoc,阅读学习 Perl 和 Perl 最佳实践,使用严格,使用警告.当我正在阅读阅读列表时,请阅读 全局变量不好,然后在 wiki 上闲逛阅读和学习.通过阅读 c2,我学到了比在学校学到的更多关于编写软件的知识.

Now, please stop using global variables and ignoring good advice. Please start reading the perldoc, read Learning Perl and Perl Best Practices, use strict, use warnings. While I am throwing reading lists around go read Global Variables are Bad and then wander around the wiki to read and learn. I learned more about writing software by reading c2 than I did in school.

如果您对这段代码的工作原理、为什么按原样布置、还可以做出哪些其他选择有疑问,请直言不讳.我愿意帮助一个愿意的学生.

If you have questions about how this code works, why it is laid out as it is, what other choices could have been made, speak up and ask. I am willing to help a willing student.

你的英语很好,但很明显你不是母语人士.我可能用了太多复杂的句子.如果您需要用简单的句子写出部分内容,我可以尝试提供帮助.我知道使用外语工作非常困难.

Your English is good, but it is clear you are not a native speaker. I may have used too many complex sentences. If you need parts of this written in simple sentences, I can try to help. I understand that working in a foreign language is very difficult.

这篇关于如何检测 Perl 中的空行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆