删除带有滑动窗口的意见没有嵌套while循环 [英] Removing comments with a sliding window without nested while loops

查看:105
本文介绍了删除带有滑动窗口的意见没有嵌套while循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从C code A C文件中删除注释和字符串。我就坚持为例子意见。我有一个滑动窗口,所以我只有字符 N N-1 在任何给定的时刻。我试图找出一种算法,不使用嵌套田地如果可能的话,但我需要一个,而以的getchar 通过输入。我首先想到的是,同时通过发现当 N = *和(n-1)= / 然后同时通过,直到 N = /和(N- 1)= * ,但考虑到这已经嵌套9:20我觉得这是低效的。我能做到这样,如果我有,但我想知道如果任何人有一个更好的解决方案。

I'm trying to remove comments and strings from a c file with c code. I'll just stick to comments for the examples. I have a sliding window so I only have character n and n-1 at any given moment. I'm trying to figure out an algorithm that does not use nested whiles if possible, but I will need one while to getchar through the input. My first thought was to while through find when n=* and (n-1)=/ then while through until n=/ and (n-1)=*, but considering this has nested whiles I feel it is inefficient. I can do it this way if I have to, but I was wondering if anyone had a better solution.

推荐答案

正确做法,这是比较复杂的不止一个一开始可能想,巧妙地指出了这里的其他意见。我强烈建议编写一个表驱动的FSM,使用状态转换图得到的转换权。尝试做任何事情比少数几个州更多的case语句是可怕的错误倾向IMO。

Doing this correctly is more complicated than one may at first think, as ably pointed out by the other comments here. I would strongly recommend writing a table-driven FSM, using a state transition diagram to get the transitions right. Trying to do anything more than a few states with case statements is horribly error-prone IMO.

下面是在点/ graphviz的格式图从中你也许可以直接code状态表。请注意,我没有测试此可言,所以因人而异。

Here's a diagram in dot/graphviz format from which you could probably directly code a state table. Note that I haven't tested this at all, so YMMV.

图的语义是,当你看到< CH> ,这是一个秋天,但如果没有在该州匹配的其他投入。文件结束,除了 S0 在任何状态下的错误,所以是没有明确列出的任何字符,或< CH> 。打印除了扫描每一个字符时,在注释( S4 S5 ),并在检测到开始评论(当 S1 )。您可以在检测到开始注释时缓冲的字符,然后打印,如果它是一个错误的开始,否则扔掉肯定时,它确实是一个注释。

The semantics of the diagram are that when you see <ch>, it is a fall-though if none of the other input in that state match. End of file is an error in any state except S0, and so is any character not explicitly listed, or <ch>. Every character scanned is printed except when in a comment (S4 and S5), and when detecting a start comment (S1). You will have to buffer characters when detecting a start comment, and print them if it's a false start, otherwise throw them away when sure it's really a comment.

在点图中,平方是一个单引号 DQ 是一个双引号

In the dot diagram, sq is a single quote ', dq is a double quote ".

digraph state_machine {
    rankdir=LR;
    size="8,5";

    node [shape=doublecircle]; S0 /* init */;
    node [shape=circle];

    S0  /* init */      -> S1  /* begin_cmt */ [label = "'/'"];
    S0  /* init */      -> S2  /* in_str */    [label = dq];
    S0  /* init */      -> S3  /* in_ch */     [label = sq];
    S0  /* init */      -> S0  /* init */      [label = "<ch>"];
    S1  /* begin_cmt */ -> S4  /* in_slc */    [label = "'/'"];
    S1  /* begin_cmt */ -> S5  /* in_mlc */    [label = "'*'"];
    S1  /* begin_cmt */ -> S0  /* init */      [label = "<ch>"];
    S1  /* begin_cmt */ -> S1  /* begin_cmt */ [label = "'\\n'"]; // handle "/\n/" and "/\n*"
    S2  /* in_str */    -> S0  /* init */      [label = "'\\'"];
    S2  /* in_str */    -> S6  /* str_esc */   [label = "'\\'"];
    S2  /* in_str */    -> S2  /* in_str */    [label = "<ch>"];
    S3  /* in_ch */     -> S0  /* init */      [label = sq];
    S4  /* in_slc */    -> S4  /* in_slc */    [label = "<ch>"];
    S4  /* in_slc */    -> S0  /* init */      [label = "'\\n'"];
    S5  /* in_mlc */    -> S7  /* end_mlc */   [label = "'*'"];
    S5  /* in_mlc */    -> S5  /* in_mlc */    [label = "<ch>"];
    S7  /* end_mlc */   -> S7  /* end_mlc */   [label = "'*'|'\\n'"];
    S7  /* end_mlc */   -> S0  /* init */      [label = "'/'"];
    S7  /* end_mlc */   -> S5  /* in_mlc */    [label = "<ch>"];
    S6  /* str_esc */   -> S8  /* oct */       [label = "[0-3]"];
    S6  /* str_esc */   -> S9  /* hex */       [label = "'x'"];
    S6  /* str_esc */   -> S2  /* in_str */    [label = "<ch>"];
    S8  /* oct */       -> S10 /* o1 */        [label = "[0-7]"];
    S10 /* o1 */        -> S2  /* in_str */    [label = "[0-7]"];
    S9  /* hex */       -> S11 /* h1 */        [label = hex];
    S11 /* h1 */        -> S2  /* in_str */    [label = hex];
    S3  /* in_ch */     -> S12 /* ch_esc */    [label = "'\\'"];
    S3  /* in_ch */     -> S13 /* out_ch */    [label = "<ch>"];
    S13 /* out_ch */    -> S0  /* init */      [label = sq];
    S12 /* ch_esc */    -> S3  /* in_ch */     [label = sq];
    S12 /* ch_esc */    -> S12 /* ch_esc */    [label = "<ch>"];
}

这篇关于删除带有滑动窗口的意见没有嵌套while循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆