扫描器提前终止 [英] Scanner terminating early

查看:92
本文介绍了扫描器提前终止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Go中编写一个扫描器来扫描延续线,并在返回之前清理线,以便返回逻辑线。因此,鉴于以下SplitLine功能(播放):

I am trying to write a scanner in Go that scans continuation lines and also clean the line up before returning it so that you can return logical lines. So, given the following SplitLine function (Play):

func ScanLogicalLines(data []byte, atEOF bool) (int, []byte, error) {
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }

    i := bytes.IndexByte(data, '\n')
    for i > 0 && data[i-1] == '\\' {
        fmt.Printf("i: %d, data[i] = %q\n", i, data[i])
        i = i + bytes.IndexByte(data[i+1:], '\n')
    }

    var match []byte = nil
    advance := 0
    switch {
    case i >= 0:
        advance, match = i + 1, data[0:i]
    case atEOF: 
        advance, match = len(data), data
    }
    token := bytes.Replace(match, []byte("\\\n"), []byte(""), -1)
    return advance, token, nil
}

func main() {
    simple := `
Just a test.

See what is returned. \
when you have empty lines.

Followed by a newline.
`

    scanner := bufio.NewScanner(strings.NewReader(simple))
    scanner.Split(ScanLogicalLines)
    for scanner.Scan() {
        fmt.Printf("line: %q\n", scanner.Text())
    }
}

我希望代码返回类似于:

I expected the code to return something like:

line: "Just a test."
line: ""
line: "See what is returned, when you have empty lines."
line: ""
line: "Followed by a newline."

但是,它在返回第一行后停止。第二次调用返回 1,,nil

However, it stops after returning the first line. The second call return 1, "", nil.

任何人有任何想法,或者它是一个错误?

Anybody have any ideas, or is it a bug?

推荐答案

我认为这是一个错误,因为提前值> 0
并不打算进一步读取即使返回的标记为零( bufio.SplitFunc ):

I would regard this as a bug because an advance value > 0 is not intended to make a further read call, even when the returned token is nil (bufio.SplitFunc):


如果数据还没有保存完整的标记,例如在扫描行时没有换行符,SplitFunc可以返回(0,nil)来指示扫描器读取更多数据进入切片,然后再从输入的同一点开始再次尝试更长的切片。

If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, SplitFunc can return (0, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.



会发生什么?



bufio.Scanner 的输入缓冲区默认为4096字节。这意味着,如果它能够执行拆分功能,它将立即读取该
的金额。在你的情况下,扫描仪可以一次读取你的输入,因为它远低于4096字节。 这意味着下一次读取它会在 EOF 中产生结果,这是这里的主要问题。

What happens is this

The input buffer of the bufio.Scanner defaults to 4096 byte. That means that it reads up to this amount at once if it can and then executes the split function. In your case the scanner can read your input all at once as it is well below 4096 byte. This means that the next read it will do results in EOF which is the main problem here.


  1. scanner.Scan 读取所有数据

  2. 您获得所有的文本

  3. 您寻找一个令牌,您会发现第一个换行符只有一个换行符

  4. 您返回通过从匹配中删除换行符, nil 作为标记 scanner.Scan 假设:用户需要更多数据

  5. scanner.Scan 尝试阅读更多内容

  6. EOF 发生

  7. scanner.Scan 试图最后一次对标记进行标记。

  8. 只是一个测试。

  9. scanner.Scan 尝试最后一次标记一次

  10. 您寻找一个标记,您发现第三行只有一个换行符

  11. 您将 nil 作为标记返回从匹配中删除换行符

  12. scanner.Scan 看到 nil 标记和设置错误( EOF

  13. 执行结束
  1. scanner.Scan reads all your data
  2. You get all the text that is there
  3. You look for a token, you find the first newline which is only one newline
  4. You return nil as a token by removing the newline from the match
  5. scanner.Scan assumes: user needs more data
  6. scanner.Scan attempts to read more
  7. EOF happens
  8. scanner.Scan tries to tokenize one last time
  9. You find "Just a test."
  10. scanner.Scan tries to tokenize one last time
  11. You look for a token, you find the third line which is only one newline
  12. You return nil as a token by removing the newline from the match
  13. scanner.Scan sees nil token and set error (EOF)
  14. Execution ends



如何规避



任何非零的标记都会阻止秒。只要您返回非零标记,
扫描程序不会检查 EOF 并继续执行标记程序。

How to circumvent

Any token that is non-nil will prevent this. As long as you return non-nil tokens the scanner will not check for EOF and continues executing your tokenizer.

您的代码返回 nil 标记的原因是 bytes.Replace 返回
nil 当存在没有事情要做 append([] byte(nil),nil ...)== nil
您可以通过返回带有容量和无元素的片作为
来防止这种情况,这将是非零: make([] byte,0,1)!= nil

The reason why your code returns nil tokens is that bytes.Replace returns nil when there's nothing to be done. append([]byte(nil), nil...) == nil. You could prevent this by returning a slice with a capacity and no elements as this would be non-nil: make([]byte, 0, 1) != nil.

这篇关于扫描器提前终止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆