扫描器提前终止 [英] Scanner terminating early
问题描述
我试图在Go中编写一个扫描器来扫描延续线,并在返回之前清理线,以便返回逻辑线。因此,鉴于以下SplitLine功能(播放):
I am trying to write a scanner in Go that scans continuation lines and also clean the line up before returning it so that you can return logical lines. So, given the following SplitLine function (Play):
func ScanLogicalLines(data []byte, atEOF bool) (int, []byte, error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
i := bytes.IndexByte(data, '\n')
for i > 0 && data[i-1] == '\\' {
fmt.Printf("i: %d, data[i] = %q\n", i, data[i])
i = i + bytes.IndexByte(data[i+1:], '\n')
}
var match []byte = nil
advance := 0
switch {
case i >= 0:
advance, match = i + 1, data[0:i]
case atEOF:
advance, match = len(data), data
}
token := bytes.Replace(match, []byte("\\\n"), []byte(""), -1)
return advance, token, nil
}
func main() {
simple := `
Just a test.
See what is returned. \
when you have empty lines.
Followed by a newline.
`
scanner := bufio.NewScanner(strings.NewReader(simple))
scanner.Split(ScanLogicalLines)
for scanner.Scan() {
fmt.Printf("line: %q\n", scanner.Text())
}
}
我希望代码返回类似于:
I expected the code to return something like:
line: "Just a test."
line: ""
line: "See what is returned, when you have empty lines."
line: ""
line: "Followed by a newline."
但是,它在返回第一行后停止。第二次调用返回 1,,nil
。
However, it stops after returning the first line. The second call return 1, "", nil
.
任何人有任何想法,或者它是一个错误?
Anybody have any ideas, or is it a bug?
推荐答案
我认为这是一个错误,因为提前值> 0
并不打算进一步读取即使返回的标记为零( bufio.SplitFunc ):
I would regard this as a bug because an advance value > 0 is not intended to make a further read call, even when the returned token is nil (bufio.SplitFunc):
如果数据还没有保存完整的标记,例如在扫描行时没有换行符,SplitFunc可以返回(0,nil)来指示扫描器读取更多数据进入切片,然后再从输入的同一点开始再次尝试更长的切片。
If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, SplitFunc can return (0, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.
会发生什么?
bufio.Scanner
的输入缓冲区默认为4096字节。这意味着,如果它能够执行拆分功能,它将立即读取该的金额。在你的情况下,扫描仪可以一次读取你的输入,因为它远低于4096字节。 这意味着下一次读取它会在
EOF
中产生结果,这是这里的主要问题。
What happens is this
The input buffer of the bufio.Scanner
defaults to 4096 byte. That means that it reads up to this
amount at once if it can and then executes the split function. In your case the scanner can read your input all at once as it is well below 4096 byte. This means that the next read it will do results in EOF
which is the main problem here.
-
scanner.Scan
读取所有数据 - 您获得所有的文本
- 您寻找一个令牌,您会发现第一个换行符只有一个换行符
- 您返回通过从匹配中删除换行符,
nil 作为标记 scanner.Scan
假设:用户需要更多数据 -
scanner.Scan
尝试阅读更多内容 -
EOF
发生 - scanner.Scan 试图最后一次对标记进行标记。
- 只是一个测试。
-
scanner.Scan
尝试最后一次标记一次 - 您寻找一个标记,您发现第三行只有一个换行符
- 您将
nil
作为标记返回从匹配中删除换行符 -
scanner.Scan
看到nil
标记和设置错误(EOF
) - 执行结束
scanner.Scan
reads all your data- You get all the text that is there
- You look for a token, you find the first newline which is only one newline
- You return
nil
as a token by removing the newline from the match scanner.Scan
assumes: user needs more datascanner.Scan
attempts to read moreEOF
happensscanner.Scan
tries to tokenize one last time- You find
"Just a test."
scanner.Scan
tries to tokenize one last time- You look for a token, you find the third line which is only one newline
- You return
nil
as a token by removing the newline from the match scanner.Scan
seesnil
token and set error (EOF
)- Execution ends
如何规避
任何非零的标记都会阻止秒。只要您返回非零标记,
扫描程序不会检查 EOF
并继续执行标记程序。
How to circumvent
Any token that is non-nil will prevent this. As long as you return non-nil tokens the
scanner will not check for EOF
and continues executing your tokenizer.
您的代码返回 nil
标记的原因是 bytes.Replace
返回
nil
当存在没有事情要做。 append([] byte(nil),nil ...)== nil
。
您可以通过返回带有容量和无元素的片作为
来防止这种情况,这将是非零: make([] byte,0,1)!= nil
。
The reason why your code returns nil
tokens is that bytes.Replace
returns
nil
when there's nothing to be done. append([]byte(nil), nil...) == nil
.
You could prevent this by returning a slice with a capacity and no elements as
this would be non-nil: make([]byte, 0, 1) != nil
.
这篇关于扫描器提前终止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!