从末尾读取日志文件并获取特定字符串的偏移量 [英] Read log file from the end and get the offset of a particular string

查看:101
本文介绍了从末尾读取日志文件并获取特定字符串的偏移量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

,例如 1. logfile

  • 开始
  • 第1行
  • 第2行
  • 第3行
  • 结束

从头开始读取文件时,我就能获得Line1的搜索位置.

I am able to get the seek position of Line1 when I read the file from beginning.

func getSeekLocation() int64 {
    start := int64(0)
    input, err := os.Open(logFile)
    if err != nil {
        fmt.Println(err)
    }
    if _, err := input.Seek(start, io.SeekStart); err != nil {
        fmt.Println(err)
    }
    scanner := bufio.NewScanner(input)

    pos := start
    scanLines := func(data []byte, atEOF bool) (advance int, token []byte, 
    err error) {
        advance, token, err = bufio.ScanLines(data, atEOF)
        pos += int64(advance)
        return
    }
    scanner.Split(scanLines)
    for scanner.Scan() {
       if strings.Contains(scanner.Text(), "Line1") {
        break
       }
    }
    size, err := getFileSize()
    if err != nil {
        fmt.Println(err)
    }
    return size - pos
}

但这不是解决问题的有效方法,因为随着文件大小的增加,获取位置的时间也会增加. 我想从EOF位置获得线的位置,我认为这样会更有效.

But this is not an efficient way to solve the problem because as the file size increases the time to get the location will also increase. I would like to get the location of the line from the EOF location which I think would be more efficient.

推荐答案

注意:我优化和改进了以下解决方案,并将其作为库发布在这里:

Note: I optimized and improved the below solution, and released it as a library here: github.com/icza/backscanner

bufio.Scanner 使用

bufio.Scanner uses an io.Reader as its source, which does not support seeking and / or reading from arbitrary positions, so it is not capable of scanning lines from the end. bufio.Scanner can only read any part of the input once all data preceeding it has already been read (that is, it can only read the end of the file if it reads all the file's content first).

因此,我们需要定制的解决方案来实现这种功能.幸运的是 os.File 确实支持从任意位置读取,因为它实现了两个 io.ReaderAt (其中任何一个都足以满足我们的需要).

So we need a custom solution to implement such functionality. Fortunately os.File does support reading from arbitrary positions as it implements both io.Seeker and io.ReaderAt (any of them would be sufficient to do what we need).

让我们构造一个Scanner,从最后一行开始向后扫描行 .为此,我们将使用io.ReaderAt.下面的实现使用一个内部缓冲区,从输入的末尾开始,按块将数据读取到该缓冲区中.输入的大小也必须传递(基本上是我们要从其开始读取的位置,不一定必须是结束位置).

Let's construct a Scanner which scans lines backward, starting with the last line. For this, we'll utilize an io.ReaderAt. The following implementation uses an internal buffer into which data is read by chunks, starting from the end of the input. The size of the input must also be passed (which is basically the position where we want to start reading from, which must not necessarily be the end position).

type Scanner struct {
    r   io.ReaderAt
    pos int
    err error
    buf []byte
}

func NewScanner(r io.ReaderAt, pos int) *Scanner {
    return &Scanner{r: r, pos: pos}
}

func (s *Scanner) readMore() {
    if s.pos == 0 {
        s.err = io.EOF
        return
    }
    size := 1024
    if size > s.pos {
        size = s.pos
    }
    s.pos -= size
    buf2 := make([]byte, size, size+len(s.buf))

    // ReadAt attempts to read full buff!
    _, s.err = s.r.ReadAt(buf2, int64(s.pos))
    if s.err == nil {
        s.buf = append(buf2, s.buf...)
    }
}

func (s *Scanner) Line() (line string, start int, err error) {
    if s.err != nil {
        return "", 0, s.err
    }
    for {
        lineStart := bytes.LastIndexByte(s.buf, '\n')
        if lineStart >= 0 {
            // We have a complete line:
            var line string
            line, s.buf = string(dropCR(s.buf[lineStart+1:])), s.buf[:lineStart]
            return line, s.pos + lineStart + 1, nil
        }
        // Need more data:
        s.readMore()
        if s.err != nil {
            if s.err == io.EOF {
                if len(s.buf) > 0 {
                    return string(dropCR(s.buf)), 0, nil
                }
            }
            return "", 0, s.err
        }
    }
}

// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
    if len(data) > 0 && data[len(data)-1] == '\r' {
        return data[0 : len(data)-1]
    }
    return data
}

使用示例:

func main() {
    scanner := NewScanner(strings.NewReader(src), len(src))
    for {
        line, pos, err := scanner.Line()
        if err != nil {
            fmt.Println("Error:", err)
            break
        }
        fmt.Printf("Line start: %2d, line: %s\n", pos, line)
    }
}

const src = `Start
Line1
Line2
Line3
End`

输出(在游乐场上尝试):

Line start: 24, line: End
Line start: 18, line: Line3
Line start: 12, line: Line2
Line start:  6, line: Line1
Line start:  0, line: Start
Error: EOF

注释:

  • 上面的Scanner不限制行的最大长度,它可以处理所有行.
  • 上面的Scanner处理\n\r\n的行尾(由dropCR()函数确保).
  • 您可以传递任何起始位置,而不仅仅是大小/长度,并且列表行将从此处开始(继续).
  • 上面的Scanner不会重用缓冲区,总是在需要时创建新的缓冲区. (预)分配2个缓冲区并明智地使用它们就足够了.实现将变得更加复杂,并且将引入最大行长度限制.
  • The above Scanner does not limit max length of lines, it handles all.
  • The above Scanner handles both \n and \r\n line endings (ensured by the dropCR() function).
  • You may pass any starter position not just the size / length, and listing lines will be performed from there (continuation).
  • The above Scanner does not reuse buffers, always creates new ones when needed. It would be enough to (pre)allocate 2 buffers, and use those wisely. Implementation would become more complex, and it would introduce a max line length limit.

要将此Scanner用于文件,可以使用os.Open()打开文件.请注意,*File实现了io.ReaderAt().然后,您可以使用 File.Stat() 获取有关文件的信息( os.FileInfo ),包括其大小(长度):

To use this Scanner with a file, you may use os.Open() to open a file. Note that *File implements io.ReaderAt(). Then you may use File.Stat() to obtain info about the file (os.FileInfo), including its size (length):

f, err := os.Open("a.txt")
if err != nil {
    panic(err)
}
fi, err := f.Stat()
if err != nil {
    panic(err)
}
defer f.Close()

scanner := NewScanner(f, int(fi.Size()))

在一行中寻找子字符串

如果要在一行中查找子字符串,则只需使用上面的Scanner即可返回每行的起始位置,从末尾读取行.

Looking for a substring in a line

If you're looking for a substring in a line, then simply use the above Scanner which returns the starting pos of each line, reading lines from the end.

您可以使用 strings.Index() 检查每一行中的子字符串,该字符串将返回行中的子字符串位置,如果找到,则将行起始位置添加到该行中.

You may check the substring in each line using strings.Index(), which returns the substring position inside the line, and if found, add the line start position to this.

比方说,我们正在寻找"ine2"子字符串(这是"Line2"行的一部分).这是您可以执行的操作:

Let's say we're looking for the "ine2" substring (which is part of the "Line2" line). Here's how you can do that:

scanner := NewScanner(strings.NewReader(src), len(src))
what := "ine2"
for {
    line, pos, err := scanner.Line()
    if err != nil {
        fmt.Println("Error:", err)
        break
    }
    fmt.Printf("Line start: %2d, line: %s\n", pos, line)

    if i := strings.Index(line, what); i >= 0 {
        fmt.Printf("Found %q at line position: %d, global position: %d\n",
            what, i, pos+i)
        break
    }
}

输出(在游乐场上尝试):

Line start: 24, line: End
Line start: 18, line: Line3
Line start: 12, line: Line2
Found "ine2" at line position: 1, global position: 13

这篇关于从末尾读取日志文件并获取特定字符串的偏移量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆