从末尾读取日志文件并获取特定字符串的偏移量 [英] Read log file from the end and get the offset of a particular string
问题描述
,例如 1. logfile
- 开始
- 第1行
- 第2行
- 第3行
- 结束
从头开始读取文件时,我就能获得Line1的搜索位置.
I am able to get the seek position of Line1 when I read the file from beginning.
func getSeekLocation() int64 {
start := int64(0)
input, err := os.Open(logFile)
if err != nil {
fmt.Println(err)
}
if _, err := input.Seek(start, io.SeekStart); err != nil {
fmt.Println(err)
}
scanner := bufio.NewScanner(input)
pos := start
scanLines := func(data []byte, atEOF bool) (advance int, token []byte,
err error) {
advance, token, err = bufio.ScanLines(data, atEOF)
pos += int64(advance)
return
}
scanner.Split(scanLines)
for scanner.Scan() {
if strings.Contains(scanner.Text(), "Line1") {
break
}
}
size, err := getFileSize()
if err != nil {
fmt.Println(err)
}
return size - pos
}
但这不是解决问题的有效方法,因为随着文件大小的增加,获取位置的时间也会增加. 我想从EOF位置获得线的位置,我认为这样会更有效.
But this is not an efficient way to solve the problem because as the file size increases the time to get the location will also increase. I would like to get the location of the line from the EOF location which I think would be more efficient.
推荐答案
Note: I optimized and improved the below solution, and released it as a library here: github.com/icza/backscanner
bufio.Scanner
uses an io.Reader
as its source, which does not support seeking and / or reading from arbitrary positions, so it is not capable of scanning lines from the end. bufio.Scanner
can only read any part of the input once all data preceeding it has already been read (that is, it can only read the end of the file if it reads all the file's content first).
因此,我们需要定制的解决方案来实现这种功能.幸运的是 os.File
确实支持从任意位置读取,因为它实现了两个 io.ReaderAt
(其中任何一个都足以满足我们的需要).
So we need a custom solution to implement such functionality. Fortunately os.File
does support reading from arbitrary positions as it implements both io.Seeker
and io.ReaderAt
(any of them would be sufficient to do what we need).
让我们构造一个Scanner
,从最后一行开始向后扫描行 .为此,我们将使用io.ReaderAt
.下面的实现使用一个内部缓冲区,从输入的末尾开始,按块将数据读取到该缓冲区中.输入的大小也必须传递(基本上是我们要从其开始读取的位置,不一定必须是结束位置).
Let's construct a Scanner
which scans lines backward, starting with the last line. For this, we'll utilize an io.ReaderAt
. The following implementation uses an internal buffer into which data is read by chunks, starting from the end of the input. The size of the input must also be passed (which is basically the position where we want to start reading from, which must not necessarily be the end position).
type Scanner struct {
r io.ReaderAt
pos int
err error
buf []byte
}
func NewScanner(r io.ReaderAt, pos int) *Scanner {
return &Scanner{r: r, pos: pos}
}
func (s *Scanner) readMore() {
if s.pos == 0 {
s.err = io.EOF
return
}
size := 1024
if size > s.pos {
size = s.pos
}
s.pos -= size
buf2 := make([]byte, size, size+len(s.buf))
// ReadAt attempts to read full buff!
_, s.err = s.r.ReadAt(buf2, int64(s.pos))
if s.err == nil {
s.buf = append(buf2, s.buf...)
}
}
func (s *Scanner) Line() (line string, start int, err error) {
if s.err != nil {
return "", 0, s.err
}
for {
lineStart := bytes.LastIndexByte(s.buf, '\n')
if lineStart >= 0 {
// We have a complete line:
var line string
line, s.buf = string(dropCR(s.buf[lineStart+1:])), s.buf[:lineStart]
return line, s.pos + lineStart + 1, nil
}
// Need more data:
s.readMore()
if s.err != nil {
if s.err == io.EOF {
if len(s.buf) > 0 {
return string(dropCR(s.buf)), 0, nil
}
}
return "", 0, s.err
}
}
}
// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\r' {
return data[0 : len(data)-1]
}
return data
}
使用示例:
func main() {
scanner := NewScanner(strings.NewReader(src), len(src))
for {
line, pos, err := scanner.Line()
if err != nil {
fmt.Println("Error:", err)
break
}
fmt.Printf("Line start: %2d, line: %s\n", pos, line)
}
}
const src = `Start
Line1
Line2
Line3
End`
输出(在游乐场上尝试):
Line start: 24, line: End
Line start: 18, line: Line3
Line start: 12, line: Line2
Line start: 6, line: Line1
Line start: 0, line: Start
Error: EOF
注释:
- 上面的
Scanner
不限制行的最大长度,它可以处理所有行. - 上面的
Scanner
处理\n
和\r\n
的行尾(由dropCR()
函数确保). - 您可以传递任何起始位置,而不仅仅是大小/长度,并且列表行将从此处开始(继续).
- 上面的
Scanner
不会重用缓冲区,总是在需要时创建新的缓冲区. (预)分配2个缓冲区并明智地使用它们就足够了.实现将变得更加复杂,并且将引入最大行长度限制.
- The above
Scanner
does not limit max length of lines, it handles all. - The above
Scanner
handles both\n
and\r\n
line endings (ensured by thedropCR()
function). - You may pass any starter position not just the size / length, and listing lines will be performed from there (continuation).
- The above
Scanner
does not reuse buffers, always creates new ones when needed. It would be enough to (pre)allocate 2 buffers, and use those wisely. Implementation would become more complex, and it would introduce a max line length limit.
要将此Scanner
用于文件,可以使用os.Open()
打开文件.请注意,*File
实现了io.ReaderAt()
.然后,您可以使用 File.Stat()
获取有关文件的信息( os.FileInfo
),包括其大小(长度):
To use this Scanner
with a file, you may use os.Open()
to open a file. Note that *File
implements io.ReaderAt()
. Then you may use File.Stat()
to obtain info about the file (os.FileInfo
), including its size (length):
f, err := os.Open("a.txt")
if err != nil {
panic(err)
}
fi, err := f.Stat()
if err != nil {
panic(err)
}
defer f.Close()
scanner := NewScanner(f, int(fi.Size()))
在一行中寻找子字符串
如果要在一行中查找子字符串,则只需使用上面的Scanner
即可返回每行的起始位置,从末尾读取行.
Looking for a substring in a line
If you're looking for a substring in a line, then simply use the above Scanner
which returns the starting pos of each line, reading lines from the end.
您可以使用 strings.Index()
检查每一行中的子字符串,该字符串将返回行中的子字符串位置,如果找到,则将行起始位置添加到该行中.
You may check the substring in each line using strings.Index()
, which returns the substring position inside the line, and if found, add the line start position to this.
比方说,我们正在寻找"ine2"
子字符串(这是"Line2"
行的一部分).这是您可以执行的操作:
Let's say we're looking for the "ine2"
substring (which is part of the "Line2"
line). Here's how you can do that:
scanner := NewScanner(strings.NewReader(src), len(src))
what := "ine2"
for {
line, pos, err := scanner.Line()
if err != nil {
fmt.Println("Error:", err)
break
}
fmt.Printf("Line start: %2d, line: %s\n", pos, line)
if i := strings.Index(line, what); i >= 0 {
fmt.Printf("Found %q at line position: %d, global position: %d\n",
what, i, pos+i)
break
}
}
输出(在游乐场上尝试):
Line start: 24, line: End
Line start: 18, line: Line3
Line start: 12, line: Line2
Found "ine2" at line position: 1, global position: 13
这篇关于从末尾读取日志文件并获取特定字符串的偏移量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!