如何使用扫描仪从特定行号开始读取文件? [英] How to read a file starting from a specific line number using Scanner?

查看:125
本文介绍了如何使用扫描仪从特定行号开始读取文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Go的新手,我试图编写一个逐行读取文件的简单脚本。我还想保存文件系统上的进度(即读取的最后一个行号),以便如果同一个文件再次作为脚本的输入提供,它将开始从停止的行读取文件。

 包主
$ b $ //包进口
进口(
bufio
标志
fmt
记录
os


//变量声明
var(
ConfigFile = flag.String(configfile,../config.json,json配置文件的路径。)

$ b $ //读取文件并分析日志条目的主函数
func main(){
flag.Parse()
settings:= NewConfig(* ConfigFile)

inputFile,err:= os.Open(settings.Source)
if err!= nil {
log.Fatal(err)
}
defer inputFile.Close()

scanner:= bufio.NewScanner(inputFile)
for scanner.Scan(){
fmt.Println(scanner.Text())
}

if err:= scanner.Err(); err!= nil {
log.Fatal(err)
}
}

//保存当前进度
func SaveProgress(){



//从进度中获取行数以确保
func GetCounter(){

}

我找不到任何方法处理扫描程序包中的行号。我知道我可以声明一个整数,例如 counter:= 0 ,并在每次读取一行时像 counter ++ 一样增加整数。但是下一次如何让扫描仪从特定的行开始?因此,例如,如果我读取直到 30 行下一次运行具有相同输入文件的脚本,我如何让扫描器开始从行 31



更新



我能想到的一个解决方案是使用如上所述的计数器并使用如下条件。

  scanner:= bufio.NewScanner(inputFile)
for scanner.Scan(){
if counter>进度{
fmt.Println(scanner.Text())
}
}

我非常确定这样的事情会起作用,但它仍然会循环我们已经阅读过的行。请提出一个更好的方法。 解决方案

如果您不想阅读,但只是略过以前阅读的内容,以获取您离开的位置。



不同的解决方案以一种函数的形式呈现,该函数将输入读取并从开始位置(字节位置)开始读取行,例如:

  func解决方案(输入io.ReadSeeker,start int64)错误

code>

特别的 io.Reader 输入用于实现 io.Seeker ,这是允许在不读取数据的情况下跳过数据的通用界面。 * os.File 实现了这一点,因此您可以将 * File 传递给这些函数。好。 io.Reader io.Seeker 的合并接口为

如果您想要一个干净启动(从文件的开头开始读取),只需传递 start = 0 即可。如果您想恢复之前的处理,请传递最后处理停止/中止的字节位置。此位置是以下函数(解决方案)中的 pos 局部变量的值。



下面的所有示例他们的测试代码可以在去游乐场找到。



1。使用 bufio.Scanner



bufio.Scanner 不能保持位置,但我们可以非常容易地扩展它以保持位置(读取字节) ,所以当我们想要重新开始时,我们可以寻找这个位置。

为了以最小的努力做到这一点,我们可以使用一个新的拆分函数,输入令牌(行)。我们可以使用 Scanner.Split() 来设置分割器功能(决定令牌/线条边界的逻辑)。默认的分割功能是 bufio.ScanLines()



让我们来看看split函数声明: bufio.SplitFunc

  type SplitFunc func(data [] byte,atEOF bool)(advance int,token [] byte,err error)

它返回前进的字节数: advance 。确切地说,我们需要维护文件的位置。所以我们可以使用内建的 bufio.ScanLines()创建一个新的split函数,所以我们甚至不必实现它的逻辑,只需使用 advance 返回值以维持位置:

  func withScanner(输入io.ReadSeeker,start int64)错误{
fmt.Println( - SCANNER,start:,start)
if _,err:= input.Seek(start,0); err!= nil {
return err
}
scanner:= bufio.NewScanner(输入)
$ b $ pos:= start
scanLines:= func( data [] byte,atEOF bool)(advance int,token [] byte,err error){
advance,token,err = bufio.ScanLines(data,atEOF)
pos + = int64(advance)
return
}
scanner.Split(scanLines)

for scanner.Scan(){
fmt.Printf(Pos:%d,Scanned :%s \ n,pos,scanner.Text())
}
return scanner.Err()
}



2。使用 bufio.Reader



在此解决方案中,我们使用 bufio.Reader 类型而不是扫描器 bufio.Reader 已经有 ReadBytes() 方法,如果我们通过'\ n'字节作为分隔符。



这个解决方案与JimB的类似,除了处理所有有效的行终止符序列,还将它们从读取行剥离是非常罕见的,他们需要);在正则表达式中,它是 \r?\\\

  func withReader(input io.ReadSeeker,start int64)error {
fmt.Println( - READER, start:,start)
if _,err:= input.Seek(start,0); err!= nil {
return err
}

r:= bufio.NewReader(input)
pos:= start
for {
data,err:= r.ReadBytes('\\\
')
pos + = int64(len(data))
if err == nil || err == io.EOF {
if len(data)> 0&& data [len(data)-1] =='\\\
'{
data = data [:len(data)-1]
}
if len(data)> 0&& data [len(data)-1] =='\r'{
data = data [:len(data)-1]
}
fmt.Printf(Pos:% d,阅读:%s \ n,pos,data)
}
if err!= nil {
if err!= io.EOF {
return err

break
}
}
return nil
}

注意:如果内容以空行(行结束符)结尾,则此解决方案将处理空行。如果你不想要这个,你可以简单地检查它:

 如果len(data)!= 0 {
fmt.Printf(Pos:%d,Read:%s\\\
,pos,data)
} else {
//最后一行为空,省略
}



测试解决方案:



测试代码将简单地使用内容first \r\\\
second\\\
third\\\
fourth
,其中包含多行且行结束的行数不同。我们将使用 strings.NewReader() 获得 io.ReadSeeker ,其来源是字符串



测试代码首先调用 withScanner() withReader()传递 0 开始位置:一个干净启动。在下一轮中,我们将传递一个起始位置 start = 14 ,这是3行的位置,所以我们不会看到处理的前2行): resume simulation。

  func main(){
const content =first \r\\\
second\
third\\\
fourth

if err:= withScanner(strings.NewReader(content),0); err!= nil {
fmt.Println(Scanner error:,err)
}
if err:= withReader(strings.NewReader(content),0); err!= nil {
fmt.Println(Reader error:,err)
}

if err:= withScanner(strings.NewReader(content),14); err!= nil {
fmt.Println(Scanner error:,err)
}
if err:= withReader(strings.NewReader(content),14); err!= nil {
fmt.Println(Reader error:,err)
}
}

输出:

   -  SCANNER,start:0 
Pos:7 ,扫描:第一个
Pos:14,已扫描:第二个
Pos:20,已扫描:第三个
Pos:26,已扫描:第四个
--READER,开始:0
Pos:7,阅读:第一个
Pos:14,阅读:第二个
Pos:20,阅读:第三个
Pos:26,阅读:第四个
--SCANNER, start:14
Pos:20,Scanned:third
Pos:26,Scanned:fourth
--READER,start:14
Pos:20,Read:third
Pos:26,阅读:第四个

试试 Go Goground


I am new to Go and I am trying to write a simple script that reads a file line by line. I also want to save the progress (i.e. the last line number that was read) on the filesystem somewhere so that if the same file was given as the input to the script again, it starts reading the file from the line where it left off. Following is what I have started off with.

package main

// Package Imports
import (
    "bufio"
    "flag"
    "fmt"
    "log"
    "os"
)

// Variable Declaration
var (
    ConfigFile = flag.String("configfile", "../config.json", "Path to json configuration file.")
)

// The main function that reads the file and parses the log entries
func main() {
    flag.Parse()
    settings := NewConfig(*ConfigFile)

    inputFile, err := os.Open(settings.Source)
    if err != nil {
        log.Fatal(err)
    }
    defer inputFile.Close()

    scanner := bufio.NewScanner(inputFile)
    for scanner.Scan() {
        fmt.Println(scanner.Text())
    }

    if err := scanner.Err(); err != nil {
        log.Fatal(err)
    }
}

// Saves the current progress
func SaveProgress() {

}

// Get the line count from the progress to make sure
func GetCounter() {

}

I could not find any methods that deals with line numbers in the scanner package. I know I can declare an integer say counter := 0 and increment it each time a line is read like counter++. But the next time how do I tell the scanner to start from a specific line? So for example if I read till line 30 the next time I run the script with the same input file, how can I make scanner to start reading from line 31?

Update

One solution I can think of here is to use the counter as I stated above and use an if condition like the following.

    scanner := bufio.NewScanner(inputFile)
    for scanner.Scan() {
        if counter > progress {
            fmt.Println(scanner.Text())
        }
    }

I am pretty sure something like this would work, but it is still going to loop over the lines that we have already read. Please suggest a better way.

解决方案

If you don't want to read but just skip the lines you read previously, you need to acquire the position where you left off.

The different solutions are presented in a form of a function which takes the input to read from and the start position (byte position) to start reading lines from, e.g.:

func solution(input io.ReadSeeker, start int64) error

A special io.Reader input is used which also implements io.Seeker, the common interface which allows skipping data without having to read them. *os.File implements this, so you are allowed to pass a *File to these functions. Good. The "merged" interface of both io.Reader and io.Seeker is io.ReadSeeker.

If you want a clean start (to start reading from the beginning of the file), simply pass start = 0. If you want to resume a previous processing, pass the byte position where the last processing was stopped/aborted. This position is the value of the pos local variable in the functions (solutions) below.

All the examples below with their testing code can be found on the Go Playground.

1. With bufio.Scanner

bufio.Scanner does not maintain the position, but we can very easily extend it to maintain the position (the read bytes), so when we want to restart next, we can seek to this position.

In order to do this with minimal effort, we can use a new split function which splits the input into tokens (lines). We can use Scanner.Split() to set the splitter function (the logic to decide where are the boundaries of tokens/lines). The default split function is bufio.ScanLines().

Let's take a look at the split function declaration: bufio.SplitFunc

type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)

It returns the number of bytes to advance: advance. Exactly what we need to maintain the file position. So we can create a new split function using the builtin bufio.ScanLines(), so we don't even have to implement its logic, just use the advance return value to maintain position:

func withScanner(input io.ReadSeeker, start int64) error {
    fmt.Println("--SCANNER, start:", start)
    if _, err := input.Seek(start, 0); err != nil {
        return err
    }
    scanner := bufio.NewScanner(input)

    pos := start
    scanLines := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
        advance, token, err = bufio.ScanLines(data, atEOF)
        pos += int64(advance)
        return
    }
    scanner.Split(scanLines)

    for scanner.Scan() {
        fmt.Printf("Pos: %d, Scanned: %s\n", pos, scanner.Text())
    }
    return scanner.Err()
}

2. With bufio.Reader

In this solution we use the bufio.Reader type instead of the Scanner. bufio.Reader already has a ReadBytes() method which is very similar to the "read a line" functionality if we pass the '\n' byte as the delimeter.

This solution is similar to JimB's, with the addition of handling all valid line terminator sequences and also stripping them off from the read line (it is very rare they are needed); in regular expression notation, it is \r?\n.

func withReader(input io.ReadSeeker, start int64) error {
    fmt.Println("--READER, start:", start)
    if _, err := input.Seek(start, 0); err != nil {
        return err
    }

    r := bufio.NewReader(input)
    pos := start
    for {
        data, err := r.ReadBytes('\n')
        pos += int64(len(data))
        if err == nil || err == io.EOF {
            if len(data) > 0 && data[len(data)-1] == '\n' {
                data = data[:len(data)-1]
            }
            if len(data) > 0 && data[len(data)-1] == '\r' {
                data = data[:len(data)-1]
            }
            fmt.Printf("Pos: %d, Read: %s\n", pos, data)
        }
        if err != nil {
            if err != io.EOF {
                return err
            }
            break
        }
    }
    return nil
}

Note: If the content ends with an empty line (line terminator), this solution will process an empty line. If you don't want this, you can simply check it like this:

if len(data) != 0 {
    fmt.Printf("Pos: %d, Read: %s\n", pos, data)
} else {
    // Last line is empty, omit it
}

Testing the solutions:

Testing code will simply use the content "first\r\nsecond\nthird\nfourth" which contains multiple lines with varying line terminating. We will use strings.NewReader() to obtain an io.ReadSeeker whose source is a string.

Test code first calls withScanner() and withReader() passing 0 start position: a clean start. In the next round we will pass a start position of start = 14 which is the position of the 3. line, so we won't see the first 2 lines processed (printed): resume simulation.

func main() {
    const content = "first\r\nsecond\nthird\nfourth"

    if err := withScanner(strings.NewReader(content), 0); err != nil {
        fmt.Println("Scanner error:", err)
    }
    if err := withReader(strings.NewReader(content), 0); err != nil {
        fmt.Println("Reader error:", err)
    }

    if err := withScanner(strings.NewReader(content), 14); err != nil {
        fmt.Println("Scanner error:", err)
    }
    if err := withReader(strings.NewReader(content), 14); err != nil {
        fmt.Println("Reader error:", err)
    }
}

Output:

--SCANNER, start: 0
Pos: 7, Scanned: first
Pos: 14, Scanned: second
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 0
Pos: 7, Read: first
Pos: 14, Read: second
Pos: 20, Read: third
Pos: 26, Read: fourth
--SCANNER, start: 14
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 14
Pos: 20, Read: third
Pos: 26, Read: fourth

Try the solutions and testing code on the Go Playground.

这篇关于如何使用扫描仪从特定行号开始读取文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆