如何使用扫描仪从特定行号开始读取文件? [英] How to read a file starting from a specific line number using Scanner?
问题描述
我是Go的新手,我试图编写一个逐行读取文件的简单脚本。我还想保存文件系统上的进度(即读取的最后一个行号),以便如果同一个文件再次作为脚本的输入提供,它将开始从停止的行读取文件。
包主
$ b $ //包进口
进口(
bufio
标志
fmt
记录
os
)
//变量声明
var(
ConfigFile = flag.String(configfile,../config.json,json配置文件的路径。)
)
$ b $ //读取文件并分析日志条目的主函数
func main(){
flag.Parse()
settings:= NewConfig(* ConfigFile)
inputFile,err:= os.Open(settings.Source)
if err!= nil {
log.Fatal(err)
}
defer inputFile.Close()
scanner:= bufio.NewScanner(inputFile)
for scanner.Scan(){
fmt.Println(scanner.Text())
}
if err:= scanner.Err(); err!= nil {
log.Fatal(err)
}
}
//保存当前进度
func SaveProgress(){
//从进度中获取行数以确保
func GetCounter(){
}
我找不到任何方法处理扫描程序包中的行号。我知道我可以声明一个整数,例如 counter:= 0
,并在每次读取一行时像 counter ++
一样增加整数。但是下一次如何让扫描仪从特定的行开始?因此,例如,如果我读取直到 30
行下一次运行具有相同输入文件的脚本,我如何让扫描器开始从行 31
?
更新
我能想到的一个解决方案是使用如上所述的计数器并使用如下条件。
scanner:= bufio.NewScanner(inputFile)
for scanner.Scan(){
if counter>进度{
fmt.Println(scanner.Text())
}
}
我非常确定这样的事情会起作用,但它仍然会循环我们已经阅读过的行。请提出一个更好的方法。 解决方案
如果您不想阅读,但只是略过以前阅读的内容,以获取您离开的位置。
不同的解决方案以一种函数的形式呈现,该函数将输入读取并从开始位置(字节位置)开始读取行,例如:
func解决方案(输入io.ReadSeeker,start int64)错误
code>
特别的 如果您想要一个干净启动(从文件的开头开始读取),只需传递 下面的所有示例他们的测试代码可以在去游乐场找到。 为了以最小的努力做到这一点,我们可以使用一个新的拆分函数,输入令牌(行)。我们可以使用 让我们来看看split函数声明: 它返回前进的字节数: 在此解决方案中,我们使用 这个解决方案与JimB的类似,除了处理所有有效的行终止符序列,还将它们从读取行剥离是非常罕见的,他们需要);在正则表达式中,它是 注意:如果内容以空行(行结束符)结尾,则此解决方案将处理空行。如果你不想要这个,你可以简单地检查它: 测试代码将简单地使用内容 测试代码首先调用 输出: 试试 Go Goground 。 I am new to Go and I am trying to write a simple script that reads a file line by line. I also want to save the progress (i.e. the last line number that was read) on the filesystem somewhere so that if the same file was given as the input to the script again, it starts reading the file from the line where it left off. Following is what I have started off with. I could not find any methods that deals with line numbers in the scanner package. I know I can declare an integer say One solution I can think of here is to use the counter as I stated above and use an if condition like the following. I am pretty sure something like this would work, but it is still going to loop over the lines that we have already read. Please suggest a better way. If you don't want to read but just skip the lines you read previously, you need to acquire the position where you left off. The different solutions are presented in a form of a function which takes the input to read from and the start position (byte position) to start reading lines from, e.g.: A special If you want a clean start (to start reading from the beginning of the file), simply pass All the examples below with their testing code can be found on the Go Playground. In order to do this with minimal effort, we can use a new split function which splits the input into tokens (lines). We can use Let's take a look at the split function declaration: It returns the number of bytes to advance:
In this solution we use the This solution is similar to JimB's, with the addition of handling all valid line terminator sequences and also stripping them off from the read line (it is very rare they are needed); in regular expression notation, it is Note: If the content ends with an empty line (line terminator), this solution will process an empty line. If you don't want this, you can simply check it like this:
Testing code will simply use the content Test code first calls Output: Try the solutions and testing code on the Go Playground. 这篇关于如何使用扫描仪从特定行号开始读取文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! io.Reader
输入用于实现 io.Seeker
,这是允许在不读取数据的情况下跳过数据的通用界面。 * os.File
实现了这一点,因此您可以将 * File
传递给这些函数。好。 io.Reader
和 io.Seeker
的合并接口为
start = 0
即可。如果您想恢复之前的处理,请传递最后处理停止/中止的字节位置。此位置是以下函数(解决方案)中的 pos
局部变量的值。
1。使用
bufio.Scanner
bufio.Scanner
不能保持位置,但我们可以非常容易地扩展它以保持位置(读取字节) ,所以当我们想要重新开始时,我们可以寻找这个位置。
Scanner.Split()
来设置分割器功能(决定令牌/线条边界的逻辑)。默认的分割功能是 bufio.ScanLines()
。
bufio.SplitFunc
type SplitFunc func(data [] byte,atEOF bool)(advance int,token [] byte,err error)
advance
。确切地说,我们需要维护文件的位置。所以我们可以使用内建的 bufio.ScanLines()
创建一个新的split函数,所以我们甚至不必实现它的逻辑,只需使用 advance
返回值以维持位置:
func withScanner(输入io.ReadSeeker,start int64)错误{
fmt.Println( - SCANNER,start:,start)
if _,err:= input.Seek(start,0); err!= nil {
return err
}
scanner:= bufio.NewScanner(输入)
$ b $ pos:= start
scanLines:= func( data [] byte,atEOF bool)(advance int,token [] byte,err error){
advance,token,err = bufio.ScanLines(data,atEOF)
pos + = int64(advance)
return
}
scanner.Split(scanLines)
for scanner.Scan(){
fmt.Printf(Pos:%d,Scanned :%s \ n,pos,scanner.Text())
}
return scanner.Err()
}
2。使用
bufio.Reader
bufio.Reader
类型而不是扫描器
。 bufio.Reader
已经有 ReadBytes()
方法,如果我们通过'\ n'$ c $,它非常类似于read a line c>字节作为分隔符。
\r?\\\
。
func withReader(input io.ReadSeeker,start int64)error {
fmt.Println( - READER, start:,start)
if _,err:= input.Seek(start,0); err!= nil {
return err
}
r:= bufio.NewReader(input)
pos:= start
for {
data,err:= r.ReadBytes('\\\
')
pos + = int64(len(data))
if err == nil || err == io.EOF {
if len(data)> 0&& data [len(data)-1] =='\\\
'{
data = data [:len(data)-1]
}
if len(data)> 0&& data [len(data)-1] =='\r'{
data = data [:len(data)-1]
}
fmt.Printf(Pos:% d,阅读:%s \ n,pos,data)
}
if err!= nil {
if err!= io.EOF {
return err
break
}
}
return nil
}
如果len(data)!= 0 {
fmt.Printf(Pos:%d,Read:%s\\\
,pos,data)
} else {
//最后一行为空,省略
}
测试解决方案:
first \r\\\
,其中包含多行且行结束的行数不同。我们将使用
second\\\
third\\\
fourth strings.NewReader()
获得 io.ReadSeeker
,其来源是字符串
。
withScanner()
和 withReader()
传递 0
开始位置:一个干净启动。在下一轮中,我们将传递一个起始位置 start = 14
,这是3行的位置,所以我们不会看到处理的前2行): resume simulation。
func main(){
const content =first \r\\\
second\
third\\\
fourth
if err:= withScanner(strings.NewReader(content),0); err!= nil {
fmt.Println(Scanner error:,err)
}
if err:= withReader(strings.NewReader(content),0); err!= nil {
fmt.Println(Reader error:,err)
}
if err:= withScanner(strings.NewReader(content),14); err!= nil {
fmt.Println(Scanner error:,err)
}
if err:= withReader(strings.NewReader(content),14); err!= nil {
fmt.Println(Reader error:,err)
}
}
- SCANNER,start:0
Pos:7 ,扫描:第一个
Pos:14,已扫描:第二个
Pos:20,已扫描:第三个
Pos:26,已扫描:第四个
--READER,开始:0
Pos:7,阅读:第一个
Pos:14,阅读:第二个
Pos:20,阅读:第三个
Pos:26,阅读:第四个
--SCANNER, start:14
Pos:20,Scanned:third
Pos:26,Scanned:fourth
--READER,start:14
Pos:20,Read:third
Pos:26,阅读:第四个
package main
// Package Imports
import (
"bufio"
"flag"
"fmt"
"log"
"os"
)
// Variable Declaration
var (
ConfigFile = flag.String("configfile", "../config.json", "Path to json configuration file.")
)
// The main function that reads the file and parses the log entries
func main() {
flag.Parse()
settings := NewConfig(*ConfigFile)
inputFile, err := os.Open(settings.Source)
if err != nil {
log.Fatal(err)
}
defer inputFile.Close()
scanner := bufio.NewScanner(inputFile)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}
// Saves the current progress
func SaveProgress() {
}
// Get the line count from the progress to make sure
func GetCounter() {
}
counter := 0
and increment it each time a line is read like counter++
. But the next time how do I tell the scanner to start from a specific line? So for example if I read till line 30
the next time I run the script with the same input file, how can I make scanner to start reading from line 31
?Update
scanner := bufio.NewScanner(inputFile)
for scanner.Scan() {
if counter > progress {
fmt.Println(scanner.Text())
}
}
func solution(input io.ReadSeeker, start int64) error
io.Reader
input is used which also implements io.Seeker
, the common interface which allows skipping data without having to read them. *os.File
implements this, so you are allowed to pass a *File
to these functions. Good. The "merged" interface of both io.Reader
and io.Seeker
is io.ReadSeeker
.start = 0
. If you want to resume a previous processing, pass the byte position where the last processing was stopped/aborted. This position is the value of the pos
local variable in the functions (solutions) below.1. With
bufio.Scanner
bufio.Scanner
does not maintain the position, but we can very easily extend it to maintain the position (the read bytes), so when we want to restart next, we can seek to this position.Scanner.Split()
to set the splitter function (the logic to decide where are the boundaries of tokens/lines). The default split function is bufio.ScanLines()
.bufio.SplitFunc
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
advance
. Exactly what we need to maintain the file position. So we can create a new split function using the builtin bufio.ScanLines()
, so we don't even have to implement its logic, just use the advance
return value to maintain position:func withScanner(input io.ReadSeeker, start int64) error {
fmt.Println("--SCANNER, start:", start)
if _, err := input.Seek(start, 0); err != nil {
return err
}
scanner := bufio.NewScanner(input)
pos := start
scanLines := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
advance, token, err = bufio.ScanLines(data, atEOF)
pos += int64(advance)
return
}
scanner.Split(scanLines)
for scanner.Scan() {
fmt.Printf("Pos: %d, Scanned: %s\n", pos, scanner.Text())
}
return scanner.Err()
}
2. With
bufio.Reader
bufio.Reader
type instead of the Scanner
. bufio.Reader
already has a ReadBytes()
method which is very similar to the "read a line" functionality if we pass the '\n'
byte as the delimeter.\r?\n
. func withReader(input io.ReadSeeker, start int64) error {
fmt.Println("--READER, start:", start)
if _, err := input.Seek(start, 0); err != nil {
return err
}
r := bufio.NewReader(input)
pos := start
for {
data, err := r.ReadBytes('\n')
pos += int64(len(data))
if err == nil || err == io.EOF {
if len(data) > 0 && data[len(data)-1] == '\n' {
data = data[:len(data)-1]
}
if len(data) > 0 && data[len(data)-1] == '\r' {
data = data[:len(data)-1]
}
fmt.Printf("Pos: %d, Read: %s\n", pos, data)
}
if err != nil {
if err != io.EOF {
return err
}
break
}
}
return nil
}
if len(data) != 0 {
fmt.Printf("Pos: %d, Read: %s\n", pos, data)
} else {
// Last line is empty, omit it
}
Testing the solutions:
"first\r\nsecond\nthird\nfourth"
which contains multiple lines with varying line terminating. We will use strings.NewReader()
to obtain an io.ReadSeeker
whose source is a string
.withScanner()
and withReader()
passing 0
start position: a clean start. In the next round we will pass a start position of start = 14
which is the position of the 3. line, so we won't see the first 2 lines processed (printed): resume simulation.func main() {
const content = "first\r\nsecond\nthird\nfourth"
if err := withScanner(strings.NewReader(content), 0); err != nil {
fmt.Println("Scanner error:", err)
}
if err := withReader(strings.NewReader(content), 0); err != nil {
fmt.Println("Reader error:", err)
}
if err := withScanner(strings.NewReader(content), 14); err != nil {
fmt.Println("Scanner error:", err)
}
if err := withReader(strings.NewReader(content), 14); err != nil {
fmt.Println("Reader error:", err)
}
}
--SCANNER, start: 0
Pos: 7, Scanned: first
Pos: 14, Scanned: second
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 0
Pos: 7, Read: first
Pos: 14, Read: second
Pos: 20, Read: third
Pos: 26, Read: fourth
--SCANNER, start: 14
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 14
Pos: 20, Read: third
Pos: 26, Read: fourth