在写入文件之前检查重复项的方法? [英] Way to check for duplicates before writing into a file?

查看:47
本文介绍了在写入文件之前检查重复项的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我写了一个小脚本,将文本文件作为输入,读取每一行并尝试将其验证为电子邮件.如果通过,则将行写入新的(干净")文件;如果不通过,则将其删除空格并尝试再次对其进行验证.现在,如果这段时间过去了,它将把该行写入一个新文件,如果失败,它将忽略该行.

So I wrote a small script that takes text files as an input, reads every line and tries to validate it as an email. If it passes, it writes the line into a new ('clean') file, if it doesn't pass, it strips it of spaces and tries to validate it again. Now, if it passes this time, it writes the line into a new file and if it fails, it ignores the line.

事情就是这样,我的脚本可能会将重复的电子邮件写入输出文件.我应该如何解决这个问题,并在写入之前检查输出文件中是否存在重复项?

Thing is, such as it is, my script may write duplicate emails into the output files. How should I go around that and check for duplicates present in the output file before writing?

以下是相关代码:

// create reading and writing buffers
    scanner := bufio.NewScanner(r)
    writer := bufio.NewWriter(w)

    for scanner.Scan() {
        email := scanner.Text()

        // validate each email
        if !correctEmail.MatchString(email) {
            // if validation didn't pass, strip and lowercase the email and store it
            email = strings.Replace(email, " ", "", -1)
            // validate the email again after cleaning
            if !correctEmail.MatchString(email) {
                // if validation didn't pass, ignore this email
                continue
            } else {
                // if validation passed, write clean email into file
                _, err = writer.WriteString(email + "\r\n")
                if err != nil {
                    return err
                }
            }

        } else {
            // if validation passed, write the email into file
            _, err = writer.WriteString(email + "\r\n")
            if err != nil {
                return err
            }
        }

    }

    err = writer.Flush()
    if err != nil {
        return err
    }

推荐答案

您可以将Go内置地图用作这样的集合:

You may use a Go built-in map as a set like this:

package main

import (
    "fmt"
)

var emailSet map[string]bool = make(map[string]bool)

func emailExists(email string) bool {
    _, ok := emailSet[email]
    return ok
}

func addEmail(email string) {
    emailSet[email] = true
}

func main() {
    emails := []string{
        "duplicated@golang.org",
        "abc@golang.org",
        "stackoverflow@golang.org",
        "duplicated@golang.org", // <- Duplicated!
    }
    for _, email := range emails {
        if !emailExists(email) {
            fmt.Println(email)
            addEmail(email)
        }
    }
}

以下是输出:

duplicated@golang.org
abc@golang.org
stackoverflow@golang.org

您可以在 围棋场 中尝试相同的代码.

You may try the same code at The Go Playground.

这篇关于在写入文件之前检查重复项的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆