图书馆书号的Python排序脚本(CSV文件) [英] Python Sorting Script for Library Book Call No. (CSV file)

查看:228
本文介绍了图书馆书号的Python排序脚本(CSV文件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Python脚本,以在电话号码和标题的CSV列表中查找重复的条目.这是CSV文件的格式:

I am writing a Python script to find duplicate entries in a CSV list of call numbers and titles. Here is the format of the CSV file:

920.105,乔治·穆勒(George Mueller)
920.105,乔治·穆勒(George Mueller)
920.105,乔治·穆勒(327.373),加拉太书和以弗所书
327.371,加拉太书和以弗所书
289,现代舌头 运动288.01,基督教的诱惑
288.003,对邪教和新宗教的了解
288.002,了解邪教和新宗教
286.061,历史 浸信会,A"
286.044,"浸信会的历史,A
286.003,浸信会历史中的这一天3
286.003,浸信会历史中的这一天 浸信会历史3
286.003,浸信会历史3中的这一天

920.105,George Mueller
920.105,George Mueller
920.105,George Mueller
327.373,The Letters to the Galatians and Ephesians
327.371,Galatians and Ephesians
289,The Modern Tongues Movement
288.01,The Seduction of Christianity
288.003,Understanding Cults and New Religions
288.002,Understanding Cults and New Religions
286.061,"History of the Baptists, A"
286.044,"History of the Baptists, A"
286.003,This Day in Baptist History 3
286.003,This Day in Baptist History 3
286.003,This Day in Baptist History 3

我需要做的是找到所有具有不同标题的重复电话号码.因此,我不在乎大多数条目,因为它们是同一本书的重复项.我正在寻找被赋予相同电话号码的不同书籍.我拥有的脚本将正确无误地完成,但是当我打开文件时,创建的脚本为空.
这是我的代码:

What I need to do is find all of the duplicated call numbers that have different titles. So I don't care about most of the entries, because they are duplicates of the same book. I'm looking for different books that were given the same call number. The script I have will complete with no errors, but when I open the file the script created it is empty.
Here's my code:

#!/usr/bin/python3

import csv


def readerObject(csvFileName):
    """
    Opens and returns a reader object.
    """
    libFile = open(csvFileName)
    libReader = csv.reader(libFile)
    libData = list(libReader)
    return libData


def main():

    # Initialize the state variable
    state = 0

    # Prompt the user for the CSV file name
    fileName = input('Enter the CSV file to be read (Please use the full path): \n')
    # Open readerObject and copy its contents into a list
    csvToList = readerObject(fileName)
    loopList1 = list(csvToList)

    # Create writer object to... Write to
    fileToWrite = input('Enter the name of the file to write to: \n')
    libOutputFile = open(fileToWrite, 'w', newline='')
    libOutputWriter = csv.writer(libOutputFile)

    # Loop 1:
    for a in range(len(loopList1)):
        if state == 1:
            libOutputWriter.writerow(loopList2[0])
            del loopList1[0]
        loopList2 = list(csvToList)
        state = 0
        # Loop 2:
        for b in range(len(loopList2)):
            if loopList2[0][0] == loopList2[1][0]:
                if loopList2[0][1] != loopList2[1][1]:
                    libOutputWriter.writerow(loopList2[1])
                    del loopList2[1]
                    state = 1

    libOutputFile.close()

if __name__ == "__main__":
    main()

提前谢谢!

推荐答案

如果您输入的内容是按书号排序的,则可以使用

if your input is sorted by the book numbers, you could use itertools.groupby:

import csv
from io import StringIO
from itertools import groupby

text = '''920.105,George Mueller
920.105,George Mueller
920.105,George Mueller 1
327.373,The Letters to the Galatians and Ephesians
327.371,Galatians and Ephesians
289,The Modern Tongues Movement
288.01,The Seduction of Christianity
288.003,Understanding Cults and New Religions
288.002,Understanding Cults and New Religions
286.061,"History of the Baptists, A"
286.044,"History of the Baptists, A"
286.003,This Day in Baptist History 1
286.003,This Day in Baptist History 2
286.003,This Day in Baptist History 3'''

with StringIO(text) as in_file, StringIO() as out_file:
    reader = csv.reader(in_file)
    writer = csv.writer(out_file)

    for number, group in groupby(reader, key=lambda x: x[0]):

        titles = set(item[1] for item in group)
        if len(titles) != 1:
            writer.writerow((number, *titles))

    print(out_file.getvalue())

将输出

920.105,George Mueller 1,George Mueller
286.003,This Day in Baptist History 2,This Day in Baptist History 3,This Day in Baptist History 1

请注意,我必须更改您的输入,因为那样就不会产生任何输出...

note that i had to change your input as that would not have generated any output...

为了使用它,您需要用with open('infile.txt', 'r') as file之类的内容替换with StringIO(text) as file:,以便程序读取您的实际文件(对于使用open('outfile.txt', 'w')的输出文件也是如此).

in order to use that you'd need replace the with StringIO(text) as file: with something like with open('infile.txt', 'r') as file for the program to read your actual file (and similar for the output file with open('outfile.txt', 'w')).

:如果您输入的内容是按数字排序的,那么有效.

again: this will only work if your input is sorted by the numbers.

这篇关于图书馆书号的Python排序脚本(CSV文件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆