Python哈希未在csv文件输出中更新 [英] Python Hash not being updated in csv file output

查看:182
本文介绍了Python哈希未在csv文件输出中更新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个工作代码,它需要一个csv文件的目录,并且散列每行一列,然后将所有文件聚合在一起。问题是输出只显示第一个哈希值,不重新运行每行的哈希。这是代码:

  import glob 
import hashlib

files = glob.glob ('* .csv')
output =combined.csv

打开(输出,'w')作为结果:
文件中的文件:
f = open(thefile)
m = f.readlines()
for in m [1:]:
fields = line.split()
hash_object = hashlib.md5 b'(fields [2])')
newline = fields [0],fields [1],hash_object.hexdigest(),fields [3]
joined_line =','。
result.write(joined_line +'\\\
')
f.close()


>。该值与CSV数据没有任何关系,即使它使用的行变量名称中使用的字符相同。



您需要传入字节



  hash_object = hashlib.md5(fields [2] .encode('utf8') )

我假设你的 fields [2] column是一个字符串,所以你需要首先对其进行编码以获取字节。 UTF-8编码可以处理可能包含在字符串中的所有代码点。



您也似乎正在重新创造CSV读写轮;您可能应该使用 csv 模块代替:

  import csv 

#...
$ b b打开(输出,'w',newline ='')作为结果:
writer = csv.writer(result)

文件中的文件:
with open文件,newline ='')as f:
reader = csv.reader(f)
next(reader,None)#跳过第一行
读取器中的字段:
hash_object = hashlib.md5(fields [2] .encode('utf8'))
newrow = fields [:2] + [hash_object.hexdigest()] + fields [3:]
writer.writerow newrow)


I have working code that takes a directory of csv files and hashes one column of each line, then aggregates all files together. The issue is the output only displays the first hash value, not re-running the hash for each line. Here is the code:

 import glob
 import hashlib

 files = glob.glob( '*.csv' )
 output="combined.csv"

 with open(output, 'w' ) as result:
     for thefile in files:
        f = open(thefile)
        m = f.readlines()
        for line in m[1:]:
            fields = line.split()       
            hash_object = hashlib.md5(b'(fields[2])')
            newline = fields[0],fields[1],hash_object.hexdigest(),fields[3]
            joined_line = ','.join(newline)
            result.write(joined_line+ '\n')
  f.close()

解决方案

You are creating a hash of a fixed bytestring b'(fields[2])'. That value has no relationship to your CSV data, even though it uses the same characters as are used in your row variable name.

You need to pass in bytes from your actual row:

hash_object = hashlib.md5(fields[2].encode('utf8'))

I am assuming your fields[2] column is a string, so you'd need to encoding it first to get bytes. The UTF-8 encoding can handle all codepoints that could possibly be contained in a string.

You also appear to be re-inventing the CSV reading and writing wheel; you probably should use the csv module instead:

 import csv

 # ...

 with open(output, 'w', newline='') as result:
     writer = csv.writer(result)

     for thefile in files:
        with open(thefile, newline='') as f:
            reader = csv.reader(f)
            next(reader, None)  # skip first row
            for fields in reader:
                hash_object = hashlib.md5(fields[2].encode('utf8'))
                newrow = fields[:2] + [hash_object.hexdigest()] + fields[3:]
                writer.writerow(newrow)

这篇关于Python哈希未在csv文件输出中更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆