我如何唯一标识Python中的媒体文件,而不是元数据的内容? [英] How do I uniquely identify the content of a media file in Python, not the metadata?

查看:250
本文介绍了我如何唯一标识Python中的媒体文件,而不是元数据的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的媒体文件,主要是音乐,他们大多已经从CD多年前进口的集合。这个系列已经被不同的媒体播放器,不同的文件系统,不同的计算机等,多次之间传输。在这个过程中,有些轨道被意外复制。我也在不断尝试策划对这些元数据和得到的一切标签正确,因为当大部分是原先进口的,我没有花哨的媒体播放软件,甚至没有意识到,ID3标签显示,一切都只是 的经典专辑追踪%d个相册。

I have a collection of media files, mostly music, most of them having been imported from CD many years ago. This collection has been transferred between different media players, different filesystems, different computers, etc, many times. In that process, some tracks have been accidentally duplicated. I'm also constantly trying to curate the metadata on these and get everything properly tagged, since when much of it was originally imported, I did not have fancy media playback software and did not even realize that the ID3 tags indicated that everything was just "Track %d" on the classic album "Album".

这将创建一个情况我有最多最新的元数据的一些文件,但我想删除其元数据没有被正确地更新相同的媒体文件的副本。由于元数据文件中的present,这些文件的内容现在不同,喜欢的 liten2 不起作用。

This creates a situation where I have some files with up-to-date metadata, but "duplicates" of the same media file that I'd like to delete, whose metadata has not been properly updated. Since the metadata is present within the file, the contents of these files now differ and tools like liten2 don't work.

我的问题是:有没有我可以使用将可以方便地提取一个唯一识别的指纹库(可能是一个SHA-1散列,但是这并不是一个硬性要求)的的媒体内容仅供的中该文件,忽略了元数据?如果是这样,我怎么使用它?

My question is: is there a library I can use that will conveniently extract a uniquely identifying fingerprint (probably a SHA-1 hash, but that's not a hard requirement) of the media content only of the file, ignoring the metadata? If so, how do I use it?

推荐答案

Echoprint 是指纹通过音频一个免费的方式它的内容 - 即它不依赖于元数据,也没有对字节确切数据匹配。他们的常见问题有一个条目我要重复数据删除大集合

Echoprint is one free way to fingerprint audio by its content - i.e. it doesn't depend on metadata, nor on byte-exact data matches. Their FAQ has an entry "I want to deduplicate a big collection".

我觉得它的核心本身不是蟒蛇,但网络API - 但他们提供 pyechonest

I think the core of it is not itself python but a web API - but they provide pyechonest library.

这篇关于我如何唯一标识Python中的媒体文件,而不是元数据的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆