如何从文件列表开始创建一个电影数据库 [英] How to create a movie database starting from a list of files

查看:130
本文介绍了如何从文件列表开始创建一个电影数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的家庭服务器上有大量的电影(大约4000)。这些文件都命名为 Title - Subtitle(year).extension 。我想创建一个数据库(即使在excel也不错)我所有的电影。数据库应该包含列:标题,字幕(如果存在),服务器上的文件的年份和位置(一些电影由genere或actor组织在文件夹中)。到目前为止,我有一个bash脚本只返回一个包含每个硬盘驱动器文件列表的txt文件(每个文件包含每个硬盘驱动器的列表)。
如何在我的家庭服务器(运行debian)上自动创建这种数据库?



自动检索有关电影使用一些电影数据库api,但我猜这将是非常复杂的。

解决方案

这是一个非常广泛的问题,这里不太适合(这是一个教程而不是一个快速的代码问题),但这里有一些战略建议:




  • Excel将打开一个.csv并将逗号/新行作为单元格。所以

  • 您需要遍历目录(ies)迭代(ies)

  • 展开路径名称 - 如果您使用高级语言像Python一样,这是通过标准功能实现的;然后使用正则表达式来解析最后一个位

  • 将每个路径的格式化内容存储在列表中的行

  • 将列表打印到文本文件,用逗号和每行加一个新的行字符

  • 提供带.csv后缀的文件,并在Excel中打开它

注意,如果你真的想要一个数据库,Python再次是一个不错的选择 - SQLite是标准安装的一部分。



欢呼,祝你好运






更新:哈哈,你在我回答的时候编辑了这个问题。看起来你需要的是文件名,但是如果你打算使用元数据,这里是一个小心点。将元数据从您的文件中拉出来,如果不是都来自同一个源码,可能会变得更棘手;并不是每个媒体类型都具有相同的元数据结构,而不是每个创建文件的应用程序都是相同的。所以获取元数据的逻辑会变得凌乱。



有没有理由不能使用现有的程序来实现?



最后你提到得到它在您的网络服务器上再次推迟到Python,将您需要的服务器的请求的能力也内置在标准包中。






最终更新



不能帮助你的bash;我都是大拇指,我也不是Python的专家,但你的目标很简单。我还没有测试这个 - 可能是一个错字或两个,认为它是大多是python准备好的伪代码。

 #导入您需要的标准库
import os#https://docs.python.org/ 2 / library / os.html
import re#https://docs.python.org/2/library/re.html

#这个函数会走你的目录并输出一个列表的文件路径
def getFilePaths(目录):
file_paths = []
用于根目录,目录,os.walk(目录)中的文件:
文件中的文件名:
filepath = os.path.join(root,filename)
file_paths.append(filepath)
return file_paths



video_file_paths = getFilePaths path / to / video / library)
output_to_csv = [];
video_file_paths中的video_file:
base_path,fname = os.path.split(video_file)

这是一个超简单的正则表达式,只要你的文件是所有格式化为
,将解析出标题,字幕,年份和文件扩展名,如果您的文件名称为
,则会出现比您预期的更多异常(如果没有,我会感到震惊),您可以需要
来使这部分更强大,无论是更精明的正则表达式,还是一些条件的
逻辑 - 也许是一个递归的try ... catch循环
reg_ex = re.compile (/^(.*) - (。*)\((。*)\)\。(。*)$ /);

#现在将编译的正则表达式应用于每个路径
name_components = reg_ex.match(fname);

每个输出都是您的CSV文件的一行; .join()将加入正则表达式
匹配的4个元素(假设您的文件名又一样干净,你要声明),然后添加
的基准线,所以你应该在这个循环中建立一个列表,其中包含以下元素:
title,subtitle,year,file_extension,full path

output_to_csv.append({0},{1}。format(name_components.join(,),base_path));

#create该文件,确保位置是可写的
csv_doc = open(my_video_database.csv,w);

#现在使用换行符连接所有行,并将编译的文本写入文件
csv_doc.write(ouput_to_csv.join(\\\
));

#关闭您的新数据库
csv_doc.close()


I have a large amount of movies on my home server (4000 circa). The files are all named Title - Subtitle (year).extension. I would like to create a database (even in excel would be fine) of all my movies. The database should contain as columns: title, subtitle (if exists), year and location of the file on the server (some movies are organized in folders by genere or actor). As of now I have a bash script that returns just a txt file containing a list of files for each hard drive (each file contains the list for each hard drive). How can I create this kind of database automatically on my home server (which is running debian)?

It would be great also to automatically retrieve other informations about the movies using some movie databases api, but I guess that this would be very complicated.

解决方案

This is a pretty broad question and not really appropriate here (this is more of a tutorial than a quick code question), but here's some strategic advice:

  • Excel will open a .csv and treat commas/new lines as cells. So
  • You need to iterate, maybe recursively, over the directory(ies)
  • Expand the path name—if you use a high-level language like Python, this is acheived by standard functions; then use regular expressions to parse the final bit
  • Store the formatted contents of each path as rows in a list
  • Print that list to a text file, joining each element by commas and each row by a new line character
  • Provide said file with a .csv suffix and open it in Excel

Note that if you really want a database proper, Python again is a nice choice—SQLite is part of the standard install.

Cheers, good luck


UPDATE: Haha, you edited the question whilst I answered. It seems like everything you need is in the file name, but if you're planning on using metadata, here's a caution. Pulling the metadata out of your files can get trickier if they've not all come from the same source; not every media type has the same metadata structure, not every application that creates the files provides the same. So the logic of getting your metadata can get messy.

Is there a reason you can't use extant programs to do this?

Finally you mention getting it on your web-server; once again deferring to Python, the capacity to make the requests of your server you need is also built into the standard package.


Final Update

Can't help you with bash; I'm all thumbs there, and I'm no expert in Python either but your goals are pretty simple. I haven't tested this—there is probably a typo or two, consider it pseudo-code that is mostly python-ready.

# import the standard libraries you'll need
import os # https://docs.python.org/2/library/os.html
import re # https://docs.python.org/2/library/re.html

# this function will walk your directories and output a list of file paths
def getFilePaths(directory):
    file_paths = []
    for root, directories, files in os.walk(directory):
        for filename in files:
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)
    return file_paths



video_file_paths = getFilePaths("path/to/video/library")
output_to_csv = [];
for video_file in video_file_paths:
    base_path, fname = os.path.split(video_file) 

     """ This is a super simple bit of regex that, provided  your files are all formatted as
     written, will parse out title, subtitle, year and file extension. If your file names
     turn out to have more exceptions than you expect (I'd be shocked if not), you may need
     to make this part more robust, either with much more savvy regex, or else some conditional
     logic—maybe a recursive try... catch loop"""
    reg_ex = re.compile("/^(.*) - (.*) \((.*)\)\.(.*)$/");

    # now apply the compiled regex to each path
    name_components = reg_ex.match(fname);

    """Each output is a row of your CSV file; .join() will join the 4 elements of the regex
    match (assuming, again, that your filenames are as clean as you claim), and then add
    the basepath, so you should be building, in this loop, a list with elements like:
    title, subtitle, year, file_extension, full path"""

    output_to_csv.append("{0},{1}".format(name_components.join(","), base_path));

#create the file, making sure the location is writeable
csv_doc = open("my_video_database.csv", "w");

# now join all the rows with line breaks and write the compiled text to the file
csv_doc.write( ouput_to_csv.join("\n") ); 

#close  your new database
csv_doc.close()

这篇关于如何从文件列表开始创建一个电影数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆