连接由正则表达式从子流程STDERR检索的字符串会导致混乱 [英] Concatenating strings retrieved by regex from a subprocess STDERR results in disorder
问题描述
我有一个音频文件,Sample.flac
.标题和长度可以用ffprobe
读取,以将输出发送到STDERR.
I have an audio file, Sample.flac
. The title and length can be read with ffprobe
to result in the output being sent to STDERR.
我想通过子进程运行ffprobe
,并且已经成功完成了.然后,我用*.communicate()[1].decode()
检索输出(通过管道传送到subprocess.PIPE
),如我应该由Python文档所指示的那样.
I want to run ffprobe
through subprocess, and have done so successfully. I then retrieve the output (piped to subprocess.PIPE
) with *.communicate()[1].decode()
as indicated that I should by the Python docs.
communicate()
返回一个元组(stdout, stderr)
,并带有Popen()
对象的输出.然后访问stderr
的正确索引,并将其从字节字符串解码为Python 3 UTF-8字符串.
communicate()
returns a tuple, (stdout, stderr)
, with the output from the Popen()
object. The proper index for stderr
is then accessed and decoded from a byte string into a Python 3 UTF-8 string.
然后使用与ffprobe
元数据输出的格式匹配的多行正则表达式模式来解析此解码后的输出.然后将匹配组适当地放入字典中,每个第一组都转换为小写,并用作第二组(值)的键.
This decoded output is then parsed with a multiline regex pattern matching the format of the ffprobe
metadata output. The match groups are then placed appropriately into a dictionary, with each first group converted to lowercase, and used as the key for the second group (value).
可以按预期通过字典键访问数据.但是,将这些值串联在一起(全部都是字符串)后,输出看起来就混乱了.
The data can be accessed through the dictionary keys as expected. But upon concatenating the values together (all are strings), the output appears mangled.
这是我期望的输出:
Believer (Kaskade Remix) 190
相反,这就是我得到的:
Instead, this is what I get:
190ever (Kaskade Remix)
我不明白为什么字符串看起来会彼此重叠"并导致变形的形式.谁能解释这个以及我做错了什么?
I don't understand why the strings appear to "overlap" each other and result in a mangled form. Can anyone explain this and what I have done wrong?
下面是运行完整的代码以产生上面的结果.这是我整个项目的缩小部分.
Below is the complete code that was run to produce the results above. It is a reduced section of my full project.
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
import os
from re import findall, MULTILINE
from subprocess import Popen, PIPE
def media_metadata(file_path):
"""Use FFPROBE to get information about a media file."""
stderr = Popen(("ffprobe", file_path), shell=True, stderr=PIPE).communicate()[1].decode()
metadata = {}
for match in findall(r"(\w+)\s+:\s(.+)$", stderr, MULTILINE):
metadata[match[0].lower()] = match[1]
return metadata
if __name__ == "__main__":
meta = media_metadata("C:/Users/spike/Music/Sample.flac")
print(meta["title"], meta["length"])
# The above and below have the same result in the console
# print(meta["title"] + " " + meta["length"])
# print("{title} {length}".format(meta))
任何人都可以解释这个不可预测的输出吗?
Can anyone explain this unpredictable output?
我早些时候在此处提出了这个问题,我认为这不是很清楚.在原始输出中,您会看到,到最后,字符串开始变得变得不可预测,甚至根本不打印title
值的一部分.
I have asked this question here earlier, however I dont think it was very clear. In the raw output when this is run on multiple files, you can see that towards the end the strings start becoming as unpredictable as not even printing part of the title
value at all.
谢谢.
推荐答案
复制:
print('Believer (Kaskade Remix)\r 190')
输出:
190ever (Kaskade Remix)
问题:
行尾是\r\n
. re $
匹配\n
. \r
保留在匹配组中.
End-Of-Line is \r\n
. re $
matches \n
. \r
remains in the matching group.
修复:
在$
之前的$
中插入\r
.即(\w+)\s+:\s(.+)\r$
或将universal_newlines=True
用作Popen参数并删除.decode()
因为输出将是带有\n
而不是\r\n
的文本.
Or use universal_newlines=True
as a Popen argument and remove .decode()
as the output will be text with \n
instead of \r\n
.
或stderr = stderr.replace('\r', '')
,然后重新处理.
替代:
ffprobe可以输出 json 字符串.使用json
模块,该模块加载字符串
并返回字典.
ffprobe can output a json string. Use json
module which loads the string
and returns a dictionary.
即命令
['ffprobe', '-show_format', '-of', 'json', file_path]
json字符串将成为标准输出流.
The json string will be the stdout stream.
这篇关于连接由正则表达式从子流程STDERR检索的字符串会导致混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!