连接由正则表达式从子流程STDERR检索的字符串会导致混乱 [英] Concatenating strings retrieved by regex from a subprocess STDERR results in disorder

查看:98
本文介绍了连接由正则表达式从子流程STDERR检索的字符串会导致混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个音频文件,Sample.flac.标题和长度可以用ffprobe读取,以将输出发送到STDERR.

I have an audio file, Sample.flac. The title and length can be read with ffprobe to result in the output being sent to STDERR.

我想通过子进程运行ffprobe,并且已经成功完成了.然后,我用*.communicate()[1].decode()检索输出(通过管道传送到subprocess.PIPE),如我应该由Python文档所指示的那样.

I want to run ffprobe through subprocess, and have done so successfully. I then retrieve the output (piped to subprocess.PIPE) with *.communicate()[1].decode() as indicated that I should by the Python docs.

communicate()返回一个元组(stdout, stderr),并带有Popen()对象的输出.然后访问stderr的正确索引,并将其从字节字符串解码为Python 3 UTF-8字符串.

communicate() returns a tuple, (stdout, stderr), with the output from the Popen() object. The proper index for stderr is then accessed and decoded from a byte string into a Python 3 UTF-8 string.

然后使用与ffprobe元数据输出的格式匹配的多行正则表达式模式来解析此解码后的输出.然后将匹配组适当地放入字典中,每个第一组都转换为小写,并用作第二组(值)的键.

This decoded output is then parsed with a multiline regex pattern matching the format of the ffprobe metadata output. The match groups are then placed appropriately into a dictionary, with each first group converted to lowercase, and used as the key for the second group (value).

这里是输出和正则表达式的示例.

可以按预期通过字典键访问数据.但是,将这些值串联在一起(全部都是字符串)后,输出看起来就混乱了.

The data can be accessed through the dictionary keys as expected. But upon concatenating the values together (all are strings), the output appears mangled.

这是我期望的输出:

Believer (Kaskade Remix) 190

相反,这就是我得到的:

Instead, this is what I get:

 190ever (Kaskade Remix)

我不明白为什么字符串看起来会彼此重叠"并导致变形的形式.谁能解释这个以及我做错了什么?

I don't understand why the strings appear to "overlap" each other and result in a mangled form. Can anyone explain this and what I have done wrong?

下面是运行完整的代码以产生上面的结果.这是我整个项目的缩小部分.

Below is the complete code that was run to produce the results above. It is a reduced section of my full project.

#! /usr/bin/env python3
# -*- coding: utf-8 -*-

import os

from re import findall, MULTILINE
from subprocess import Popen, PIPE


def media_metadata(file_path):
    """Use FFPROBE to get information about a media file."""
    stderr = Popen(("ffprobe", file_path), shell=True, stderr=PIPE).communicate()[1].decode()

    metadata = {}

    for match in findall(r"(\w+)\s+:\s(.+)$", stderr, MULTILINE):
        metadata[match[0].lower()] = match[1]

    return metadata


if __name__ == "__main__":
    meta = media_metadata("C:/Users/spike/Music/Sample.flac")
    print(meta["title"], meta["length"])
    # The above and below have the same result in the console
    # print(meta["title"] + " " + meta["length"])
    # print("{title} {length}".format(meta))

任何人都可以解释这个不可预测的输出吗?

Can anyone explain this unpredictable output?

我早些时候在此处提出了这个问题,我认为这不是很清楚.在原始输出中,您会看到,到最后,字符串开始变得变得不可预测,甚至根本不打印title值的一部分.

I have asked this question here earlier, however I dont think it was very clear. In the raw output when this is run on multiple files, you can see that towards the end the strings start becoming as unpredictable as not even printing part of the title value at all.

谢谢.

推荐答案

复制:

print('Believer (Kaskade Remix)\r 190')

输出:

 190ever (Kaskade Remix)

问题:

行尾是\r\n. re $匹配\n. \r保留在匹配组中.

End-Of-Line is \r\n. re $ matches \n. \r remains in the matching group.

修复:

$之前的$中插入\r.即(\w+)\s+:\s(.+)\r$

或将universal_newlines=True用作Popen参数并删除.decode() 因为输出将是带有\n而不是\r\n的文本.

Or use universal_newlines=True as a Popen argument and remove .decode() as the output will be text with \n instead of \r\n.

stderr = stderr.replace('\r', ''),然后重新处理.

替代:

ffprobe可以输出 json 字符串.使用json模块,该模块加载字符串 并返回字典.

ffprobe can output a json string. Use json module which loads the string and returns a dictionary.

即命令

['ffprobe', '-show_format', '-of', 'json', file_path]

json字符串将成为标准输出流.

The json string will be the stdout stream.

这篇关于连接由正则表达式从子流程STDERR检索的字符串会导致混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆