诊断何时受到磁盘I/O的限制 [英] diagnosing when I'm being limited by disk i/o

查看:143
本文介绍了诊断何时受到磁盘I/O的限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Linux机器上运行Python 2.7,到目前为止,我脚本中最慢的部分是使用

I'm running Python 2.7 on a Linux machine, and by far the slowest part of my script is loading a large json file from disk (a SSD) using the ujson library. When I check top during this loading process, my cpu usage is basically at 100%, leading me to believe that I'm being bottlenecked by parsing the json rather than by transferring the bytes from disk into memory. Is this a valid assumption to be making, or will ujson burn empty loops or something while waiting for the disk? I'm interested in knowing because I'm not sure whether dedicating another core of my cpu for another script that does a lot of disk i/o will significantly slow down the first script.

推荐答案

在没有看到您的代码的情况下,我将假设您正在执行以下操作:

Without seeing your code, I'm going to assume you are doing something like this:

with open('data.json') as datafile:
    data = json.loads(datafile.read())

相反,您可以拆分读取文件和解析文件的步骤:

Instead, you could split the steps of reading the file and parsing it:

with open('data.json') as datafile:
    raw_data = datafile.read()
    data = json.loads(raw_data)

如果添加一些计时呼叫,则可以确定每个步骤花费的时间:

If you add some timing calls, you can determine how long each step is taking:

# Timing decorator from https://www.andreas-jung.com/contents/a-python-decorator-for-measuring-the-execution-time-of-methods
import time                                                

def timeit(method):

    def timed(*args, **kw):
        ts = time.time()
        result = method(*args, **kw)
        te = time.time()

        print '%r (%r, %r) %2.2f sec' % \
              (method.__name__, args, kw, te-ts)
        return result

    return timed

with open('data.json') as datafile:
    @timeit
    raw_data = datafile.read()
    @timeit
    data = json.loads(raw_data)

这篇关于诊断何时受到磁盘I/O的限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆