Jupyter 笔记本:每个笔记本的内存使用情况 [英] Jupyter notebook: memory usage for each notebook

查看:40
本文介绍了Jupyter 笔记本:每个笔记本的内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于用户从不关闭旧笔记本,我实验室服务器 (Ubuntu) 上的内存不断填满.我想更好地了解每个笔记本占用多少内存.我可以总结每个用户运行的所有 jupyter 笔记本的(粗略)内存使用情况,但我想获得每个单独笔记本的总内存使用情况,以便我可以关闭那些特定的内存猪(或告诉另一个用户关闭他的/她倒下了).我很快把下面的代码放在一起得到大约.内存.每个 jupyter 内核的使用情况,但我不知道如何将内核 ID 与特定笔记本相关联.

The memory on my lab's server (Ubuntu) is constantly filling up due to users never shutting down old notebooks. I would like to get a better idea of how much memory each notebook is taking up. I can summarize (rough) memory usage for all jupyter notebooks run by each user, but I would like to get the total memory usage of each individual notebook so that I can shut down those particular memory hogs (or tell another user to shut his/her's down). I quickly put together the following code to get approx. mem. usage per jupyter kernel, but I don't know how to associate the kernel IDs to a particular notebook.

import os
import pwd
import pandas as pd

UID   = 1
EUID  = 2

pids = [pid for pid in os.listdir('/proc') if pid.isdigit()]

df = []
for pid in pids:
    try:
        ret = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()
    except IOError: # proc has already terminated
        continue

    # jupyter notebook processes
    if len(ret) > 0 and 'share/jupyter/runtime' in ret:
        process = psutil.Process(int(pid))
        mem = process.memory_info()[0] 

        # user name for pid
        for ln in open('/proc/%d/status' % int(pid)):
            if ln.startswith('Uid:'):
                uid = int(ln.split()[UID])
                uname = pwd.getpwuid(uid).pw_name

        # user, pid, memory, proc_desc
        df.append([uname, pid, mem, ret])

df = pd.DataFrame(df)
df.columns = ['user', 'pid', 'memory', 'proc_desc']
df

推荐答案

为了可移植性,我对 sharchaea 的脚本做了一些改进和速度.

I made some improvements to sharchaea's script for portability and speed.

主要是只检查笔记本运行的端口,检查不同的主机名选项,改进内核进程检查并检查 ipython 或 jupyter.

Mainly, only check ports that notebooks are running on, check different hostname options, improve the kernel process check and check for ipython or jupyter.

import argparse
import re
import subprocess

import pandas as pd
import psutil
import requests
import tabulate

kernel_regex = re.compile(r".+kernel-(.+).json")
notebook_regex = re.compile(r"(https?://([^:/]+):?(d+)?)/?(?token=([a-z0-9]+))?")


def get_proc_info():
    pids = psutil.pids()

    # memory info from psutil.Process
    df_mem = []

    for pid in pids:
        try:
            proc = psutil.Process(pid)
            cmd = " ".join(proc.cmdline())
        except psutil.NoSuchProcess:
            continue

        if len(cmd) > 0 and ("jupyter" in cmd or "ipython" in cmd) and "kernel" in cmd:
            # kernel
            kernel_ID = re.sub(kernel_regex, r"1", cmd)

            # memory
            mem = proc.memory_info()[0] / float(1e9)

            uname = proc.username()

            # user, pid, memory, kernel_ID
            df_mem.append([uname, pid, mem, kernel_ID])

    df_mem = pd.DataFrame(df_mem)
    df_mem.columns = ["user", "pid", "memory_GB", "kernel_ID"]
    return df_mem


def get_running_notebooks():
    notebooks = []

    for n in subprocess.Popen(
        ["jupyter", "notebook", "list"], stdout=subprocess.PIPE
    ).stdout.readlines()[1:]:
        match = re.match(notebook_regex, n.decode())
        if match:
            base_url, host, port, _, token = match.groups()
            notebooks.append({"base_url": base_url, "token": token})
        else:
            print("Unknown format: {}".format(n.decode()))

    return notebooks


def get_session_info(password=None):
    df_nb = []
    kernels = []

    for notebook in get_running_notebooks():
        s = requests.Session()
        if notebook["token"] is not None:
            s.get(notebook["base_url"] + "/?token=" + notebook["token"])
        else:
            # do a get to the base url to get the session cookies
            s.get(notebook["base_url"])
        if password is not None:
            # Seems jupyter auth process has changed, need to first get a cookie,
            # then add that cookie to the data being sent over with the password
            data = {"password": password}
            data.update(s.cookies)
            s.post(notebook["base_url"] + "/login", data=data)

        res = s.get(notebook["base_url"] + "/api/sessions")

        if res.status_code != 200:
            raise Exception(res.json())

        for sess in res.json():
            kernel_ID = sess["kernel"]["id"]
            if kernel_ID not in kernels:
                kernel = {
                    "kernel_ID": kernel_ID,
                    "kernel_name": sess["kernel"]["name"],
                    "kernel_state": sess["kernel"]["execution_state"],
                    "kernel_connections": sess["kernel"]["connections"],
                    # "notebook_url": notebook["base_url"] + "/notebook/" + sess["id"],
                    "notebook_path": sess["path"],
                }
                kernel.update(notebook)
                df_nb.append(kernel)
                kernels.append(kernel_ID)

    df_nb = pd.DataFrame(df_nb)
    del df_nb["token"]
    return df_nb


def parse_args():
    parser = argparse.ArgumentParser(description="Find memory usage.")
    parser.add_argument("--password", help="password (only needed if pass-protected)")

    return parser.parse_args()


def main(password=None, print_ascii=False):
    df_mem = get_proc_info()
    df_nb = get_session_info(password)

    # joining tables
    df = pd.merge(df_nb, df_mem, on=["kernel_ID"], how="inner")
    df = df.sort_values("memory_GB", ascending=False).reset_index(drop=True)
    if print_ascii:
        print(tabulate.tabulate(df, headers=(df.columns.tolist())))
    return df


if __name__ == "__main__":
    args = vars(parse_args())
    main(args["password"], print_ascii=True)

我可能会继续在此要点

代码已更新为使用令牌身份验证与较新版本的 Jupyter 一起使用,仅利用 psutil 使其与 Windows 兼容,并在 Python 3 上工作.

edit: Code has been updated to work with newer versions of Jupyter using token authentication, to leverage only psutil making it Windows compatible, and to work on Python 3.

这篇关于Jupyter 笔记本:每个笔记本的内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆