Jupyter 笔记本:每个笔记本的内存使用情况 [英] Jupyter notebook: memory usage for each notebook
问题描述
由于用户从不关闭旧笔记本,我实验室服务器 (Ubuntu) 上的内存不断填满.我想更好地了解每个笔记本占用多少内存.我可以总结每个用户运行的所有 jupyter 笔记本的(粗略)内存使用情况,但我想获得每个单独笔记本的总内存使用情况,以便我可以关闭那些特定的内存猪(或告诉另一个用户关闭他的/她倒下了).我很快把下面的代码放在一起得到大约.内存.每个 jupyter 内核的使用情况,但我不知道如何将内核 ID 与特定笔记本相关联.
The memory on my lab's server (Ubuntu) is constantly filling up due to users never shutting down old notebooks. I would like to get a better idea of how much memory each notebook is taking up. I can summarize (rough) memory usage for all jupyter notebooks run by each user, but I would like to get the total memory usage of each individual notebook so that I can shut down those particular memory hogs (or tell another user to shut his/her's down). I quickly put together the following code to get approx. mem. usage per jupyter kernel, but I don't know how to associate the kernel IDs to a particular notebook.
import os
import pwd
import pandas as pd
UID = 1
EUID = 2
pids = [pid for pid in os.listdir('/proc') if pid.isdigit()]
df = []
for pid in pids:
try:
ret = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()
except IOError: # proc has already terminated
continue
# jupyter notebook processes
if len(ret) > 0 and 'share/jupyter/runtime' in ret:
process = psutil.Process(int(pid))
mem = process.memory_info()[0]
# user name for pid
for ln in open('/proc/%d/status' % int(pid)):
if ln.startswith('Uid:'):
uid = int(ln.split()[UID])
uname = pwd.getpwuid(uid).pw_name
# user, pid, memory, proc_desc
df.append([uname, pid, mem, ret])
df = pd.DataFrame(df)
df.columns = ['user', 'pid', 'memory', 'proc_desc']
df
推荐答案
为了可移植性,我对 sharchaea 的脚本做了一些改进和速度.
I made some improvements to sharchaea's script for portability and speed.
主要是只检查笔记本运行的端口,检查不同的主机名选项,改进内核进程检查并检查 ipython 或 jupyter.
Mainly, only check ports that notebooks are running on, check different hostname options, improve the kernel process check and check for ipython or jupyter.
import argparse
import re
import subprocess
import pandas as pd
import psutil
import requests
import tabulate
kernel_regex = re.compile(r".+kernel-(.+).json")
notebook_regex = re.compile(r"(https?://([^:/]+):?(d+)?)/?(?token=([a-z0-9]+))?")
def get_proc_info():
pids = psutil.pids()
# memory info from psutil.Process
df_mem = []
for pid in pids:
try:
proc = psutil.Process(pid)
cmd = " ".join(proc.cmdline())
except psutil.NoSuchProcess:
continue
if len(cmd) > 0 and ("jupyter" in cmd or "ipython" in cmd) and "kernel" in cmd:
# kernel
kernel_ID = re.sub(kernel_regex, r"1", cmd)
# memory
mem = proc.memory_info()[0] / float(1e9)
uname = proc.username()
# user, pid, memory, kernel_ID
df_mem.append([uname, pid, mem, kernel_ID])
df_mem = pd.DataFrame(df_mem)
df_mem.columns = ["user", "pid", "memory_GB", "kernel_ID"]
return df_mem
def get_running_notebooks():
notebooks = []
for n in subprocess.Popen(
["jupyter", "notebook", "list"], stdout=subprocess.PIPE
).stdout.readlines()[1:]:
match = re.match(notebook_regex, n.decode())
if match:
base_url, host, port, _, token = match.groups()
notebooks.append({"base_url": base_url, "token": token})
else:
print("Unknown format: {}".format(n.decode()))
return notebooks
def get_session_info(password=None):
df_nb = []
kernels = []
for notebook in get_running_notebooks():
s = requests.Session()
if notebook["token"] is not None:
s.get(notebook["base_url"] + "/?token=" + notebook["token"])
else:
# do a get to the base url to get the session cookies
s.get(notebook["base_url"])
if password is not None:
# Seems jupyter auth process has changed, need to first get a cookie,
# then add that cookie to the data being sent over with the password
data = {"password": password}
data.update(s.cookies)
s.post(notebook["base_url"] + "/login", data=data)
res = s.get(notebook["base_url"] + "/api/sessions")
if res.status_code != 200:
raise Exception(res.json())
for sess in res.json():
kernel_ID = sess["kernel"]["id"]
if kernel_ID not in kernels:
kernel = {
"kernel_ID": kernel_ID,
"kernel_name": sess["kernel"]["name"],
"kernel_state": sess["kernel"]["execution_state"],
"kernel_connections": sess["kernel"]["connections"],
# "notebook_url": notebook["base_url"] + "/notebook/" + sess["id"],
"notebook_path": sess["path"],
}
kernel.update(notebook)
df_nb.append(kernel)
kernels.append(kernel_ID)
df_nb = pd.DataFrame(df_nb)
del df_nb["token"]
return df_nb
def parse_args():
parser = argparse.ArgumentParser(description="Find memory usage.")
parser.add_argument("--password", help="password (only needed if pass-protected)")
return parser.parse_args()
def main(password=None, print_ascii=False):
df_mem = get_proc_info()
df_nb = get_session_info(password)
# joining tables
df = pd.merge(df_nb, df_mem, on=["kernel_ID"], how="inner")
df = df.sort_values("memory_GB", ascending=False).reset_index(drop=True)
if print_ascii:
print(tabulate.tabulate(df, headers=(df.columns.tolist())))
return df
if __name__ == "__main__":
args = vars(parse_args())
main(args["password"], print_ascii=True)
我可能会继续在此要点
代码已更新为使用令牌身份验证与较新版本的 Jupyter 一起使用,仅利用 psutil
使其与 Windows 兼容,并在 Python 3 上工作.
edit: Code has been updated to work with newer versions of Jupyter using token authentication, to leverage only psutil
making it Windows compatible, and to work on Python 3.
这篇关于Jupyter 笔记本:每个笔记本的内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!