Jupyter笔记本:每个笔记本的内存使用情况 [英] Jupyter notebook: memory usage for each notebook
问题描述
由于用户从未关闭旧笔记本电脑,实验室服务器(Ubuntu)上的内存一直在不断用完.我想对每个笔记本占用多少内存有一个更好的了解.我可以总结每个用户运行的所有jupyter笔记本的(粗略)内存使用情况,但是我想获取每个单独笔记本的总内存使用情况,以便我可以关闭这些特定的内存消耗(或告诉其他用户关闭他的/她很沮丧).我迅速将以下代码组合在一起以获得近似值.记忆每个jupyter内核的使用情况,但我不知道如何将内核ID与特定笔记本关联.
The memory on my lab's server (Ubuntu) is constantly filling up due to users never shutting down old notebooks. I would like to get a better idea of how much memory each notebook is taking up. I can summarize (rough) memory usage for all jupyter notebooks run by each user, but I would like to get the total memory usage of each individual notebook so that I can shut down those particular memory hogs (or tell another user to shut his/her's down). I quickly put together the following code to get approx. mem. usage per jupyter kernel, but I don't know how to associate the kernel IDs to a particular notebook.
import os
import pwd
import pandas as pd
UID = 1
EUID = 2
pids = [pid for pid in os.listdir('/proc') if pid.isdigit()]
df = []
for pid in pids:
try:
ret = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()
except IOError: # proc has already terminated
continue
# jupyter notebook processes
if len(ret) > 0 and 'share/jupyter/runtime' in ret:
process = psutil.Process(int(pid))
mem = process.memory_info()[0]
# user name for pid
for ln in open('/proc/%d/status' % int(pid)):
if ln.startswith('Uid:'):
uid = int(ln.split()[UID])
uname = pwd.getpwuid(uid).pw_name
# user, pid, memory, proc_desc
df.append([uname, pid, mem, ret])
df = pd.DataFrame(df)
df.columns = ['user', 'pid', 'memory', 'proc_desc']
df
推荐答案
我似乎已经为自己的问题找到了可行的解决方案:
I seemed to have figured out a working solution for my own problem:
import os
import pwd
import psutil
import re
import string
import json
import urllib2
import pandas as pd
UID = 1
EUID = 2
regex = re.compile(r'.+kernel-(.+)\.json')
pids = [pid for pid in os.listdir('/proc') if pid.isdigit()]
# memory info from psutil.Process
df_mem = []
for pid in pids:
try:
ret = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()
except IOError: # proc has already terminated
continue
# jupyter notebook processes
if len(ret) > 0 and 'share/jupyter/runtime' in ret:
# kernel
kernel_ID = re.sub(regex, r'\1', ret)
kernel_ID = filter(lambda x: x in string.printable, kernel_ID)
# memory
process = psutil.Process(int(pid))
mem = process.memory_info()[0] / float(1e9)
# user name for pid
for ln in open('/proc/{}/status'.format(int(pid))):
if ln.startswith('Uid:'):
uid = int(ln.split()[UID])
uname = pwd.getpwuid(uid).pw_name
# user, pid, memory, kernel_ID
df_mem.append([uname, pid, mem, kernel_ID])
df_mem = pd.DataFrame(df_mem)
df_mem.columns = ['user', 'pid', 'memory_GB', 'kernel_ID']
# notebook info from assessing ports
df_nb = []
for port in xrange(5000,30000):
sessions = None
try:
url = 'http://127.0.0.1:{}/api/sessions'.format(port)
sessions = json.load(urllib2.urlopen(url))
except urllib2.URLError:
sessions = None
if sessions:
for sess in sessions:
kernel_ID = str(sess['kernel']['id'])
notebook_path = sess['notebook']['path']
df_nb.append([port, kernel_ID, notebook_path])
df_nb = pd.DataFrame(df_nb)
df_nb.columns = ['port', 'kernel_ID', 'notebook_path']
# joining tables
df = pd.merge(df_nb, df_mem, on=['kernel_ID'], how='inner')
df.sort(['memory_GB'], ascending=False)
这篇关于Jupyter笔记本:每个笔记本的内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!