如何确定SLURM中在python脚本步骤内存中超出了哪一点 [英] How to determine at which point in python script step memory exceeded in SLURM
问题描述
我有一个python
脚本,该脚本正在SLURM
群集上运行,用于多个输入文件:
I have a python
script that I am running on a SLURM
cluster for multiple input files:
#!/bin/bash
#SBATCH -p standard
#SBATCH -A overall
#SBATCH --time=12:00:00
#SBATCH --output=normalize_%A.out
#SBATCH --error=normalize_%A.err
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=240000
HDF5_DIR=...
OUTPUT_DIR=...
NORM_SCRIPT=...
norm_func () {
local file=$1
echo "$file"
python $NORM_SCRIPT -data $file -path $OUTPUT_DIR
}
# Doing normalization in parallel
for file in $HDF5_DIR/*; do norm_func "$file" & done
wait
python脚本仅加载数据集(scRNAseq
),对其进行归一化并另存为.csv
文件.其中的一些主要代码行是:
The python script is just loading a dataset (scRNAseq
), does its normalization and saves as .csv
file. Some major lines of code in it are:
f = h5py.File(path_to_file, 'r')
rawcounts = np.array(rawcounts)
unique_code = np.unique(split_code)
for code in unique_code:
mask = np.equal(split_code, code)
curr_counts = rawcounts[:,mask]
# Actual TMM normalization
mtx_norm = gmn.tmm_normalization(curr_counts)
# Writing the results into .csv file
csv_path = path_to_save + "/" + file_name + "_" + str(code) + ".csv"
with open(csv_path,'w', encoding='utf8') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(["", cell_ids])
for idx, row in enumerate(mtx_norm):
writer.writerow([gene_symbols[idx], row])
对于10Gb
以上的数据集,我一直收到step memory exceeded
错误,但我不确定为什么.如何更改.slurm
脚本或python
代码以减少其内存使用量?我如何才能真正找到导致memory
问题的原因,在这种情况下,是否有特定的调试内存的方法?任何建议将不胜感激.
I keep getting step memory exceeded
error for datasets that are above 10Gb
and I am not sure why. How I can change my .slurm
script or python
code to reduce its memory usage? How can I actually identify what causes the memory
problem, is there a particular way of debugging the memory in this case? Any suggestions would be greatly appreciated.
推荐答案
您可以使用srun
启动python脚本来获取详细信息:
You can get refined information by using srun
to start the python scripts:
srun python $NORM_SCRIPT -data $file -path $OUTPUT_DIR
然后,Slurm将为每个python脚本实例创建一个步骤",并在记帐中独立报告每个步骤的信息(错误,返回代码,使用的内存等),您可以查询sacct
命令.
Slurm will then create one 'step' per instance of your python script, and report information (errors, return codes, memory used, etc.) for each step independently in the accounting, which you can interrogate with the sacct
command.
如果由管理员配置,请使用--profile
选项来获取每个步骤的内存使用情况的时间表.
If configured by the administrators, use the --profile
option to get a timeline of the memory usage of each step.
在您的python脚本中,您可以使用 memory_profile 模块来获取有关内存使用情况的反馈您的脚本.
In your python script you can use the memory_profile module to get a feedback on the memory usage of your scripts.
这篇关于如何确定SLURM中在python脚本步骤内存中超出了哪一点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!