Python:Pycharm 运行时 [英] Python: Pycharm runtimes

查看：51 发布时间：2021/6/15 19:34:03 python performance

本文介绍了Python:Pycharm 运行时的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目睹了 PyCharm 的一些奇怪的运行时问题，如下所述.该代码已在具有 20 个内核和 256 GB RAM 的机器上运行，并且有足够的内存可用.我没有展示任何实际功能，因为它是一个相当大的项目，但我非常乐意根据要求添加详细信息.

I am witnessing some strange run time issues with PyCharm that are explained below. The code has been run on a machine with 20 cores and 256 GB RAM and there is sufficient memory to spare. I am not showing any of the real functions as it is a reasonably large project, but am more than happy to add details upon request.

简而言之，我有一个具有以下结构的 .py 文件项目:

In short, I have a .py file project with the following structure:

import ...
import ...

cpu_cores = control_parameters.cpu_cores
prng = RandomState(123)

def collect_results(result_list):
    return pd.DataFrame({'start_time': result_list[0::4],
                  'arrival_time': result_list[1::4],
                  'tour_id': result_list[2::4],
                  'trip_id': result_list[3::4]})

if __name__ == '__main__':

    # Run the serial code
    st = starttimes.StartTimesCreate(prng)
    temp_df, two_trips_df, time_dist_arr = st.run()

     # Prepare the dataframe to sample start times. Create groups from the input dataframe
    temp_df1 = st.prepare_two_trips_more_df(temp_df, two_trips_df)
    validation.logger.info("Dataframe prepared for multiprocessing")

    grp_list = []
    for name, group in temp_df1.groupby('tour_id'):  ### problem lies here in runtimes
        grp_list.append(group)
    validation.logger.info("All groups have been prepared for multiprocessing, "
                           "for a total of %s groups" %len(grp_list))

################ PARALLEL CODE BELOW #################

for 循环 在 1050 万行和 18 列的数据帧上运行.在当前表单中，创建群组列表(280 万个群组)大约需要 25 分钟.这些组被创建，然后被馈送到多进程池，其代码未显示.

The for loop is run on a dataframe of 10.5million rows and 18 columns. In the current form it takes about 25 mins to create the list of groups (2.8M groups). These groups are created and then fed to a multiprocess pool, code for which is not shown.

花费的 25 分钟相当长，因为我也完成了以下测试运行，只需要 7 分钟.本质上，我将 temp_df1 文件保存为 CSV，然后在预先保存的文件中进行批处理，并像以前一样运行 for 循环.

The 25 mins it is taking is quite long for I have done the following test run as well, which takes only 7 mins. Essentially, I saved the temp_df1 file to a CSV and then just batched in the pre-saved file and run the same for loop as before.

import ...
import ...

cpu_cores = control_parameters.cpu_cores
prng = RandomState(123)

def collect_results(result_list):
    return pd.DataFrame({'start_time': result_list[0::4],
                  'arrival_time': result_list[1::4],
                  'tour_id': result_list[2::4],
                  'trip_id': result_list[3::4]})

if __name__ == '__main__':

    # Run the serial code
    st = starttimes.StartTimesCreate(prng)

    temp_df1 = pd.read_csv(r"c:\\...\\temp_df1.csv")
    time_dist = pd.read_csv(r"c:\\...\\start_time_distribution_treso_1.csv")
    time_dist_arr = np.array(time_dist.to_records())

    grp_list = []
    for name, group in temp_df1.groupby('tour_id'):
        grp_list.append(group)
    validation.logger.info("All groups have been prepared for multiprocessing, "
                           "for a total of %s groups" %len(grp_list))

问题那么，是什么导致代码在我只批处理文件时比在更上游的函数中创建文件时运行速度快 3 倍?

QUESTION So, what is it that is causing the code to run 3 times faster when I just batch in the file versus when the file is created as part of a function further upstream?

提前致谢，请让我知道如何进一步澄清.

Thanks in advance and please let me know how I can further clarify.

Python:Pycharm 运行时 [英] Python: Pycharm runtimes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python:Pycharm 运行时 [英] Python: Pycharm runtimes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭