low_memory和memory_map标志在pd.read_csv中做什么 [英] What do low_memory and memory_map flags do in pd.read_csv
问题描述
pandas.read_csv
的功能签名提供以下选项:
read_csv(filepath_or_buffer, low_memory=True, memory_map=False, iterator=False, chunksize=None, ...)
我找不到任何有关low_memory
或memory_map
标志的文档.我对这些功能是否已实现以及它们如何工作感到困惑.
具体地说,
-
memory_map
:如果实施,则使用np.memmap
,如果使用,则将各个列存储为memmap或行. -
low_memory
:是否指定类似cache
的内容存储在内存中? - 我们能否将现有的
DataFrame
转换为映射的DataFrame
PS :相关模块的版本
pandas==0.14.0
scipy==0.14.0
numpy==1.8.1
我将尝试总结对这个问题的评论,并将我自己的研究结果添加到一个综合答案中.
-
low_memory
选项是种,因为它实际上不再执行任何操作(源代码似乎是如何解析传入的数据流的一个选项,而不是与您所接收的数据帧的工作方式有关的东西. - 由于我在第2点中的假设是这仅用于解析,因此这个问题有点无关紧要.
the function signature for pandas.read_csv
gives, among others, the following options:
read_csv(filepath_or_buffer, low_memory=True, memory_map=False, iterator=False, chunksize=None, ...)
I couldn't find any documentation for either low_memory
or memory_map
flags. I am confused about whether these features are implemented yet and if so how do they work.
Specifically,
memory_map
: If implemented does it usenp.memmap
and if so does it store the individual columns as memmap or the rows.low_memory
: Does it specify something likecache
to store in memory?- can we convert an existing
DataFrame
to a memmappedDataFrame
P.S. : versions of relevant modules
pandas==0.14.0
scipy==0.14.0
numpy==1.8.1
I will attempt to sum up the comments to this question and also add my own research into one comprehensive answer.
low_memory
option is kind of depricated, as in that it does not actually do anything anymore (source).memory_map
does not seem to use the numpy memory map as far as I can tell from the source code It seems to be an option for how to parse the incoming stream of data, not something that matters for how the dataframe you receive works.- Since my assumption in point 2 is that this is only for parsing, this question is kind of irrelevant.
这篇关于low_memory和memory_map标志在pd.read_csv中做什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!