多线程从磁盘读取? [英] multithread read from disk?

查看:194
本文介绍了多线程从磁盘读取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我需要从保存在磁盘上的同一个文件中读取许多不同的,独立的数据块。



是否可以多线程上传这个文件? p>

相关:同一处理器上的所有线程是否使用相同的IO设备从磁盘读取?在这种情况下,多线程不会加快上传速度 - 线程只是在排队等待。



(我目前是多线程与OpenMP。 )

解决方案

是的,这是可能的。但是:


同一处理器上的所有线程是否使用相同的IO设备从磁盘读取?


是的。磁盘上的读磁头。例如,尝试并行复制两个文件,而不是串行。它将花费更长的并行时间,因为操作系统使用调度算法来确保IO速率在两个线程/进程之间是公平或相等的。因此,读磁头将在磁盘的不同部分之间来回跳动,从而减慢A LOT的进程。实际读取数据的时间与寻找它的时间相比非常小,当您一次读取磁盘的两个不同部分时,您大部分时间都在寻找。



请注意,所有这些假设你使用的是硬盘。 如果你使用的是SSD,它并不会慢,但也不会更快。编辑:根据意见,并行实际上是更快的SSD。使用RAID时,情况变得更复杂,(显然)取决于您使用的是何种RAID。



这是它的外观圆盘变成矩形,因为ascii圆是很难的,并且简化数据布局以使其更容易阅读):



假设文件由盘片上的一些空间分隔像这样:

  | | 

系列阅读将类似于( * 表示阅读)

 空格-----> 
| * | t
| * i
| * | m
| * | e
| * | |
| / | |
| / | |
| / | V
| / |
| * |
| * |
| * |
| * |

并行读取将类似于

  | \ | 
| * |
| / |
| / |
| / |
| / |
| * |
| \ |
| \ |
| \ |
| \ |
| * |
| / |
| / |
| / |
| / |
| * |
| \ |
| \ |
| \ |
| \ |
| * |

etc


Suppose I need to read many distinct, independent chunks of data from the same file saved on disk.

Is it possible to multi-thread this upload?

Related: Do all threads on the same processor use the same IO device to read from disk? In this case, multi-threading would not speed up the upload at all - the threads would just be waiting in line.

(I am currently multi-threading with OpenMP.)

解决方案

Yes, it is possible. However:

Do all threads on the same processor use the same IO device to read from disk?

Yes. The read head on the disk. As an example, try copying two files in parallel as opposed to in series. It will take significantly longer in parallel, because the OS uses scheduling algorithms to make sure the IO rate is "fair," or equal between the two threads/processes. Because of this, the read head will jump back and forth between different parts of the disk, slowing the process down A LOT. The time to actually read the data is pretty small compared to the time to seek to it, and when you're reading two different parts of the disk at once, you spend most of the time seeking.

Note that all of this assumes you're using a hard disk. If you're using an SSD, it will not be slower in parallel, but it will not be faster either. Edit: according to comments parallel is actually faster for an SSD. With RAID the situation becomes more complicated, and (obviously) depends on what kind of RAID you're using.

This is what it looks like (I've unwrapped the circular disk into a rectangle because ascii circles are hard, and simplified the data layout to make it easier to read):

Assume the files are separated by some space on the platter like so:

|         |

A series read will look like (* indicates reading)

space ----->
|        *|  t
|        *|  i
|        *|  m
|        *|  e
|        *|  |
|       / |  |
|     /   |  |
|   /     |  V
|  /      |
|*        |
|*        |
|*        |
|*        |

While a parallel read will look like

|       \ |
|        *|
|       / |
|     /   |
|   /     |
|  /      |
|*        |
|  \      |
|    \    |
|     \   |
|       \ |
|        *|
|       / |
|     /   |
|   /     |
|  /      |
|*        |
|  \      |
|    \    |
|     \   |
|       \ |
|        *|

etc

这篇关于多线程从磁盘读取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆