Blog Content

h5py和python多核编程mpi4py的使用

Python 2018-08-31 23:08:48

HDF5 for Python

http://docs.h5py.org/en/latest/index.html

使用h5py库读写超过内存的大数据。在简单数据的读操作中，我们通常一次性把数据全部读入到内存中。读写超过内存的大数据时，有别于简单数据的读写操作，受限于内存大小，通常需要指定位置、指定区域读写操作，避免无关数据的读写。 h5py库刚好可以实现这一功能。

h5py的优势：速度快、压缩效率高，h5py文件是存放两类对象的容器，数据集(dataset)和组(group)，dataset类似数组类的数据集合，和numpy的数组差不多。group是像文件夹一样的容器，它好比python中的字典，有键(key)和值(value)。group中可以存放dataset或者其他的group。”键”就是组成员的名称，”值”就是组成员对象本身(组或者数据集)

apt-get install python-mpi4py

brew install hdf5

h5py的安装：

python setup.py configure --mpi --hdf5=/path_parallel_hdf5_lib/

python setup.py build

python setup.py install --prefix=/wheretoinstall_h5py

export PYTHONPATH=$PYTHONPATH:/wheretoinstall_h5py

You might also need to pre-install cython

pip install h5py # 这个自动安装的有问题，建议按上述手动

pip install mpi4py

测试 mpi4py 是否安装正确

现在可以写一段简单的程序来测试 mpi4py 是否安装好及能否正常使用：

# mpi_helloworld.py

from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
node_name = MPI.Get_processor_name() # get the name of the node

print('Hello world from process %d at %s.' % (rank, node_name))

运行 mpi4py 程序

用下面的命令运行一个 Python 写的 MPI 程序：

$ mpiexec -n 3 python mpi_helloworld.py
Hello world from process 2 at node1.
Hello world from process 0 at node1.
Hello world from process 1 at node1.

也可以用一种较老的方式：

$ mpirun -np 3 python mpi_helloworld.py
Hello world from process 2 at node1.
Hello world from process 0 at node1.
Hello world from process 1 at node1.

其中 -n 或者 -np 指定使用多少个 MPI 进程来执行这个程序。

上面的命令会在单个节点（单台机器）上发起3个 MPI 进程来并行执行 mpi_helloworld.py，如果要在多个节点（多台机器）上并行执行程序，可以用下面的命令：

$ mpiexec -n 3 -host node1,node2,node3 python mpi_helloworld.py
Hello world from process 1 at node2.
Hello world from process 2 at node3.
Hello world from process 0 at node1.

其中 -host （或者 -H）后面指定所要使用的节点，这些节点以逗号分隔。如果节点很多也可以用选项 -hostfile 或者 -machinefile 指定一个文件，在这个文件中写入你需要使用的计算节点。更多的运行选项可以通过下面的命令获得：

$ mpiexec --help

上一篇：mvn clean package 、mvn clean install、mvn clean deploy的区别与联系
下一篇：多路麦克风话筒扩展，蓝牙音频接收器

One - One Code All

Blog Content

h5py和python多核编程mpi4py的使用

HDF5 for Python

The minute you think of giving up, think of the reason why you held on so long.