pgl.utils.mp_reader module: MultiProcessing reader helper function for Paddle.

Optimized Multiprocessing Reader for PaddlePaddle

pgl.utils.mp_reader.deserialize_data(data)[source]
pgl.utils.mp_reader.log = <Logger pgl.utils.mp_reader (DEBUG)>[source]
pgl.utils.mp_reader.multiprocess_reader(readers, use_pipe=True, queue_size=1000, pipe_size=10)[source]

multiprocess_reader use python multi process to read data from readers and then use multiprocess.Queue or multiprocess.Pipe to merge all data. The process number is equal to the number of input readers, each process call one reader. Multiprocess.Queue require the rw access right to /dev/shm, some platform does not support. you need to create multiple readers first, these readers should be independent to each other so that each process can work independently. An example: .. code-block:: python

reader0 = reader([“file01”, “file02”]) reader1 = reader([“file11”, “file12”]) reader1 = reader([“file21”, “file22”]) reader = multiprocess_reader([reader0, reader1, reader2],

queue_size=100, use_pipe=False)

pgl.utils.mp_reader.numpy_deserialize_data(data)[source]

deserialize_data

pgl.utils.mp_reader.numpy_serialize_data(data)[source]

serialize_data

pgl.utils.mp_reader.serialize_data(data)[source]