mindpandas.channel
- class mindpandas.channel.DataSender(address, namespace='default', num_shards=1, dataset_name='dataset', full_batch=False, max_queue_size=10)
The sender (input side) of the channel. It can be used for sending new object through the channel.
- Parameters
address (str) - The ip address of the node current sender runs on.
namespace (str, optional) - The namespace that the channel belongs to. By default the value is
'default'
and the sender will be running in namespacedefault
. DataSender and DataReceiver in different namespaces cannot connect to each other.num_shards (int, optional) - Specifies how many shards the data will be divided into. By default the value is
1
.dataset_name (str, optional) - The name of the dataset. By default the value is
'dataset'
.full_batch (bool, optional) - If
True
, each shard will get complete data sent by the sender. Otherwise each shard only gets part of the data. By default the value isFalse
.max_queue_size (int, optional) - The maximum number of data that can be cached in the queue. By default the value is
10
.
- Raises
ValueError - When num_shards is an invalid value.
Note
Distributed executor has to be started in advance.
- send(obj)
Send object through the channel.
- Parameters
obj (Union[numpy.ndarray, list, mindpandas.DataFrame]) - The object to send.
- Raises
TypeError - If the type of the obj is invalid.
ValueError - If the length of the obj is not a positive integer or cannot be evenly divided by the number of shards.
- property num_shards
Returns the num_shards of current channel.
- property full_batch
Returns the value of full_batch.
- get_queue(shard_id=None)
Returns the object references that haven’t been consumed in the shard specified by shard_id.
- Parameters
shard_id (int, optional) - The id of the requested shard. By default the value is
None
and it will return all shards.
- Returns
List,stores references of the data in the shard.
- class mindpandas.channel.DataReceiver(address, namespace='default', shard_id=0, dataset_name='dataset')
The receiver (output side) of the channel. It can be used for receiving new object from the channel.
- Parameters
address (str) - The ip address of the node current receiver runs on.
namespace (str, optional) - he namespace that the channel belongs to. By default the value is
'default'
and the receiver will be running in namespacedefault
. DataSender and DataReceiver in different namespaces cannot connect to each other.shard_id (int, optional) - Specifies the shard of data that is received by current receiver. By default the value is
0
and the receiver will get data from the shard with id 0.dataset_name (str, optional) - The name of the dataset. By default the value is
'dataset'
.
Note
Distributed executor has to be started and a DataSender has to be initialized in advance. To pair with the correct DataSender, the namespace and dataset_name have to be identical to the DataSender.
- recv()
Get data from the channel.
- Returns
object,the least recent object in the shard that haven’t been consumed.
- Raises
ValueError - When the shard_id of current receiver is invalid.
- property shard_id
Returns the shard_id of current receiver.
- property num_shards
Returns the num_shards of current channel.