mindpandas.config
Mindpandas config file
- mindpandas.config.get_adaptive_concurrency()[source]
Get the flag for using adaptive concurrency or not.
- Returns
bool, value of adaptive_concurrency flag.
Examples
>>> # Get the adaptive concurrency flag. >>> import mindpandas as pd >>> adaptive = pd.get_adaptive_concurrency()
- mindpandas.config.get_concurrency_mode()[source]
Get the current concurrency mode. It would be one of {‘multithread’, ‘multiprocess’}.
- Returns
str, current concurrency mode.
Examples
>>> # Get the current concurrency mode. >>> import mindpandas as pd >>> mode = pd.get_concurrency_mode()
- mindpandas.config.get_min_block_size()[source]
Get the current min block size of each partition.
- Returns
int, current min_block_size of each partition in config.
Examples
>>> # Get the current min block size. >>> import mindpandas as pd >>> mode = pd.get_min_block_size()
- mindpandas.config.get_partition_shape()[source]
Get the current partition shape.
- Returns
- Number of expected partitions along each axis. It is a tuple of two positive integers.
The first element is the row-wise number of partitions and the second element is the column-wise number of partitions.
- Return type
shape(tuple)
Examples
>>> # Get the current partition shape. >>> import mindpandas as pd >>> mode = pd.get_partition_shape()
- mindpandas.config.set_adaptive_concurrency(adaptive)[source]
Users can set adaptive concurrency to allow read_csv to automatically select the concurrency mode based on the file size. Available options are “True” or “False”. When set to True, file sizes read from read_csv greater than 18 MB and DataFrame initialized from pandas DataFrame using more than 1 GB CPU memory will use the multiprocess mode, otherwise they will use the multithread mode. When set to False, it will use the current concurrency mode.
- Parameters
adaptive (bool) – True to turn on adaptive concurrency, False to turn off adaptive concurrency.
- Raises
ValueError – if adaptive is not True or False.
Examples
>>> # Set adaptive concurrency to True. >>> import mindpandas as pd >>> pd.set_adaptive_concurrency(True)
- mindpandas.config.set_concurrency_mode(mode, **kwargs)[source]
Set the backend concurrency mode to parallelize the computation. Default mode is multithread. Available options are {‘multithread’, ‘multiprocess’}. For the instruction and usage of two modes, please referring to MindPandas execution mode introduction and configuration instructions for more information.
- Parameters
mode (str) – This parameter can be set to ‘multithread’ for multithread backend, or ‘multiprocess’ for distributed multiprocess backend.
**kwargs –
When running on multithread mode, no additional kwargs needed. When running on multiprocess mode, additional parameters include:
address: The ip address of the master node, required.
- Raises
ValueError – If mode is not ‘multithread’ or ‘multiprocess’.
Examples
>>> # Change the mode to multiprocess. >>> import mindpandas as pd >>> pd.set_concurrency_mode('multiprocess', address='127.0.0.1')
- mindpandas.config.set_min_block_size(min_block_size)[source]
Users can set the min block size of each partition using this API. It means the minimum size of each axis of each partition. In other words, each partition’s size would be larger or equal to (min_block_size, min_block_size), unless the original data is smaller than this size. For example, if the min_block_size is set to be 32, and I have a dataframe which only has 16 columns and the partition shape is (2, 2), then during the partitioning we won’t further split the columns.
- Parameters
min_block_size (int) – Minimum size of a partition’s number of rows and number of columns during partitioning.
- Raises
ValueError – if min_block_size is not int type.
Examples
>>> # Set the min block size of each partition to 8. >>> import mindpandas as pd >>> pd.set_min_block_size(8)
- mindpandas.config.set_partition_shape(shape)[source]
Users can set the partition shape of the data, where shape[0] is the expected number of partitions along axis 0 ( row-wise) and shape[1] is the expected number of partitions along axis 1 (column-wise). e.g. If the shape is (16, 16), then mindpandas will try to slice original data into 16 * 16 partitions.
- Parameters
shape (tuple) – Number of expected partitions along each axis. It should be a tuple of two positive integers. The first element is the row-wise number of partitions and the second element is the column-wise number of partitions.
- Raises
ValueError – If shape is not tuple type or the value of shape is not int.
Examples
>>> # Set the shape of each partition to (16, 16). >>> import mindpandas as pd >>> pd.set_partition_shape((16, 16))