mindspore.dataset.DatasetCache

class mindspore.dataset.DatasetCache(session_id, size=0, spilling=False, hostname=None, port=None, num_connections=None, prefetch_size=None)[source]

A client to interface with tensor caching service.

For details, please check Tutorial .

Parameters
  • session_id (int) – A user assigned session id for the current pipeline.

  • size (int, optional) – Size of the memory set aside for the row caching. Default: 0, which means unlimited, note that it might bring in the risk of running out of memory on the machine.

  • spilling (bool, optional) – Whether or not spilling to disk if out of memory. Default: False.

  • hostname (str, optional) – Host name. Default: None , use default hostname ‘127.0.0.1’.

  • port (int, optional) – Port to connect to server. Default: None , use default port 50052.

  • num_connections (int, optional) – Number of tcp/ip connections. Default: None , use default value 12.

  • prefetch_size (int, optional) – The size of the cache queue between operations. Default: None , use default value 20.

Examples

>>> import mindspore.dataset as ds
>>>
>>> # Create a cache instance, in which session_id is generated from command line `cache_admin -g`
>>> # In the following code, suppose the session_id is 780643335
>>> some_cache = ds.DatasetCache(session_id=780643335, size=0)
>>>
>>> dataset_dir = "/path/to/image_folder_dataset_directory"
>>> ds1 = ds.ImageFolderDataset(dataset_dir, cache=some_cache)
get_stat()[source]

Get the statistics from a cache. After data pipeline, three types of statistics can be obtained, including average number of cache hits (avg_cache_sz), number of caches in memory (num_mem_cached) and number of caches in disk (num_disk_cached).