mindspore.dataset.DatasetCache
- class mindspore.dataset.DatasetCache(session_id, size=0, spilling=False, hostname=None, port=None, num_connections=None, prefetch_size=None)[source]
A client to interface with tensor caching service.
For details, please check Tutorial .
- Parameters
session_id (int) – A user assigned session id for the current pipeline.
size (int, optional) – Size of the memory set aside for the row caching. Default:
0
, which means unlimited, note that it might bring in the risk of running out of memory on the machine.spilling (bool, optional) – Whether or not spilling to disk if out of memory. Default:
False
.hostname (str, optional) – Host name. Default:
None
, use default hostname '127.0.0.1'.port (int, optional) – Port to connect to server. Default:
None
, use default port 50052.num_connections (int, optional) – Number of tcp/ip connections. Default:
None
, use default value 12.prefetch_size (int, optional) – The size of the cache queue between operations. Default:
None
, use default value 20.
Examples
>>> import subprocess >>> import mindspore.dataset as ds >>> >>> # Create a cache instance with command line `dataset-cache --start` >>> # Create a session with `dataset-cache -g` >>> # After creating cache with a valid session, get session id with command `dataset-cache --list_sessions` >>> command = "dataset-cache --list_sessions | tail -1 | awk -F ' ' '{{print $1;}}'" >>> session_id = subprocess.getoutput(command).split('\n')[-1] >>> some_cache = ds.DatasetCache(session_id=int(session_id), size=0) >>> >>> dataset_dir = "/path/to/image_folder_dataset_directory" >>> dataset = ds.ImageFolderDataset(dataset_dir, cache=some_cache)
- get_stat()[source]
Get the statistics from a cache. After data pipeline, three types of statistics can be obtained, including average number of cache hits (avg_cache_sz), number of caches in memory (num_mem_cached) and number of caches in disk (num_disk_cached).
Examples
>>> import os >>> import subprocess >>> import mindspore.dataset as ds >>> >>> # In example above, we created cache with a valid session id >>> command = "dataset-cache --list_sessions | tail -1 | awk -F ' ' '{{print $1;}}'" >>> id = subprocess.getoutput(command).split('\n')[-1] >>> some_cache = ds.DatasetCache(session_id=int(id), size=0) >>> >>> # run the dataset pipeline to trigger cache >>> dataset = ds.ImageFolderDataset("/path/to/image_folder_dataset_directory", cache=some_cache) >>> data = list(dataset) >>> >>> # get status of cache >>> stat = some_cache.get_stat() >>> # Average cache size >>> cache_sz = stat.avg_cache_sz >>> # Number of rows cached in memory >>> num_mem_cached = stat.num_mem_cached >>> # Number of rows spilled to disk >>> num_disk_cached = stat.num_disk_cached