mindspore.dataset.Dataset.save

Dataset.save(file_name, num_files=1, file_type='mindrecord')[source]

Save the dynamic data processed by the dataset pipeline in common dataset format. Supported dataset formats: 'mindrecord' only. And you can use mindspore.dataset.MindDataset API to read the saved file(s).

Implicit type casting exists when saving data as 'mindrecord' . The transform table shows how to do type casting.

Implicit Type Casting when Saving as mindrecord
Type in dataset	Type in mindrecord	Details
bool	int32	transform to int32
int8	int32
uint8	int32
int16	int32
uint16	int32
int32	int32
uint32	int64
int64	int64
uint64	int64	Maybe reverse
float16	float32
float32	float32
float64	float64
string	string	Multi-dimensional string not supported
bytes	bytes	Multi-dimensional bytes not supported

Note

To save the samples in order, set dataset’s shuffle to False and num_files to 1.
Before calling the function, do not use batch operation, repeat operation or data augmentation operations with random attribute in map operation.
When array dimension is variable, one-dimensional arrays or multi-dimensional arrays with variable dimension 0 are supported.
MindRecord does not support multi-dimensional string or multi-dimensional bytes.

Parameters

file_name (str) – Path to dataset file.
num_files (int, optional) – Number of dataset files. Default: 1 .
file_type (str, optional) – Dataset format. Default: 'mindrecord' .

Examples

>>> import mindspore.dataset as ds
>>> import numpy as np
>>>
>>> def generator_1d():
...     for i in range(10):
...         yield (np.array([i]),)
>>>
>>> # apply dataset operations
>>> d1 = ds.GeneratorDataset(generator_1d, ["data"], shuffle=False)
>>> d1.save('/path/to/save_file')