mindspore.dataset.transforms
General
This module is to support common data augmentations. Some operations are implemented in C++ to provide high performance. Other operations are implemented in Python including using NumPy.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.transforms as transforms
Note: Legacy c_transforms and py_transforms are deprecated but can still be imported as follows:
from mindspore.dataset.transforms import c_transforms
from mindspore.dataset.transforms import py_transforms
See Common Transforms tutorial for more details.
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
PyTensorOperation, the base class of all data processing operations implemented in Python.
Note: In eager mode, non-NumPy input is implicitly converted to NumPy format and sent to MindSpore.
Transforms
Compose a list of transforms into a single transform. |
|
Concatenate data with input array along given axis, only 1D data is supported. |
|
Duplicate the input tensor to output, only support transform one column each time. |
|
Tensor operation to fill all elements in the tensor with the specified value. |
|
Mask content of the input tensor with the given predicate. |
|
Apply One-Hot encoding to the input labels. |
|
Pad input tensor according to pad_shape, input tensor needs to have same rank. |
|
Randomly perform a series of transforms with a given probability. |
|
Randomly select one transform from a list to apply. |
|
Perform a series of transforms to the input image in a random order. |
|
Extract a slice from the input. |
|
Tensor operation to cast to a given MindSpore data type or NumPy data type. |
|
Perform the unique operation on the input tensor, only support transform one column each time. |
Utilities
Relational operator. |
Vision
This module is to support vision augmentations. Some image augmentations are implemented with C++ OpenCV to provide high performance. Other additional image augmentations are developed with Python PIL.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.vision as vision
import mindspore.dataset.vision.utils as utils
Note: Legacy c_transforms and py_transforms are deprecated but can still be imported as follows:
import mindspore.dataset.vision.c_transforms as c_vision
import mindspore.dataset.vision.py_transforms as py_vision
See Vision Transforms tutorial for more details.
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
ImageTensorOperation, the base class of all image processing operations. It is a derived class of TensorOperation.
PyTensorOperation, the base class of all data processing operations implemented in Python.
The data transform operation can be executed in the data processing pipeline or in the eager mode:
Pipeline mode is generally used to process big datasets. Examples refer to introduction to data processing pipeline .
Eager mode is more like a function call to process data. Examples refer to Lightweight Data Processing .
Example Gallery
Example gallery of using vision transform APIs, jump to Load & Process Data With Dataset Pipeline. This guide presents various transforms and input/output results.
Transforms
Adjust the brightness of the input image. |
|
Adjust the contrast of the input image. |
|
Apply gamma correction on input image. |
|
Adjust the hue of the input image. |
|
Adjust the saturation of the input image. |
|
Adjust the sharpness of the input image. |
|
Apply Affine transformation to the input image, keeping the center of the image unchanged. |
|
Apply AutoAugment data augmentation method based on AutoAugment: Learning Augmentation Strategies from Data . |
|
Apply automatic contrast on input image. |
|
Apply a given image processing operation on a random selection of bounding box regions of a given image. |
|
Crop the input image at the center to the given size. |
|
Change the color space of the image. |
|
Crop the input image at a specific location. |
|
Apply CutMix transformation on input batch of images and labels. |
|
Randomly cut (mask) out a given number of square patches from the input image array. |
|
Decode the input image in RGB mode. |
|
Decode the input raw video bytes. |
|
Apply histogram equalization on input image. |
|
Erase the input image with given value. |
|
Crop the given image into one central crop and four corners. |
|
Blur input image with the specified Gaussian kernel. |
|
Convert the input PIL Image to grayscale. |
|
Flip the input image horizontally. |
|
Convert the input numpy.ndarray images from HSV to RGB. |
|
Transpose the input image from shape <H, W, C> to <C, H, W>. |
|
Invert the colors of the input RGB image. |
|
Linearly transform the input numpy.ndarray image with a square transformation matrix and a mean vector. |
|
Randomly mix up a batch of numpy.ndarray images together with its labels. |
|
Apply MixUp transformation on input batch of images and labels. |
|
Normalize the input image with respect to mean and standard deviation. |
|
Normalize the input image with respect to mean and standard deviation then pad an extra channel with value zero. |
|
Pad the image according to padding parameters. |
|
Pad the image to a fixed size. |
|
Apply perspective transformation on input image. |
|
Reduce the bit depth of the color channels of image to create a high contrast and vivid color effect, similar to that seen in posters or printed materials. |
|
Apply RandAugment data augmentation method on the input image. |
|
Randomly adjust the sharpness of the input image with a given probability. |
|
Apply Random affine transformation to the input image. |
|
Automatically adjust the contrast of the image with a given probability. |
|
Adjust the color of the input image by a fixed or random degree. |
|
Randomly adjust the brightness, contrast, saturation, and hue of the input image. |
|
Crop the input image at a random location. |
|
A combination of Crop , Decode and Resize . |
|
Crop the input image at a random location and adjust bounding boxes accordingly. |
|
Apply histogram equalization on the input image with a given probability. |
|
Randomly erase pixels within a random selected rectangle erea on the input numpy.ndarray image. |
|
Randomly convert the input PIL Image to grayscale. |
|
Randomly flip the input image horizontally with a given probability. |
|
Randomly flip the input image and its bounding box horizontally with a given probability. |
|
Randomly invert the colors of image with a given probability. |
|
Add AlexNet-style PCA-based noise to an image. |
|
Randomly apply perspective transformation to the input PIL Image with a given probability. |
|
Reduce the bit depth of the color channels of image with a given probability to create a high contrast and vivid color image. |
|
This operation will crop the input image randomly, and resize the cropped image using a selected interpolation mode |
|
Crop the input image to a random size and aspect ratio and adjust bounding boxes accordingly. |
|
Resize the input image using |
|
Tensor operation to resize the input image using a randomly selected interpolation mode |
|
Rotate the input image randomly within a specified range of degrees. |
|
Choose a random sub-policy from a policy list to be applied on the input image. |
|
Adjust the sharpness of the input image by a fixed or random degree. |
|
Randomly selects a subrange within the specified threshold range and sets the pixel value within the subrange to (255 - pixel). |
|
Randomly flip the input image vertically with a given probability. |
|
Flip the input image vertically, randomly with a given probability and adjust bounding boxes accordingly. |
|
Rescale the input image with the given rescale and shift. |
|
Resize the input image to the given size with a given interpolation mode |
|
Crop the input image at a specific region and resize it to desired size. |
|
Resize the input image to the given size and adjust bounding boxes accordingly. |
|
Convert the input numpy.ndarray images from RGB to HSV. |
|
Rotate the input image by specified degrees. |
|
Slice Tensor to multiple patches in horizontal and vertical directions. |
|
Solarize the image by inverting all pixel values within the threshold. |
|
Crop the given image into one central crop and four corners with the flipped version of these. |
|
Convert the PIL input image to numpy.ndarray image. |
|
Convert the input decoded numpy.ndarray image to PIL Image. |
|
Convert the input PIL Image or numpy.ndarray to numpy.ndarray of the desired dtype, rescale the pixel value range from [0, 255] to [0.0, 1.0] and change the shape from <H, W, C> to <C, H, W>. |
|
Cast the input to a given MindSpore data type or NumPy data type. |
|
Apply TrivialAugmentWide data augmentation method on the input image. |
|
Uniformly select a number of transformations from a sequence and apply them sequentially and randomly, which means that there is a chance that a chosen transformation will not be applied. |
|
Flip the input image vertically. |
Utilities
AutoAugment policy for different datasets. |
|
Padding Mode, Border Type. |
|
The color conversion mode. |
|
Data Format of images after batch operation. |
|
The read mode used for the image file. |
|
Interpolation methods. |
|
Mode to Slice Tensor into multiple parts. |
|
Encode the input image as JPEG data. |
|
Encode the input image as PNG data. |
|
Get the number of input image channels. |
|
Get the size of input image as [height, width]. |
|
Read a file in binary mode. |
|
Read a image file and decode it into one channel grayscale data or RGB color data. |
|
Read the video, audio, metadata from a video file. |
|
Read the timestamps and frames per second of a video file. |
|
Write the one dimension uint8 data into a file using binary mode. |
|
Write the image data into a JPEG file. |
|
Write the image into a PNG file. |
Text
This module is to support text processing for NLP. It includes two parts: text transforms and utils. text transforms is a high performance NLP text processing module which is developed with ICU4C and cppjieba. utils provides some general methods for NLP text processing.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.text as text
See Text Transforms tutorial for more details.
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
TextTensorOperation, the base class of all text processing operations. It is a derived class of TensorOperation.
The data transform operation can be executed in the data processing pipeline or in the eager mode:
Pipeline mode is generally used to process big datasets. Examples refer to introduction to data processing pipeline .
Eager mode is more like a function call to process data. Examples refer to Lightweight Data Processing .
Example Gallery
Example gallery of using vision transform APIs, jump to Illustration of text transforms. This guide presents various transforms and input/output results.
Transforms
API Name |
Description |
Note |
Add token to beginning or end of sequence. |
None |
|
Tokenize the input UTF-8 encoded string by specific rules. |
BasicTokenizer is not supported on Windows platform yet. |
|
Tokenizer used for Bert text process. |
BertTokenizer is not supported on Windows platform yet. |
|
Apply case fold operation on UTF-8 string tensor, which is aggressive that can convert more characters into lower case than |
CaseFold is not supported on Windows platform yet. |
|
Filter Wikipedia XML dumps to "clean" text consisting only of lowercase letters (a-z, converted from A-Z), and spaces (never consecutive). |
FilterWikipediaXML is not supported on Windows platform yet. |
|
Use Jieba tokenizer to tokenize Chinese strings. |
||
Look up a word into an id according to the input vocabulary table. |
None |
|
Generate n-gram from a 1-D string Tensor. |
None |
|
Normalize the input UTF-8 encoded strings. |
NormalizeUTF8 is not supported on Windows platform yet. |
|
Class that applies user-defined string tokenizer into input string. |
None |
|
Replace part of the input UTF-8 string with a difference text string using regular expressions. |
RegexReplace is not supported on Windows platform yet. |
|
Tokenize a scalar tensor of UTF-8 string by regex expression pattern. |
RegexTokenizer is not supported on Windows platform yet. |
|
Tokenize scalar token or 1-D tokens to tokens by sentencepiece. |
None |
|
Construct a tensor from given data (only support 1-D for now), where each element in the dimension axis is a slice of data starting at the corresponding position, with a specified width. |
None |
|
Tensor operation to convert every element of a string tensor to a number. |
None |
|
Look up a token into vectors according to the input vector table. |
None |
|
Truncate the input sequence so that it does not exceed the maximum length. |
None |
|
Truncate a pair of 1-D string input so that their total length is less than the specified length. |
None |
|
Unpack the Unicode characters in the input strings. |
None |
|
Tokenize a scalar tensor of UTF-8 string based on Unicode script boundaries. |
UnicodeScriptTokenizer is not supported on Windows platform yet. |
|
Tokenize a scalar tensor of UTF-8 string on ICU4C defined whitespaces, such as: ' ', '\t', '\r', '\n'. |
WhitespaceTokenizer is not supported on Windows platform yet. |
|
Tokenize the input text to subword tokens. |
None |
Utilities
API Name |
Description |
Note |
CharNGram pre-trained word embeddings. |
None |
|
FastText pre-trained word embeddings. |
None |
|
Global Vectors (GloVe) pre-trained word embeddings. |
None |
|
An enumeration for |
None |
|
None |
||
Subword algorithms for SentencePiece. |
None |
|
SentencePiece object that is used to do words segmentation. |
None |
|
Model input type for the SentencePiece tokenizer. |
None |
|
An enumeration for |
None |
|
Pre-trained word embeddings. |
None |
|
Create Vocab for training NLP models. |
None |
|
Convert NumPy array of str to array of bytes by encoding each element based on charset encoding . |
None |
|
Convert NumPy array of bytes to array of str by decoding each element based on charset encoding . |
None |
Audio
This module is to support audio augmentations. It includes two parts: audio transforms and utils. audio transforms is a high performance processing module with common audio operations. utils provides some general methods for audio processing.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.audio as audio
from mindspore.dataset.audio import utils
Alternative and equivalent imported audio module is as follows:
import mindspore.dataset.audio.transforms as audio
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
AudioTensorOperation, the base class of all audio processing operations. It is a derived class of TensorOperation.
The data transform operation can be executed in the data processing pipeline or in the eager mode:
Pipeline mode is generally used to process big datasets. Examples refer to introduction to data processing pipeline .
Eager mode is more like a function call to process data. Examples refer to Lightweight Data Processing .
Example Gallery
Example gallery of using vision transform APIs, jump to Illustration of audio transforms. This guide presents various transforms and input/output results.
Transforms
Design two-pole all-pass filter with central frequency and bandwidth for audio waveform. |
|
Turn the input audio waveform from the amplitude/power scale to decibel scale. |
|
Calculate the angle of complex number sequence. |
|
Design two-pole band-pass filter for audio waveform. |
|
Design two-pole Butterworth band-pass filter for audio waveform. |
|
Design two-pole Butterworth band-reject filter for audio waveform. |
|
Design a bass tone-control effect, also known as two-pole low-shelf filter for audio waveform. |
|
Perform a biquad filter of input audio. |
|
Compute the norm of complex number sequence. |
|
Compute delta coefficients, also known as differential coefficients, of a spectrogram. |
|
Apply contrast effect for audio waveform. |
|
Turn a waveform from the decibel scale to the power/amplitude scale. |
|
Apply a DC shift to the audio. |
|
Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation shelving filter) to the audio waveform. |
|
Detect pitch frequency. |
|
Dither increases the perceived dynamic range of audio stored at a particular bit-depth by eliminating nonlinear truncation distortion. |
|
Design biquad equalizer filter and perform filtering. |
|
Add a fade in and/or fade out to an waveform. |
|
Apply an IIR filter forward and backward to a waveform. |
|
Apply a flanger effect to the audio. |
|
Apply masking to a spectrogram in the frequency domain. |
|
Apply amplification or attenuation to the whole waveform. |
|
Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation. |
|
Design biquad highpass filter and perform filtering. |
|
Solve for a normal STFT from a mel frequency STFT, using a conversion matrix. |
|
Create an inverse spectrogram to recover an audio signal from a spectrogram. |
|
Create LFCC for a raw audio signal. |
|
Perform an IIR filter by evaluating different equation. |
|
Design two-pole low-pass filter for audio waveform. |
|
Separate a complex-valued spectrogram with shape \((..., 2)\) into its magnitude and phase. |
|
Apply a mask along axis . |
|
Apply a mask along axis . |
|
Convert normal STFT to STFT at the Mel scale. |
|
Create MelSpectrogram for a raw audio signal. |
|
Create MFCC for a raw audio signal. |
|
Decode mu-law encoded signal, refer to mu-law algorithm . |
|
Encode signal based on mu-law companding. |
|
Apply an overdrive effect to the audio waveform. |
|
Apply a phasing effect to the audio. |
|
Given a STFT spectrogram, speed up in time without modifying pitch by a factor of rate. |
|
Shift the pitch of a waveform by n_steps steps. |
|
Resample a signal from one frequency to another. |
|
Apply RIAA vinyl playback equalization. |
|
Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. |
|
Compute the spectral centroid for each channel along the time axis. |
|
Create a spectrogram from an audio signal. |
|
Apply masking to a spectrogram in the time domain. |
|
Stretch Short Time Fourier Transform (STFT) in time without modifying pitch for a given rate. |
|
Design a treble tone-control effect. |
|
Voice activity detector. |
|
Adjust volume of waveform. |
Utilities
Padding mode. |
|
Density function type. |
|
Fade Shapes. |
|
Gain Types. |
|
Interpolation Type. |
|
Mel scale implementation type. |
|
Modulation Type. |
|
Normalization mode. |
|
Normalization type. |
|
Resample method. |
|
Scale Types. |
|
Window function type. |
|
Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm. |
|
Creates a linear triangular filterbank. |
|
Create a frequency transformation matrix. |