mindspore.dataset.transforms
General
This module is to support common data augmentations. Some operations are implemented in C++ to provide high performance. Other operations are implemented in Python including using NumPy.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.transforms as transforms
Note: Legacy c_transforms and py_transforms are deprecated but can still be imported as follows:
from mindspore.dataset.transforms import c_transforms
from mindspore.dataset.transforms import py_transforms
See Common Transforms tutorial for more details.
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
PyTensorOperation, the base class of all data processing operations implemented in Python.
Note: In eager mode, non-NumPy input is implicitly converted to NumPy format and sent to MindSpore.
Transforms
Compose a list of transforms into a single transform. |
|
Tensor operation that concatenates all columns into a single tensor, only 1D tenspr is supported. |
|
Duplicate the input tensor to output, only support transform one column each time. |
|
Tensor operation to fill all elements in the tensor with the specified value. |
|
Mask content of the input tensor with the given predicate. |
|
Tensor operation to apply one hot encoding. |
|
Pad input tensor according to pad_shape, input tensor needs to have same rank. |
|
Randomly perform a series of transforms with a given probability. |
|
Randomly select one transform from a list of transforms to perform operation. |
|
Perform a series of transforms to the input image in a random order. |
|
Slice operation to extract a tensor out using the given n slices. |
|
Tensor operation to cast to a given MindSpore data type or NumPy data type. |
|
Perform the unique operation on the input tensor, only support transform one column each time. |
Utilities
Relationship operator. |
Vision
This module is to support vision augmentations. Some image augmentations are implemented with C++ OpenCV to provide high performance. Other additional image augmentations are developed with Python PIL.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.vision as vision
import mindspore.dataset.vision.utils as utils
Note: Legacy c_transforms and py_transforms are deprecated but can still be imported as follows:
import mindspore.dataset.vision.c_transforms as c_vision
import mindspore.dataset.vision.py_transforms as py_vision
See Vision Transforms tutorial for more details.
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
ImageTensorOperation, the base class of all image processing operations. It is a derived class of TensorOperation.
PyTensorOperation, the base class of all data processing operations implemented in Python.
The data transform operation can be executed in the data processing pipeline or in the eager mode:
Pipeline mode is generally used to process datasets. For examples, please refer to introduction to data processing pipeline .
Eager mode is generally used for scattered samples. Examples of image preprocessing are as follows:
import numpy as np import mindspore.dataset.vision as vision from PIL import Image,ImageFont,ImageDraw # draw circle img = Image.new("RGB", (300, 300), (255, 255, 255)) draw = ImageDraw.Draw(img) draw.ellipse(((0, 0), (100, 100)), fill=(255, 0, 0), outline=(255, 0, 0), width=5) img.save("./1.jpg") with open("./1.jpg", "rb") as f: data = f.read() data_decoded = vision.Decode()(data) data_croped = vision.RandomCrop(size=(250, 250))(data_decoded) data_resized = vision.Resize(size=(224, 224))(data_croped) data_normalized = vision.Normalize(mean=[0.485 * 255, 0.456 * 255, 0.406 * 255], std=[0.229 * 255, 0.224 * 255, 0.225 * 255])(data_resized) data_hwc2chw = vision.HWC2CHW()(data_normalized) print("data: {}, shape: {}".format(data_hwc2chw, data_hwc2chw.shape), flush=True)
Transforms
Adjust the brightness of the input image. |
|
Adjust the contrast of the input image. |
|
Apply gamma correction on input image. |
|
Adjust the hue of the input image. |
|
Adjust the saturation of the input image. |
|
Adjust the sharpness of the input image. |
|
Apply Affine transformation to the input image, keeping the center of the image unchanged. |
|
Apply AutoAugment data augmentation method based on AutoAugment: Learning Augmentation Strategies from Data . |
|
Apply automatic contrast on input image. |
|
Apply a given image processing operation on a random selection of bounding box regions of a given image. |
|
Crop the input image at the center to the given size. |
|
Change the color space of the image. |
|
Crop the input image at a specific location. |
|
Apply CutMix transformation on input batch of images and labels. |
|
Randomly cut (mask) out a given number of square patches from the input image array. |
|
Decode the input image in RGB mode. |
|
Apply histogram equalization on input image. |
|
Erase the input image with given value. |
|
Crop the given image into one central crop and four corners. |
|
Blur input image with the specified Gaussian kernel. |
|
Convert the input PIL Image to grayscale. |
|
Flip the input image horizontally. |
|
Convert the input numpy.ndarray images from HSV to RGB. |
|
Transpose the input image from shape <H, W, C> to <C, H, W>. |
|
Apply invert on input image in RGB mode. |
|
Linearly transform the input numpy.ndarray image with a square transformation matrix and a mean vector. |
|
Randomly mix up a batch of numpy.ndarray images together with its labels. |
|
Apply MixUp transformation on input batch of images and labels. |
|
Normalize the input image with respect to mean and standard deviation. |
|
Normalize the input image with respect to mean and standard deviation then pad an extra channel with value zero. |
|
Pad the image according to padding parameters. |
|
Pad the image to a fixed size. |
|
Apply perspective transformation on input image. |
|
Reduce the bit depth of the color channels of image to create a high contrast and vivid color effect, similar to that seen in posters or printed materials. |
|
Apply RandAugment data augmentation method on the input image. |
|
Randomly adjust the sharpness of the input image with a given probability. |
|
Apply Random affine transformation to the input image. |
|
Automatically adjust the contrast of the image with a given probability. |
|
Adjust the color of the input image by a fixed or random degree. |
|
Randomly adjust the brightness, contrast, saturation, and hue of the input image. |
|
Crop the input image at a random location. |
|
A combination of Crop , Decode and Resize . |
|
Crop the input image at a random location and adjust bounding boxes accordingly. |
|
Apply histogram equalization on the input image with a given probability. |
|
Randomly erase pixels within a random selected rectangle erea on the input numpy.ndarray image. |
|
Randomly convert the input PIL Image to grayscale. |
|
Randomly flip the input image horizontally with a given probability. |
|
Flip the input image horizontally randomly with a given probability and adjust bounding boxes accordingly. |
|
Randomly invert the colors of image with a given probability. |
|
Add AlexNet-style PCA-based noise to an image. |
|
Randomly apply perspective transformation to the input PIL Image with a given probability. |
|
Reduce the bit depth of the color channels of image with a given probability to create a high contrast and vivid color image. |
|
This operation will crop the input image randomly, and resize the cropped image using a selected interpolation mode |
|
Crop the input image to a random size and aspect ratio and adjust bounding boxes accordingly. |
|
Resize the input image using |
|
Tensor operation to resize the input image using a randomly selected interpolation mode |
|
Rotate the input image randomly within a specified range of degrees. |
|
Choose a random sub-policy from a policy list to be applied on the input image. |
|
Adjust the sharpness of the input image by a fixed or random degree. |
|
Randomly selects a subrange within the specified threshold range and sets the pixel value within the subrange to (255 - pixel). |
|
Randomly flip the input image vertically with a given probability. |
|
Flip the input image vertically, randomly with a given probability and adjust bounding boxes accordingly. |
|
Rescale the input image with the given rescale and shift. |
|
Resize the input image to the given size with a given interpolation mode |
|
Crop the input image at a specific region and resize it to desired size. |
|
Resize the input image to the given size and adjust bounding boxes accordingly. |
|
Convert the input numpy.ndarray images from RGB to HSV. |
|
Rotate the input image by specified degrees. |
|
Slice Tensor to multiple patches in horizontal and vertical directions. |
|
Solarize the image by inverting all pixel values within the threshold. |
|
Crop the given image into one central crop and four corners with the flipped version of these. |
|
Convert the PIL input image to numpy.ndarray image. |
|
Convert the input decoded numpy.ndarray image to PIL Image. |
|
Convert the input PIL Image or numpy.ndarray to numpy.ndarray of the desired dtype, rescale the pixel value range from [0, 255] to [0.0, 1.0] and change the shape from <H, W, C> to <C, H, W>. |
|
Cast the input to a given MindSpore data type or NumPy data type. |
|
Apply TrivialAugmentWide data augmentation method on the input image. |
|
Uniformly select a number of transformations from a sequence and apply them sequentially and randomly, which means that there is a chance that a chosen transformation will not be applied. |
|
Flip the input image vertically. |
Utilities
AutoAugment policy for different datasets. |
|
Padding Mode, Border Type. |
|
The color conversion mode. |
|
Data Format of images after batch operation. |
|
The read mode used for the image file. |
|
Interpolation Modes. |
|
Mode to Slice Tensor into multiple parts. |
|
Encode the input image as JPEG data. |
|
Encode the input image as PNG data. |
|
Get the number of input image channels. |
|
Get the size of input image as [height, width]. |
|
Read a file in binary mode. |
|
Read a image file and decode it into one channel grayscale data or RGB color data. |
|
Write the one dimension uint8 data into a file using binary mode. |
|
Write the image data into a JPEG file. |
|
Write the image into a PNG file. |
Text
This module is to support text processing for NLP. It includes two parts: text transforms and utils. text transforms is a high performance NLP text processing module which is developed with ICU4C and cppjieba. utils provides some general methods for NLP text processing.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.text as text
See Text Transforms tutorial for more details.
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
TextTensorOperation, the base class of all text processing operations. It is a derived class of TensorOperation.
The data transform operation can be executed in the data processing pipeline or in the eager mode:
Pipeline mode is generally used to process datasets. For examples, please refer to introduction to data processing pipeline .
Eager mode is generally used for scattered samples. Examples of text preprocessing are as follows:
import mindspore.dataset.text as text from mindspore.dataset.text import NormalizeForm # construct vocab vocab_list = {"music": 1, "Opera": 2, "form": 3, "theatre": 4, "which": 5, "in": 6, "fundamental": 7, "dramatic": 8, "component": 9, "taken": 10, "roles": 11, "singers": 12, "is": 13, "are": 14, "of": 15, "UNK": 16} vocab = text.Vocab.from_dict(vocab_list) tokenizer_op = text.BertTokenizer(vocab=vocab, suffix_indicator='##', max_bytes_per_token=100, unknown_token='[UNK]', lower_case=False, keep_whitespace=False, normalization_form=NormalizeForm.NONE, preserve_unused_token=True, with_offsets=False) # tokenizer tokens = tokenizer_op("Opera is a form of theatre in which music is a fundamental " "component and dramatic roles are taken by singers.") print("token: {}".format(tokens), flush=True) # token to ids ids = vocab.tokens_to_ids(tokens) print("token to id: {}".format(ids), flush=True) # ids to token tokens_from_ids = vocab.ids_to_tokens([15, 3, 7]) print("token to id: {}".format(tokens_from_ids), flush=True)
Note: In eager mode, non-NumPy input is implicitly converted to NumPy format and sent to MindSpore.
Transforms
API Name |
Description |
Note |
Add token to beginning or end of sequence. |
None |
|
Tokenize the input UTF-8 encoded string by specific rules. |
BasicTokenizer is not supported on Windows platform yet. |
|
Tokenizer used for Bert text process. |
BertTokenizer is not supported on Windows platform yet. |
|
Apply case fold operation on UTF-8 string tensor, which is aggressive that can convert more characters into lower case than |
CaseFold is not supported on Windows platform yet. |
|
Filter Wikipedia XML dumps to "clean" text consisting only of lowercase letters (a-z, converted from A-Z), and spaces (never consecutive). |
FilterWikipediaXML is not supported on Windows platform yet. |
|
Tokenize Chinese string into words based on dictionary. |
The integrity of the HMMSEgment algorithm and MPSegment algorithm files must be confirmed. |
|
Look up a word into an id according to the input vocabulary table. |
None |
|
Generate n-gram from a 1-D string Tensor. |
None |
|
Apply normalize operation on UTF-8 string tensor. |
NormalizeUTF8 is not supported on Windows platform yet. |
|
Class that applies user-defined string tokenizer into input string. |
None |
|
Replace a part of UTF-8 string tensor with given text according to regular expressions. |
RegexReplace is not supported on Windows platform yet. |
|
Tokenize a scalar tensor of UTF-8 string by regex expression pattern. |
RegexTokenizer is not supported on Windows platform yet. |
|
Tokenize scalar token or 1-D tokens to tokens by sentencepiece. |
None |
|
Construct a tensor from given data (only support 1-D for now), where each element in the dimension axis is a slice of data starting at the corresponding position, with a specified width. |
None |
|
Tensor operation to convert every element of a string tensor to a number. |
None |
|
Look up a token into vectors according to the input vector table. |
None |
|
Truncate the input sequence so that it does not exceed the maximum length. |
None |
|
Truncate a pair of rank-1 tensors such that the total length is less than max_length. |
None |
|
Tokenize a scalar tensor of UTF-8 string to Unicode characters. |
None |
|
Tokenize a scalar tensor of UTF-8 string based on Unicode script boundaries. |
UnicodeScriptTokenizer is not supported on Windows platform yet. |
|
Tokenize a scalar tensor of UTF-8 string on ICU4C defined whitespaces, such as: ' ', '\t', '\r', '\n'. |
WhitespaceTokenizer is not supported on Windows platform yet. |
|
Tokenize the input text to subword tokens. |
None |
Utilities
API Name |
Description |
Note |
CharNGram object that is used to map tokens into pre-trained vectors. |
None |
|
FastText object that is used to map tokens into vectors. |
None |
|
GloVe object that is used to map tokens into vectors. |
None |
|
An enumeration for |
None |
|
Enumeration class for Unicode normalization forms . |
None |
|
An enumeration for SentencePieceModel. |
None |
|
SentencePiece object that is used to do words segmentation. |
None |
|
An enumeration for loading type of |
None |
|
An enumeration for |
None |
|
Vectors object that is used to map tokens into vectors. |
None |
|
Vocab object that is used to save pairs of words and ids. |
None |
|
Convert NumPy array of str to array of bytes by encoding each element based on charset encoding . |
None |
|
Convert NumPy array of bytes to array of str by decoding each element based on charset encoding . |
None |
Audio
This module is to support audio augmentations. It includes two parts: audio transforms and utils. audio transforms is a high performance processing module with common audio operations. utils provides some general methods for audio processing.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.audio as audio
from mindspore.dataset.audio import utils
Alternative and equivalent imported audio module is as follows:
import mindspore.dataset.audio.transforms as audio
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
AudioTensorOperation, the base class of all audio processing operations. It is a derived class of TensorOperation.
The data transform operation can be executed in the data processing pipeline or in the eager mode:
Pipeline mode is generally used to process datasets. For examples, please refer to introduction to data processing pipeline .
Eager mode is generally used for scattered samples. Examples of audio preprocessing are as follows:
import numpy as np import mindspore.dataset.audio as audio from mindspore.dataset.audio import ResampleMethod # audio sample waveform = np.random.random([1, 30]) # transform resample_op = audio.Resample(orig_freq=48000, new_freq=16000, resample_method=ResampleMethod.SINC_INTERPOLATION, lowpass_filter_width=6, rolloff=0.99, beta=None) waveform_resampled = resample_op(waveform) print("waveform reampled: {}".format(waveform_resampled), flush=True)
Transforms
Design two-pole all-pass filter with central frequency and bandwidth for audio waveform. |
|
Turn the input audio waveform from the amplitude/power scale to decibel scale. |
|
Calculate the angle of complex number sequence. |
|
Design two-pole band-pass filter for audio waveform. |
|
Design two-pole Butterworth band-pass filter for audio waveform. |
|
Design two-pole Butterworth band-reject filter for audio waveform. |
|
Design a bass tone-control effect, also known as two-pole low-shelf filter for audio waveform. |
|
Perform a biquad filter of input audio. |
|
Compute the norm of complex number sequence. |
|
Compute delta coefficients, also known as differential coefficients, of a spectrogram. |
|
Apply contrast effect for audio waveform. |
|
Turn a waveform from the decibel scale to the power/amplitude scale. |
|
Apply a DC shift to the audio. |
|
Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation shelving filter) to the audio waveform. |
|
Detect pitch frequency. |
|
Dither increases the perceived dynamic range of audio stored at a particular bit-depth by eliminating nonlinear truncation distortion. |
|
Design biquad equalizer filter and perform filtering. |
|
Add a fade in and/or fade out to an waveform. |
|
Apply an IIR filter forward and backward to a waveform. |
|
Apply a flanger effect to the audio. |
|
Apply masking to a spectrogram in the frequency domain. |
|
Apply amplification or attenuation to the whole waveform. |
|
Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation. |
|
Design biquad highpass filter and perform filtering. |
|
Solve for a normal STFT from a mel frequency STFT, using a conversion matrix. |
|
Create an inverse spectrogram to recover an audio signal from a spectrogram. |
|
Create LFCC for a raw audio signal. |
|
Perform an IIR filter by evaluating different equation. |
|
Design two-pole low-pass filter for audio waveform. |
|
Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase. |
|
Apply a mask along axis . |
|
Apply a mask along axis . |
|
Convert normal STFT to STFT at the Mel scale. |
|
Create MelSpectrogram for a raw audio signal. |
|
Create MFCC for a raw audio signal. |
|
Decode mu-law encoded signal, refer to mu-law algorithm . |
|
Encode signal based on mu-law companding. |
|
Apply an overdrive effect to the audio waveform. |
|
Apply a phasing effect to the audio. |
|
Given a STFT spectrogram, speed up in time without modifying pitch by a factor of rate. |
|
Shift the pitch of a waveform by n_steps steps. |
|
Resample a signal from one frequency to another. |
|
Apply RIAA vinyl playback equalization. |
|
Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. |
|
Compute the spectral centroid for each channel along the time axis. |
|
Create a spectrogram from an audio signal. |
|
Apply masking to a spectrogram in the time domain. |
|
Stretch Short Time Fourier Transform (STFT) in time without modifying pitch for a given rate. |
|
Design a treble tone-control effect. |
|
Voice activity detector. |
|
Adjust volume of waveform. |
Utilities
Padding mode. |
|
Density function type. |
|
Fade Shapes. |
|
Gain Types. |
|
Interpolation Type. |
|
Mel scale implementation type. |
|
Modulation Type. |
|
Normalization mode. |
|
Normalization type. |
|
Resample method. |
|
Scale Types. |
|
Window function type. |
|
Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm. |
|
Creates a linear triangular filterbank. |
|
Create a frequency transformation matrix. |