Function differences with torchvision.transforms.ToTensor

torchvision.transforms.ToTensor

class torchvision.transforms.ToTensor

For more information, see torchvision.transforms.ToTensor.

mindspore.dataset.vision.ToTensor

class mindspore.dataset.vision.ToTensor(
    output_type=np.float32
    )

For more information, see mindspore.dataset.vision.ToTensor.

Differences

PyTorch: Convert the PIL Image or numpy array to tensor. The input numpy array is usually in the format of <H, W, C> and the value is in the range of [0, 255], and the output is <C, H, W > Torch Tensor with format and value in [0.0, 1.0].

MindSpore: The input is an image of PIL type or a numpy array with a value in the range of [0, 255] in the format of <H, W, C>, and the output is in the range of [0.0, 1.0] with <C, H, W> Format numpy array; it is equivalent to two operations of channel conversion and pixel value normalization on the original input image.

Code Example

import numpy as np
from PIL import Image
from torchvision import transforms
import mindspore.dataset.vision as vision

# In MindSpore, ToTensor convert PIL Image into numpy array.
img_path =  "/path/to/test/1.jpg"

img = Image.open(img_path)
to_tensor = vision.ToTensor()
img_data = to_tensor(img)
print("img_data type:", type(img_data))
print("img_data dtype:", img_data.dtype)

# Out:
#img_data type: <class 'numpy.ndarray'>
#img_data dtype: float32

# In torch, ToTensor transforms the input to tensor.
img_path = "/path/to/test/1.jpg"

image_transform = transforms.Compose([transforms.ToTensor()])
img = np.array(Image.open(img_path))
img_data = image_transform(img)
print(img_data.shape)
# Out:
# torch.Size([3, 2268, 4032])