mindspore.dataset.vision.RandomCropWithBBox

class mindspore.dataset.vision.RandomCropWithBBox(size, padding=None, pad_if_needed=False, fill_value=0, padding_mode=Border.CONSTANT)[source]

Crop the input image at a random location and adjust bounding boxes accordingly.

Parameters

size (Union[int, Sequence[int]]) – The output size of the cropped image. The size value(s) must be positive. If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, an image of size (height, width) will be cropped.
padding (Union[int, Sequence[int]], optional) – The number of pixels to pad the image The padding value(s) must be non-negative. Default: None. If padding is not None, first pad image with padding values. If a single number is provided, pad all borders with this value. If a tuple or lists of 2 values are provided, pad the (left and right) with the first value and (top and bottom) with the second value. If 4 values are provided as a list or tuple, pad the left, top, right and bottom respectively.
pad_if_needed (bool, optional) – Pad the image if either side is smaller than the given output size. Default: False.
fill_value (Union[int, tuple[int]], optional) – The pixel intensity of the borders, only valid for padding_mode Border.CONSTANT. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. The fill_value values must be in range [0, 255]. Default: 0.
padding_mode (Border, optional) –
The method of padding. Default: Border.CONSTANT. It can be any of Border.CONSTANT, Border.EDGE, Border.REFLECT, Border.SYMMETRIC.
- Border.CONSTANT , means it fills the border with constant values.
- Border.EDGE , means it pads with the last value on the edge.
- Border.REFLECT , means it reflects the values on the edge omitting the last value of edge.
- Border.SYMMETRIC , means it reflects the values on the edge repeating the last
  
  value of edge.

Raises

TypeError – If size is not of type int or Sequence[int].
TypeError – If padding is not of type int or Sequence[int].
TypeError – If pad_if_needed is not of type boolean.
TypeError – If fill_value is not of type int or tuple[int].
TypeError – If padding_mode is not of type mindspore.dataset.vision.Border .
ValueError – If size is not positive.
ValueError – If padding is negative.
ValueError – If fill_value is not in range [0, 255].
RuntimeError – If given tensor shape is not <H, W> or <H, W, C>.

Supported Platforms:: CPU

Examples

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> import mindspore.dataset.vision as vision
>>>
>>> # Use the transform in dataset pipeline mode
>>> data = np.random.randint(0, 255, size=(100, 100, 3)).astype(np.float32)
>>> numpy_slices_dataset = ds.NumpySlicesDataset(data, ["image"])
>>> func = lambda img: (data, np.array([[0, 0, data.shape[1], data.shape[0]]]).astype(np.float32))
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=[func],
...                                                 input_columns=["image"],
...                                                 output_columns=["image", "bbox"])
>>> random_crop_with_bbox_op = vision.RandomCropWithBBox([64, 64], [20, 20, 20, 20])
>>> transforms_list = [random_crop_with_bbox_op]
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=transforms_list, input_columns=["image", "bbox"])
>>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
...     print(item["image"].shape, item["image"].dtype)
...     print(item["bbox"].shape, item["bbox"].dtype)
...     break
(64, 64, 3) float32
(1, 4) float32
>>>
>>> # Use the transform in eager mode
>>> data = np.random.randint(0, 255, size=(100, 100, 3)).astype(np.float32)
>>> func = lambda img: (data, np.array([[0, 0, data.shape[1], data.shape[0]]]).astype(data.dtype))
>>> func_data, func_bboxes = func(data)
>>> output = vision.RandomCropWithBBox([64, 64], [20, 20, 20, 20])(func_data, func_bboxes)
>>> print(output[0].shape, output[0].dtype)
(64, 64, 3) float32
>>> print(output[1].shape, output[1].dtype)
(1, 4) float32

Tutorial Examples:

Illustration of vision transforms