mindspore.nn.Conv3dTranspose
- class mindspore.nn.Conv3dTranspose(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, output_padding=0, has_bias=False, weight_init='normal', bias_init='zeros', data_format='NCDHW')[source]
Compute a 3D transposed convolution, which is also known as a deconvolution (although it is not an actual deconvolution). The transposed convolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes. This module can be seen as the gradient of Conv3d with respect to its input.
x is typically of shape \((N, C, D, H, W)\), where \(N\) is batch size, \(C\) is channel number, \(D\) is the characteristic depth, \(H\) is the height of the characteristic layer, and \(W\) is the width of the characteristic layer. The calculation process of transposed convolution is equivalent to the reverse calculation of convolution.
The pad_mode argument effectively adds \(dilation * (kernel\_size - 1) - padding\) amount of zero padding to both sizes of the input. So that when a Conv3d and a ConvTranspose3d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when stride > 1, Conv3d maps multiple input shapes to the same output shape. ConvTranspose3d provide padding argument to increase the calculated output shape on one or more side.
The height and width of output are defined as:
if the ‘pad_mode’ is set to be “pad”,
\[ \begin{align}\begin{aligned}D_{out} = (D_{in} - 1) \times \text{stride_d} - 2 \times \text{padding_d} + \text{dilation_d} \times (\text{kernel_size_d} - 1) + \text{output_padding_d} + 1\\H_{out} = (H_{in} - 1) \times \text{stride_h} - 2 \times \text{padding_h} + \text{dilation_h} \times (\text{kernel_size_h} - 1) + \text{output_padding_h} + 1\\W_{out} = (W_{in} - 1) \times \text{stride_w} - 2 \times \text{padding_w} + \text{dilation_w} \times (\text{kernel_size_w} - 1) + \text{output_padding_w} + 1\end{aligned}\end{align} \]if the ‘pad_mode’ is set to be “same”,
\[\begin{split}D_{out} = (D_{in} + \text{stride_d} - 1)/\text{stride_d} \\ H_{out} = (H_{in} + \text{stride_h} - 1)/\text{stride_h} \\ W_{out} = (W_{in} + \text{stride_w} - 1)/\text{stride_w}\end{split}\]if the ‘pad_mode’ is set to be “valid”,
\[\begin{split}D_{out} = (D_{in} - 1) \times \text{stride_d} + \text{dilation_d} \times (\text{kernel_size_d} - 1) + 1 \\ H_{out} = (H_{in} - 1) \times \text{stride_h} + \text{dilation_h} \times (\text{kernel_size_h} - 1) + 1 \\ W_{out} = (W_{in} - 1) \times \text{stride_w} + \text{dilation_w} \times (\text{kernel_size_w} - 1) + 1\end{split}\]- Parameters
in_channels (int) – The number of input channel \(C_{in}\).
out_channels (int) – The number of output channel \(C_{out}\).
kernel_size (Union[int, tuple[int]]) – The kernel size of the 3D convolution.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the depth, height and width of movement are both strides, or a tuple of three int numbers that represent depth, height and width of movement respectively. Its value must be equal to or greater than 1. Default: 1.
pad_mode (str) –
Select the mode of the pad. The optional values are “pad”, “same”, “valid”. Default: “same”.
same: Adopts the way of completion. The depth, height and width of the output will be the same as the input x. The total number of padding will be calculated in depth, horizontal and vertical directions and evenly distributed to head and tail, top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the tail, bottom and the right side. If this mode is set, padding and output_padding must be 0.
valid: Adopts the way of discarding. The possible largest depth, height and width of output will be returned without padding. Extra pixels will be discarded. If this mode is set, padding and output_padding must be 0.
pad: Implicit paddings on both sides of the input x in depth, height, width. The number of pad will be padded to the input Tensor borders. padding must be greater than or equal to 0.
padding (Union(int, tuple[int])) – The pad value to be filled. Default: 0. If padding is an integer, the paddings of head, tail, top, bottom, left and right are the same, equal to padding. If padding is a tuple of six integers, the padding of head, tail, top, bottom, left and right equal to padding[0], padding[1], padding[2], padding[3], padding[4] and padding[5] correspondingly.
dilation (Union(int, tuple[int])) – The data type is int or a tuple of 3 integers : math:(dilation_d, dilation_h, dilation_w). Currently, dilation on depth only supports the case of 1. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater or equal to 1 and bounded by the height and width of the input x. Default: 1.
group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. Default: 1. Only 1 is currently supported.
output_padding (Union(int, tuple[int])) – Add extra size to each dimension of the output. Default: 0. Must be greater than or equal to 0.
has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.
data_format (str) – The optional value for data format. Currently only support ‘NCDHW’.
- Inputs:
x (Tensor) - Tensor of shape \((N, C_{in}, D_{in}, H_{in}, W_{in})\). Currently input data type only support float16 and float32.
- Outputs:
Tensor, the shape is \((N, C_{out}, D_{out}, H_{out}, W_{out})\).
- Supported Platforms:
Ascend
GPU
- Raises
TypeError – If in_channels, out_channels or group is not an int.
TypeError – If kernel_size, stride, padding , dilation or output_padding is neither an int not a tuple of three.
TypeError – If input data type is not float16 or float32.
ValueError – If in_channels, out_channels, kernel_size, stride or dilation is less than 1.
ValueError – If padding is less than 0.
ValueError – If pad_mode is not one of ‘same’, ‘valid’, ‘pad’.
ValueError – If padding is a tuple whose length is not equal to 6.
ValueError – If pad_mode is not equal to ‘pad’ and padding is not equal to (0, 0, 0, 0, 0, 0).
ValueError – If data_format is not ‘NCDHW’.
Examples
>>> x = Tensor(np.ones([32, 16, 10, 32, 32]), mindspore.float32) >>> conv3d_transpose = nn.Conv3dTranspose(in_channels=16, out_channels=3, kernel_size=(4, 6, 2), ... pad_mode='pad') >>> output = conv3d_transpose(x) >>> print(output.shape) (32, 3, 13, 37, 33)