mindspore.mint.nn.functional.conv3d
- mindspore.mint.nn.functional.conv3d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)[source]
Applies a 3D convolution over an input tensor. The input tensor is typically of shape \((N, C_{in}, D_{in}, H_{in}, W_{in})\) or \((C_{in}, D_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C\) is channel number, \(D, H, W\) are the depth, height and width of the feature graph, respectively.
The output is calculated based on formula:
\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})\]where \(bias\) is the output channel bias, \(ccor\) is the cross-correlation , \(weight\) is the convolution kernel value and \(X\) represents the input feature map.
Here are the indices' meanings:
\(i\) corresponds to the batch number, the range is \([0, N-1]\), where \(N\) is the batch size of the input.
\(j\) corresponds to the output channel, the range is \([0, C_{out}-1]\), where \(C_{out}\) is the number of output channels, which is also equal to the number of kernels.
\(k\) corresponds to the input channel, the range is \([0, C_{in}-1]\), where \(C_{in}\) is the number of input channels, which is also equal to the number of channels in the convolutional kernels.
Therefore, in the above formula, \({bias}(C_{\text{out}_j})\) represents the bias of the \(j\)-th output channel, \({weight}(C_{\text{out}_j}, k)\) represents the slice of the \(j\)-th convolutional kernel in the \(k\)-th channel, and \({X}(N_i, k)\) represents the slice of the \(k\)-th input channel in the \(i\)-th batch of the input feature map.
The shape of the convolutional kernel is given by \((kd, kh, kw)\) where \(kd\) , \(kd\) and\(kw\) are the depth, height and width of the kernel, respectively. If we consider the input and output channels as well as the group parameter, the complete kernel shape will be \((C_{out}, C_{in} / \text{group}, kd, kh, kw)\), where group is the number of groups dividing x's input channel when applying group convolution.
For more details about convolution layer, please refer to Gradient Based Learning Applied to Document Recognition.
The following lists some of the limitations of the parameters.
input – The input to the conv3d. The input must have each dimension size within the range [1, int32_max].
weight – Filters of shape \((C_{out}, C_{in} / groups, kd, kh, kw)\). The value of \(kh\) and \(kw\) is in the range [1, 511]. The remaining values are in the range [1, int32_max]. And \(kh*kw*k0\) is less 65536 (k0 is 16. If data type is float32, k0 is 8).
bias – Bias Tensor with shape \((C_{out})\). The shape must equal the first dimension of the weight.
stride – The distance of kernel moving. It can be an int number or tuple (noted by \((stride_d, stride_h, stride_w)\)). stride_h and stride_w are in the range [1, 63]. stride_d is in the range [1, 255].
padding – If padding is an int number, it is in the range [0, 255].
dilation – The value is in the range [1, 255].
groups – The value is in the range [1, 65535].
\(C_{in} \% \text{groups} == 0 \quad \text{and} \quad C_{out} \% \text{groups} == 0\) .
\(weight[1] == C_{in} / groups\) .
\(H_{in} + PadUp + PadDown >= (kh - 1) * DilationH + 1\) .
\(W_{in} + PadLeft + PadRight >= (kw - 1) * DilationW + 1\) .
\(D_{in} + PadFront + PadBack >= (kd - 1) * DilationD + 1\) .
\(H_{out} = (H_{in} + PadUp + PadDown - ((kh - 1) * DilationH + 1)) / StrideH + 1\) .
\(W_{out} = (W_{in} + PadLeft + PadRight - ((kw - 1) * DilationW + 1)) / StrideW + 1\) .
\(D_{out} = (D_{in} + PadFront + PadBack - ((kd - 1) * DilationD + 1)) / StrideD + 1\) .
\((D_{in}+PadFront+PadBack - ((kd-1)*DilationD+1)) /% StrideD <= PadBack\) .
\((H_{in}+PadUp+PadDown - ((kh-1)*Dilationh+1)) /% StrideH <= PadDown\) .
\(stride_d <= kernel_d\) .
\(PadUp < kh\) and \(PadDown < kh\) . When padding =
'valid'
, both PadUp and PadDown are zeros. When padding ='same'
, pad can be calculated by \(floor(((H_{out}-1) * strideH + (kh - 1) * DilationH + 1 - H_{in}) / 2)\) for high dimension. It is similar way to calculate the padding for depth and width dimension. And the depth and width dimensions also have the same constraints.\(((kh - 1) * DilationH - PadUp)\) should be in [0, 255]. It is the same constraint for depth and width dimension.
If padding is
'same'
, stride must be 1.
Warning
This API does not support Atlas series products. This is an experimental API that is subject to change or deletion.
- Parameters
input (Tensor) – Tensor of shape \((N, C_{in}, D_{in}, H_{in}, W_{in})\).
weight (Tensor) – Set size of kernel is \((kd, kh, kw)\), then the shape is \((C_{out}, C_{in} / groups, kd, kh, kw)\).
bias (Tensor, optional) – Bias Tensor with shape \((C_{out})\). When bias is
None
, zeros will be used. Default:None
.stride (Union(int, tuple[int]), optional) – The distance of kernel moving, an int number that represents the depth, the height and width of movement are both strides, or a tuple of triple int numbers that represent the depth, height and width of movement respectively. Default:
1
.padding (Union(int, tuple[int], str), optional) –
Implicit paddings on both sides of the input x. Can be a string, one integer or a tuple/list with 3 integers. If padding is a string, the optional values are
"same"
,"valid"
.same: Adopts the way of completion. The height and width of the output will be equal to the input x divided by stride. The padding will be evenly calculated in top and bottom, left and right possiblily. Otherwise, the last extra padding will be calculated from the bottom and the right side. If this mode is set, stride must be 1.
valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.
If padding is one integer, the paddings of top, bottom, left and right are the same, equal to padding. If padding is a tuple/list with 3 integers, the padding of head, tail, top, bottom, left and right equal to pad[0], pad[0], pad[1], pad[1], pad[2] and pad[2] correspondingly. Default:
0
.dilation (Union[int, tuple[int]], optional) – Controlling the space between the kernel points. Default:
1
.groups (int, optional) – Splits input into groups. Default:
1
.
- Returns
Tensor, the same dtype as the input, with the shape \((N, C_{out}, D_{out}, H_{out}, W_{out})\) or \((C_{out}, D_{out}, H_{out}, W_{out})\).
- Raises
- Supported Platforms:
Ascend
Examples
>>> import mindspore >>> import numpy as np >>> from mindspore import mint >>> x = mindspore.Tensor(np.random.randn(12, 1, 60, 50, 8), mindspore.float16) >>> w = mindspore.Tensor(np.random.randn(26, 1, 2, 4, 4), mindspore.float16) >>> out = mint.nn.functional.conv3d(x, w) >>> print(out.shape) (12, 26, 59, 47, 5)