Document feedback

Question document fragment

When a question document fragment contains a formula, it is displayed as a space.

Submission type

issue

It's a little complicated...

I'd like to ask someone.

Please select the submission type

Problem type

Specifications and Common Mistakes

- Specifications and Common Mistakes:

- Misspellings or punctuation mistakes,incorrect formulas, abnormal display.

- Incorrect links, empty cells, or wrong formats.

- Chinese characters in English context.

- Minor inconsistencies between the UI and descriptions.

- Low writing fluency that does not affect understanding.

- Incorrect version numbers, including software package names and version numbers on the UI.

Usability

- Usability:

- Incorrect or missing key steps.

- Missing main function descriptions, keyword explanation, necessary prerequisites, or precautions.

- Ambiguous descriptions, unclear reference, or contradictory context.

- Unclear logic, such as missing classifications, items, and steps.

Correctness

- Correctness:

- Technical principles, function descriptions, supported platforms, parameter types, or exceptions inconsistent with that of software implementation.

- Incorrect schematic or architecture diagrams.

- Incorrect commands or command parameters.

- Incorrect code.

- Commands inconsistent with the functions.

- Wrong screenshots.

- Sample code running error, or running results inconsistent with the expectation.

Risk Warnings

- Risk Warnings:

- Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

- Content Compliance:

- Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions.

- Copyright infringement.

Please select the type of question

Problem description

Describe the bug so that we can quickly locate the problem.

Document feedback

mindspore.mint.nn.functional.conv3d

mindspore.mint.nn.functional.conv3d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)[source]

Applies a 3D convolution over an input tensor. The input tensor is typically of shape $(N, C_{i n}, D_{i n}, H_{i n}, W_{i n})$ or $(C_{i n}, D_{i n}, H_{i n}, W_{i n})$ , where $N$ is batch size, $C$ is channel number, $D, H, W$ are the depth, height and width of the feature graph, respectively.

The output is calculated based on formula:

out (N_{i}, C_{{out}_{j}}) = bias (C_{{out}_{j}}) + \sum_{k = 0}^{C_{i n} - 1} ccor (weight (C_{{out}_{j}}, k), X (N_{i}, k))

where $b i a s$ is the output channel bias, $c c o r$ is the cross-correlation , $w e i g h t$ is the convolution kernel value and $X$ represents the input feature map.

Here are the indices' meanings:

$i$ corresponds to the batch number, the range is $[0, N - 1]$ , where $N$ is the batch size of the input.
$j$ corresponds to the output channel, the range is $[0, C_{o u t} - 1]$ , where $C_{o u t}$ is the number of output channels, which is also equal to the number of kernels.
$k$ corresponds to the input channel, the range is $[0, C_{i n} - 1]$ , where $C_{i n}$ is the number of input channels, which is also equal to the number of channels in the convolutional kernels.

Therefore, in the above formula, $b i a s (C_{{out}_{j}})$ represents the bias of the $j$ -th output channel, $w e i g h t (C_{{out}_{j}}, k)$ represents the slice of the $j$ -th convolutional kernel in the $k$ -th channel, and $X (N_{i}, k)$ represents the slice of the $k$ -th input channel in the $i$ -th batch of the input feature map.

The shape of the convolutional kernel is given by $(k d, k h, k w)$ where $k d$ , $k d$ and $k w$ are the depth, height and width of the kernel, respectively. If we consider the input and output channels as well as the group parameter, the complete kernel shape will be $(C_{o u t}, C_{i n} / group, k d, k h, k w)$ , where group is the number of groups dividing x's input channel when applying group convolution.

For more details about convolution layer, please refer to Gradient Based Learning Applied to Document Recognition.

The following lists some of the limitations of the parameters.

input – The input to the conv3d. The input must have each dimension size within the range [1, int32_max].
weight – Filters of shape $(C_{o u t}, C_{i n} / g r o u p s, k d, k h, k w)$ . The value of $k h$ and $k w$ is in the range [1, 511]. The remaining values are in the range [1, int32_max]. And $k h * k w * k 0$ is less 65536 (k0 is 16. If data type is float32, k0 is 8).
bias – Bias Tensor with shape $(C_{o u t})$ . The shape must equal the first dimension of the weight.
stride – The distance of kernel moving. It can be an int number or tuple (noted by $(s t r i d e_{d}, s t r i d e_{h}, s t r i d e_{w})$ ). stride_h and stride_w are in the range [1, 63]. stride_d is in the range [1, 255].
padding – If padding is an int number, it is in the range [0, 255].
dilation – The value is in the range [1, 255].
groups – The value is in the range [1, 65535].
$C_{i n} % groups == 0 and C_{o u t} % groups == 0$ .
$w e i g h t [1] == C_{i n} / g r o u p s$ .
$H_{i n} + P a d U p + P a d D o w n >= (k h - 1) * D i l a t i o n H + 1$ .
$W_{i n} + P a d L e f t + P a d R i g h t >= (k w - 1) * D i l a t i o n W + 1$ .
$D_{i n} + P a d F r o n t + P a d B a c k >= (k d - 1) * D i l a t i o n D + 1$ .
$H_{o u t} = (H_{i n} + P a d U p + P a d D o w n - ((k h - 1) * D i l a t i o n H + 1)) / S t r i d e H + 1$ .
$W_{o u t} = (W_{i n} + P a d L e f t + P a d R i g h t - ((k w - 1) * D i l a t i o n W + 1)) / S t r i d e W + 1$ .
$D_{o u t} = (D_{i n} + P a d F r o n t + P a d B a c k - ((k d - 1) * D i l a t i o n D + 1)) / S t r i d e D + 1$ .
$(D_{i n} + P a d F r o n t + P a d B a c k - ((k d - 1) * D i l a t i o n D + 1)) /$ .
$(H_{i n} + P a d U p + P a d D o w n - ((k h - 1) * D i l a t i o n h + 1)) /$ .
$s t r i d e_{d} <= k e r n e l_{d}$ .
$P a d U p < k h$ and $P a d D o w n < k h$ . When padding = 'valid', both PadUp and PadDown are zeros. When padding = 'same', pad can be calculated by $f l o o r (((H_{o u t} - 1) * s t r i d e H + (k h - 1) * D i l a t i o n H + 1 - H_{i n}) / 2)$ for high dimension. It is similar way to calculate the padding for depth and width dimension. And the depth and width dimensions also have the same constraints.
$((k h - 1) * D i l a t i o n H - P a d U p)$ should be in [0, 255]. It is the same constraint for depth and width dimension.
If padding is 'same', stride must be 1.

Warning

This API does not support Atlas series products. This is an experimental API that is subject to change or deletion.

Parameters

input (Tensor) – Tensor of shape $(N, C_{i n}, D_{i n}, H_{i n}, W_{i n})$ .
weight (Tensor) – Set size of kernel is $(k d, k h, k w)$ , then the shape is $(C_{o u t}, C_{i n} / g r o u p s, k d, k h, k w)$ .
bias (Tensor, optional) – Bias Tensor with shape $(C_{o u t})$ . When bias is None , zeros will be used. Default: None .
stride (Union(int, tuple[int]), optional) – The distance of kernel moving, an int number that represents the depth, the height and width of movement are both strides, or a tuple of triple int numbers that represent the depth, height and width of movement respectively. Default: 1 .
padding (Union(int, tuple[int], str), optional) –
Implicit paddings on both sides of the input x. Can be a string, one integer or a tuple/list with 3 integers. If padding is a string, the optional values are "same" , "valid".
- same: Adopts the way of completion. The height and width of the output will be equal to the input x divided by stride. The padding will be evenly calculated in top and bottom, left and right possiblily. Otherwise, the last extra padding will be calculated from the bottom and the right side. If this mode is set, stride must be 1.
- valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.
If padding is one integer, the paddings of top, bottom, left and right are the same, equal to padding. If padding is a tuple/list with 3 integers, the padding of head, tail, top, bottom, left and right equal to pad[0], pad[0], pad[1], pad[1], pad[2] and pad[2] correspondingly. Default: 0 .
dilation (Union[int, tuple[int]], optional) – Controlling the space between the kernel points. Default: 1 .
groups (int, optional) – Splits input into groups. Default: 1 .

Returns

Tensor, the same dtype as the input, with the shape $(N, C_{o u t}, D_{o u t}, H_{o u t}, W_{o u t})$ or $(C_{o u t}, D_{o u t}, H_{o u t}, W_{o u t})$ .

Raises

TypeError – If stride, padding or dilation is neither an int nor a tuple.
TypeError – groups is not an int.
TypeError – If bias is not a Tensor.

Supported Platforms:: Ascend

Examples

>>> import mindspore
>>> import numpy as np
>>> from mindspore import mint
>>> x = mindspore.Tensor(np.random.randn(12, 1, 60, 50, 8), mindspore.float16)
>>> w = mindspore.Tensor(np.random.randn(26, 1, 2, 4, 4), mindspore.float16)
>>> out = mint.nn.functional.conv3d(x, w)
>>> print(out.shape)
(12, 26, 59, 47, 5)