# 比较与torch.nn.MultiheadAttention的差异

[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.4.10/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/r2.4.10/docs/mindspore/source_zh_cn/note/api_mapping/pytorch_diff/MultiheadAttention.md)

## torch.nn.MultiheadAttention

```python
class torch.nn.MultiheadAttention(
    embed_dim,
    num_heads,
    dropout=0.0,
    bias=True,
    add_bias_kv=False,
    add_zero_attn=False,
    kdim=None,
    vdim=None
)(query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None)
```

更多内容详见[torch.nn.MultiheadAttention](https://pytorch.org/docs/1.8.1/generated/torch.nn.MultiheadAttention.html)。

## mindspore.nn.MultiheadAttention

```python
class mindspore.nn.MultiheadAttention(
    embed_dim,
    num_heads,
    dropout=0.0,
    has_bias=True,
    add_bias_kv=False,
    add_zero_attn=False,
    kdim=None,
    vdim=None,
    batch_first=False,
    dtype=mstype.float32
)(query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None, average_attn_weights=True)
```

更多内容详见[mindspore.nn.MultiheadAttention](https://mindspore.cn/docs/zh-CN/r2.4.10/api_python/nn/mindspore.nn.MultiheadAttention.html)。

## 差异对比

`torch.nn.MultiheadAttention` 和 `mindspore.nn.MultiheadAttention` 用法基本一致。

| 分类  | 子类  | PyTorch                  | MindSpore     | 差异                                                 |
| ---- |-----  |------------------------- |-------------  |----------------------------------------------------|
| 参数  | 参数1  | embed_dim            | embed_dim | 功能一致                                               |
|      | 参数2  | num_heads               | num_heads    | 功能一致                                        |
|      | 参数3  | dropout               | dropout          | 功能一致           |
|      | 参数4  | bias               | has_bias          | 功能一致           |
|      | 参数5  | add_bias_kv               | add_bias_kv          | 功能一致           |
|      | 参数6  | add_zero_attn        | add_zero_attn          | 功能一致 |
|      | 参数7  | kdim                 | kdim          | 功能一致 |
|      | 参数8  | vdim                 | vdim          | 功能一致 |
|      | 参数9  |                      | batch_first          | MindSpore可配置第一维是否输出batch维度, PyTorch没有此功能。 |
|      | 参数10 |                     | dtype          | MindSpore可配置网络参数的dtype, PyTorch没有此功能。 |
| 输入  | 输入1  | query            | query | 功能一致                                               |
|      | 输入2  | key           | key | 功能一致                                               |
|      | 输入3  | value      | value | 功能一致                                               |
|      | 输入4  | key_padding_mask            | key_padding_mask | MindSpore中dtype可设置为float或bool Tensor,PyTorch中dtype可设置为byte或bool Tensor |
|      | 输入5  | need_weights           | need_weights | 功能一致                                               |
|      | 输入6  | attn_mask      | attn_mask | MindSpore中dtype可设置为float或bool Tensor,PyTorch中dtype可设置为float、byte或bool Tensor |
|      | 输入7  |                | average_attn_weights | 如果为 True, 则返回值 attn_output_weights 为注意力头的平均值。如果为 False,则 attn_weights 分别返回每个注意力头的值。PyTorch没有此功能。 |

### 代码示例

```python
# PyTorch
import torch
from torch import nn

embed_dim, num_heads = 128, 8
seq_length, batch_size = 10, 8
query = torch.rand(seq_length, batch_size, embed_dim)
key = torch.rand(seq_length, batch_size, embed_dim)
value = torch.rand(seq_length, batch_size, embed_dim)
multihead_attn = nn.MultiheadAttention(embed_dim, num_heads)
attn_output, attn_output_weights = multihead_attn(query, key, value)
print(attn_output.shape)
#torch.Size([10, 8, 128])
print(attn_output_weights.shape)
#torch.Size([8, 10, 10])

# MindSpore
import mindspore as ms
import numpy as np

embed_dim, num_heads = 128, 8
seq_length, batch_size = 10, 8
query = ms.Tensor(np.random.randn(seq_length, batch_size, embed_dim), ms.float32)
key = ms.Tensor(np.random.randn(seq_length, batch_size, embed_dim), ms.float32)
value = ms.Tensor(np.random.randn(seq_length, batch_size, embed_dim), ms.float32)
multihead_attn = ms.nn.MultiheadAttention(embed_dim, num_heads)
attn_output, attn_output_weights = multihead_attn(query, key, value)
print(attn_output.shape)
#(10, 8, 128)
print(attn_output_weights.shape)
#(8, 10, 10)
```