Using BNN to Implement an Image Classification Application
Deep learning models have a strong fitting capability, while Bayesian theory has a good explainable capability. MindSpore Deep Probability Programming combines deep learning and Bayesian learning. By setting the network weight to distribution, introducing hidden space distribution, etc., the distribution can be sampled forward propagation, which introduces uncertainty and enhances The robustness and interpretability of the model are improved.
This chapter will introduce in detail the application of Bayesian neural network in deep probabilistic programming on MindSpore. Before starting to practice, make sure that you have correctly installed MindSpore 0.7.0-beta and above.
This example is for the GPU or Ascend 910 AI processor platform. You can download the complete sample code from https://gitee.com/mindspore/mindspore/tree/r1.5/tests/st/probability/bnn_layers.
BNN only supports GRAPH mode now, please setcontext.set_context(mode=context.GRAPH_MODE)
in your code.
Using BNN
BNN is a basic model composed of probabilistic model and neural network. Its weight is not a definite value, but a distribution. The following example describes how to use the bnn_layers
module in MDP to implement a BNN, and then use the BNN to implement a simple image classification function. The overall process is as follows:
Process the MNIST dataset.
Define the Bayes LeNet.
Define the loss function and optimizer.
Load and train the dataset.
This example is for the GPU or Ascend 910 AI processor platform. You can download the complete sample code from https://gitee.com/mindspore/mindspore/tree/r1.5/tests/st/probability/bnn_layers. BNN only supports GRAPH mode now, please set
context.set_context(mode=context.GRAPH_MODE)
in your code.
Environment Preparation
Set the training mode to graph mode and the computing platform to GPU.
from mindspore import context
context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target="GPU")
Data Preparation
Downloading the Dataset
Download the MNIST dataset and unzip it to the specified location, execute the following command:
mkdir -p ./datasets/MNIST_Data/train ./datasets/MNIST_Data/test
wget -NP ./datasets/MNIST_Data/train https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/train-labels-idx1-ubyte --no-check-certificate
wget -NP ./datasets/MNIST_Data/train https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/train-images-idx3-ubyte --no-check-certificate
wget -NP ./datasets/MNIST_Data/test https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/t10k-labels-idx1-ubyte --no-check-certificate
wget -NP ./datasets/MNIST_Data/test https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/t10k-images-idx3-ubyte --no-check-certificate
tree ./datasets/MNIST_Data
./datasets/MNIST_Data
├── test
│ ├── t10k-images-idx3-ubyte
│ └── t10k-labels-idx1-ubyte
└── train
├── train-images-idx3-ubyte
└── train-labels-idx1-ubyte
2 directories, 4 files
Defining the Dataset Enhancement Method
The original training dataset of the MNIST dataset is 60,000 single-channel digital images with (32,1 ,32,32)
, through the custom create_dataset function to enhance the original dataset to meet the training requirements of the data, the specific enhancement operation explanation can refer to Quick Start for Beginners.
import mindspore.dataset.vision.c_transforms as CV
import mindspore.dataset.transforms.c_transforms as C
from mindspore.dataset.vision import Inter
from mindspore import dataset as ds
def create_dataset(data_path, batch_size=32, repeat_size=1,
num_parallel_workers=1):
# define dataset
mnist_ds = ds.MnistDataset(data_path)
# define some parameters needed for data enhancement and rough justification
resize_height, resize_width = 32, 32
rescale = 1.0 / 255.0
shift = 0.0
rescale_nml = 1 / 0.3081
shift_nml = -1 * 0.1307 / 0.3081
# according to the parameters, generate the corresponding data enhancement method
c_trans = [
CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR),
CV.Rescale(rescale_nml, shift_nml),
CV.Rescale(rescale, shift),
CV.HWC2CHW()
]
type_cast_op = C.TypeCast(mstype.int32)
# using map to apply operations to a dataset
mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(operations=c_trans, input_columns="image", num_parallel_workers=num_parallel_workers)
# process the generated dataset
buffer_size = 10000
mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)
mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
mnist_ds = mnist_ds.repeat(repeat_size)
return mnist_ds
Defining the BNN
In the classic LeNet5 network, the data goes through the following calculation process: convolution 1->activation->pooling->convolution 2->activation->pooling->dimensionality reduction->full connection 1->full connection 2-> Fully connected 3.
In this example, a probabilistic programming method will be introduced, using the bnn_layers
module to transform the convolutional layer and the fully connected layer into a Bayesian layer.
from mindspore.common.initializer import Normal
import mindspore.nn as nn
from mindspore.nn.probability import bnn_layers
import mindspore.ops as ops
from mindspore import dtype as mstype
class BNNLeNet5(nn.Cell):
def __init__(self, num_class=10):
super(BNNLeNet5, self).__init__()
self.num_class = num_class
self.conv1 = bnn_layers.ConvReparam(1, 6, 5, stride=1, padding=0, has_bias=False, pad_mode="valid")
self.conv2 = bnn_layers.ConvReparam(6, 16, 5, stride=1, padding=0, has_bias=False, pad_mode="valid")
self.fc1 = bnn_layers.DenseReparam(16 * 5 * 5, 120)
self.fc2 = bnn_layers.DenseReparam(120, 84)
self.fc3 = bnn_layers.DenseReparam(84, self.num_class)
self.relu = nn.ReLU()
self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
self.flatten = nn.Flatten()
def construct(self, x):
x = self.max_pool2d(self.relu(self.conv1(x)))
x = self.max_pool2d(self.relu(self.conv2(x)))
x = self.flatten(x)
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.fc3(x)
return x
network = BNNLeNet5(num_class=10)
for layer in network.trainable_params():
print(layer.name)
conv1.weight_posterior.mean
conv1.weight_posterior.untransformed_std
conv2.weight_posterior.mean
conv2.weight_posterior.untransformed_std
fc1.weight_posterior.mean
fc1.weight_posterior.untransformed_std
fc1.bias_posterior.mean
fc1.bias_posterior.untransformed_std
fc2.weight_posterior.mean
fc2.weight_posterior.untransformed_std
fc2.bias_posterior.mean
fc2.bias_posterior.untransformed_std
fc3.weight_posterior.mean
fc3.weight_posterior.untransformed_std
fc3.bias_posterior.mean
fc3.bias_posterior.untransformed_std
The printed information shows that the convolutional layer and the fully connected layer of the LeNet network constructed with the bnn_layers
module are both Bayesian layers.
Defining the Loss Function and Optimizer
Next, you need to define the loss function and the optimizer. The loss function is the training target of deep learning, also called the objective function. It can be understood as the distance between the output of the neural network (Logits) and the label (Labels), which is a scalar data.
Common loss functions include mean square error, L2 loss, Hinge loss, and cross entropy. Cross entropy is usually used for image classification.
The optimizer is used for neural network solution (training). Because of the large scale of neural network parameters, the stochastic gradient descent (SGD) algorithm and its improved algorithm are used in deep learning to solve the problem. MindSpore encapsulates common optimizers, such as SGD
, Adam
, and Momemtum
. In this example, the Adam
optimizer is used. Generally, two parameters need to be set: learning rate (learning_rate
) and weight attenuation (weight_decay
).
An example of the code for defining the loss function and optimizer in MindSpore is as follows:
import mindspore.nn as nn
# loss function definition
criterion = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
# optimization definition
optimizer = nn.AdamWeightDecay(params=network.trainable_params(), learning_rate=0.0001)
Training the Network
The training process of BNN is similar to that of DNN. The only difference is that WithLossCell
is replaced with WithBNNLossCell
applicable to BNN. In addition to the backbone
and loss_fn
parameters, the dnn_factor
and bnn_factor
parameters are added to WithBNNLossCell
. dnn_factor
is a coefficient of the overall network loss computed by a loss function, and bnn_factor
is a coefficient of the KL divergence of each Bayesian layer. The two parameters are used to balance the overall network loss and the KL divergence of the Bayesian layer, preventing the overall network loss from being covered by a large KL divergence.
dnn_factor
is the coefficient of the overall network loss calculated by the loss function.bnn_factor
is the coefficient of the KL divergence of each Bayesian layer.
The code examples of train_model
and validate_model
in MindSpore are as follows:
def train_model(train_net, net, dataset):
accs = []
loss_sum = 0
for _, data in enumerate(dataset.create_dict_iterator()):
train_x = Tensor(data['image'].asnumpy().astype(np.float32))
label = Tensor(data['label'].asnumpy().astype(np.int32))
loss = train_net(train_x, label)
output = net(train_x)
log_output = ops.LogSoftmax(axis=1)(output)
acc = np.mean(log_output.asnumpy().argmax(axis=1) == label.asnumpy())
accs.append(acc)
loss_sum += loss.asnumpy()
loss_sum = loss_sum / len(accs)
acc_mean = np.mean(accs)
return loss_sum, acc_mean
def validate_model(net, dataset):
accs = []
for _, data in enumerate(dataset.create_dict_iterator()):
train_x = Tensor(data['image'].asnumpy().astype(np.float32))
label = Tensor(data['label'].asnumpy().astype(np.int32))
output = net(train_x)
log_output = ops.LogSoftmax(axis=1)(output)
acc = np.mean(log_output.asnumpy().argmax(axis=1) == label.asnumpy())
accs.append(acc)
acc_mean = np.mean(accs)
return acc_mean
Perform training.
from mindspore.nn import TrainOneStepCell
from mindspore import Tensor
import numpy as np
net_with_loss = bnn_layers.WithBNNLossCell(network, criterion, dnn_factor=60000, bnn_factor=0.000001)
train_bnn_network = TrainOneStepCell(net_with_loss, optimizer)
train_bnn_network.set_train()
train_set = create_dataset('./datasets/MNIST_Data/train', 64, 1)
test_set = create_dataset('./datasets/MNIST_Data/test', 64, 1)
epoch = 10
for i in range(epoch):
train_loss, train_acc = train_model(train_bnn_network, network, train_set)
valid_acc = validate_model(network, test_set)
print('Epoch: {} \tTraining Loss: {:.4f} \tTraining Accuracy: {:.4f} \tvalidation Accuracy: {:.4f}'.
format(i+1, train_loss, train_acc, valid_acc))
Epoch: 1 Training Loss: 21444.8605 Training Accuracy: 0.8928 validation Accuracy: 0.9513
Epoch: 2 Training Loss: 9396.3887 Training Accuracy: 0.9536 validation Accuracy: 0.9635
Epoch: 3 Training Loss: 7320.2412 Training Accuracy: 0.9641 validation Accuracy: 0.9674
Epoch: 4 Training Loss: 6221.6970 Training Accuracy: 0.9685 validation Accuracy: 0.9731
Epoch: 5 Training Loss: 5450.9543 Training Accuracy: 0.9725 validation Accuracy: 0.9733
Epoch: 6 Training Loss: 4898.9741 Training Accuracy: 0.9754 validation Accuracy: 0.9767
Epoch: 7 Training Loss: 4505.7502 Training Accuracy: 0.9775 validation Accuracy: 0.9784
Epoch: 8 Training Loss: 4099.8783 Training Accuracy: 0.9797 validation Accuracy: 0.9791
Epoch: 9 Training Loss: 3795.2288 Training Accuracy: 0.9810 validation Accuracy: 0.9796
Epoch: 10 Training Loss: 3581.4254 Training Accuracy: 0.9823 validation Accuracy: 0.9773