mindarmour.adv_robustness.detectors

This module includes detector methods on distinguishing adversarial examples from benign examples.

class mindarmour.adv_robustness.detectors.DivergenceBasedDetector(auto_encoder, model, option='jsd', t=1, bounds=(0.0, 1.0))[source]

The divergence-based detector learns to distinguish normal and adversarial examples by their js-divergence.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples, by Dongyu Meng and Hao Chen, at CCS 2017.

Parameters
  • auto_encoder (Model) – Encoder model.

  • model (Model) – Targeted model.

  • option (str) – Method used to calculate Divergence. Default: “jsd”.

  • t (int) – Temperature used to overcome numerical problem. Default: 1.

  • bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: (0.0, 1.0).

Examples

>>> import mindspore.ops.operations as P
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import DivergenceBasedDetector
>>> class PredNet(Cell):
...     def __init__(self):
...         super(PredNet, self).__init__()
...         self.shape = P.Shape()
...         self.reshape = P.Reshape()
...         self._softmax = P.Softmax()
...     def construct(self, inputs):
...         data = self.reshape(inputs, (self.shape(inputs)[0], -1))
...         return self._softmax(data)
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.add = P.Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> np.random.seed(5)
>>> ori = np.random.rand(4, 4, 4).astype(np.float32)
>>> np.random.seed(6)
>>> adv = np.random.rand(4, 4, 4).astype(np.float32)
>>> encoder = Model(Net())
>>> model = Model(PredNet())
>>> detector = DivergenceBasedDetector(encoder, model)
>>> threshold = detector.fit(ori)
>>> detector.set_threshold(threshold)
>>> adv_ids = detector.detect(adv)
>>> adv_trans = detector.transform(adv)
detect_diff(inputs)[source]

Detect the distance between original samples and reconstructed samples.

The distance is calculated by JSD.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

float, the distance.

Raises

NotImplementedError – If the param option is not supported.

class mindarmour.adv_robustness.detectors.EnsembleDetector(detectors, policy='vote')[source]

The ensemble detector uses a list of detectors to detect the adversarial examples from the input samples.

Parameters
  • detectors (Union[tuple, list]) – List of detector methods.

  • policy (str) – Decision policy, could be ‘vote’, ‘all’ or ‘any’. Default: ‘vote’

Examples

>>> from mindspore.ops.operations import Add
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector
>>> from mindarmour.adv_robustness.detectors import RegionBasedDetector
>>> from mindarmour.adv_robustness.detectors import EnsembleDetector
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> class AutoNet(Cell):
...     def __init__(self):
...         super(AutoNet, self).__init__()
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> np.random.seed(6)
>>> adv = np.random.rand(4, 4).astype(np.float32)
>>> model = Model(Net())
>>> auto_encoder = Model(AutoNet())
>>> random_label = np.random.randint(10, size=4)
>>> labels = np.eye(10)[random_label]
>>> magnet_detector = ErrorBasedDetector(auto_encoder)
>>> region_detector = RegionBasedDetector(model)
>>> region_detector.fit(adv, labels)
>>> detectors = [magnet_detector, region_detector]
>>> detector = EnsembleDetector(detectors)
>>> adv_ids = detector.detect(adv)
detect(inputs)[source]

Detect adversarial examples from input samples.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

Raises

ValueError – If policy is not supported.

detect_diff(inputs)[source]

This method is not available in this class.

Parameters

inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.

Raises

NotImplementedError – This function is not available in ensemble.

fit(inputs, labels=None)[source]

Fit detector like a machine learning model. This method is not available in this class.

Parameters
Raises

NotImplementedError – This function is not available in ensemble.

transform(inputs)[source]

Filter adversarial noises in input samples. This method is not available in this class.

Parameters

inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.

Raises

NotImplementedError – This function is not available in ensemble.

class mindarmour.adv_robustness.detectors.ErrorBasedDetector(auto_encoder, false_positive_rate=0.01, bounds=(0.0, 1.0))[source]

The detector reconstructs input samples, measures reconstruction errors and rejects samples with large reconstruction errors.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples, by Dongyu Meng and Hao Chen, at CCS 2017.

Parameters
  • auto_encoder (Model) – An (trained) auto encoder which represents the input by reduced encoding.

  • false_positive_rate (float) – Detector’s false positive rate. Default: 0.01.

  • bounds (tuple) – (clip_min, clip_max). Default: (0.0, 1.0).

Examples

>>> from mindspore.ops.operations import Add
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> np.random.seed(5)
>>> ori = np.random.rand(4, 4, 4).astype(np.float32)
>>> np.random.seed(6)
>>> adv = np.random.rand(4, 4, 4).astype(np.float32)
>>> model = Model(Net())
>>> detector = ErrorBasedDetector(model)
>>> detector.fit(ori)
>>> adv_ids = detector.detect(adv)
>>> adv_trans = detector.transform(adv)
detect(inputs)[source]

Detect if input samples are adversarial or not.

Parameters

inputs (numpy.ndarray) – Suspicious samples to be judged.

Returns

list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Detect the distance between the original samples and reconstructed samples.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

float, the distance between reconstructed and original samples.

fit(inputs, labels=None)[source]

Find a threshold for a given dataset to distinguish adversarial examples.

Parameters
Returns

float, threshold to distinguish adversarial samples from benign ones.

set_threshold(threshold)[source]

Set the parameters threshold.

Parameters

threshold (float) – Detection threshold.

transform(inputs)[source]

Reconstruct input samples.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

numpy.ndarray, reconstructed images.

class mindarmour.adv_robustness.detectors.RegionBasedDetector(model, number_points=10, initial_radius=0.0, max_radius=1.0, search_step=0.01, degrade_limit=0.0, sparse=False)[source]

The region-based detector uses the fact that adversarial examples are close to the classification boundary, and ensembles information around the given example to predict whether it is an adversarial example or not.

Reference: Mitigating evasion attacks to deep neural networks via region-based classification

Parameters
  • model (Model) – Target model.

  • number_points (int) – The number of samples generate from the hyper cube of original sample. Default: 10.

  • initial_radius (float) – Initial radius of hyper cube. Default: 0.0.

  • max_radius (float) – Maximum radius of hyper cube. Default: 1.0.

  • search_step (float) – Incremental during search of radius. Default: 0.01.

  • degrade_limit (float) – Acceptable decrease of classification accuracy. Default: 0.0.

  • sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: False.

Examples

>>> from mindspore.ops.operations import Add
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import RegionBasedDetector
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> np.random.seed(5)
>>> ori = np.random.rand(4, 4).astype(np.float32)
>>> labels = np.array([[1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0],
...                   [0, 1, 0, 0]]).astype(np.int32)
>>> np.random.seed(6)
>>> adv = np.random.rand(4, 4).astype(np.float32)
>>> model = Model(Net())
>>> detector = RegionBasedDetector(model)
>>> radius = detector.fit(ori, labels)
>>> detector.set_radius(radius)
>>> adv_ids = detector.detect(adv)
detect(inputs)[source]

Tell whether input samples are adversarial or not.

Parameters

inputs (numpy.ndarray) – Suspicious samples to be judged.

Returns

list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Return raw prediction results and region-based prediction results.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

numpy.ndarray, raw prediction results and region-based prediction results of input samples.

fit(inputs, labels=None)[source]

Train detector to decide the best radius.

Parameters
Returns

float, the best radius.

set_radius(radius)[source]

Set radius.

Parameters

radius (float) – Radius of region.

transform(inputs)[source]

Generate hyper cube for input samples.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

numpy.ndarray, hyper cube corresponds to every sample.

class mindarmour.adv_robustness.detectors.SimilarityDetector(trans_model, max_k_neighbor=1000, chunk_size=1000, max_buffer_size=10000, tuning=False, fpr=0.001)[source]

The detector measures similarity among adjacent queries and rejects queries which are remarkably similar to previous queries.

Reference: Stateful Detection of Black-Box Adversarial Attacks by Steven Chen, Nicholas Carlini, and David Wagner. at arxiv 2019

Parameters
  • trans_model (Model) – A MindSpore model to encode input data into lower dimension vector.

  • max_k_neighbor (int) – The maximum number of the nearest neighbors. Default: 1000.

  • chunk_size (int) – Buffer size. Default: 1000.

  • max_buffer_size (int) – Maximum buffer size. Default: 10000.

  • tuning (bool) – Calculate the average distance for the nearest k neighbours, if tuning is true, k=K. If False k=1,…,K. Default: False.

  • fpr (float) – False positive ratio on legitimate query sequences. Default: 0.001

Examples

>>> from mindspore.ops.operations import Add
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import SimilarityDetector
>>> class EncoderNet(Cell):
...     def __init__(self, encode_dim):
...         super(EncoderNet, self).__init__()
...         self._encode_dim = encode_dim
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
...     def get_encode_dim(self):
...         return self._encode_dim
>>> np.random.seed(5)
>>> x_train = np.random.rand(10, 32, 32, 3).astype(np.float32)
>>> perm = np.random.permutation(x_train.shape[0])
>>> benign_queries = x_train[perm[:10], :, :, :]
>>> suspicious_queries = x_train[perm[-1], :, :, :] + np.random.normal(0, 0.05, (10,) + x_train.shape[1:])
>>> suspicious_queries = suspicious_queries.astype(np.float32)
>>> encoder = Model(EncoderNet(encode_dim=256))
>>> detector = SimilarityDetector(max_k_neighbor=3, trans_model=encoder)
>>> num_nearest_neighbors, thresholds = detector.fit(inputs=x_train)
>>> detector.set_threshold(num_nearest_neighbors[-1], thresholds[-1])
>>> detector.detect(benign_queries)
>>> detections = detector.get_detection_interval()
>>> detected_queries = detector.get_detected_queries()
clear_buffer()[source]

Clear the buffer memory.

detect(inputs)[source]

Process queries to detect black-box attack.

Parameters

inputs (numpy.ndarray) – Query sequence.

Raises

ValueError – The parameters of threshold or num_of_neighbors is not available.

detect_diff(inputs)[source]

Detect adversarial samples from input samples, like the predict_proba function in common machine learning model.

Parameters

inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.

Raises

NotImplementedError – This function is not available in class SimilarityDetector.

fit(inputs, labels=None)[source]

Process input training data to calculate the threshold. A proper threshold should make sure the false positive rate is under a given value.

Parameters
Returns

  • list[int], number of the nearest neighbors.

  • list[float], calculated thresholds for different K.

Raises

ValueError – The number of training data is less than max_k_neighbor!

get_detected_queries()[source]

Get the indexes of detected queries.

Returns

list[int], sequence number of detected malicious queries.

get_detection_interval()[source]

Get the interval between adjacent detections.

Returns

list[int], number of queries between adjacent detections.

set_threshold(num_of_neighbors, threshold)[source]

Set the parameters num_of_neighbors and threshold.

Parameters
  • num_of_neighbors (int) – Number of the nearest neighbors.

  • threshold (float) – Detection threshold.

transform(inputs)[source]

Filter adversarial noises in input samples.

Parameters

inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.

Raises

NotImplementedError – This function is not available in class SimilarityDetector.

class mindarmour.adv_robustness.detectors.SpatialSmoothing(model, ksize=3, is_local_smooth=True, metric='l1', false_positive_ratio=0.05)[source]

Detect method based on spatial smoothing. Using Gaussian filtering, median filtering, and mean filtering, to blur the original image. When the model has a large threshold difference between the predicted values before and after the sample is blurred, it is judged as an adversarial example.

Parameters
  • model (Model) – Target model.

  • ksize (int) – Smooth window size. Default: 3.

  • is_local_smooth (bool) – If True, trigger local smooth. If False, none local smooth. Default: True.

  • metric (str) – Distance method. Default: ‘l1’.

  • false_positive_ratio (float) – False positive rate over benign samples. Default: 0.05.

Examples

>>> import mindspore.ops.operations as P
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import SpatialSmoothing
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self._softmax = P.Softmax()
...     def construct(self, inputs):
...         return self._softmax(inputs)
>>> input_shape = (50, 3)
>>> np.random.seed(1)
>>> input_np = np.random.randn(*input_shape).astype(np.float32)
>>> np.random.seed(2)
>>> adv_np = np.random.randn(*input_shape).astype(np.float32)
>>> model = Model(Net())
>>> detector = SpatialSmoothing(model)
>>> threshold = detector.fit(input_np)
>>> detector.set_threshold(threshold.item())
>>> detected_res = np.array(detector.detect(adv_np))
detect(inputs)[source]

Detect if an input sample is an adversarial example.

Parameters

inputs (numpy.ndarray) – Suspicious samples to be judged.

Returns

list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Return the raw distance value (before apply the threshold) between the input sample and its smoothed counterpart.

Parameters

inputs (numpy.ndarray) – Suspicious samples to be judged.

Returns

float, distance.

fit(inputs, labels=None)[source]

Train detector to decide the threshold. The proper threshold make sure the actual false positive rate over benign sample is less than the given value.

Parameters
Returns

float, threshold, distance larger than which is reported as positive, i.e. adversarial.

set_threshold(threshold)[source]

Set the parameters threshold.

Parameters

threshold (float) – Detection threshold.