mindarmour.adv_robustness.detectors

This module includes detector methods on distinguishing adversarial examples from benign examples.

class mindarmour.adv_robustness.detectors.DivergenceBasedDetector(auto_encoder, model, option='jsd', t=1, bounds=(0.0, 1.0))[source]

The divergence-based detector learns to distinguish normal and adversarial examples by their js-divergence.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples, by Dongyu Meng and Hao Chen, at CCS 2017.

Parameters

auto_encoder (Model) – Encoder model.
model (Model) – Targeted model.
option (str) – Method used to calculate Divergence. Default: “jsd”.
t (int) – Temperature used to overcome numerical problem. Default: 1.
bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: (0.0, 1.0).

Examples

>>> import mindspore.ops.operations as P
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import DivergenceBasedDetector
>>> class PredNet(Cell):
...     def __init__(self):
...         super(PredNet, self).__init__()
...         self.shape = P.Shape()
...         self.reshape = P.Reshape()
...         self._softmax = P.Softmax()
...     def construct(self, inputs):
...         data = self.reshape(inputs, (self.shape(inputs)[0], -1))
...         return self._softmax(data)
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.add = P.Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> np.random.seed(5)
>>> ori = np.random.rand(4, 4, 4).astype(np.float32)
>>> np.random.seed(6)
>>> adv = np.random.rand(4, 4, 4).astype(np.float32)
>>> encoder = Model(Net())
>>> model = Model(PredNet())
>>> detector = DivergenceBasedDetector(encoder, model)
>>> threshold = detector.fit(ori)
>>> detector.set_threshold(threshold)
>>> adv_ids = detector.detect(adv)
>>> adv_trans = detector.transform(adv)

detect_diff(inputs)[source]

Detect the distance between original samples and reconstructed samples.

The distance is calculated by JSD.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: float, the distance.
Raises: NotImplementedError – If the param option is not supported.

class mindarmour.adv_robustness.detectors.EnsembleDetector(detectors, policy='vote')[source]

The ensemble detector uses a list of detectors to detect the adversarial examples from the input samples.

Parameters

detectors (Union[tuple, list]) – List of detector methods.
policy (str) – Decision policy, could be ‘vote’, ‘all’ or ‘any’. Default: ‘vote’

Examples

>>> from mindspore.ops.operations import Add
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector
>>> from mindarmour.adv_robustness.detectors import RegionBasedDetector
>>> from mindarmour.adv_robustness.detectors import EnsembleDetector
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> class AutoNet(Cell):
...     def __init__(self):
...         super(AutoNet, self).__init__()
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> np.random.seed(6)
>>> adv = np.random.rand(4, 4).astype(np.float32)
>>> model = Model(Net())
>>> auto_encoder = Model(AutoNet())
>>> random_label = np.random.randint(10, size=4)
>>> labels = np.eye(10)[random_label]
>>> magnet_detector = ErrorBasedDetector(auto_encoder)
>>> region_detector = RegionBasedDetector(model)
>>> region_detector.fit(adv, labels)
>>> detectors = [magnet_detector, region_detector]
>>> detector = EnsembleDetector(detectors)
>>> adv_ids = detector.detect(adv)

detect(inputs)[source]

Detect adversarial examples from input samples.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
Raises: ValueError – If policy is not supported.

detect_diff(inputs)[source]

This method is not available in this class.

Parameters: inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
Raises: NotImplementedError – This function is not available in ensemble.

fit(inputs, labels=None)[source]

Fit detector like a machine learning model. This method is not available in this class.

Parameters

inputs (numpy.ndarray) – Data to calculate the threshold.
labels (numpy.ndarray) – Labels of data. Default: None.

Raises

NotImplementedError – This function is not available in ensemble.

transform(inputs)[source]

Filter adversarial noises in input samples. This method is not available in this class.

Parameters: inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
Raises: NotImplementedError – This function is not available in ensemble.

class mindarmour.adv_robustness.detectors.ErrorBasedDetector(auto_encoder, false_positive_rate=0.01, bounds=(0.0, 1.0))[source]

The detector reconstructs input samples, measures reconstruction errors and rejects samples with large reconstruction errors.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples, by Dongyu Meng and Hao Chen, at CCS 2017.

Parameters

auto_encoder (Model) – An (trained) auto encoder which represents the input by reduced encoding.
false_positive_rate (float) – Detector’s false positive rate. Default: 0.01.
bounds (tuple) – (clip_min, clip_max). Default: (0.0, 1.0).

Examples

>>> from mindspore.ops.operations import Add
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> np.random.seed(5)
>>> ori = np.random.rand(4, 4, 4).astype(np.float32)
>>> np.random.seed(6)
>>> adv = np.random.rand(4, 4, 4).astype(np.float32)
>>> model = Model(Net())
>>> detector = ErrorBasedDetector(model)
>>> detector.fit(ori)
>>> adv_ids = detector.detect(adv)
>>> adv_trans = detector.transform(adv)

detect(inputs)[source]

Detect if input samples are adversarial or not.

Parameters: inputs (numpy.ndarray) – Suspicious samples to be judged.
Returns: list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Detect the distance between the original samples and reconstructed samples.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: float, the distance between reconstructed and original samples.

fit(inputs, labels=None)[source]

Find a threshold for a given dataset to distinguish adversarial examples.

Parameters

inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – Labels of input samples. Default: None.

Returns

float, threshold to distinguish adversarial samples from benign ones.

set_threshold(threshold)[source]

Set the parameters threshold.

Parameters: threshold (float) – Detection threshold.

transform(inputs)[source]

Reconstruct input samples.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: numpy.ndarray, reconstructed images.

class mindarmour.adv_robustness.detectors.RegionBasedDetector(model, number_points=10, initial_radius=0.0, max_radius=1.0, search_step=0.01, degrade_limit=0.0, sparse=False)[source]

The region-based detector uses the fact that adversarial examples are close to the classification boundary, and ensembles information around the given example to predict whether it is an adversarial example or not.

Reference: Mitigating evasion attacks to deep neural networks via region-based classification

Parameters

model (Model) – Target model.
number_points (int) – The number of samples generate from the hyper cube of original sample. Default: 10.
initial_radius (float) – Initial radius of hyper cube. Default: 0.0.
max_radius (float) – Maximum radius of hyper cube. Default: 1.0.
search_step (float) – Incremental during search of radius. Default: 0.01.
degrade_limit (float) – Acceptable decrease of classification accuracy. Default: 0.0.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: False.

Examples

>>> from mindspore.ops.operations import Add
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import RegionBasedDetector
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
>>> np.random.seed(5)
>>> ori = np.random.rand(4, 4).astype(np.float32)
>>> labels = np.array([[1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0],
...                   [0, 1, 0, 0]]).astype(np.int32)
>>> np.random.seed(6)
>>> adv = np.random.rand(4, 4).astype(np.float32)
>>> model = Model(Net())
>>> detector = RegionBasedDetector(model)
>>> radius = detector.fit(ori, labels)
>>> detector.set_radius(radius)
>>> adv_ids = detector.detect(adv)

detect(inputs)[source]

Tell whether input samples are adversarial or not.

Parameters: inputs (numpy.ndarray) – Suspicious samples to be judged.
Returns: list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Return raw prediction results and region-based prediction results.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: numpy.ndarray, raw prediction results and region-based prediction results of input samples.

fit(inputs, labels=None)[source]

Train detector to decide the best radius.

Parameters

inputs (numpy.ndarray) – Benign samples.
labels (numpy.ndarray) – Ground truth labels of the input samples. Default:None.

Returns

float, the best radius.

set_radius(radius)[source]

Set radius.

Parameters: radius (float) – Radius of region.

transform(inputs)[source]

Generate hyper cube for input samples.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: numpy.ndarray, hyper cube corresponds to every sample.

class mindarmour.adv_robustness.detectors.SimilarityDetector(trans_model, max_k_neighbor=1000, chunk_size=1000, max_buffer_size=10000, tuning=False, fpr=0.001)[source]

The detector measures similarity among adjacent queries and rejects queries which are remarkably similar to previous queries.

Reference: Stateful Detection of Black-Box Adversarial Attacks by Steven Chen, Nicholas Carlini, and David Wagner. at arxiv 2019

Parameters

trans_model (Model) – A MindSpore model to encode input data into lower dimension vector.
max_k_neighbor (int) – The maximum number of the nearest neighbors. Default: 1000.
chunk_size (int) – Buffer size. Default: 1000.
max_buffer_size (int) – Maximum buffer size. Default: 10000.
tuning (bool) – Calculate the average distance for the nearest k neighbours, if tuning is true, k=K. If False k=1,…,K. Default: False.
fpr (float) – False positive ratio on legitimate query sequences. Default: 0.001

Examples

>>> from mindspore.ops.operations import Add
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import SimilarityDetector
>>> class EncoderNet(Cell):
...     def __init__(self, encode_dim):
...         super(EncoderNet, self).__init__()
...         self._encode_dim = encode_dim
...         self.add = Add()
...     def construct(self, inputs):
...         return self.add(inputs, inputs)
...     def get_encode_dim(self):
...         return self._encode_dim
>>> np.random.seed(5)
>>> x_train = np.random.rand(10, 32, 32, 3).astype(np.float32)
>>> perm = np.random.permutation(x_train.shape[0])
>>> benign_queries = x_train[perm[:10], :, :, :]
>>> suspicious_queries = x_train[perm[-1], :, :, :] + np.random.normal(0, 0.05, (10,) + x_train.shape[1:])
>>> suspicious_queries = suspicious_queries.astype(np.float32)
>>> encoder = Model(EncoderNet(encode_dim=256))
>>> detector = SimilarityDetector(max_k_neighbor=3, trans_model=encoder)
>>> num_nearest_neighbors, thresholds = detector.fit(inputs=x_train)
>>> detector.set_threshold(num_nearest_neighbors[-1], thresholds[-1])
>>> detector.detect(benign_queries)
>>> detections = detector.get_detection_interval()
>>> detected_queries = detector.get_detected_queries()

clear_buffer()[source]: Clear the buffer memory.

detect(inputs)[source]

Process queries to detect black-box attack.

Parameters: inputs (numpy.ndarray) – Query sequence.
Raises: ValueError – The parameters of threshold or num_of_neighbors is not available.

detect_diff(inputs)[source]

Detect adversarial samples from input samples, like the predict_proba function in common machine learning model.

Parameters: inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
Raises: NotImplementedError – This function is not available in class SimilarityDetector.

fit(inputs, labels=None)[source]

Process input training data to calculate the threshold. A proper threshold should make sure the false positive rate is under a given value.

Parameters

inputs (numpy.ndarray) – Training data to calculate the threshold.
labels (numpy.ndarray) – Labels of training data.

Returns

list[int], number of the nearest neighbors.
list[float], calculated thresholds for different K.

Raises

ValueError – The number of training data is less than max_k_neighbor!

get_detected_queries()[source]

Get the indexes of detected queries.

Returns: list[int], sequence number of detected malicious queries.

get_detection_interval()[source]

Get the interval between adjacent detections.

Returns: list[int], number of queries between adjacent detections.

set_threshold(num_of_neighbors, threshold)[source]

Set the parameters num_of_neighbors and threshold.

Parameters

num_of_neighbors (int) – Number of the nearest neighbors.
threshold (float) – Detection threshold.

transform(inputs)[source]

Filter adversarial noises in input samples.

Parameters: inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
Raises: NotImplementedError – This function is not available in class SimilarityDetector.

class mindarmour.adv_robustness.detectors.SpatialSmoothing(model, ksize=3, is_local_smooth=True, metric='l1', false_positive_ratio=0.05)[source]

Detect method based on spatial smoothing. Using Gaussian filtering, median filtering, and mean filtering, to blur the original image. When the model has a large threshold difference between the predicted values before and after the sample is blurred, it is judged as an adversarial example.

Parameters

model (Model) – Target model.
ksize (int) – Smooth window size. Default: 3.
is_local_smooth (bool) – If True, trigger local smooth. If False, none local smooth. Default: True.
metric (str) – Distance method. Default: ‘l1’.
false_positive_ratio (float) – False positive rate over benign samples. Default: 0.05.

Examples

>>> import mindspore.ops.operations as P
>>> from mindspore.nn import Cell
>>> from mindspore import Model
>>> from mindarmour.adv_robustness.detectors import SpatialSmoothing
>>> class Net(Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self._softmax = P.Softmax()
...     def construct(self, inputs):
...         return self._softmax(inputs)
>>> input_shape = (50, 3)
>>> np.random.seed(1)
>>> input_np = np.random.randn(*input_shape).astype(np.float32)
>>> np.random.seed(2)
>>> adv_np = np.random.randn(*input_shape).astype(np.float32)
>>> model = Model(Net())
>>> detector = SpatialSmoothing(model)
>>> threshold = detector.fit(input_np)
>>> detector.set_threshold(threshold.item())
>>> detected_res = np.array(detector.detect(adv_np))

detect(inputs)[source]

Detect if an input sample is an adversarial example.

Parameters: inputs (numpy.ndarray) – Suspicious samples to be judged.
Returns: list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Return the raw distance value (before apply the threshold) between the input sample and its smoothed counterpart.

Parameters: inputs (numpy.ndarray) – Suspicious samples to be judged.
Returns: float, distance.

fit(inputs, labels=None)[source]

Train detector to decide the threshold. The proper threshold make sure the actual false positive rate over benign sample is less than the given value.

Parameters

inputs (numpy.ndarray) – Benign samples.
labels (numpy.ndarray) – Default None.

Returns

float, threshold, distance larger than which is reported as positive, i.e. adversarial.

set_threshold(threshold)[source]

Set the parameters threshold.

Parameters: threshold (float) – Detection threshold.