mindarmour.adv_robustness.detectors
This module includes detector methods on distinguishing adversarial examples from benign examples.
- class mindarmour.adv_robustness.detectors.DivergenceBasedDetector(auto_encoder, model, option='jsd', t=1, bounds=(0.0, 1.0))[source]
This class implement a divergence-based detector.
- Parameters
auto_encoder (Model) – Encoder model.
model (Model) – Targeted model.
option (str) – Method used to calculate Divergence. Default: “jsd”.
t (int) – Temperature used to overcome numerical problem. Default: 1.
bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: (0.0, 1.0).
Examples
>>> import numpy as np >>> from mindspore.ops.operations import Add >>> from mindspore.nn import Cell >>> from mindspore import Model >>> from mindspore import context >>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector >>> class PredNet(Cell): >>> def __init__(self): >>> super(Net, self).__init__() >>> self.add = Add() >>> >>> def construct(self, inputs): >>> return self.add(inputs, inputs) >>> >>> np.random.seed(5) >>> ori = np.random.rand(4, 4, 4).astype(np.float32) >>> np.random.seed(6) >>> adv = np.random.rand(4, 4, 4).astype(np.float32) >>> model = Model(PredNet()) >>> detector = DivergenceBasedDetector(encoder, model) >>> threshold = detector.fit(ori) >>> detector.set_threshold(threshold) >>> adv_ids = detector.detect(adv) >>> adv_trans = detector.transform(adv)
- detect_diff(inputs)[source]
Detect the distance between original samples and reconstructed samples.
The distance is calculated by JSD.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
float, the distance.
- Raises
NotImplementedError – If the param option is not supported.
- class mindarmour.adv_robustness.detectors.EnsembleDetector(detectors, policy='vote')[source]
Ensemble detector.
- Parameters
Examples
>>> import numpy as np >>> from mindspore.ops.operations import Add >>> from mindspore.nn import Cell >>> from mindspore import Model >>> from mindspore import context >>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector >>> from mindarmour.adv_robustness.detectors import RegionBasedDetector >>> from mindarmour.adv_robustness.detectors import EnsembleDetector >>> >>> class Net(Cell): >>> def __init__(self): >>> super(Net, self).__init__() >>> self.add = Add() >>> def construct(self, inputs): >>> return self.add(inputs, inputs) >>> >>> class AutoNet(Cell): >>> def __init__(self): >>> super(AutoNet, self).__init__() >>> self.add = Add() >>> def construct(self, inputs): >>> return self.add(inputs, inputs) >>> >>> np.random.seed(6) >>> adv = np.random.rand(4, 4).astype(np.float32) >>> model = Model(Net()) >>> auto_encoder = Model(AutoNet()) >>> random_label = np.random.randint(10, size=4) >>> labels = np.eye(10)[random_label] >>> magnet_detector = ErrorBasedDetector(auto_encoder) >>> region_detector = RegionBasedDetector(model) >>> region_detector.fit(adv, labels) >>> detectors = [magnet_detector, region_detector] >>> detector = EnsembleDetector(detectors) >>> adv_ids = detector.detect(adv)
- detect(inputs)[source]
Detect adversarial examples from input samples.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
- Raises
ValueError – If policy is not supported.
- detect_diff(inputs)[source]
This method is not available in this class.
- Parameters
inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
- Raises
NotImplementedError – This function is not available in ensemble.
- fit(inputs, labels=None)[source]
Fit detector like a machine learning model. This method is not available in this class.
- Parameters
inputs (numpy.ndarray) – Data to calculate the threshold.
labels (numpy.ndarray) – Labels of data. Default: None.
- Raises
NotImplementedError – This function is not available in ensemble.
- transform(inputs)[source]
Filter adversarial noises in input samples. This method is not available in this class.
- Parameters
inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
- Raises
NotImplementedError – This function is not available in ensemble.
- class mindarmour.adv_robustness.detectors.ErrorBasedDetector(auto_encoder, false_positive_rate=0.01, bounds=(0.0, 1.0))[source]
The detector reconstructs input samples, measures reconstruction errors and rejects samples with large reconstruction errors.
- Parameters
Examples
>>> import numpy as np >>> from mindspore.ops.operations import Add >>> from mindspore.nn import Cell >>> from mindspore import Model >>> from mindspore import context >>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector >>> class Net(Cell): >>> def __init__(self): >>> super(Net, self).__init__() >>> self.add = Add() >>> >>> def construct(self, inputs): >>> return self.add(inputs, inputs) >>> >>> np.random.seed(5) >>> ori = np.random.rand(4, 4, 4).astype(np.float32) >>> np.random.seed(6) >>> adv = np.random.rand(4, 4, 4).astype(np.float32) >>> model = Model(Net()) >>> detector = ErrorBasedDetector(model) >>> detector.fit(ori) >>> adv_ids = detector.detect(adv) >>> adv_trans = detector.transform(adv)
- detect(inputs)[source]
Detect if input samples are adversarial or not.
- Parameters
inputs (numpy.ndarray) – Suspicious samples to be judged.
- Returns
list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
- detect_diff(inputs)[source]
Detect the distance between the original samples and reconstructed samples.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
float, the distance between reconstructed and original samples.
- fit(inputs, labels=None)[source]
Find a threshold for a given dataset to distinguish adversarial examples.
- Parameters
inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – Labels of input samples. Default: None.
- Returns
float, threshold to distinguish adversarial samples from benign ones.
- set_threshold(threshold)[source]
Set the parameters threshold.
- Parameters
threshold (float) – Detection threshold.
- transform(inputs)[source]
Reconstruct input samples.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
numpy.ndarray, reconstructed images.
- class mindarmour.adv_robustness.detectors.RegionBasedDetector(model, number_points=10, initial_radius=0.0, max_radius=1.0, search_step=0.01, degrade_limit=0.0, sparse=False)[source]
This class implement a region-based detector.
Reference: Mitigating evasion attacks to deep neural networks via region-based classification
- Parameters
model (Model) – Target model.
number_points (int) – The number of samples generate from the hyper cube of original sample. Default: 10.
initial_radius (float) – Initial radius of hyper cube. Default: 0.0.
max_radius (float) – Maximum radius of hyper cube. Default: 1.0.
search_step (float) – Incremental during search of radius. Default: 0.01.
degrade_limit (float) – Acceptable decrease of classification accuracy. Default: 0.0.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: False.
Examples
>>> import numpy as np >>> from mindspore.ops.operations import Add >>> from mindspore.nn import Cell >>> from mindspore import Model >>> from mindspore import context >>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector >>> class Net(Cell): >>> def __init__(self): >>> super(Net, self).__init__() >>> self.add = Add() >>> >>> def construct(self, inputs): >>> return self.add(inputs, inputs) >>> >>> np.random.seed(5) >>> ori = np.random.rand(4, 4).astype(np.float32) >>> labels = np.array([[1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 1, 0, 0]]).astype(np.int32) >>> np.random.seed(6) >>> adv = np.random.rand(4, 4).astype(np.float32) >>> model = Model(Net()) >>> detector = RegionBasedDetector(model) >>> radius = detector.fit(ori, labels) >>> detector.set_radius(radius) >>> adv_ids = detector.detect(adv)
- detect(inputs)[source]
Tell whether input samples are adversarial or not.
- Parameters
inputs (numpy.ndarray) – Suspicious samples to be judged.
- Returns
list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
- detect_diff(inputs)[source]
Return raw prediction results and region-based prediction results.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
numpy.ndarray, raw prediction results and region-based prediction results of input samples.
- fit(inputs, labels=None)[source]
Train detector to decide the best radius.
- Parameters
inputs (numpy.ndarray) – Benign samples.
labels (numpy.ndarray) – Ground truth labels of the input samples. Default:None.
- Returns
float, the best radius.
- transform(inputs)[source]
Generate hyper cube for input samples.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
numpy.ndarray, hyper cube corresponds to every sample.
- class mindarmour.adv_robustness.detectors.SimilarityDetector(trans_model, max_k_neighbor=1000, chunk_size=1000, max_buffer_size=10000, tuning=False, fpr=0.001)[source]
The detector measures similarity among adjacent queries and rejects queries which are remarkably similar to previous queries.
- Parameters
trans_model (Model) – A MindSpore model to encode input data into lower dimension vector.
max_k_neighbor (int) – The maximum number of the nearest neighbors. Default: 1000.
chunk_size (int) – Buffer size. Default: 1000.
max_buffer_size (int) – Maximum buffer size. Default: 10000.
tuning (bool) – Calculate the average distance for the nearest k neighbours, if tuning is true, k=K. If False k=1,…,K. Default: False.
fpr (float) – False positive ratio on legitimate query sequences. Default: 0.001
Examples
>>> import numpy as np >>> from mindspore.ops.operations import Add >>> from mindspore.nn import Cell >>> from mindspore import Model >>> from mindspore import context >>> from mindarmour.adv_robustness.detectors import SimilarityDetector >>> >>> class EncoderNet(Cell): >>> def __init__(self, encode_dim): >>> super(EncoderNet, self).__init__() >>> self._encode_dim = encode_dim >>> self.add = Add() >>> def construct(self, inputs): >>> return self.add(inputs, inputs) >>> def get_encode_dim(self): >>> return self._encode_dim >>> >>> np.random.seed(5) >>> x_train = np.random.rand(10, 32, 32, 3).astype(np.float32) >>> perm = np.random.permutation(x_train.shape[0]) >>> benign_queries = x_train[perm[:10], :, :, :] >>> suspicious_queries = x_train[perm[-1], :, :, :] + np.random.normal(0, 0.05, (10,) + x_train.shape[1:]) >>> suspicious_queries = suspicious_queries.astype(np.float32) >>> encoder = Model(EncoderNet(encode_dim=256)) >>> detector = SimilarityDetector(max_k_neighbor=3, trans_model=encoder) >>> num_nearest_neighbors, thresholds = detector.fit(inputs=x_train) >>> detector.set_threshold(num_nearest_neighbors[-1], thresholds[-1]) >>> detector.detect(benign_queries) >>> detections = detector.get_detection_interval()
- detect(inputs)[source]
Process queries to detect black-box attack.
- Parameters
inputs (numpy.ndarray) – Query sequence.
- Raises
ValueError – The parameters of threshold or num_of_neighbors is not available.
- detect_diff(inputs)[source]
Detect adversarial samples from input samples, like the predict_proba function in common machine learning model.
- Parameters
inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
- Raises
NotImplementedError – This function is not available in class SimilarityDetector.
- fit(inputs, labels=None)[source]
Process input training data to calculate the threshold. A proper threshold should make sure the false positive rate is under a given value.
- Parameters
inputs (numpy.ndarray) – Training data to calculate the threshold.
labels (numpy.ndarray) – Labels of training data.
- Returns
list[int], number of the nearest neighbors.
list[float], calculated thresholds for different K.
- Raises
ValueError – The number of training data is less than max_k_neighbor!
- get_detected_queries()[source]
Get the indexes of detected queries.
- Returns
list[int], sequence number of detected malicious queries.
- get_detection_interval()[source]
Get the interval between adjacent detections.
- Returns
list[int], number of queries between adjacent detections.
- set_threshold(num_of_neighbors, threshold)[source]
Set the parameters num_of_neighbors and threshold.
- transform(inputs)[source]
Filter adversarial noises in input samples.
- Parameters
inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
- Raises
NotImplementedError – This function is not available in class SimilarityDetector.
- class mindarmour.adv_robustness.detectors.SpatialSmoothing(model, ksize=3, is_local_smooth=True, metric='l1', false_positive_ratio=0.05)[source]
Detect method based on spatial smoothing.
- Parameters
model (Model) – Target model.
ksize (int) – Smooth window size. Default: 3.
is_local_smooth (bool) – If True, trigger local smooth. If False, none local smooth. Default: True.
metric (str) – Distance method. Default: ‘l1’.
false_positive_ratio (float) – False positive rate over benign samples. Default: 0.05.
Examples
>>> import numpy as np >>> from mindspore.ops.operations as P >>> from mindspore.nn import Cell >>> from mindspore import Model >>> from mindspore import context >>> from mindarmour.adv_robustness.detectors import SpatialSmoothing >>> class Net(Cell): >>> def __init__(self): >>> super(Net, self).__init__() >>> self._softmax = P.Softmax() >>> >>> def construct(self, inputs): >>> return self._softmax(inputs) >>> >>> input_shape = (50, 3) >>> np.random.seed(1) >>> input_np = np.random.randn(*input_shape).astype(np.float32) >>> np.random.seed(2) >>> adv_np = np.random.randn(*input_shape).astype(np.float32) >>> model = Model(Net()) >>> detector = SpatialSmoothing(model) >>> threshold = detector.fit(input_np) >>> detector.set_threshold(threshold.item()) >>> detected_res = np.array(detector.detect(adv_np))
- detect(inputs)[source]
Detect if an input sample is an adversarial example.
- Parameters
inputs (numpy.ndarray) – Suspicious samples to be judged.
- Returns
list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
- detect_diff(inputs)[source]
Return the raw distance value (before apply the threshold) between the input sample and its smoothed counterpart.
- Parameters
inputs (numpy.ndarray) – Suspicious samples to be judged.
- Returns
float, distance.
- fit(inputs, labels=None)[source]
Train detector to decide the threshold. The proper threshold make sure the actual false positive rate over benign sample is less than the given value.
- Parameters
inputs (numpy.ndarray) – Benign samples.
labels (numpy.ndarray) – Default None.
- Returns
float, threshold, distance larger than which is reported as positive, i.e. adversarial.