mindarmour.adv_robustness.detectors
This module includes detector methods on distinguishing adversarial examples from benign examples.
- class mindarmour.adv_robustness.detectors.DivergenceBasedDetector(auto_encoder, model, option='jsd', t=1, bounds=(0.0, 1.0))[source]
The divergence-based detector learns to distinguish normal and adversarial examples by their js-divergence.
- Parameters
auto_encoder (Model) – Encoder model.
model (Model) – Targeted model.
option (str) – Method used to calculate Divergence. Default: “jsd”.
t (int) – Temperature used to overcome numerical problem. Default: 1.
bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: (0.0, 1.0).
Examples
>>> import mindspore.ops.operations as P >>> from mindspore.nn import Cell >>> from mindspore import Model >>> from mindarmour.adv_robustness.detectors import DivergenceBasedDetector >>> class PredNet(Cell): ... def __init__(self): ... super(PredNet, self).__init__() ... self.shape = P.Shape() ... self.reshape = P.Reshape() ... self._softmax = P.Softmax() ... def construct(self, inputs): ... data = self.reshape(inputs, (self.shape(inputs)[0], -1)) ... return self._softmax(data) >>> class Net(Cell): ... def __init__(self): ... super(Net, self).__init__() ... self.add = P.Add() ... def construct(self, inputs): ... return self.add(inputs, inputs) >>> np.random.seed(5) >>> ori = np.random.rand(4, 4, 4).astype(np.float32) >>> np.random.seed(6) >>> adv = np.random.rand(4, 4, 4).astype(np.float32) >>> encoder = Model(Net()) >>> model = Model(PredNet()) >>> detector = DivergenceBasedDetector(encoder, model) >>> threshold = detector.fit(ori) >>> detector.set_threshold(threshold) >>> adv_ids = detector.detect(adv) >>> adv_trans = detector.transform(adv)
- detect_diff(inputs)[source]
Detect the distance between original samples and reconstructed samples.
The distance is calculated by JSD.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
float, the distance.
- Raises
NotImplementedError – If the param option is not supported.
- class mindarmour.adv_robustness.detectors.EnsembleDetector(detectors, policy='vote')[source]
The ensemble detector uses a list of detectors to detect the adversarial examples from the input samples.
- Parameters
Examples
>>> from mindspore.ops.operations import Add >>> from mindspore import Model >>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector >>> from mindarmour.adv_robustness.detectors import RegionBasedDetector >>> from mindarmour.adv_robustness.detectors import EnsembleDetector >>> class Net(nn.Cell): ... def __init__(self): ... super(Net, self).__init__() ... self.add = Add() ... def construct(self, inputs): ... return self.add(inputs, inputs) >>> class AutoNet(Cell): ... def __init__(self): ... super(AutoNet, self).__init__() ... self.add = Add() ... def construct(self, inputs): ... return self.add(inputs, inputs) >>> np.random.seed(6) >>> adv = np.random.rand(4, 4).astype(np.float32) >>> model = Model(Net()) >>> auto_encoder = Model(AutoNet()) >>> random_label = np.random.randint(10, size=4) >>> labels = np.eye(10)[random_label] >>> magnet_detector = ErrorBasedDetector(auto_encoder) >>> region_detector = RegionBasedDetector(model) >>> region_detector.fit(adv, labels) >>> detectors = [magnet_detector, region_detector] >>> detector = EnsembleDetector(detectors) >>> adv_ids = detector.detect(adv)
- detect(inputs)[source]
Detect adversarial examples from input samples.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
- Raises
ValueError – If policy is not supported.
- detect_diff(inputs)[source]
This method is not available in this class.
- Parameters
inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
- Raises
NotImplementedError – This function is not available in ensemble.
- fit(inputs, labels=None)[source]
Fit detector like a machine learning model. This method is not available in this class.
- Parameters
inputs (numpy.ndarray) – Data to calculate the threshold.
labels (numpy.ndarray) – Labels of data. Default: None.
- Raises
NotImplementedError – This function is not available in ensemble.
- transform(inputs)[source]
Filter adversarial noises in input samples. This method is not available in this class.
- Parameters
inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
- Raises
NotImplementedError – This function is not available in ensemble.
- class mindarmour.adv_robustness.detectors.ErrorBasedDetector(auto_encoder, false_positive_rate=0.01, bounds=(0.0, 1.0))[source]
The detector reconstructs input samples, measures reconstruction errors and rejects samples with large reconstruction errors.
- Parameters
Examples
>>> from mindspore.ops.operations import Add >>> from mindspore import Model >>> from mindarmour.adv_robustness.detectors import ErrorBasedDetector >>> class Net(nn.Cell): ... def __init__(self): ... super(Net, self).__init__() ... self.add = Add() ... def construct(self, inputs): ... return self.add(inputs, inputs) >>> np.random.seed(5) >>> ori = np.random.rand(4, 4, 4).astype(np.float32) >>> np.random.seed(6) >>> adv = np.random.rand(4, 4, 4).astype(np.float32) >>> model = Model(Net()) >>> detector = ErrorBasedDetector(model) >>> detector.fit(ori) >>> adv_ids = detector.detect(adv) >>> adv_trans = detector.transform(adv)
- detect(inputs)[source]
Detect if input samples are adversarial or not.
- Parameters
inputs (numpy.ndarray) – Suspicious samples to be judged.
- Returns
list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
- detect_diff(inputs)[source]
Detect the distance between the original samples and reconstructed samples.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
float, the distance between reconstructed and original samples.
- fit(inputs, labels=None)[source]
Find a threshold for a given dataset to distinguish adversarial examples.
- Parameters
inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – Labels of input samples. Default: None.
- Returns
float, threshold to distinguish adversarial samples from benign ones.
- set_threshold(threshold)[source]
Set the parameters threshold.
- Parameters
threshold (float) – Detection threshold.
- transform(inputs)[source]
Reconstruct input samples.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
numpy.ndarray, reconstructed images.
- class mindarmour.adv_robustness.detectors.RegionBasedDetector(model, number_points=10, initial_radius=0.0, max_radius=1.0, search_step=0.01, degrade_limit=0.0, sparse=False)[source]
The region-based detector uses the fact that adversarial examples are close to the classification boundary, and ensembles information around the given example to predict whether it is an adversarial example or not.
Reference: Mitigating evasion attacks to deep neural networks via region-based classification
- Parameters
model (Model) – Target model.
number_points (int) – The number of samples generate from the hyper cube of original sample. Default: 10.
initial_radius (float) – Initial radius of hyper cube. Default: 0.0.
max_radius (float) – Maximum radius of hyper cube. Default: 1.0.
search_step (float) – Incremental during search of radius. Default: 0.01.
degrade_limit (float) – Acceptable decrease of classification accuracy. Default: 0.0.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: False.
Examples
>>> from mindspore.ops.operations import Add >>> from mindspore import Model >>> from mindarmour.adv_robustness.detectors import RegionBasedDetector >>> class Net(nn.Cell): ... def __init__(self): ... super(Net, self).__init__() ... self.add = Add() ... def construct(self, inputs): ... return self.add(inputs, inputs) >>> np.random.seed(5) >>> ori = np.random.rand(4, 4).astype(np.float32) >>> labels = np.array([[1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0], ... [0, 1, 0, 0]]).astype(np.int32) >>> np.random.seed(6) >>> adv = np.random.rand(4, 4).astype(np.float32) >>> model = Model(Net()) >>> detector = RegionBasedDetector(model) >>> radius = detector.fit(ori, labels) >>> detector.set_radius(radius) >>> adv_ids = detector.detect(adv)
- detect(inputs)[source]
Tell whether input samples are adversarial or not.
- Parameters
inputs (numpy.ndarray) – Suspicious samples to be judged.
- Returns
list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
- detect_diff(inputs)[source]
Return raw prediction results and region-based prediction results.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
numpy.ndarray, raw prediction results and region-based prediction results of input samples.
- fit(inputs, labels=None)[source]
Train detector to decide the best radius.
- Parameters
inputs (numpy.ndarray) – Benign samples.
labels (numpy.ndarray) – Ground truth labels of the input samples. Default:None.
- Returns
float, the best radius.
- transform(inputs)[source]
Generate hyper cube for input samples.
- Parameters
inputs (numpy.ndarray) – Input samples.
- Returns
numpy.ndarray, hyper cube corresponds to every sample.
- class mindarmour.adv_robustness.detectors.SimilarityDetector(trans_model, max_k_neighbor=1000, chunk_size=1000, max_buffer_size=10000, tuning=False, fpr=0.001)[source]
The detector measures similarity among adjacent queries and rejects queries which are remarkably similar to previous queries.
- Parameters
trans_model (Model) – A MindSpore model to encode input data into lower dimension vector.
max_k_neighbor (int) – The maximum number of the nearest neighbors. Default: 1000.
chunk_size (int) – Buffer size. Default: 1000.
max_buffer_size (int) – Maximum buffer size. Default: 10000.
tuning (bool) – Calculate the average distance for the nearest k neighbours, if tuning is true, k=K. If False k=1,…,K. Default: False.
fpr (float) – False positive ratio on legitimate query sequences. Default: 0.001
Examples
>>> from mindspore.ops.operations import Add >>> from mindspore.nn import Cell >>> from mindspore import Model >>> from mindarmour.adv_robustness.detectors import SimilarityDetector >>> class EncoderNet(Cell): ... def __init__(self, encode_dim): ... super(EncoderNet, self).__init__() ... self._encode_dim = encode_dim ... self.add = Add() ... def construct(self, inputs): ... return self.add(inputs, inputs) ... def get_encode_dim(self): ... return self._encode_dim >>> np.random.seed(5) >>> x_train = np.random.rand(10, 32, 32, 3).astype(np.float32) >>> perm = np.random.permutation(x_train.shape[0]) >>> benign_queries = x_train[perm[:10], :, :, :] >>> suspicious_queries = x_train[perm[-1], :, :, :] + np.random.normal(0, 0.05, (10,) + x_train.shape[1:]) >>> suspicious_queries = suspicious_queries.astype(np.float32) >>> encoder = Model(EncoderNet(encode_dim=256)) >>> detector = SimilarityDetector(max_k_neighbor=3, trans_model=encoder) >>> num_nearest_neighbors, thresholds = detector.fit(inputs=x_train) >>> detector.set_threshold(num_nearest_neighbors[-1], thresholds[-1]) >>> detector.detect(benign_queries) >>> detections = detector.get_detection_interval() >>> detected_queries = detector.get_detected_queries()
- detect(inputs)[source]
Process queries to detect black-box attack.
- Parameters
inputs (numpy.ndarray) – Query sequence.
- Raises
ValueError – The parameters of threshold or num_of_neighbors is not available.
- detect_diff(inputs)[source]
Detect adversarial samples from input samples, like the predict_proba function in common machine learning model.
- Parameters
inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
- Raises
NotImplementedError – This function is not available in class SimilarityDetector.
- fit(inputs, labels=None)[source]
Process input training data to calculate the threshold. A proper threshold should make sure the false positive rate is under a given value.
- Parameters
inputs (numpy.ndarray) – Training data to calculate the threshold.
labels (numpy.ndarray) – Labels of training data.
- Returns
list[int], number of the nearest neighbors.
list[float], calculated thresholds for different K.
- Raises
ValueError – The number of training data is less than max_k_neighbor!
- get_detected_queries()[source]
Get the indexes of detected queries.
- Returns
list[int], sequence number of detected malicious queries.
- get_detection_interval()[source]
Get the interval between adjacent detections.
- Returns
list[int], number of queries between adjacent detections.
- set_threshold(num_of_neighbors, threshold)[source]
Set the parameters num_of_neighbors and threshold.
- transform(inputs)[source]
Filter adversarial noises in input samples.
- Parameters
inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
- Raises
NotImplementedError – This function is not available in class SimilarityDetector.
- class mindarmour.adv_robustness.detectors.SpatialSmoothing(model, ksize=3, is_local_smooth=True, metric='l1', false_positive_ratio=0.05)[source]
Detect method based on spatial smoothing. Using Gaussian filtering, median filtering, and mean filtering, to blur the original image. When the model has a large threshold difference between the predicted values before and after the sample is blurred, it is judged as an adversarial example.
- Parameters
model (Model) – Target model.
ksize (int) – Smooth window size. Default: 3.
is_local_smooth (bool) – If True, trigger local smooth. If False, none local smooth. Default: True.
metric (str) – Distance method. Default: ‘l1’.
false_positive_ratio (float) – False positive rate over benign samples. Default: 0.05.
Examples
>>> import mindspore.ops.operations as P >>> from mindspore import Model >>> from mindarmour.adv_robustness.detectors import SpatialSmoothing >>> class Net(nn.Cell): ... def __init__(self): ... super(Net, self).__init__() ... self._softmax = P.Softmax() ... def construct(self, inputs): ... return self._softmax(inputs) >>> input_shape = (50, 3) >>> np.random.seed(1) >>> input_np = np.random.randn(*input_shape).astype(np.float32) >>> np.random.seed(2) >>> adv_np = np.random.randn(*input_shape).astype(np.float32) >>> model = Model(Net()) >>> detector = SpatialSmoothing(model) >>> threshold = detector.fit(input_np) >>> detector.set_threshold(threshold.item()) >>> detected_res = np.array(detector.detect(adv_np))
- detect(inputs)[source]
Detect if an input sample is an adversarial example.
- Parameters
inputs (numpy.ndarray) – Suspicious samples to be judged.
- Returns
list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
- detect_diff(inputs)[source]
Return the raw distance value (before apply the threshold) between the input sample and its smoothed counterpart.
- Parameters
inputs (numpy.ndarray) – Suspicious samples to be judged.
- Returns
float, distance.
- fit(inputs, labels=None)[source]
Train detector to decide the threshold. The proper threshold make sure the actual false positive rate over benign sample is less than the given value.
- Parameters
inputs (numpy.ndarray) – Benign samples.
labels (numpy.ndarray) – Default None.
- Returns
float, threshold, distance larger than which is reported as positive, i.e. adversarial.