Document feedback

Question document fragment

When a question document fragment contains a formula, it is displayed as a space.

Submission type

issue

It's a little complicated...

I'd like to ask someone.

PR

Just a small problem.

I can fix it online!

Please select the submission type

Problem type

Specifications and Common Mistakes

- Specifications and Common Mistakes:

- Misspellings or punctuation mistakes,incorrect formulas, abnormal display.

- Incorrect links, empty cells, or wrong formats.

- Chinese characters in English context.

- Minor inconsistencies between the UI and descriptions.

- Low writing fluency that does not affect understanding.

- Incorrect version numbers, including software package names and version numbers on the UI.

Usability

- Usability:

- Incorrect or missing key steps.

- Missing main function descriptions, keyword explanation, necessary prerequisites, or precautions.

- Ambiguous descriptions, unclear reference, or contradictory context.

- Unclear logic, such as missing classifications, items, and steps.

Correctness

- Correctness:

- Technical principles, function descriptions, supported platforms, parameter types, or exceptions inconsistent with that of software implementation.

- Incorrect schematic or architecture diagrams.

- Incorrect commands or command parameters.

- Incorrect code.

- Commands inconsistent with the functions.

- Wrong screenshots.

- Sample code running error, or running results inconsistent with the expectation.

Risk Warnings

- Risk Warnings:

- Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

- Content Compliance:

- Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions.

- Copyright infringement.

Please select the type of question

Problem description

Describe the bug so that we can quickly locate the problem.

Document feedback

Horizontal FL-Local Differential Privacy Perturbation Training

During federated learning, user data is used only for local device training and does not need to be uploaded to the central server. This prevents personal data leakage. However, in the conventional federated learning framework, models are migrated to the cloud in plaintext. There is still a risk of indirect disclosure of user privacy. After obtaining the plaintext model uploaded by a user, the attacker can restore the user’s personal training data through attacks such as reconstruction and model inversion. As a result, user privacy is disclosed.

As a federated learning framework, MindSpore Federated provides secure aggregation algorithms based on local differential privacy (LDP). Noise addition is performed on local models before they are migrated to the cloud. On the premise of ensuring the model availability, the problem of privacy leakage in horizontal federated learning is solved.

Principles

Differential privacy is a mechanism for protecting user data privacy. Differential privacy is defined as follows:

P r [K (D) \in S] \leq e^{ϵ} P r [K (D^{'}) \in S] + δ ​

For datasets $D, D^{'}$ that have only one record difference, the random algorithm $K$ is used to compute the probability of the $S$ subset, which meets the preceding formula. $ϵ$ is the differential privacy budget, and $δ$ is the perturbation. The smaller the values of $ϵ$ and $δ$ , the closer the data distribution of $K$ on $D$ and $D^{'}$ .

In horizontal federated learning, if the model weight matrix after local training on the client is $W$ , the attacker can use $W$ to restore the training dataset[1] of the user because the model “remembers” the features of the training set during the training process.

MindSpore Federated provides a LDP-based secure aggregation algorithm to prevent privacy data leakage when local models are migrated to the cloud.

The MindSpore Federated client generates a differential noise matrix $G$ that has the same dimension as the local model $W$ , and then adds the two to obtain a weight $W_{p}$ that meets the differential privacy definition:

W_{p} = W + G

The MindSpore Federated client uploads the noise-added model $W_{p}$ to the cloud server for federated aggregation. The noise matrix $G$ is equivalent to adding a layer of mask to the original model, which reduces the risk of sensitive data leakage from models and affects the convergence of model training. How to achieve a better balance between model privacy and usability is still a question worth studying. Experiments show that when the number of participants $n$ is large enough (generally more than 1000), most of the noises can cancel each other, and the LDP mechanism has no obvious impact on the accuracy and convergence of the aggregation model.

Usage

Local differential privacy training currently only supports cross device scenarios. Enabling differential privacy training is simple. You only need to set the encrypt_train_type field to DP_ENCRYPT via yaml when starting the cloud-side service.

In addition, to control the effect of privacy protection, three parameters are provided: dp_eps, dp_delta, and dp_norm_clip. They are also set through the yaml file.

The valid value range of dp_eps and dp_norm_clip is greater than 0. The legal range of dp_delta is 0<dp_delta<1. In general, the smaller dp_eps and dp_delta are, the better the privacy protection will be, but the greater the impact on the convergence of the model. It is recommended that dp_delta be taken as the inverse of the number of clients and dp_eps be greater than 50.

dp_norm_clip is the adjustment coefficient of the model weight before noise is added to the model weight by the LDP mechanism. It affects the convergence of the model. The recommended value ranges from 0.5 to 2.

References

[1] Ligeng Zhu, Zhijian Liu, and Song Han. Deep Leakage from Gradients. NeurIPS, 2019.