Document feedback

Question document fragment

When a question document fragment contains a formula, it is displayed as a space.

Submission type
issue

It's a little complicated...

I'd like to ask someone.

PR

Just a small problem.

I can fix it online!

Please select the submission type

Problem type
Specifications and Common Mistakes

- Specifications and Common Mistakes:

- Misspellings or punctuation mistakes,incorrect formulas, abnormal display.

- Incorrect links, empty cells, or wrong formats.

- Chinese characters in English context.

- Minor inconsistencies between the UI and descriptions.

- Low writing fluency that does not affect understanding.

- Incorrect version numbers, including software package names and version numbers on the UI.

Usability

- Usability:

- Incorrect or missing key steps.

- Missing main function descriptions, keyword explanation, necessary prerequisites, or precautions.

- Ambiguous descriptions, unclear reference, or contradictory context.

- Unclear logic, such as missing classifications, items, and steps.

Correctness

- Correctness:

- Technical principles, function descriptions, supported platforms, parameter types, or exceptions inconsistent with that of software implementation.

- Incorrect schematic or architecture diagrams.

- Incorrect commands or command parameters.

- Incorrect code.

- Commands inconsistent with the functions.

- Wrong screenshots.

- Sample code running error, or running results inconsistent with the expectation.

Risk Warnings

- Risk Warnings:

- Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

- Content Compliance:

- Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions.

- Copyright infringement.

Please select the type of question

Problem description

Describe the bug so that we can quickly locate the problem.

Horizontal FL-Pairwise encryption training

View Source On Gitee

During federated learning, user data is used only for local device training and does not need to be uploaded to the central server. This prevents personal data leakage. However, in the conventional federated learning framework, models are migrated to the cloud in plaintext. There is still a risk of indirect disclosure of user privacy. After obtaining the plaintext model uploaded by a user, the attacker can restore the user’s personal training data through attacks such as reconstruction and model inversion. As a result, user privacy is disclosed.

As a federated learning framework, MindSpore Federated provides secure aggregation algorithms based on local secure multi-party computation (MPC). Secret noise addition is performed on local models before they are migrated to the cloud. On the premise of ensuring the model availability, the problem of privacy leakage and model theft in horizontal federated learning are solved.

Principles

Although the LDP technology can properly protect user data privacy, when there are a relatively small quantity of participating clients or a Gaussian noise amplitude is relatively large, the model accuracy is greatly affected. To meet both model protection and model convergence requirements, we provide the MPC-based secure aggregation solution.

In this training mode, assuming that the participating client set is U, for any client u and v, they negotiate a pair of random perturbations puv and pvu, which meet the following condition:

puv={pvu,uv0,u=v

Therefore, each client u adds the perturbation negotiated with other users to the original model weight xu before uploading the model to the server:

xencrypt=xu+vUpuv

Therefore, the Server aggregation result x is as follows:

x=uU(xu+vUpuv)=uUxu+uUvUpuv=uUxu

The preceding process describes only the main idea of the aggregation algorithm. The MPC-based aggregation solution is accuracy-lossless but increases the number of communication rounds. If you are interested in the specific steps of the algorithm, refer to the paper[1].

Usage

Cross device scenario

Enabling pairwise encryption training is simple. Just set the encrypt_type field to PW_ENCRYPT through yaml file when starting the cloud-side service.

In addition, most of the workers participating in the training are unstable edge computing nodes such as mobile phones, so the problems of dropping the line and secret key reconstruction should be considered. Related parameters are share_secrets_ratio, reconstruct_secrets_threshold, and cipher_time_window.

share_client_ratio indicates the client threshold decrease ratio of public key broadcast round, secret sharing round and secret reconstruction round. The value must be less than or equal to 1.

reconstruct_secrets_threshold indicates the number of secret shares required to reconstruct a secret. The value must be less than the number of clients that participate in updateModel (start_fl_job_threshold*update_model_ratio).

To ensure system security, the value of reconstruct_secrets_threshold must be greater than half of the number of federated learning clients when the server and client are not colluded. When the server and client are colluded, the value of reconstruct_secrets_threshold must be greater than two thirds of the number of federated learning clients.

cipher_time_window indicates the duration limit of each communication round for secure aggregation. It is used to ensure that the server can start a new round of iteration when some clients are offline.

Cross silo scenario

In cross silo scenario, you only need to set the encrypt_type field to PW_ENCRYPT through yaml file in the cloud-side startup script.

Different from cross silo scenario, all of the workers are stable computing nodes in cross silo scenario. You only need to set the parameter cipher_time_window.

References

[1] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, et al. Practical Secure Aggregationfor Privacy-Preserving Machine Learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and communications Security. 2017.