Horizontal FL-Local Differential Privacy Perturbation Training
During federated learning, user data is used only for local device training and does not need to be uploaded to the central server. This prevents personal data leakage. However, in the conventional federated learning framework, models are migrated to the cloud in plaintext. There is still a risk of indirect disclosure of user privacy. After obtaining the plaintext model uploaded by a user, the attacker can restore the user’s personal training data through attacks such as reconstruction and model inversion. As a result, user privacy is disclosed.
As a federated learning framework, MindSpore Federated provides secure aggregation algorithms based on local differential privacy (LDP). Noise addition is performed on local models before they are migrated to the cloud. On the premise of ensuring the model availability, the problem of privacy leakage in horizontal federated learning is solved.
Principles
Differential privacy is a mechanism for protecting user data privacy. Differential privacy is defined as follows:
For datasets
In horizontal federated learning, if the model weight matrix after local training on the client is
MindSpore Federated provides a LDP-based secure aggregation algorithm to prevent privacy data leakage when local models are migrated to the cloud.
The MindSpore Federated client generates a differential noise matrix
The MindSpore Federated client uploads the noise-added model
Usage
Local differential privacy training currently only supports cross device scenarios. Enabling differential privacy training is simple. You only need to set the encrypt_type
field to DP_ENCRYPT
via yaml when starting the cloud-side service.
In addition, to control the effect of privacy protection, three parameters are provided: dp_eps
, dp_delta
, and dp_norm_clip
. They are also set through the yaml file.
The valid value range of dp_eps
and dp_norm_clip
is greater than 0. The legal range of dp_delta
is 0<dp_delta
<1. In general, the smaller dp_eps
and dp_delta
are, the better the privacy protection will be, but the greater the impact on the convergence of the model. It is recommended that dp_delta
be taken as the inverse of the number of clients and dp_eps
be greater than 50.
dp_norm_clip
is the adjustment coefficient of the model weight before noise is added to the model weight by the LDP mechanism. It affects the convergence of the model. The recommended value ranges from 0.5 to 2.
References
[1] Ligeng Zhu, Zhijian Liu, and Song Han. Deep Leakage from Gradients. NeurIPS, 2019.