Differential Privacy
@differentialprivacy.org
470 followers 0 following 850 posts
🤖 new arXiv preprints mentioning "differential privacy" or "differentially private" in the title/abstract/metadata + updates from https://differentialprivacy.org [Under construction.]
Posts Media Videos Starter Packs
differentialprivacy.org
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma

http://arxiv.org/abs/2510.06719
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma

http://arxiv.org/abs/2510.06719

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by
grounding them in external knowledge. However, its application in sensitive
domains is limited by privacy risks. Existing private RAG methods typically
rely on query-time differential privacy (DP), which requires repeated noise
injection and leads to accumulated privacy loss. To address this issue, we
propose DP-SynRAG, a framework that uses LLMs to generate differentially
private synthetic RAG databases. Unlike prior methods, the synthetic text can
be reused once created, thereby avoiding repeated noise injection and
additional privacy costs. To preserve essential information for downstream RAG
tasks, DP-SynRAG extends private prediction, which instructs LLMs to generate
text that mimics subsampled database records in a DP manner. Experiments show
that DP-SynRAG achieves superior performanec to the state-of-the-art private
RAG systems while maintaining a fixed privacy budget, offering a scalable
solution for privacy-preserving RAG.
differentialprivacy.org
Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency

Mohamed Seif, Antti Koskela, H. Vincent Poor, Andrea J. Goldsmith

http://arxiv.org/abs/2510.07136
Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency

Mohamed Seif, Antti Koskela, H. Vincent Poor, Andrea J. Goldsmith

http://arxiv.org/abs/2510.07136

We study the problem of spectral graph clustering under edge differential
privacy (DP). Specifically, we develop three mechanisms: (i) graph perturbation
via randomized edge flipping combined with adjacency matrix shuffling, which
enforces edge privacy while preserving key spectral properties of the graph.
Importantly, shuffling considerably amplifies the guarantees: whereas flipping
edges with a fixed probability alone provides only a constant epsilon edge DP
guarantee as the number of nodes grows, the shuffled mechanism achieves
(epsilon, delta) edge DP with parameters that tend to zero as the number of
nodes increase; (ii) private graph projection with additive Gaussian noise in a
lower-dimensional space to reduce dimensionality and computational complexity;
and (iii) a noisy power iteration method that distributes Gaussian noise across
iterations to ensure edge DP while maintaining convergence. Our analysis
provides rigorous privacy guarantees and a precise characterization of the
misclassification error rate. Experiments on synthetic and real-world networks
validate our theoretical analysis and illustrate the practical privacy-utility
trade-offs.
differentialprivacy.org
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng

http://arxiv.org/abs/2510.07304
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng

http://arxiv.org/abs/2510.07304

Machine learning (ML) models memorize and leak training data, causing serious
privacy issues to data owners. Training algorithms with differential privacy
(DP), such as DP-SGD, have been gaining attention as a solution. However,
DP-SGD adds a noise at each training iteration, which degrades the accuracy of
the trained model. To improve accuracy, a new family of approaches adds
carefully designed correlated noises, so that noises cancel out each other
across iterations. We performed an extensive characterization study of these
new mechanisms, for the first time to the best of our knowledge, and show they
incur non-negligible overheads when the model is large or uses large embedding
tables. Motivated by the analysis, we propose Cocoon, a hardware-software
co-designed framework for efficient training with correlated noises. Cocoon
accelerates models with embedding tables through pre-computing and storing
correlated noises in a coalesced format (Cocoon-Emb), and supports large models
through a custom near-memory processing device (Cocoon-NMP). On a real system
with an FPGA-based NMP device prototype, Cocoon improves the performance by
2.33-10.82x(Cocoon-Emb) and 1.55-3.06x (Cocoon-NMP).
differentialprivacy.org
DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping

Ruoxing Yang

http://arxiv.org/abs/2510.05288
DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping

Ruoxing Yang

http://arxiv.org/abs/2510.05288

Large language models (LLMs) such as ChatGPT have evolved into powerful and
ubiquitous tools. Fine-tuning on small datasets allows LLMs to acquire
specialized skills for specific tasks efficiently. Although LLMs provide great
utility in both general and task-specific use cases, they are limited by two
security-related concerns. First, traditional LLM hardware requirements make
them infeasible to run locally on consumer-grade devices. A remote network
connection with the LLM provider's server is usually required, making the
system vulnerable to network attacks. Second, fine-tuning an LLM for a
sensitive task may involve sensitive data. Non-private fine-tuning algorithms
produce models vulnerable to training data reproduction attacks. Our work
addresses these security concerns by enhancing differentially private
optimization algorithms and applying them to fine-tune localizable language
models. We introduce adaptable gradient clipping along with other engineering
enhancements to the standard DP-Adam optimizer to create DP-Adam-AC. We use our
optimizer to fine-tune examples of two localizable LLM designs, small language
model (Qwen2.5-0.5B) and 1.58 bit quantization (Bitnet-b1.58-2B). We
demonstrate promising improvements in loss through experimentation with two
synthetic datasets.
differentialprivacy.org
Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng

http://arxiv.org/abs/2510.05416
Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng

http://arxiv.org/abs/2510.05416

Differentially private stochastic gradient descent (DP-SGD) offers the
promise of training deep learning models while mitigating many privacy risks.
However, there is currently a large accuracy gap between DP-SGD and normal SGD
training. This has resulted in different lines of research investigating
orthogonal ways of improving privacy-preserving training. One such line of
work, known as DP-MF, correlates the privacy noise across different iterations
of stochastic gradient descent -- allowing later iterations to cancel out some
of the noise added to earlier iterations. In this paper, we study how to
improve this noise correlation. We propose a technique called NoiseCurve that
uses model curvature, estimated from public unlabeled data, to improve the
quality of this cross-iteration noise correlation. Our experiments on various
datasets, models, and privacy parameters show that the noise correlations
computed by NoiseCurve offer consistent and significant improvements in
accuracy over the correlation scheme used by DP-MF.
differentialprivacy.org
Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

Praneeth Vepakomma, Kaustubh Ponkshe

http://arxiv.org/abs/2510.05581
Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

Praneeth Vepakomma, Kaustubh Ponkshe

http://arxiv.org/abs/2510.05581

Traditional collaborative learning approaches are based on sharing of model
weights between clients and a server. However, there are advantages to resource
efficiency through schemes based on sharing of embeddings (activations) created
from the data. Several differentially private methods were developed for
sharing of weights while such mechanisms do not exist so far for sharing of
embeddings. We propose Ours to learn a privacy encoding network in conjunction
with a small utility generation network such that the final embeddings
generated from it are equipped with formal differential privacy guarantees.
These privatized embeddings are then shared with a more powerful server, that
learns a post-processing that results in a higher accuracy for machine learning
tasks. We show that our co-design of collaborative and private learning results
in requiring only one round of privatized communication and lesser compute on
the client than traditional methods. The privatized embeddings that we share
from the client are agnostic to the type of model (deep learning, random
forests or XGBoost) used on the server in order to process these activations to
complete a task.
differentialprivacy.org
DP-SNP-TIHMM: Differentially Private, Time-Inhomogeneous Hidden Markov Models for Synthesizing Genome-Wide Association Datasets

Shadi Rahimian, Mario Fritz

http://arxiv.org/abs/2510.05777
DP-SNP-TIHMM: Differentially Private, Time-Inhomogeneous Hidden Markov Models for Synthesizing Genome-Wide Association Datasets

Shadi Rahimian, Mario Fritz

http://arxiv.org/abs/2510.05777

Single nucleotide polymorphism (SNP) datasets are fundamental to genetic
studies but pose significant privacy risks when shared. The correlation of SNPs
with each other makes strong adversarial attacks such as masked-value
reconstruction, kin, and membership inference attacks possible. Existing
privacy-preserving approaches either apply differential privacy to statistical
summaries of these datasets or offer complex methods that require
post-processing and the usage of a publicly available dataset to suppress or
selectively share SNPs.
  In this study, we introduce an innovative framework for generating synthetic
SNP sequence datasets using samples derived from time-inhomogeneous hidden
Markov models (TIHMMs). To preserve the privacy of the training data, we ensure
that each SNP sequence contributes only a bounded influence during training,
enabling strong differential privacy guarantees. Crucially, by operating on
full SNP sequences and bounding their gradient contributions, our method
directly addresses the privacy risks introduced by their inherent correlations.
  Through experiments conducted on the real-world 1000 Genomes dataset, we
demonstrate the efficacy of our method using privacy budgets of $\varepsilon
\in [1, 10]$ at $\delta=10^{-4}$. Notably, by allowing the transition models of
the HMM to be dependent on the location in the sequence, we significantly
enhance performance, enabling the synthetic datasets to closely replicate the
statistical properties of non-private datasets. This framework facilitates the
private sharing of genomic data while offering researchers exceptional
flexibility and utility.
differentialprivacy.org
The Five Safes as a Privacy Context

James Bailie, Ruobin Gong

http://arxiv.org/abs/2510.05803
The Five Safes as a Privacy Context

James Bailie, Ruobin Gong

http://arxiv.org/abs/2510.05803

The Five Safes is a framework used by national statistical offices (NSO) for
assessing and managing the disclosure risk of data sharing. This paper makes
two points: Firstly, the Five Safes can be understood as a specialization of a
broader concept $\unicode{x2013}$ contextual integrity $\unicode{x2013}$ to the
situation of statistical dissemination by an NSO. We demonstrate this by
mapping the five parameters of contextual integrity onto the five dimensions of
the Five Safes. Secondly, the Five Safes contextualizes narrow, technical
notions of privacy within a holistic risk assessment. We demonstrate this with
the example of differential privacy (DP). This contextualization allows NSOs to
place DP within their Five Safes toolkit while also guiding the design of DP
implementations within the broader privacy context, as delineated by both their
regulation and the relevant social norms.
differentialprivacy.org
Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation

Kaixiang Zhang, Zhaojian Li, Wei Lin

http://arxiv.org/abs/2510.05959
Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation

Kaixiang Zhang, Zhaojian Li, Wei Lin

http://arxiv.org/abs/2510.05959

Distributed control of connected and automated vehicles has attracted
considerable interest for its potential to improve traffic efficiency and
safety. However, such control schemes require sharing privacy-sensitive vehicle
data, which introduces risks of information leakage and potential malicious
activities. This paper investigates the stability and privacy-preserving
properties of distributed platoon control under two types of quantizers:
deterministic and probabilistic. For deterministic quantization, we show that
the resulting control strategy ensures the system errors remain uniformly
ultimately bounded. Moreover, in the absence of auxiliary information, an
eavesdropper cannot uniquely infer sensitive vehicle states. In contrast, the
use of probabilistic quantization enables asymptotic convergence of the vehicle
platoon in expectation with bounded variance. Importantly, probabilistic
quantizers can satisfy differential privacy guarantees, thereby preserving
privacy even when the eavesdropper possesses arbitrary auxiliary information.
We further analyze the trade-off between control performance and privacy by
formulating an optimization problem that characterizes the impact of the
quantization step on both metrics. Numerical simulations are provided to
illustrate the performance differences between the two quantization strategies.
differentialprivacy.org
Privacy Enhancement in Over-the-Air Federated Learning via Adaptive Receive Scaling

Faeze Moradi Kalarde, Ben Liang, Min Dong, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng

http://arxiv.org/abs/2510.03860
Privacy Enhancement in Over-the-Air Federated Learning via Adaptive Receive Scaling

Faeze Moradi Kalarde, Ben Liang, Min Dong, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng

http://arxiv.org/abs/2510.03860

In Federated Learning (FL) with over-the-air aggregation, the quality of the
signal received at the server critically depends on the receive scaling
factors. While a larger scaling factor can reduce the effective noise power and
improve training performance, it also compromises the privacy of devices by
reducing uncertainty. In this work, we aim to adaptively design the receive
scaling factors across training rounds to balance the trade-off between
training convergence and privacy in an FL system under dynamic channel
conditions. We formulate a stochastic optimization problem that minimizes the
overall R\'enyi differential privacy (RDP) leakage over the entire training
process, subject to a long-term constraint that ensures convergence of the
global loss function. Our problem depends on unknown future information, and we
observe that standard Lyapunov optimization is not applicable. Thus, we develop
a new online algorithm, termed AdaScale, based on a sequence of novel per-round
problems that can be solved efficiently. We further derive upper bounds on the
dynamic regret and constraint violation of AdaSacle, establishing that it
achieves diminishing dynamic regret in terms of time-averaged RDP leakage while
ensuring convergence of FL training to a stationary point. Numerical
experiments on canonical classification tasks show that our approach
effectively reduces RDP and DP leakages compared with state-of-the-art
benchmarks without compromising learning performance.
differentialprivacy.org
Multi-Class Support Vector Machine with Differential Privacy

Jinseong Park, Yujin Choi, Jaewook Lee

http://arxiv.org/abs/2510.04027
Multi-Class Support Vector Machine with Differential Privacy

Jinseong Park, Yujin Choi, Jaewook Lee

http://arxiv.org/abs/2510.04027

With the increasing need to safeguard data privacy in machine learning
models, differential privacy (DP) is one of the major frameworks to build
privacy-preserving models. Support Vector Machines (SVMs) are widely used
traditional machine learning models due to their robust margin guarantees and
strong empirical performance in binary classification. However, applying DP to
multi-class SVMs is inadequate, as the standard one-versus-rest (OvR) and
one-versus-one (OvO) approaches repeatedly query each data sample when building
multiple binary classifiers, thus consuming the privacy budget proportionally
to the number of classes. To overcome this limitation, we explore all-in-one
SVM approaches for DP, which access each data sample only once to construct
multi-class SVM boundaries with margin maximization properties. We propose a
novel differentially Private Multi-class SVM (PMSVM) with weight and gradient
perturbation methods, providing rigorous sensitivity and convergence analyses
to ensure DP in all-in-one SVMs. Empirical results demonstrate that our
approach surpasses existing DP-SVM methods in multi-class scenarios.
differentialprivacy.org
DP-HYPE: Distributed Differentially Private Hyperparameter Search

Johannes Liebenow, Thorsten Peinemann, Esfandiar Mohammadi

http://arxiv.org/abs/2510.04902
DP-HYPE: Distributed Differentially Private Hyperparameter Search

Johannes Liebenow, Thorsten Peinemann, Esfandiar Mohammadi

http://arxiv.org/abs/2510.04902

The tuning of hyperparameters in distributed machine learning can
substantially impact model performance. When the hyperparameters are tuned on
sensitive data, privacy becomes an important challenge and to this end,
differential privacy has emerged as the de facto standard for provable privacy.
A standard setting when performing distributed learning tasks is that clients
agree on a shared setup, i.e., find a compromise from a set of hyperparameters,
like the learning rate of the model to be trained. Yet, prior work on
differentially private hyperparameter tuning either uses computationally
expensive cryptographic protocols, determines hyperparameters separately for
each client, or applies differential privacy locally, which can lead to
undesirable utility-privacy trade-offs.
  In this work, we present our algorithm DP-HYPE, which performs a distributed
and privacy-preserving hyperparameter search by conducting a distributed voting
based on local hyperparameter evaluations of clients. In this way, DP-HYPE
selects hyperparameters that lead to a compromise supported by the majority of
clients, while maintaining scalability and independence from specific learning
tasks. We prove that DP-HYPE preserves the strong notion of differential
privacy called client-level differential privacy and, importantly, show that
its privacy guarantees do not depend on the number of hyperparameters. We also
provide bounds on its utility guarantees, that is, the probability of reaching
a compromise, and implement DP-HYPE as a submodule in the popular Flower
framework for distributed machine learning. In addition, we evaluate
performance on multiple benchmark data sets in iid as well as multiple non-iid
settings and demonstrate high utility of DP-HYPE even under small privacy
budgets.
differentialprivacy.org
Federated Computation of ROC and PR Curves

Xuefeng Xu, Graham Cormode

http://arxiv.org/abs/2510.04979
Federated Computation of ROC and PR Curves

Xuefeng Xu, Graham Cormode

http://arxiv.org/abs/2510.04979

Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves are
fundamental tools for evaluating machine learning classifiers, offering
detailed insights into the trade-offs between true positive rate vs. false
positive rate (ROC) or precision vs. recall (PR). However, in Federated
Learning (FL) scenarios, where data is distributed across multiple clients,
computing these curves is challenging due to privacy and communication
constraints. Specifically, the server cannot access raw prediction scores and
class labels, which are used to compute the ROC and PR curves in a centralized
setting. In this paper, we propose a novel method for approximating ROC and PR
curves in a federated setting by estimating quantiles of the prediction score
distribution under distributed differential privacy. We provide theoretical
bounds on the Area Error (AE) between the true and estimated curves,
demonstrating the trade-offs between approximation accuracy, privacy, and
communication cost. Empirical results on real-world datasets demonstrate that
our method achieves high approximation accuracy with minimal communication and
strong privacy guarantees, making it practical for privacy-preserving model
evaluation in federated systems.
differentialprivacy.org
Differentially Private Wasserstein Barycenters

Anming Gu, Sasidhar Kunapuli, Mark Bun, Edward Chien, Kristjan Greenewald

http://arxiv.org/abs/2510.03021
Differentially Private Wasserstein Barycenters

Anming Gu, Sasidhar Kunapuli, Mark Bun, Edward Chien, Kristjan Greenewald

http://arxiv.org/abs/2510.03021

The Wasserstein barycenter is defined as the mean of a set of probability
measures under the optimal transport metric, and has numerous applications
spanning machine learning, statistics, and computer graphics. In practice these
input measures are empirical distributions built from sensitive datasets,
motivating a differentially private (DP) treatment. We present, to our
knowledge, the first algorithms for computing Wasserstein barycenters under
differential privacy. Empirically, on synthetic data, MNIST, and large-scale
U.S. population datasets, our methods produce high-quality private barycenters
with strong accuracy-privacy tradeoffs.
differentialprivacy.org
Private Learning of Littlestone Classes, Revisited

Xin Lyu

http://arxiv.org/abs/2510.00076
Private Learning of Littlestone Classes, Revisited

Xin Lyu

http://arxiv.org/abs/2510.00076

We consider online and PAC learning of Littlestone classes subject to the
constraint of approximate differential privacy. Our main result is a private
learner to online-learn a Littlestone class with a mistake bound of
$\tilde{O}(d^{9.5}\cdot \log(T))$ in the realizable case, where $d$ denotes the
Littlestone dimension and $T$ the time horizon. This is a doubly-exponential
improvement over the state-of-the-art [GL'21] and comes polynomially close to
the lower bound for this task.
  The advancement is made possible by a couple of ingredients. The first is a
clean and refined interpretation of the ``irreducibility'' technique from the
state-of-the-art private PAC-learner for Littlestone classes [GGKM'21]. Our new
perspective also allows us to improve the PAC-learner of [GGKM'21] and give a
sample complexity upper bound of $\widetilde{O}(\frac{d^5
\log(1/\delta\beta)}{\varepsilon \alpha})$ where $\alpha$ and $\beta$ denote
the accuracy and confidence of the PAC learner, respectively. This improves
over [GGKM'21] by factors of $\frac{d}{\alpha}$ and attains an optimal
dependence on $\alpha$.
  Our algorithm uses a private sparse selection algorithm to \emph{sample} from
a pool of strongly input-dependent candidates. However, unlike most previous
uses of sparse selection algorithms, where one only cares about the utility of
output, our algorithm requires understanding and manipulating the actual
distribution from which an output is drawn. In the proof, we use a sparse
version of the Exponential Mechanism from [GKM'21] which behaves nicely under
our framework and is amenable to a very easy utility proof.
differentialprivacy.org
Privately Estimating Black-Box Statistics

Günter F. Steinke, Thomas Steinke

http://arxiv.org/abs/2510.00322
Privately Estimating Black-Box Statistics

Günter F. Steinke, Thomas Steinke

http://arxiv.org/abs/2510.00322

Standard techniques for differentially private estimation, such as Laplace or
Gaussian noise addition, require guaranteed bounds on the sensitivity of the
estimator in question. But such sensitivity bounds are often large or simply
unknown. Thus we seek differentially private methods that can be applied to
arbitrary black-box functions. A handful of such techniques exist, but all are
either inefficient in their use of data or require evaluating the function on
exponentially many inputs. In this work we present a scheme that trades off
between statistical efficiency (i.e., how much data is needed) and oracle
efficiency (i.e., the number of evaluations). We also present lower bounds
showing the near-optimality of our scheme.
differentialprivacy.org
Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings

Bo Li, Wei Wang, Peng Ye

http://arxiv.org/abs/2510.00574
Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings

Bo Li, Wei Wang, Peng Ye

http://arxiv.org/abs/2510.00574

We revisit the problem of private online learning, in which a learner
receives a sequence of $T$ data points and has to respond at each time-step a
hypothesis. It is required that the entire stream of output hypotheses should
satisfy differential privacy. Prior work of Golowich and Livni [2021]
established that every concept class $\mathcal{H}$ with finite Littlestone
dimension $d$ is privately online learnable in the realizable setting. In
particular, they proposed an algorithm that achieves an $O_{d}(\log T)$ mistake
bound against an oblivious adversary. However, their approach yields a
suboptimal $\tilde{O}_{d}(\sqrt{T})$ bound against an adaptive adversary. In
this work, we present a new algorithm with a mistake bound of $O_{d}(\log T)$
against an adaptive adversary, closing this gap. We further investigate the
problem in the agnostic setting, which is more general than the realizable
setting as it does not impose any assumptions on the data. We give an algorithm
that obtains a sublinear regret of $\tilde{O}_d(\sqrt{T})$ for generic
Littlestone classes, demonstrating that they are also privately online
learnable in the agnostic setting.
differentialprivacy.org
Differentially Private Learning of Exponential Distributions: Adaptive Algorithms and Tight Bounds

Bar Mahpud, Or Sheffet

http://arxiv.org/abs/2510.00790
Differentially Private Learning of Exponential Distributions: Adaptive Algorithms and Tight Bounds

Bar Mahpud, Or Sheffet

http://arxiv.org/abs/2510.00790

We study the problem of learning exponential distributions under differential
privacy. Given $n$ i.i.d.\ samples from $\mathrm{Exp}(\lambda)$, the goal is to
privately estimate $\lambda$ so that the learned distribution is close in total
variation distance to the truth. We present two complementary pure DP
algorithms: one adapts the classical maximum likelihood estimator via clipping
and Laplace noise, while the other leverages the fact that the
$(1-1/e)$-quantile equals $1/\lambda$. Each method excels in a different
regime, and we combine them into an adaptive best-of-both algorithm achieving
near-optimal sample complexity for all $\lambda$. We further extend our
approach to Pareto distributions via a logarithmic reduction, prove nearly
matching lower bounds using packing and group privacy \cite{Karwa2017FiniteSD},
and show how approximate $(\epsilon,\delta)$-DP removes the need for externally
supplied bounds. Together, these results give the first tight characterization
of exponential distribution learning under DP and illustrate the power of
adaptive strategies for heavy-tailed laws.
differentialprivacy.org
Adaptive Federated Few-Shot Rare-Disease Diagnosis with Energy-Aware Secure Aggregation

Aueaphum Aueawatthanaphisut

http://arxiv.org/abs/2510.00976
Adaptive Federated Few-Shot Rare-Disease Diagnosis with Energy-Aware Secure Aggregation

Aueaphum Aueawatthanaphisut

http://arxiv.org/abs/2510.00976

Rare-disease diagnosis remains one of the most pressing challenges in digital
health, hindered by extreme data scarcity, privacy concerns, and the limited
resources of edge devices. This paper proposes the Adaptive Federated Few-Shot
Rare-Disease Diagnosis (AFFR) framework, which integrates three pillars: (i)
few-shot federated optimization with meta-learning to generalize from limited
patient samples, (ii) energy-aware client scheduling to mitigate device
dropouts and ensure balanced participation, and (iii) secure aggregation with
calibrated differential privacy to safeguard sensitive model updates. Unlike
prior work that addresses these aspects in isolation, AFFR unifies them into a
modular pipeline deployable on real-world clinical networks. Experimental
evaluation on simulated rare-disease detection datasets demonstrates up to 10%
improvement in accuracy compared with baseline FL, while reducing client
dropouts by over 50% without degrading convergence. Furthermore,
privacy-utility trade-offs remain within clinically acceptable bounds. These
findings highlight AFFR as a practical pathway for equitable and trustworthy
federated diagnosis of rare conditions.
differentialprivacy.org
Privacy Preserved Federated Learning with Attention-Based Aggregation for Biometric Recognition

Kassahun Azezew, Minyechil Alehegn, Tsega Asresa, Bitew Mekuria, Tizazu Bayh, Ayenew Kassie, Amsalu Tesema, Animut Embiyale

http://arxiv.org/abs/2510.01113
Privacy Preserved Federated Learning with Attention-Based Aggregation for Biometric Recognition

Kassahun Azezew, Minyechil Alehegn, Tsega Asresa, Bitew Mekuria, Tizazu Bayh, Ayenew Kassie, Amsalu Tesema, Animut Embiyale

http://arxiv.org/abs/2510.01113

Because biometric data is sensitive, centralized training poses a privacy
risk, even though biometric recognition is essential for contemporary
applications. Federated learning (FL), which permits decentralized training,
provides a privacy-preserving substitute. Conventional FL, however, has trouble
with interpretability and heterogeneous data (non-IID). In order to handle
non-IID biometric data, this framework adds an attention mechanism at the
central server that weights local model updates according to their
significance. Differential privacy and secure update protocols safeguard data
while preserving accuracy. The A3-FL framework is evaluated in this study using
FVC2004 fingerprint data, with each client's features extracted using a Siamese
Convolutional Neural Network (Siamese-CNN). By dynamically modifying client
contributions, the attention mechanism increases the accuracy of the global
model.The accuracy, convergence speed, and robustness of the A3-FL framework
are superior to those of standard FL (FedAvg) and static baselines, according
to experimental evaluations using fingerprint data (FVC2004). The accuracy of
the attention-based approach was 0.8413, while FedAvg, Local-only, and
Centralized approaches were 0.8164, 0.7664, and 0.7997, respectively. Accuracy
stayed high at 0.8330 even with differential privacy. A scalable and
privacy-sensitive biometric system for secure and effective recognition in
dispersed environments is presented in this work.
differentialprivacy.org
Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising

Ali Dadsetan, Frank Rudzicz

http://arxiv.org/abs/2510.01137
Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising

Ali Dadsetan, Frank Rudzicz

http://arxiv.org/abs/2510.01137

We address the challenge of sample efficiency in differentially private
fine-tuning of large language models (LLMs) using DP-SGD. While DP-SGD provides
strong privacy guarantees, the added noise significantly increases the entropy
of gradient matrices, disrupting their low-rank structure and slowing
optimization. We propose a post-processing algorithm that leverages random
matrix theory to denoise gradients, restore low-rank structure, and improve
alignment with the original signal. Applied to DP-SGD fine-tuning of RoBERTa on
GLUE tasks, our method improves sample efficiency compared to state-of-the-art
approaches, substantially reducing training time when optimal performance is
not required. This work demonstrates that matrix recovery techniques can
enhance the utility of private language model training without compromising
privacy guarantees.
differentialprivacy.org
SoftAdaClip: A Smooth Clipping Strategy for Fair and Private Model Training

Dorsa Soleymani, Ali Dadsetan, Frank Rudzicz

http://arxiv.org/abs/2510.01447
SoftAdaClip: A Smooth Clipping Strategy for Fair and Private Model Training

Dorsa Soleymani, Ali Dadsetan, Frank Rudzicz

http://arxiv.org/abs/2510.01447

Differential privacy (DP) provides strong protection for sensitive data, but
often reduces model performance and fairness, especially for underrepresented
groups. One major reason is gradient clipping in DP-SGD, which can
disproportionately suppress learning signals for minority subpopulations.
Although adaptive clipping can enhance utility, it still relies on uniform hard
clipping, which may restrict fairness. To address this, we introduce
SoftAdaClip, a differentially private training method that replaces hard
clipping with a smooth, tanh-based transformation to preserve relative gradient
magnitudes while bounding sensitivity. We evaluate SoftAdaClip on various
datasets, including MIMIC-III (clinical text), GOSSIS-eICU (structured
healthcare), and Adult Income (tabular data). Our results show that SoftAdaClip
reduces subgroup disparities by up to 87% compared to DP-SGD and up to 48%
compared to Adaptive-DPSGD, and these reductions in subgroup disparities are
statistically significant. These findings underscore the importance of
integrating smooth transformations with adaptive mechanisms to achieve fair and
private model training.
differentialprivacy.org
Private and Fair Machine Learning: Revisiting the Disparate Impact of Differentially Private SGD

Lea Demelius, Dominik Kowald, Simone Kopeinik, Roman Kern, Andreas Trügler

http://arxiv.org/abs/2510.01744
Private and Fair Machine Learning: Revisiting the Disparate Impact of Differentially Private SGD

Lea Demelius, Dominik Kowald, Simone Kopeinik, Roman Kern, Andreas Trügler

http://arxiv.org/abs/2510.01744

Differential privacy (DP) is a prominent method for protecting information
about individuals during data analysis. Training neural networks with
differentially private stochastic gradient descent (DPSGD) influences the
model's learning dynamics and, consequently, its output. This can affect the
model's performance and fairness. While the majority of studies on the topic
report a negative impact on fairness, it has recently been suggested that
fairness levels comparable to non-private models can be achieved by optimizing
hyperparameters for performance directly on differentially private models
(rather than re-using hyperparameters from non-private models, as is common
practice). In this work, we analyze the generalizability of this claim by 1)
comparing the disparate impact of DPSGD on different performance metrics, and
2) analyzing it over a wide range of hyperparameter settings. We highlight that
a disparate impact on one metric does not necessarily imply a disparate impact
on another. Most importantly, we show that while optimizing hyperparameters
directly on differentially private models does not mitigate the disparate
impact of DPSGD reliably, it can still lead to improved utility-fairness
trade-offs compared to re-using hyperparameters from non-private models. We
stress, however, that any form of hyperparameter tuning entails additional
privacy leakage, calling for careful considerations of how to balance privacy,
utility and fairness. Finally, we extend our analyses to DPSGD-Global-Adapt, a
variant of DPSGD designed to mitigate the disparate impact on accuracy, and
conclude that this alternative may not be a robust solution with respect to
hyperparameter choice.
differentialprivacy.org
Secure Multi-Modal Data Fusion in Federated Digital Health Systems via MCP

Aueaphum Aueawatthanaphisut

http://arxiv.org/abs/2510.01780
Secure Multi-Modal Data Fusion in Federated Digital Health Systems via MCP

Aueaphum Aueawatthanaphisut

http://arxiv.org/abs/2510.01780

Secure and interoperable integration of heterogeneous medical data remains a
grand challenge in digital health. Current federated learning (FL) frameworks
offer privacy-preserving model training but lack standardized mechanisms to
orchestrate multi-modal data fusion across distributed and resource-constrained
environments. This study introduces a novel framework that leverages the Model
Context Protocol (MCP) as an interoperability layer for secure, cross-agent
communication in multi-modal federated healthcare systems. The proposed
architecture unifies three pillars: (i) multi-modal feature alignment for
clinical imaging, electronic medical records, and wearable IoT data; (ii)
secure aggregation with differential privacy to protect patient-sensitive
updates; and (iii) energy-aware scheduling to mitigate dropouts in mobile
clients. By employing MCP as a schema-driven interface, the framework enables
adaptive orchestration of AI agents and toolchains while ensuring compliance
with privacy regulations. Experimental evaluation on benchmark datasets and
pilot clinical cohorts demonstrates up to 9.8\% improvement in diagnostic
accuracy compared with baseline FL, a 54\% reduction in client dropout rates,
and clinically acceptable privacy--utility trade-offs. These results highlight
MCP-enabled multi-modal fusion as a scalable and trustworthy pathway toward
equitable, next-generation federated health infrastructures.
differentialprivacy.org
Private Federated Multiclass Post-hoc Calibration

Samuel Maddock, Graham Cormode, Carsten Maple

http://arxiv.org/abs/2510.01987
Private Federated Multiclass Post-hoc Calibration

Samuel Maddock, Graham Cormode, Carsten Maple

http://arxiv.org/abs/2510.01987

Calibrating machine learning models so that predicted probabilities better
reflect the true outcome frequencies is crucial for reliable decision-making
across many applications. In Federated Learning (FL), the goal is to train a
global model on data which is distributed across multiple clients and cannot be
centralized due to privacy concerns. FL is applied in key areas such as
healthcare and finance where calibration is strongly required, yet federated
private calibration has been largely overlooked. This work introduces the
integration of post-hoc model calibration techniques within FL. Specifically,
we transfer traditional centralized calibration methods such as histogram
binning and temperature scaling into federated environments and define new
methods to operate them under strong client heterogeneity. We study (1) a
federated setting and (2) a user-level Differential Privacy (DP) setting and
demonstrate how both federation and DP impacts calibration accuracy. We propose
strategies to mitigate degradation commonly observed under heterogeneity and
our findings highlight that our federated temperature scaling works best for
DP-FL whereas our weighted binning approach is best when DP is not required.