Lightnews — Scholar-powered news

Differential Privacy @differentialprivacy.org · 6h

Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma

http://arxiv.org/abs/2510.06719

Differential Privacy @differentialprivacy.org · 6h

Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency

Mohamed Seif, Antti Koskela, H. Vincent Poor, Andrea J. Goldsmith

http://arxiv.org/abs/2510.07136

Differential Privacy @differentialprivacy.org · 6h

Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng

http://arxiv.org/abs/2510.07304

Differential Privacy @differentialprivacy.org · 1d

DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping

Ruoxing Yang

http://arxiv.org/abs/2510.05288

Differential Privacy @differentialprivacy.org · 1d

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng

http://arxiv.org/abs/2510.05416

1

Differential Privacy @differentialprivacy.org · 1d

Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

Praneeth Vepakomma, Kaustubh Ponkshe

http://arxiv.org/abs/2510.05581

Differential Privacy @differentialprivacy.org · 1d

DP-SNP-TIHMM: Differentially Private, Time-Inhomogeneous Hidden Markov Models for Synthesizing Genome-Wide Association Datasets

Shadi Rahimian, Mario Fritz

http://arxiv.org/abs/2510.05777

$DP-SNP-TIHMM: Differentially Private, Time-Inhomogeneous Hidden Markov Models for Synthesizing Genome-Wide Association Datasets Shadi Rahimian, Mario Fritz http://arxiv.org/abs/2510.05777 Single nucleotide polymorphism (SNP) datasets are fundamental to genetic studies but pose significant privacy risks when shared. The correlation of SNPs with each other makes strong adversarial attacks such as masked-value reconstruction, kin, and membership inference attacks possible. Existing privacy-preserving approaches either apply differential privacy to statistical summaries of these datasets or offer complex methods that require post-processing and the usage of a publicly available dataset to suppress or selectively share SNPs. In this study, we introduce an innovative framework for generating synthetic SNP sequence datasets using samples derived from time-inhomogeneous hidden Markov models (TIHMMs). To preserve the privacy of the training data, we ensure that each SNP sequence contributes only a bounded influence during training, enabling strong differential privacy guarantees. Crucially, by operating on full SNP sequences and bounding their gradient contributions, our method directly addresses the privacy risks introduced by their inherent correlations. Through experiments conducted on the real-world 1000 Genomes dataset, we demonstrate the efficacy of our method using privacy budgets of $\varepsilon \in [1, 10]$ at $\delta=10^{-4}$. Notably, by allowing the transition models of the HMM to be dependent on the location in the sequence, we significantly enhance performance, enabling the synthetic datasets to closely replicate the statistical properties of non-private datasets. This framework facilitates the private sharing of genomic data while offering researchers exceptional flexibility and utility.$

Differential Privacy @differentialprivacy.org · 1d

The Five Safes as a Privacy Context

James Bailie, Ruobin Gong

http://arxiv.org/abs/2510.05803

$The Five Safes as a Privacy Context James Bailie, Ruobin Gong http://arxiv.org/abs/2510.05803 The Five Safes is a framework used by national statistical offices (NSO) for assessing and managing the disclosure risk of data sharing. This paper makes two points: Firstly, the Five Safes can be understood as a specialization of a broader concept $\unicode{x2013}$ contextual integrity $\unicode{x2013}$ to the situation of statistical dissemination by an NSO. We demonstrate this by mapping the five parameters of contextual integrity onto the five dimensions of the Five Safes. Secondly, the Five Safes contextualizes narrow, technical notions of privacy within a holistic risk assessment. We demonstrate this with the example of differential privacy (DP). This contextualization allows NSOs to place DP within their Five Safes toolkit while also guiding the design of DP implementations within the broader privacy context, as delineated by both their regulation and the relevant social norms.$

Differential Privacy @differentialprivacy.org · 1d

Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation

Kaixiang Zhang, Zhaojian Li, Wei Lin

http://arxiv.org/abs/2510.05959

Differential Privacy @differentialprivacy.org · 2d

Privacy Enhancement in Over-the-Air Federated Learning via Adaptive Receive Scaling

Faeze Moradi Kalarde, Ben Liang, Min Dong, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng

http://arxiv.org/abs/2510.03860

$Privacy Enhancement in Over-the-Air Federated Learning via Adaptive Receive Scaling Faeze Moradi Kalarde, Ben Liang, Min Dong, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng http://arxiv.org/abs/2510.03860 In Federated Learning (FL) with over-the-air aggregation, the quality of the signal received at the server critically depends on the receive scaling factors. While a larger scaling factor can reduce the effective noise power and improve training performance, it also compromises the privacy of devices by reducing uncertainty. In this work, we aim to adaptively design the receive scaling factors across training rounds to balance the trade-off between training convergence and privacy in an FL system under dynamic channel conditions. We formulate a stochastic optimization problem that minimizes the overall R\'enyi differential privacy (RDP) leakage over the entire training process, subject to a long-term constraint that ensures convergence of the global loss function. Our problem depends on unknown future information, and we observe that standard Lyapunov optimization is not applicable. Thus, we develop a new online algorithm, termed AdaScale, based on a sequence of novel per-round problems that can be solved efficiently. We further derive upper bounds on the dynamic regret and constraint violation of AdaSacle, establishing that it achieves diminishing dynamic regret in terms of time-averaged RDP leakage while ensuring convergence of FL training to a stationary point. Numerical experiments on canonical classification tasks show that our approach effectively reduces RDP and DP leakages compared with state-of-the-art benchmarks without compromising learning performance.$

Differential Privacy @differentialprivacy.org · 2d

Multi-Class Support Vector Machine with Differential Privacy

Jinseong Park, Yujin Choi, Jaewook Lee

http://arxiv.org/abs/2510.04027

1

Differential Privacy @differentialprivacy.org · 2d

DP-HYPE: Distributed Differentially Private Hyperparameter Search

Johannes Liebenow, Thorsten Peinemann, Esfandiar Mohammadi

http://arxiv.org/abs/2510.04902

Differential Privacy @differentialprivacy.org · 2d

Federated Computation of ROC and PR Curves

Xuefeng Xu, Graham Cormode

http://arxiv.org/abs/2510.04979

Differential Privacy @differentialprivacy.org · 3d

Differentially Private Wasserstein Barycenters

Anming Gu, Sasidhar Kunapuli, Mark Bun, Edward Chien, Kristjan Greenewald

http://arxiv.org/abs/2510.03021

Differential Privacy @differentialprivacy.org · 6d

Private Learning of Littlestone Classes, Revisited

Xin Lyu

http://arxiv.org/abs/2510.00076

$Private Learning of Littlestone Classes, Revisited Xin Lyu http://arxiv.org/abs/2510.00076 We consider online and PAC learning of Littlestone classes subject to the constraint of approximate differential privacy. Our main result is a private learner to online-learn a Littlestone class with a mistake bound of $\tilde{O}(d^{9.5}\cdot \log(T))$ in the realizable case, where $d$ denotes the Littlestone dimension and $T$ the time horizon. This is a doubly-exponential improvement over the state-of-the-art [GL'21] and comes polynomially close to the lower bound for this task. The advancement is made possible by a couple of ingredients. The first is a clean and refined interpretation of the ``irreducibility'' technique from the state-of-the-art private PAC-learner for Littlestone classes [GGKM'21]. Our new perspective also allows us to improve the PAC-learner of [GGKM'21] and give a sample complexity upper bound of $\widetilde{O}(\frac{d^5 \log(1/\delta\beta)}{\varepsilon \alpha})$ where $\alpha$ and $\beta$ denote the accuracy and confidence of the PAC learner, respectively. This improves over [GGKM'21] by factors of $\frac{d}{\alpha}$ and attains an optimal dependence on $\alpha$. Our algorithm uses a private sparse selection algorithm to \emph{sample} from a pool of strongly input-dependent candidates. However, unlike most previous uses of sparse selection algorithms, where one only cares about the utility of output, our algorithm requires understanding and manipulating the actual distribution from which an output is drawn. In the proof, we use a sparse version of the Exponential Mechanism from [GKM'21] which behaves nicely under our framework and is amenable to a very easy utility proof.$

2

Differential Privacy @differentialprivacy.org · 6d

Privately Estimating Black-Box Statistics

Günter F. Steinke, Thomas Steinke

http://arxiv.org/abs/2510.00322

4

Differential Privacy @differentialprivacy.org · 6d

Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings

Bo Li, Wei Wang, Peng Ye

http://arxiv.org/abs/2510.00574

$Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings Bo Li, Wei Wang, Peng Ye http://arxiv.org/abs/2510.00574 We revisit the problem of private online learning, in which a learner receives a sequence of $T$ data points and has to respond at each time-step a hypothesis. It is required that the entire stream of output hypotheses should satisfy differential privacy. Prior work of Golowich and Livni [2021] established that every concept class $\mathcal{H}$ with finite Littlestone dimension $d$ is privately online learnable in the realizable setting. In particular, they proposed an algorithm that achieves an $O_{d}(\log T)$ mistake bound against an oblivious adversary. However, their approach yields a suboptimal $\tilde{O}_{d}(\sqrt{T})$ bound against an adaptive adversary. In this work, we present a new algorithm with a mistake bound of $O_{d}(\log T)$ against an adaptive adversary, closing this gap. We further investigate the problem in the agnostic setting, which is more general than the realizable setting as it does not impose any assumptions on the data. We give an algorithm that obtains a sublinear regret of $\tilde{O}_d(\sqrt{T})$ for generic Littlestone classes, demonstrating that they are also privately online learnable in the agnostic setting.$

1

Differential Privacy @differentialprivacy.org · 6d

Differentially Private Learning of Exponential Distributions: Adaptive Algorithms and Tight Bounds

Bar Mahpud, Or Sheffet

http://arxiv.org/abs/2510.00790

$Differentially Private Learning of Exponential Distributions: Adaptive Algorithms and Tight Bounds Bar Mahpud, Or Sheffet http://arxiv.org/abs/2510.00790 We study the problem of learning exponential distributions under differential privacy. Given $n$ i.i.d.\ samples from $\mathrm{Exp}(\lambda)$, the goal is to privately estimate $\lambda$ so that the learned distribution is close in total variation distance to the truth. We present two complementary pure DP algorithms: one adapts the classical maximum likelihood estimator via clipping and Laplace noise, while the other leverages the fact that the $(1-1/e)$-quantile equals $1/\lambda$. Each method excels in a different regime, and we combine them into an adaptive best-of-both algorithm achieving near-optimal sample complexity for all $\lambda$. We further extend our approach to Pareto distributions via a logarithmic reduction, prove nearly matching lower bounds using packing and group privacy \cite{Karwa2017FiniteSD}, and show how approximate $(\epsilon,\delta)$-DP removes the need for externally supplied bounds. Together, these results give the first tight characterization of exponential distribution learning under DP and illustrate the power of adaptive strategies for heavy-tailed laws.$

Differential Privacy @differentialprivacy.org · 6d

Adaptive Federated Few-Shot Rare-Disease Diagnosis with Energy-Aware Secure Aggregation

Aueaphum Aueawatthanaphisut

http://arxiv.org/abs/2510.00976

Differential Privacy @differentialprivacy.org · 6d

Privacy Preserved Federated Learning with Attention-Based Aggregation for Biometric Recognition

Kassahun Azezew, Minyechil Alehegn, Tsega Asresa, Bitew Mekuria, Tizazu Bayh, Ayenew Kassie, Amsalu Tesema, Animut Embiyale

http://arxiv.org/abs/2510.01113

Differential Privacy @differentialprivacy.org · 6d

Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising

Ali Dadsetan, Frank Rudzicz

http://arxiv.org/abs/2510.01137

Differential Privacy @differentialprivacy.org · 6d

SoftAdaClip: A Smooth Clipping Strategy for Fair and Private Model Training

Dorsa Soleymani, Ali Dadsetan, Frank Rudzicz

http://arxiv.org/abs/2510.01447

Differential Privacy @differentialprivacy.org · 6d

Private and Fair Machine Learning: Revisiting the Disparate Impact of Differentially Private SGD

Lea Demelius, Dominik Kowald, Simone Kopeinik, Roman Kern, Andreas Trügler

http://arxiv.org/abs/2510.01744

Differential Privacy @differentialprivacy.org · 6d

Secure Multi-Modal Data Fusion in Federated Digital Health Systems via MCP

Aueaphum Aueawatthanaphisut

http://arxiv.org/abs/2510.01780

$Secure Multi-Modal Data Fusion in Federated Digital Health Systems via MCP Aueaphum Aueawatthanaphisut http://arxiv.org/abs/2510.01780 Secure and interoperable integration of heterogeneous medical data remains a grand challenge in digital health. Current federated learning (FL) frameworks offer privacy-preserving model training but lack standardized mechanisms to orchestrate multi-modal data fusion across distributed and resource-constrained environments. This study introduces a novel framework that leverages the Model Context Protocol (MCP) as an interoperability layer for secure, cross-agent communication in multi-modal federated healthcare systems. The proposed architecture unifies three pillars: (i) multi-modal feature alignment for clinical imaging, electronic medical records, and wearable IoT data; (ii) secure aggregation with differential privacy to protect patient-sensitive updates; and (iii) energy-aware scheduling to mitigate dropouts in mobile clients. By employing MCP as a schema-driven interface, the framework enables adaptive orchestration of AI agents and toolchains while ensuring compliance with privacy regulations. Experimental evaluation on benchmark datasets and pilot clinical cohorts demonstrates up to 9.8\% improvement in diagnostic accuracy compared with baseline FL, a 54\% reduction in client dropout rates, and clinically acceptable privacy--utility trade-offs. These results highlight MCP-enabled multi-modal fusion as a scalable and trustworthy pathway toward equitable, next-generation federated health infrastructures.$

Differential Privacy @differentialprivacy.org · 6d

Private Federated Multiclass Post-hoc Calibration

Samuel Maddock, Graham Cormode, Carsten Maple

http://arxiv.org/abs/2510.01987