paraacha
talhaparacha.com
paraacha
@talhaparacha.com
Postdoc at Ruhr University Bochum. Research in network measurement and security. Experiences at Northeastern, TU Wien, Meta, Cloudflare, EPFL, Rutgers, NUST-SEECS
We also experiment with the Temperature parameter to control sampling strategy. We find that sampling in a conservative way makes more instances valid, at the expense of diversity in features. And that the other extreme adds too much randomness, which also hurts testing. 9/n
December 16, 2025 at 7:42 PM
An example of a useful TLS certificate generated by our pipeline is shown here, where the date is set to be June 31 2037 (June has 30 days!). This certificate is rejected by all TLS libraries except one (another similar discrepancy was due to the use of leap seconds). 8/n
December 16, 2025 at 7:42 PM
and (c) LLMs do not necessarily outperform RNNs in our experiments. We find the last part particularly interesting, given that RNNs have been available for over two decades and require substantially less resources. 7/n
December 16, 2025 at 7:42 PM
(b) several models outperform Transcert, the current state-of-the-art (with the main model used in our paper generating 30% more distinct discrepancies), 6/n
December 16, 2025 at 7:42 PM
We find that (a) our language models trigger significant number of unique discrepancies (26 out of a maximum possible of 30) -- a discrepancy is when a TLS library accepts a certificate, while others reject it, indicating a potential bug 5/n
December 16, 2025 at 7:42 PM
We train RNNs (small and medium sized) and GPTs (fine-tuned and trained-from-scratch) since it is unclear which approach is better for testing, in contrast to just learning (also highlighted by Godefroid et al. in Learn&Fuzz, see the snippet below for a short discussion). 4/n
December 16, 2025 at 7:42 PM
Our insight is that language models learn a representation for the textual data they are trained on, and that the learned representation is probabilistic and often imperfect, meaning a sampled instance can considerably break expectations (and may thus, help in testing). 3/n
December 16, 2025 at 7:42 PM
We train language models on datasets of real-world TLS certificates, to generate synthetic instances for use in differential testing. 2/n
December 16, 2025 at 7:42 PM
Super excited to share that I'll present our latest research at ICSE 2026. Our work (co-authors Kyle Posluns, @kevin.borgolte.me, @lindorfer.in, and @proffnes.discuss.systems.ap.brid.gy) explores the use of language models for software testing in TLS certificate validation logic. 1/n
December 16, 2025 at 7:42 PM