Underfox
@underfox3.bsky.social
760 followers 17 following 670 posts
Physicist, Telecom Engineering lover, HPC Enthusiast. Prog Rock/Metal fan. --- Independent tech analyst focused on semiconductors, patent analysis and emerging technologies.
Posts Media Videos Starter Packs
Pinned
underfox3.bsky.social
I would like to invite all my followers to subscribe on my Substack. Everyone who loves technology and is hungry for new discoveries is welcome. Also, feel free to contact me!

Profile Link: underfox3.substack.com
Underfox3 | Substack
Independent tech analyst focused on semiconductors, patent analysis and emerging technologies. Click to read Underfox3, a Substack publication. Launched a year ago.
underfox3.substack.com
underfox3.bsky.social
Each query dynamically selects a few informational blocks, as well as mandatory anchors, with causal routing that avoids loop closures. The model is able to allocate computation to relevant histories, preserving identities, actions, and scenes across minutes of content.
underfox3.bsky.social
Researchers have proposed Mixture of Contexts, a long video generation framework that learns to route each query to the most relevant segments of the video sequence, instead of relying on uniform or static sparse attention or a fixed selection strategy.

arxiv.org/pdf/2508.21058
underfox3.bsky.social
This work could pave the way not only for automatic optimizations for ML and science kernels but also for the development of LLM-optimized AMD GPU drivers. Congrats to the authors for this excellent work.
underfox3.bsky.social
SwizzlePerf is the first work that adds rich context from a suite of profilers into the context to directly reflect cache-locality improvements and improve LLM optimization.
underfox3.bsky.social
This isn't the first time AMD researchers have ventured into AI-powered GPU optimization. The biggest and most important difference is that this work takes hardware-awareness into account.
underfox3.bsky.social
By grouping cooperative blocks into a single XCD, the proposed workflow reduces off-chip traffic and stabilizes residency in the disaggregated caches, reducing the average energy per instruction even in kernels whose execution time is dominated by arithmetic throughput.
underfox3.bsky.social
While the primary focus of the presented work was performance, it is clear that the same remapping will also have pronounced benefits in terms of energy efficiency.
underfox3.bsky.social
The results show that SwizzlePerf can achieve on a wide range of ML and scientific GPU kernels of up to a 2.1x speedup and 70% L2 hit rate improvement.
underfox3.bsky.social
In this paper is presented SwizzlePerf, a LLM workflow that automatically generates spatial optimizations for GPU kernels on disaggregated architectures by giving LLMs explicit hardware-awareness.

arxiv.org/pdf/2508.20258
underfox3.bsky.social
This work will be presented at the in 58th IEEE/ACM International Symposium on Microarchitecture (MICRO 25), which will be held October 18 - 22, 2025 at Seoul, Korea.
underfox3.bsky.social
OmniSim is able to successfully simulate 11 designs previously unsupported by any HLS tool, achieving up to 35.9x speedup over traditional C/RTL co-simulation, and up to 6.61x speedup over the state-of-the-art yet less capable simulator, LightningSim, on its own benchmark suite.
underfox3.bsky.social
OmniSim carefully orchestrates functionality and performance simulation threads to accurately model hardware-level behavior under arbitrary OS scheduling, achieving near-C simulation speed with near-RTL accuracy for both functionality and performance.
underfox3.bsky.social
In this paper is presented OmniSim, a framework that extend C-level simulation capability of HLS tools by enabling both functionality and performance simulations for those complex dataflow designs that are currently unsupported or considered infeasible.

arxiv.org/pdf/2508.19299
underfox3.bsky.social
The implemented proof of concept is capable of demonstrating softmax computation and invertible logic without the need to create a network of probabilistic devices, offering major scalability advantages.
underfox3.bsky.social
For the first time, researchers reported the realization of multi-value probabilistic computing by leveraging the thermally activated diffusion of magnetic skyrmions through an effectively non-flat energy landscape defined by a discrete number of sites.

arxiv.org/pdf/2508.19623
underfox3.bsky.social
Excerpt from: Y. Wong, G. Zocchi, Spontaneous spiral patterns etched on Germanium, Arxiv, 2025

Link: arxiv.org/pdf/2508.16764
underfox3.bsky.social
A thin metallic film on germanium, in the presence of water, results in a remarkable pattern-forming system, such as this beautiful spiral spontaneously etched on the surface with a total structure diameter of 680 μm.
underfox3.bsky.social
These findings represent a major step toward lower-power and faster spintronic devices for memory logic applications, creating new possibilities for electrical modulation of spin dynamics and ultrafast spin injection into two-dimensional quantum material.
underfox3.bsky.social
The experimental results employing direct contacts as well as contacts involving tunnel barriers show efficient gate control, with over 100% enhancement in the demagnetization rate compared to bare Cobalt by modulating the junction resistance.
underfox3.bsky.social
Researchers have demonstrated graphene spin-field-effect junctions where the electric field can control ultrafast spin currents and spin dynamics in thin-film ferromagnets.

PRL link: journals.aps.org/prl/pdf/10.1...
underfox3.bsky.social
It is important to note that the proposed experiment in this work also revealed that the topology-aware losses could also contribute to improving the geometry of the interpolated data.
underfox3.bsky.social
Given an input sequence of persistence diagrams and a sparse temporal sampling of the corresponding data, the porposed approach inverts the non-keyframe diagrams to produce plausible estimations of the missing data.
underfox3.bsky.social
In this paper, researchers have developed a neural approach for the topology aware interpolation of scalar fields losses based on persistence diagrams, for constraining the topology and geometry of the output interpolations.

arxiv.org/pdf/2508.17995
Reposted by Underfox
underfox3.bsky.social
In this paper is presented an ab-initio transistor simulation of unprecedented scale including electron-electron interactions within the self-consistent GW approximation, carefully optimized to take advantage of the Alps and Frontier supercomputers. #HPC

arxiv.org/pdf/2508.19138