Lightnews — Scholar-powered news

Underfox

@underfox3.bsky.social

760 followers 17 following 670 posts

Physicist, Telecom Engineering lover, HPC Enthusiast. Prog Rock/Metal fan. --- Independent tech analyst focused on semiconductors, patent analysis and emerging technologies.

Posts Media Videos Starter Packs

Pinned

Underfox @underfox3.bsky.social · Nov 28

I would like to invite all my followers to subscribe on my Substack. Everyone who loves technology and is hungry for new discoveries is welcome. Also, feel free to contact me!

Profile Link: underfox3.substack.com

Underfox3 | Substack

Independent tech analyst focused on semiconductors, patent analysis and emerging technologies. Click to read Underfox3, a Substack publication. Launched a year ago.

underfox3.substack.com

2 12

Underfox @underfox3.bsky.social · Aug 29

Github:

primecai.github.io/moc/

Underfox @underfox3.bsky.social · Aug 29

Each query dynamically selects a few informational blocks, as well as mandatory anchors, with causal routing that avoids loop closures. The model is able to allocate computation to relevant histories, preserving identities, actions, and scenes across minutes of content.

1 1

Underfox @underfox3.bsky.social · Aug 29

Researchers have proposed Mixture of Contexts, a long video generation framework that learns to route each query to the most relevant segments of the video sequence, instead of relying on uniform or static sparse attention or a fixed selection strategy.

arxiv.org/pdf/2508.21058

1 2

Underfox @underfox3.bsky.social · Aug 29

This work could pave the way not only for automatic optimizations for ML and science kernels but also for the development of LLM-optimized AMD GPU drivers. Congrats to the authors for this excellent work.

Underfox @underfox3.bsky.social · Aug 29

SwizzlePerf is the first work that adds rich context from a suite of profilers into the context to directly reflect cache-locality improvements and improve LLM optimization.

1 1

Underfox @underfox3.bsky.social · Aug 29

This isn't the first time AMD researchers have ventured into AI-powered GPU optimization. The biggest and most important difference is that this work takes hardware-awareness into account.

Underfox @underfox3.bsky.social · Aug 29

By grouping cooperative blocks into a single XCD, the proposed workflow reduces off-chip traffic and stabilizes residency in the disaggregated caches, reducing the average energy per instruction even in kernels whose execution time is dominated by arithmetic throughput.

Underfox @underfox3.bsky.social · Aug 29

While the primary focus of the presented work was performance, it is clear that the same remapping will also have pronounced benefits in terms of energy efficiency.

Underfox @underfox3.bsky.social · Aug 29

The results show that SwizzlePerf can achieve on a wide range of ML and scientific GPU kernels of up to a 2.1x speedup and 70% L2 hit rate improvement.

Underfox @underfox3.bsky.social · Aug 29

In this paper is presented SwizzlePerf, a LLM workflow that automatically generates spatial optimizations for GPU kernels on disaggregated architectures by giving LLMs explicit hardware-awareness.

arxiv.org/pdf/2508.20258

1 3

Underfox @underfox3.bsky.social · Aug 29

This work will be presented at the in 58th IEEE/ACM International Symposium on Microarchitecture (MICRO 25), which will be held October 18 - 22, 2025 at Seoul, Korea.

Underfox @underfox3.bsky.social · Aug 29

OmniSim is able to successfully simulate 11 designs previously unsupported by any HLS tool, achieving up to 35.9x speedup over traditional C/RTL co-simulation, and up to 6.61x speedup over the state-of-the-art yet less capable simulator, LightningSim, on its own benchmark suite.

1 1

Underfox @underfox3.bsky.social · Aug 29

OmniSim carefully orchestrates functionality and performance simulation threads to accurately model hardware-level behavior under arbitrary OS scheduling, achieving near-C simulation speed with near-RTL accuracy for both functionality and performance.

Underfox @underfox3.bsky.social · Aug 29

In this paper is presented OmniSim, a framework that extend C-level simulation capability of HLS tools by enabling both functionality and performance simulations for those complex dataflow designs that are currently unsupported or considered infeasible.

arxiv.org/pdf/2508.19299

1 2

Underfox @underfox3.bsky.social · Aug 29

The implemented proof of concept is capable of demonstrating softmax computation and invertible logic without the need to create a network of probabilistic devices, offering major scalability advantages.

Underfox @underfox3.bsky.social · Aug 29

For the first time, researchers reported the realization of multi-value probabilistic computing by leveraging the thermally activated diffusion of magnetic skyrmions through an effectively non-flat energy landscape defined by a discrete number of sites.

arxiv.org/pdf/2508.19623

1 1

Underfox @underfox3.bsky.social · Aug 29

Excerpt from: Y. Wong, G. Zocchi, Spontaneous spiral patterns etched on Germanium, Arxiv, 2025

Link: arxiv.org/pdf/2508.16764

Underfox @underfox3.bsky.social · Aug 29

A thin metallic film on germanium, in the presence of water, results in a remarkable pattern-forming system, such as this beautiful spiral spontaneously etched on the surface with a total structure diameter of 680 μm.

1 2

Underfox @underfox3.bsky.social · Aug 28

These findings represent a major step toward lower-power and faster spintronic devices for memory logic applications, creating new possibilities for electrical modulation of spin dynamics and ultrafast spin injection into two-dimensional quantum material.

Underfox @underfox3.bsky.social · Aug 28

The experimental results employing direct contacts as well as contacts involving tunnel barriers show efficient gate control, with over 100% enhancement in the demagnetization rate compared to bare Cobalt by modulating the junction resistance.

1 1

Underfox @underfox3.bsky.social · Aug 28

Researchers have demonstrated graphene spin-field-effect junctions where the electric field can control ultrafast spin currents and spin dynamics in thin-film ferromagnets.

PRL link: journals.aps.org/prl/pdf/10.1...

1 2

Underfox @underfox3.bsky.social · Aug 28

It is important to note that the proposed experiment in this work also revealed that the topology-aware losses could also contribute to improving the geometry of the interpolated data.

Underfox @underfox3.bsky.social · Aug 28

Given an input sequence of persistence diagrams and a sparse temporal sampling of the corresponding data, the porposed approach inverts the non-keyframe diagrams to produce plausible estimations of the missing data.

1 2

Underfox @underfox3.bsky.social · Aug 28

In this paper, researchers have developed a neural approach for the topology aware interpolation of scalar fields losses based on persistence diagrams, for constraining the topology and geometry of the output interpolations.

arxiv.org/pdf/2508.17995

1 2

Reposted by Underfox

Underfox @underfox3.bsky.social · Aug 28

In this paper is presented an ab-initio transistor simulation of unprecedented scale including electron-electron interactions within the self-consistent GW approximation, carefully optimized to take advantage of the Alps and Frontier supercomputers. #HPC

arxiv.org/pdf/2508.19138

1 2 8