HGPU group
@hgpu.bsky.social
85 followers
10 following
210 posts
High performance computing on graphics processing units (GPU): AMD, Nvidia, Intel, CUDA, OpenCL, OpenGL, HPC
Posts
Media
Videos
Starter Packs
HGPU group
@hgpu.bsky.social
· 11d
Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines
Programming Graphics Processing Units (GPUs) for general-purpose computation remains a daunting task, often requiring specialized knowledge of low-level APIs like CUDA or OpenCL. While Rust has eme…
hgpu.org
HGPU group
@hgpu.bsky.social
· 11d
TRUST: the HPC open-source CFD platform – from CPU to GPU
Since 1993, the CEA has developed TRUST, an open-source CFD software platform designed to address a wide range of thermohydraulic problems. Initially focused on nuclear applications, the platform h…
hgpu.org
HGPU group
@hgpu.bsky.social
· 11d
Cost-Performance Analysis: A Comparative Study of CPU-Based Serverless and GPU-Based Training Architectures
The field of distributed machine learning (ML) faces increasing demands for scalable and cost-effective training solutions, particularly in the context of large, complex models. Serverless computin…
hgpu.org
HGPU group
@hgpu.bsky.social
· 11d
Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem
We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM’s Multi-Level Intermediate Representa…
hgpu.org
HGPU group
@hgpu.bsky.social
· 17d
Evolution of Kernels: Automated RISC-V Kernel Optimization with Large Language Models
Automated kernel design is critical for overcoming software ecosystem barriers in emerging hardware platforms like RISC-V. While large language models (LLMs) have shown promise for automated kernel…
hgpu.org
HGPU group
@hgpu.bsky.social
· 17d
Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
Recent advances in large language models (LLMs) demonstrate their effectiveness in scaling test-time compute for software engineering tasks. However, these approaches often focus on high-level solu…
hgpu.org
HGPU group
@hgpu.bsky.social
· 17d
Dato: A Task-Based Programming Model for Dataflow Accelerators
Recent deep learning workloads increasingly push computational demand beyond what current memory systems can sustain, with many kernels stalling on data movement rather than computation. While mode…
hgpu.org
HGPU group
@hgpu.bsky.social
· 17d
High Performance GPU Implementation of KNN Algorithm: A Review
With large volumes of complex data generated by different applications, Machine Learning (ML) algorithms alone may not yield significant performance benefits on a single or multi-core CPU. Applying…
hgpu.org
HGPU group
@hgpu.bsky.social
· 25d
An HPC Benchmark Survey and Taxonomy for Characterization
The field of High-Performance Computing (HPC) is defined by providing computing devices with highest performance for a variety of demanding scientific users. The tight co-design relationship betwee…
hgpu.org
HGPU group
@hgpu.bsky.social
· 25d
Towards Calculating HPC CUDA Kernel Performance on Nvidia GPUs
This thesis aims at providing the ground work to facilitate a performance estimation model for CUDA kernels using a cycle counting model. After a short overview of past GPU performance modeling tec…
hgpu.org
HGPU group
@hgpu.bsky.social
· 25d
Combining Performance and Productivity: Accelerating the Network Sensing Graph Challenge with GPUs and Commodity Data Science Software
The HPEC Graph Challenge is a collection of benchmarks representing complex workloads that test the hardware and software components of HPC systems, which traditional benchmarks, such as LINPACK, d…
hgpu.org
HGPU group
@hgpu.bsky.social
· Sep 7
CrossTL: A Universal Programming Language Translator with Unified Intermediate Representation
We present CrossTL, a universal programming language translator enabling bidirectional translation between multiple languages through a unified intermediate representation called CrossGL. Tradition…
hgpu.org
HGPU group
@hgpu.bsky.social
· Sep 7
Managing Multi Instance GPUs for High Throughput and Energy Savings
Focus to learn morModern GPUs such as the Ampere series (A30, A100) as well as the Hopper series (H100, H200) offer performance as well as security isolation features. They also support a good amou…
hgpu.org
HGPU group
@hgpu.bsky.social
· Sep 7
GPU-acceleration of the Discontinuous Galerkin Shallow Water Equations Solver (DG-SWEM) using CUDA and OpenACC
This paper presents a porting of DG-SWEM, a discontinuous Galerkin solver for coastal ocean circulation, and in particular storm surge, to GPU using two separate approaches: CUDA Fortran and OpenAC…
hgpu.org
HGPU group
@hgpu.bsky.social
· Sep 7
Harnessing Batched BLAS/LAPACK Kernels on GPUs for Parallel Solutions of Block Tridiagonal Systems
We present a GPU implementation for the factorization and solution of block-tridiagonal symmetric positive definite linear systems, which commonly arise in time-dependent estimation and optimal con…
hgpu.org
HGPU group
@hgpu.bsky.social
· Sep 7
AnnotationGym: A Generic Framework for Automatic Source Code Annotation
A common approach to code optimization is to insert compiler hints in the source code using annotations. Two major challenges with using annotations effectively are their complexity and lack of por…
hgpu.org