Lightnews — Scholar-powered news

Paper

@paper.bsky.social

Top 30 most popular arXiv papers in the last 30 days.
[1/30] [2/30] [3/30] [4/30] [5/30] [6/30] [7/30] [8/30] [9/30] [10/30] [11/30] [12/30] [13/30] [14/30] [15/30] [16/30] [17/30] [18/30] [19/30] [20/30] [21/30] [22/30] [23/30] [24/30] [25/30] [26/30] [27/30] [28/30] [29/30] [30/30]

1/30 https://arxiv.org/abs/2602.03229
2/30 https://arxiv.org/abs/2601.20245
3/30 https://arxiv.org/abs/2601.07222
4/30 https://arxiv.org/abs/2602.00773
5/30 https://arxiv.org/abs/2601.07708
6/30 https://arxiv.org/abs/2601.15621
7/30 https://arxiv.org/abs/2601.18401
8/30 https://arxiv.org/abs/2601.15494
9/30 https://arxiv.org/abs/2601.16983
10/30 https://arxiv.org/abs/2602.00919
11/30 https://arxiv.org/abs/2602.02276
12/30 https://arxiv.org/abs/2602.04705
13/30 https://arxiv.org/abs/2601.09012
14/30 https://arxiv.org/abs/2602.03716
15/30 https://arxiv.org/abs/2601.06943
16/30 https://arxiv.org/abs/2601.09668
17/30 https://arxiv.org/abs/2601.23265
18/30 https://arxiv.org/abs/2601.12538
19/30 https://arxiv.org/abs/2601.10825
20/30 https://arxiv.org/abs/2601.17058
21/30 https://arxiv.org/abs/2602.05192
22/30 https://arxiv.org/abs/2601.10387
23/30 https://arxiv.org/abs/2601.16725
24/30 https://arxiv.org/abs/2601.20833
25/30 https://arxiv.org/abs/2602.00294
26/30 https://arxiv.org/abs/2601.10477
27/30 https://arxiv.org/abs/2602.02084
28/30 https://arxiv.org/abs/2601.08521
29/30 https://arxiv.org/abs/2601.22060
30/30 https://arxiv.org/abs/2601.08763

February 10, 2026 at 12:11 AM

Paper

@paper.bsky.social

Top 30 most popular arXiv papers in the last 30 days.
[1/30] [2/30] [3/30] [4/30] [5/30] [6/30] [7/30] [8/30] [9/30] [10/30] [11/30] [12/30] [13/30] [14/30] [15/30] [16/30] [17/30] [18/30] [19/30] [20/30] [21/30] [22/30] [23/30] [24/30] [25/30] [26/30] [27/30] [28/30] [29/30] [30/30]

1/30 https://arxiv.org/abs/2602.03229
2/30 https://arxiv.org/abs/2601.20245
3/30 https://arxiv.org/abs/2601.07222
4/30 https://arxiv.org/abs/2601.07708
5/30 https://arxiv.org/abs/2602.00773
6/30 https://arxiv.org/abs/2601.15621
7/30 https://arxiv.org/abs/2601.18401
8/30 https://arxiv.org/abs/2601.15494
9/30 https://arxiv.org/abs/2601.16983
10/30 https://arxiv.org/abs/2602.00919
11/30 https://arxiv.org/abs/2602.02276
12/30 https://arxiv.org/abs/2602.04705
13/30 https://arxiv.org/abs/2601.09012
14/30 https://arxiv.org/abs/2601.06943
15/30 https://arxiv.org/abs/2601.09668
16/30 https://arxiv.org/abs/2602.03716
17/30 https://arxiv.org/abs/2601.06521
18/30 https://arxiv.org/abs/2601.12538
19/30 https://arxiv.org/abs/2601.10825
20/30 https://arxiv.org/abs/2601.23265
21/30 https://arxiv.org/abs/2601.17058
22/30 https://arxiv.org/abs/2602.05192
23/30 https://arxiv.org/abs/2601.16725
24/30 https://arxiv.org/abs/2601.10387
25/30 https://arxiv.org/abs/2601.20833
26/30 https://arxiv.org/abs/2602.00294
27/30 https://arxiv.org/abs/2601.10477
28/30 https://arxiv.org/abs/2602.02084
29/30 https://arxiv.org/abs/2601.08521
30/30 https://arxiv.org/abs/2601.22060

February 9, 2026 at 12:07 AM

Paper

@paper.bsky.social

[22/30] 180 Likes, 112 Comments, 2 Posts
2602.05192, cs․AI | math․AG | math․CO | math․GT | math․HO | math․RA, 05 Feb 2026

🆕First Proof

Mohammed Abouzaid, Andrew J. Blumberg, Martin Hairer, Joe Kileel, Tamara G. Kolda, Paul D. Nelson, Daniel Spielman, Nikhil Srivastava, Rachel Ward, Shmue...

To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors.

The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time.

February 9, 2026 at 12:06 AM

Paper

@paper.bsky.social

Top 30 most popular arXiv papers in the last 30 days.
[1/30] [2/30] [3/30] [4/30] [5/30] [6/30] [7/30] [8/30] [9/30] [10/30] [11/30] [12/30] [13/30] [14/30] [15/30] [16/30] [17/30] [18/30] [19/30] [20/30] [21/30] [22/30] [23/30] [24/30] [25/30] [26/30] [27/30] [28/30] [29/30] [30/30]

February 8, 2026 at 12:07 AM

Paper

@paper.bsky.social

[5/30] 640 Likes, 466 Comments, 3 Posts
2602.00773, cs․HC, 31 Jan 2026

🆕"Please, don't kill the only model that still feels human": Understanding the #Keep4o Backlash

Huiqian Lai

When OpenAI replaced GPT-4o with GPT-5, it triggered the Keep4o user resistance movement, revealing a conflict between rapid platform iteration and users' deep socio-emotional attachments to AI systems.

This paper presents a phenomenon-driven, mixed-methods investigation of this conflict, analyzing 1,482 social media posts.

Thematic analysis reveals that resistance stems from two core investments: instrumental dependency, where the AI is deeply integrated into professional workflows, and relational attachment, where users form strong parasocial bonds with the AI as a unique companion.

Quantitative analysis further shows that the coercive deprivation of user choice was a key catalyst, transforming individual grievances into a collective, rights-based protest.

This study illuminates an emerging form of socio-technical conflict in the age of generative AI.

Our findings suggest that for AI systems designed for companionship and deep integration, the process of change--particularly the preservation of user agency--can be as critical as the technological outcome itself.

February 8, 2026 at 12:07 AM

Paper

@paper.bsky.social

[16/30] 198 Likes, 28 Comments, 2 Posts
2602.03716, math․CO | math․AC | math․NT, 03 Feb 2026

🆕Fel's Conjecture on Syzygies of Numerical Semigroups

Evan Chen, Chris Cummins, GSM, Dejan Grubisic, Leopold Haller, Letong Hong, Andranik Kurghinyan, Kenny Lau, Hugh Leather, Seewoo Lee, Aram Marko...

$Let $S=\langle d_1,\dots,d_m\rangle$ be a numerical semigroup and $k[S]$ its semigroup ring. The Hilbert numerator of $k[S]$ determines normalized alternating syzygy power sums $K_p(S)$ encoding alternating power sums of syzygy degrees. Fel conjectured an explicit formula for $K_p(S)$, for all $p\ge 0$, in terms of the gap power sums $G_r(S)=\sum_{g\notin S} g^r$ and universal symmetric polynomials $T_n$ evaluated at the generator power sums $σ_k=\sum_i d_i^k$ (and $δ_k=(σ_k-1)/2^k$). We prove Fel's conjecture via exponential generating functions and coefficient extraction, solating the universal identities for $T_n$ needed for the derivation. The argument is fully formalized in Lean/Mathlib, and was produced automatically by AxiomProver from a natural-language statement of the conjecture.$

February 8, 2026 at 12:07 AM

Paper

@paper.bsky.social

[30/30] 138 Likes, 12 Comments, 2 Posts
2601.23265, cs․CL | cs․CV, 30 Jan 2026

🆕PaperBanana: Automating Academic Illustration for AI Scientists

Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon

Despite rapid advances in autonomous AI scientists powered by language models, generating publication-ready illustrations remains a labor-intensive bottleneck in the research workflow.

To lift this burden, we introduce PaperBanana, an agentic framework for automated generation of publication-ready academic illustrations.

Powered by state-of-the-art VLMs and image generation models, PaperBanana orchestrates specialized agents to retrieve references, plan content and style, render images, and iteratively refine via self-critique.

To rigorously evaluate our framework, we introduce PaperBananaBench, comprising 292 test cases for methodology diagrams curated from NeurIPS 2025 publications, covering diverse research domains and illustration styles.

Comprehensive experiments demonstrate that PaperBanana consistently outperforms leading baselines in faithfulness, conciseness, readability, and aesthetics.

We further show that our method effectively extends to the generation of high-quality statistical plots.

Collectively, PaperBanana paves the way for the automated generation of publication-ready illustrations.

February 8, 2026 at 12:07 AM

Paper

@paper.bsky.social

[1/30] 76 Likes, 2 Comments, 1 Posts
2601.19325, cs․CV | cs․AI, 27 Jan 2026

🆕Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Zichen Wen, Boxue Yang, Shuang Chen, Yaojie Zhang, Yuhang Han, Junlong Ke, Cong Wang, Yicheng Fu, Jiawang Zhao, Jiangchao Yao, Xi Fang, Zhen W...

We present Innovator-VL, a scientific multimodal large language model designed to advance understanding and reasoning across diverse scientific domains while maintaining excellent performance on general vision tasks.

Contrary to the trend of relying on massive domain-specific pretraining and opaque pipelines, our work demonstrates that principled training design and transparent methodology can yield strong scientific intelligence with substantially reduced data requirements.

(i) First, we provide a fully transparent, end-to-end reproducible training pipeline, covering data collection, cleaning, preprocessing, supervised fine-tuning, reinforcement learning, and evaluation, along with detailed optimization recipes.

This facilitates systematic extension by the community.

(ii) Second, Innovator-VL exhibits remarkable data efficiency, achieving competitive performance on various scientific tasks using fewer than five million curated samples without large-scale pretraining.

These results highlight that effective reasoning can be achieved through principled data selection rather than indiscriminate scaling.

(iii) Third, Innovator-VL demonstrates strong generalization, achieving competitive performance on general vision, multimodal reasoning, and scientific benchmarks.

This indicates that scientific alignment can be integrated into a unified model without compromising general-purpose capabilities.

Our practices suggest that efficient, reproducible, and high-performing scientific multimodal models can be built even without large-scale data, providing a practical foundation for future research.

February 7, 2026 at 12:23 AM

Paper

@paper.bsky.social

[3/30] 75 Likes, 3 Comments, 1 Posts
2601.12993, cs․RO, 19 Jan 2026

🆕Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Hao Luo, Ye Wang, Wanpeng Zhang, Sipeng Zheng, Ziheng Xi, Chaoyi Xu, Haiweng Xu, Haoqi Yuan, Chi Zhang, Yiqing Wang, Yicheng Feng, Zongqing...

We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms.

While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction.

To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments.

Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms.

Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm to bridge human demonstrations and robotic execution.

Architecturally, Being-H0.5 utilizes a Mixture-of-Transformers design featuring a novel Mixture-of-Flow (MoF) framework to decouple shared motor primitives from specialized embodiment-specific experts.

Finally, to make cross-embodiment policies stable in the real world, we introduce Manifold-Preserving Gating for robustness under sensory shift and Universal Async Chunking to universalize chunked control across embodiments with different latency and control profiles.

We empirically demonstrate that Being-H0.5 achieves state-of-the-art results on simulated benchmarks, such as LIBERO (98.9%) and RoboCasa (53.9%), while also exhibiting strong cross-embodiment capabilities on five robotic platforms.

February 7, 2026 at 12:22 AM

Paper

@paper.bsky.social

[4/30] 74 Likes, 4 Comments, 1 Posts
2601.14724, cs․CV | cs․AI | cs․CL, 26 Jan 2026

🆕HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Haowei Zhang, Shudong Yang, Jinlan Fu, See-Kiong Ng, Xipeng Qiu

$Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging, as existing models struggle to simultaneously maintain stable understanding performance, real-time responses, and low GPU memory overhead. To address this challenge, we propose HERMES, a novel training-free architecture for real-time and accurate understanding of video streams. Based on a mechanistic attention investigation, we conceptualize KV cache as a hierarchical memory framework that encapsulates video information across multiple granularities. During inference, HERMES reuses a compact KV cache, enabling efficient streaming understanding under resource constraints. Notably, HERMES requires no auxiliary computations upon the arrival of user queries, thereby guaranteeing real-time responses for continuous video stream interactions, which achieves 10$\times$ faster TTFT compared to prior SOTA. Even when reducing video tokens by up to 68% compared with uniform sampling, HERMES achieves superior or comparable accuracy across all benchmarks, with up to 11.4% gains on streaming datasets.$

February 7, 2026 at 12:22 AM

Paper

@paper.bsky.social

[5/30] 74 Likes, 1 Comments, 1 Posts
2602.02437, cs․CV | cs․AI, 04 Feb 2026

🆕UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Dianyi Wang, Chaofan Ma, Feng Han, Size Wu, Wei Song, Yibin Wang, Zhixiong Zhang, Tianhang Wang, Siyuan Wang, Zho...

Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rather than interconnected reasoning steps.

To address this, we propose UniReason, a unified framework that harmonizes these two tasks through two complementary reasoning paradigms.

We incorporate world knowledge-enhanced textual reasoning into generation to infer implicit knowledge, and leverage editing capabilities for fine-grained editing-like visual refinement to further correct visual errors via self-reflection.

This approach unifies generation and editing within a shared architecture, mirroring the human cognitive process of planning followed by refinement.

We support this framework by systematically constructing a large-scale reasoning-centric dataset (~300k samples) covering five major knowledge domains (e.g., cultural commonsense, physics, etc.) for textual reasoning, alongside an agent-generated corpus for visual refinement.

Extensive experiments demonstrate that UniReason achieves advanced performance on reasoning-intensive benchmarks such as WISE, KrisBench and UniREditBench, while maintaining superior general synthesis capabilities.

February 7, 2026 at 12:22 AM

Paper

@paper.bsky.social

[6/30] 73 Likes, 7 Comments, 4 Posts
2601.08584, cs․CL, 13 Jan 2026

🆕Ministral 3

Alexander H. Liu, Kartik Khandelwal, Sandeep Subramanian, Victor Jouault, Abhinav Rastogi, Adrien Sadé, Alan Jeffares, Albert Jiang, Alexandre Cahill, Alexandre Gavaudan, Alexandre Sablayrolles, Amélie Héliou,...

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters.

For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving.

In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique.

Each model comes with image understanding capabilities, all under the Apache 2.0 license.

February 7, 2026 at 12:22 AM

Paper

@paper.bsky.social

[7/30] 73 Likes, 4 Comments, 1 Posts
2601.10103, cs․CV | cs․AI, 15 Jan 2026

🆕FlowAct-R1: Towards Interactive Humanoid Video Generation

Lizhen Wang, Yongming Zhu, Zhipeng Ge, Youwei Zheng, Longhao Zhang, Tianshu Hu, Shiyang Qin, Mingshuang Luo, Jiaxu Zhang, Xin Chen, Yulong Wang, Zerong Zheng...

Interactive humanoid video generation aims to synthesize lifelike visual agents that can engage with humans through continuous and responsive video.

Despite recent advances in video synthesis, existing methods often grapple with the trade-off between high-fidelity synthesis and real-time interaction requirements.

In this paper, we propose FlowAct-R1, a framework specifically designed for real-time interactive humanoid video generation.

Built upon a MMDiT architecture, FlowAct-R1 enables the streaming synthesis of video with arbitrary durations while maintaining low-latency responsiveness.

We introduce a chunkwise diffusion forcing strategy, complemented by a novel self-forcing variant, to alleviate error accumulation and ensure long-term temporal consistency during continuous interaction.

By leveraging efficient distillation and system-level optimizations, our framework achieves a stable 25fps at 480p resolution with a time-to-first-frame (TTFF) of only around 1.5 seconds.

The proposed method provides holistic and fine-grained full-body control, enabling the agent to transition naturally between diverse behavioral states in interactive scenarios.

Experimental results demonstrate that FlowAct-R1 achieves exceptional behavioral vividness and perceptual realism, while maintaining robust generalization across diverse character styles.

February 7, 2026 at 12:22 AM

Paper

@paper.bsky.social

[8/30] 72 Likes, 2 Comments, 1 Posts
2602.04145, cs․LG | cs․CL | cs․MM, 05 Feb 2026

🆕Training Data Efficiency in Multimodal Process Reward Models

Jinyuan Li, Chengsong Huang, Langlin Huang, Shaoyang Xu, Haolin Liu, Wenxuan Zhang, Jiaxin Huang

$Multimodal Process Reward Models (MPRMs) are central to step-level supervision for visual reasoning in MLLMs. Training MPRMs typically requires large-scale Monte Carlo (MC)-annotated corpora, incurring substantial training cost. This paper studies the data efficiency for MPRM training. Our preliminary experiments reveal that MPRM training quickly saturates under random subsampling of the training data, indicating substantial redundancy within existing MC-annotated corpora. To explain this, we formalize a theoretical framework and reveal that informative gradient updates depend on two factors: label mixtures of positive/negative steps and label reliability (average MC scores of positive steps). Guided by these insights, we propose the Balanced-Information Score (BIS), which prioritizes both mixture and reliability based on existing MC signals at the rollout level, without incurring any additional cost. Across two backbones (InternVL2.5-8B and Qwen2.5-VL-7B) on VisualProcessBench, BIS-selected subsets consistently match and even surpass the full-data performance at small fractions. Notably, the BIS subset reaches full-data performance using only 10% of the training data, improving over random subsampling by a relative 4.1%.$

February 7, 2026 at 12:21 AM

Paper

@paper.bsky.social

[9/30] 71 Likes, 5 Comments, 1 Posts
2601.15165, cs․CL | cs․AI | cs․LG, 26 Jan 2026

🆕The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Zanlin Ni, Shenzhi Wang, Yang Yue, Tianyu Yu, Weilin Zhao, Yeguo Hua, Tianyi Chen, Jun Song, Cheng Yu, Bo Zh...

Diffusion Large Language Models (dLLMs) break the rigid left-to-right constraint of traditional LLMs, enabling token generation in arbitrary orders.

Intuitively, this flexibility implies a solution space that strictly supersets the fixed autoregressive trajectory, theoretically unlocking superior reasoning potential for general tasks like mathematics and coding.

Consequently, numerous works have leveraged reinforcement learning (RL) to elicit the reasoning capability of dLLMs.

In this paper, we reveal a counter-intuitive reality: arbitrary order generation, in its current form, narrows rather than expands the reasoning boundary of dLLMs.

We find that dLLMs tend to exploit this order flexibility to bypass high-uncertainty tokens that are crucial for exploration, leading to a premature collapse of the solution space.

This observation motivates a rethink of RL approaches for dLLMs, where considerable complexities, such as handling combinatorial trajectories and intractable likelihoods, are often devoted to preserving this flexibility.

We demonstrate that effective reasoning can be better elicited by intentionally forgoing arbitrary order and applying standard Group Relative Policy Optimization (GRPO) instead.

Our approach, JustGRPO, is minimalist yet surprisingly effective (e.g., 89.1% accuracy on GSM8K) while fully retaining the parallel decoding ability of dLLMs.

Project page: https://nzl-thu.github.io/the-flexibility-trap

February 7, 2026 at 12:21 AM

Paper

@paper.bsky.social

[10/30] 70 Likes, 2 Comments, 1 Posts
2601.08828, cs․CV | cs․AI | cs․LG | cs․MM | cs․RO, 13 Jan 2026

🆕Motion Attribution for Video Generation

Xindi Wu, Despoina Paschalidou, Jun Gao, Antonio Torralba, Laura Leal-Taixé, Olga Russakovsky, Sanja Fidler, Jonathan Lorraine

Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood.

We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models.

We use this to study which fine-tuning clips improve or degrade temporal dynamics.

Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation.

On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility.

With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model.

To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.

February 7, 2026 at 12:21 AM

Paper

@paper.bsky.social

[11/30] 68 Likes, 3 Comments, 1 Posts
2601.22153, cs․RO | cs․CV, 29 Jan 2026

🆕DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Haozhe Xie, Beichen Wen, Jiarui Zheng, Zhaoxi Chen, Fangzhou Hong, Haiwen Diao, Ziwei Liu

Manipulating dynamic objects remains an open challenge for Vision-Language-Action (VLA) models, which, despite strong generalization in static manipulation, struggle in dynamic scenarios requiring rapid perception, temporal anticipation, and continuous control.

We present DynamicVLA, a framework for dynamic object manipulation that integrates temporal reasoning and closed-loop adaptation through three key designs:

1) a compact 0.4B VLA using a convolutional vision encoder for spatially efficient, structurally faithful encoding, enabling fast multimodal inference;

2) Continuous Inference, enabling overlapping reasoning and execution for lower latency and timely adaptation to object motion; and

3) Latent-aware Action Streaming, which bridges the perception-execution gap by enforcing temporally aligned action execution.

To fill the missing foundation of dynamic manipulation data, we introduce the Dynamic Object Manipulation (DOM) benchmark, built from scratch with an auto data collection pipeline that efficiently gathers 200K synthetic episodes across 2.8K scenes and 206 objects, and enables fast collection of 2K real-world episodes without teleoperation.

Extensive evaluations demonstrate remarkable improvements in response speed, perception, and generalization, positioning DynamicVLA as a unified framework for general dynamic object manipulation across embodiments.

February 7, 2026 at 12:21 AM

Paper

@paper.bsky.social

[12/30] 65 Likes, 3 Comments, 1 Posts
2601.07022, cs․CL, 11 Jan 2026

🆕Solar Open Technical Report

Sungrae Park, Sanghoon Kim, Jungho Cho, Gyoungjin Gim, Dawoon Jung, Mikyoung Cha, Eunhae Choo, Taekgyu Hong, Minbyul Jeong, SeHwan Joo, Minsoo Khang, Eunwon Kim, Minjeong Kim, Sujeong Kim, Yunsu ...

We introduce Solar Open, a 102B-parameter bilingual Mixture-of-Experts language model for underserved languages.

Solar Open demonstrates a systematic methodology for building competitive LLMs by addressing three interconnected challenges.

First, to train effectively despite data scarcity for underserved languages, we synthesize 4.5T tokens of high-quality, domain-specific, and RL-oriented data.

Second, we coordinate this data through a progressive curriculum jointly optimizing composition, quality thresholds, and domain coverage across 20 trillion tokens.

Third, to enable reasoning capabilities through scalable RL, we apply our proposed framework SnapPO for efficient optimization.

Across benchmarks in English and Korean, Solar Open achieves competitive performance, demonstrating the effectiveness of this methodology for underserved language AI development.

February 7, 2026 at 12:21 AM

Paper

@paper.bsky.social

[13/30] 64 Likes, 4 Comments, 1 Posts
2601.11077, cs․SE | cs․AI | cs․CL, 16 Jan 2026

🆕ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Jie Yang, Honglin Guo, Li Ji, Jiazheng Zhou, Rui Zheng, Zhikai Lei, Shuo Zhang, Zhiheng Xi, Shichun Liu, Yuxin Wang, Bo Wang, Yining...

The evolution of Large Language Models (LLMs) into autonomous agents has expanded the scope of AI coding from localized code generation to complex, repository-level, and execution-driven problem solving.

However, current benchmarks predominantly evaluate code logic in static contexts, neglecting the dynamic, full-process requirements of real-world engineering, particularly in backend development which demands rigorous environment configuration and service deployment.

To address this gap, we introduce ABC-Bench, a benchmark explicitly designed to evaluate agentic backend coding within a realistic, executable workflow.

Using a scalable automated pipeline, we curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.

Distinct from previous evaluations, ABC-Bench require the agents to manage the entire development lifecycle from repository exploration to instantiating containerized services and pass the external end-to-end API tests.

Our extensive evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks, highlighting a substantial disparity between current model capabilities and the demands of practical backend engineering.

Our code is available at https://github.com/OpenMOSS/ABC-Bench.

February 7, 2026 at 12:21 AM

Paper

@paper.bsky.social

[15/30] 64 Likes, 2 Comments, 1 Posts
2602.02103, cs․LG | cs․CL, 02 Feb 2026

🆕No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs

Liyan Xu, Mo Yu, Fandong Meng, Jie Zhou

This work stems from prior complementary observations on the dynamics of Chain-of-Thought (CoT): Large Language Models (LLMs) is shown latent planning of subsequent reasoning prior to CoT emergence, thereby diminishing the significance of explicit CoT; whereas CoT remains critical for tasks requiring multi-step reasoning.

To deepen the understanding between LLM's internal states and its verbalized reasoning trajectories, we investigate the latent planning strength of LLMs, through our probing method, Tele-Lens, applying to hidden states across diverse task domains.

Our empirical results indicate that LLMs exhibit a myopic horizon, primarily conducting incremental transitions without precise global planning.

Leveraging this characteristic, we propose a hypothesis on enhancing uncertainty estimation of CoT, which we validate that a small subset of CoT positions can effectively represent the uncertainty of the entire path.

We further underscore the significance of exploiting CoT dynamics, and demonstrate that automatic recognition of CoT bypass can be achieved without performance degradation.

Our code, data and models are released at https://github.com/lxucs/tele-lens.

February 7, 2026 at 12:20 AM

Paper

@paper.bsky.social

[16/30] 62 Likes, 6 Comments, 1 Posts
2601.09088, cs․LG | cs․CL, 14 Jan 2026

🆕Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Shaotian Yan, Kaiyuan Liu, Chen Shen, Bing Wang, Sinan Fan, Jun Zhang, Yue Wu, Zheng Wang, Jieping Ye

In this report, we introduce DASD-4B-Thinking, a lightweight yet highly capable, fully open-source reasoning model.

It achieves SOTA performance among open-source models of comparable scale across challenging benchmarks in mathematics, scientific reasoning, and code generation -- even outperforming several larger models.

We begin by critically reexamining a widely adopted distillation paradigm in the community: SFT on teacher-generated responses, also known as sequence-level distillation.

Although a series of recent works following this scheme have demonstrated remarkable efficiency and strong empirical performance, they are primarily grounded in the SFT perspective.

Consequently, these approaches focus predominantly on designing heuristic rules for SFT data filtering, while largely overlooking the core principle of distillation itself -- enabling the student model to learn the teacher's full output distribution so as to inherit its generalization capability.

Specifically, we identify three critical limitations in current practice:

i) Inadequate representation of the teacher's sequence-level distribution;

ii) Misalignment between the teacher's output distribution and the student's learning capacity; and

iii) Exposure bias arising from teacher-forced training versus autoregressive inference.

In summary, these shortcomings reflect a systemic absence of explicit teacher-student interaction throughout the distillation process, leaving the essence of distillation underexploited.

To address these issues, we propose several methodological innovations that collectively form an enhanced sequence-level distillation training pipeline.

Remarkably, DASD-4B-Thinking obtains competitive results using only 448K training samples -- an order of magnitude fewer than those employed by most existing open-source efforts.

To support community research, we publicly release our models and the training dataset.

February 7, 2026 at 12:20 AM

Paper

@paper.bsky.social

[17/30] 61 Likes, 6 Comments, 2 Posts
2601.16282, cs․CL | cs․AI, 22 Jan 2026

🆕Generating Literature-Driven Scientific Theories at Scale

Peter Jansen, Peter Clark, Doug Downey, Daniel S. Weld

Contemporary automated scientific discovery has focused on agents for generating scientific experiments, while systems that perform higher-level scientific activities such as theory building remain underexplored.

In this work, we formulate the problem of synthesizing theories consisting of qualitative and quantitative laws from large corpora of scientific literature.

We study theory generation at scale, using 13.7k source papers to synthesize 2.9k theories, examining how generation using literature-grounding versus parametric knowledge, and accuracy-focused versus novelty-focused generation objectives change theory properties.

Our experiments show that, compared to using parametric LLM memory for generation, our literature-supported method creates theories that are significantly better at both matching existing evidence and at predicting future results from 4.6k subsequently-written papers

February 7, 2026 at 12:20 AM

Paper

@paper.bsky.social

[18/30] 60 Likes, 5 Comments, 1 Posts
2601.11655, cs․SE | cs․CL, 15 Jan 2026

🆕Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Caihua Li, Lianghong Guo, Yanlin Wang, Daya Guo, Wei Tao, Zhenyu Shan, Mingwei Liu, Jiachi Chen, Haoyu Song, Duyu...

Issue resolution, a complex Software Engineering (SWE) task integral to real-world development, has emerged as a compelling challenge for artificial intelligence.

The establishment of benchmarks like SWE-bench revealed this task as profoundly difficult for large language models, thereby significantly accelerating the evolution of autonomous coding agents.

This paper presents a systematic survey of this emerging domain.

We begin by examining data construction pipelines, covering automated collection and synthesis approaches.

We then provide a comprehensive analysis of methodologies, spanning training-free frameworks with their modular components to training-based techniques, including supervised fine-tuning and reinforcement learning.

Subsequently, we discuss critical analyses of data quality and agent behavior, alongside practical applications.

Finally, we identify key challenges and outline promising directions for future research.

An open-source repository is maintained at https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution to serve as a dynamic resource in this field.

February 7, 2026 at 12:20 AM

Paper

@paper.bsky.social

[19/30] 60 Likes, 3 Comments, 1 Posts
2601.14133, cs․RO | cs․CV, 30 Jan 2026

🆕TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Bin Yu, Shijie Lian, Xiaopeng Lin, Yuliang Wei, Zhaolong Shen, Changti Wu, Yuzhuo Miao, Xinming Wa...

The fundamental premise of Vision-Language-Action (VLA) models is to harness the extensive general capabilities of pre-trained Vision-Language Models (VLMs) for generalized embodied intelligence.

However, standard robotic fine-tuning inevitably disrupts the pre-trained feature space, leading to "catastrophic forgetting" that compromises the general visual understanding we aim to leverage.

To effectively utilize the uncorrupted general capabilities of VLMs for robotic tasks, we propose TwinBrainVLA, which coordinates two isomorphic VLM pathways: a frozen generalist (also called "Left Brain") and a trainable specialist (also called "Right Brain").

Our architecture utilizes a Asymmetric Mixture-of-Transformers (AsyMoT) mechanism, enabling the Right Brain to dynamically query and fuse intact semantic knowledge from the Left Brain with proprioceptive states.

This fused representation conditions a flow-matching action expert for precise continuous control.

Empirical results on SimplerEnv and RoboCasa benchmarks demonstrate that by explicitly retaining general capabilities, TwinBrainVLA achieves substantial performance gains over baseline models in complex manipulation tasks.

February 7, 2026 at 12:20 AM

Paper

@paper.bsky.social

[20/30] 59 Likes, 40 Comments, 1 Posts
2601.18557, math․NT | math․AG, 26 Jan 2026

🆕Arithmetic volumes of moduli stacks of Shtukas

Tony Feng, Zhiwei Yun, Wei Zhang

We define and study "tautological classes" in the cohomology of moduli stacks of shtukas, pursuing two directions of applications.

First, we prove a formula relating the "arithmetic volume" of tautological classes to higher derivatives of Artin $L$-functions, which can be viewed as an arithmetic analog of Hirzebruch's Proportionality principle.

Second, we define and analyze the structure of the "phantom tautological ring", using a general relation between Hecke correspondences and Vinberg's degeneration, and give applications to a function field analog of Colmez's Conjecture.

February 7, 2026 at 12:20 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news