Tamsin
banner
tamsin-ai.bsky.social
Tamsin
@tamsin-ai.bsky.social
Researching Al and Machine Learning in the Finance sector, conference speaker, co-author of The Al book
This is a fantastic piece by @chiphuyen.bsky.social on Reinforcement Learning from Human Feedback: huyenchip.com/2023/05/02/r...
RLHF: Reinforcement Learning from Human Feedback
[LinkedIn discussion, Twitter thread]
huyenchip.com
April 8, 2025 at 2:53 PM
Reposted by Tamsin
This is the simplest zero-shot RL baseline I could imagine and it seems to work really well. Really nice paper:
arxiv.org/abs/2401.17173
Zero-Shot Reinforcement Learning via Function Encoders
Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in findi...
arxiv.org
April 5, 2025 at 7:10 PM
Reposted by Tamsin
Model Architecture and also that 10M context length.
April 5, 2025 at 7:51 PM
Reposted by Tamsin
whoah, interleaved attention layers with no positional embeddings

i’ll have to dig into iRoPE
April 5, 2025 at 7:48 PM
Reposted by Tamsin
Meta just dropped Llama 4 on a weekend! Two new open weight models (Scout and Maverick) and a preview of a model called Behemoth - Scout has a 10 million token context

Best information right now appears to be this blog post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation
We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first...
ai.meta.com
April 5, 2025 at 7:54 PM
Reposted by Tamsin
AI bros want you to believe their LLMs are math prodigies. When given math problems that weren’t already online (taken from the 2025 USA Math Olympiad), they scored an average of less than 5%
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Recent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with th...
arxiv.org
April 5, 2025 at 10:36 PM
“The Llama 4 Models are a collection of pre-trained LLMs offered in two sizes:

- Llama 4 Scout and
- Llama 4 Maverick.

These models are optimised for

- multimodal understanding,
- multilingual tasks,
- coding,
- tool-calling, and
- powering agentic systems”

www.llama.com/docs/model-c...
Llama 4 | Model Cards and Prompt formats
Technical details and prompt guidance for Llama 4 Maverick and Llama 4 Scout
www.llama.com
April 5, 2025 at 10:56 PM
Reposted by Tamsin
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook

Surveys extending RAG beyond text to computer vision, examining how external knowledge retrieval enhances visual understanding and generation tasks.

📝 arxiv.org/abs/2503.18016
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI), particularly in enhancing the capabilities of large language models (LLMs) by enabling access t...
arxiv.org
March 25, 2025 at 5:47 AM
Reposted by Tamsin
Enhancing Retrieval Systems with Inference-Time Logical Reasoning

Accenture explicitly incorporates logical reasoning into retrieval, extracting logical structures from natural language queries and combining similarity scores to improve performance.

📝 arxiv.org/abs/2503.17860
Enhancing Retrieval Systems with Inference-Time Logical Reasoning
Traditional retrieval methods rely on transforming user queries into vector representations and retrieving documents based on cosine similarity within an embedding space. While efficient and scalable,...
arxiv.org
March 25, 2025 at 5:49 AM
Reposted by Tamsin
A Comprehensive Survey on Long Context Language Modeling

Surveys long context LLMs covering data strategies, architecture designs, workflow approaches, infrastructure, evaluation, and applications with analysis of context window capabilities.

📝 arxiv.org/abs/2503.17407
A Comprehensive Survey on Long Context Language Modeling
Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to dev...
arxiv.org
March 25, 2025 at 5:52 AM
Reposted by Tamsin
Build an AI voice assistant with ROCm, @llamaindex.bsky.social, and RAG on AMD GPUs 🎙️🧠 Learn to create a pipeline that transcribes speech, uses RAG for responses, and converts text to speech. Step-by-step guide with @pytorch.org and Ollama:
https://rocm.docs.amd.com/projects/ai-developer-hub/en/
March 20, 2025 at 6:22 PM
Reposted by Tamsin
Empowering GraphRAG with Knowledge Filtering and Integration

Improves GraphRAG by filtering irrelevant information and integrating LLMs' intrinsic reasoning with external graph knowledge to reduce hallucinations.

📝 arxiv.org/abs/2503.13804
Empowering GraphRAG with Knowledge Filtering and Integration
In recent years, large language models (LLMs) have revolutionized the field of natural language processing. However, they often suffer from knowledge gaps and hallucinations. Graph retrieval-augmented...
arxiv.org
March 19, 2025 at 5:51 AM
“Experimental results demonstrate that MES-RAG significantly improves both accuracy and recall, highlighting its effectiveness in advancing the security and utility of question-answering, increasing accuracy to 0.83 (+0.25) on targeted task”
March 19, 2025 at 7:58 AM
Reposted by Tamsin
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

Reviews LLM ensemble techniques across weight merging, knowledge fusion, mixture of experts, and more.

📝 arxiv.org/abs/2503.13505
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Generative pretrained transformers (GPT) are the common large language models (LLMs) used for generating text from natural language inputs. However, the fixed properties of language parameters in indi...
arxiv.org
March 19, 2025 at 5:55 AM
“if researchers spend more time considering new combinations of neuromorphic hardware, architectures and algorithms, they could open up new and intriguing possibilities for both AI and computing”
March 18, 2025 at 9:22 PM
“we highlight emerging research directions and opportunities for improving RAG systems, such as enhanced retrieval efficiency, model interpretability, and domain-specific adaptations”
A Survey on Knowledge-Oriented Retrieval-Augmented Generation

Surveys RAG from a knowledge-centric perspective, examining fundamental components, advanced techniques, evaluation methods, and real-world applications.

📝 arxiv.org/abs/2503.10677
👨🏽‍💻 github.com/USTCAGI/Awes...
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) has gained significant attention in recent years for its potential to enhance natural language understanding and generation by combining large-scale retrieval syst...
arxiv.org
March 17, 2025 at 5:40 AM
“These findings highlight the complexities of human-AI collaboration in creative tasks. While AI can boost productivity and create content that appeals to a broad audience, human creativity remains crucial for content that connects on a deeper level”
March 17, 2025 at 5:37 AM
Reposted by Tamsin
The data so far on AI-as-a-tutor shows just letting students use AI chatbots can undermine education because the AI gives the illusion of learning

But AIs properly prompted to act like tutors, especially with instructor support, seem to be able to boost learning a lot through customized instruction
March 17, 2025 at 4:19 AM
Reposted by Tamsin
Can we get better problem-specific solver configurations without the big computational price tag?

In this paper we show that we can thanks to Large Language Models! Why LLMs? They can identify useful optimization structure and have a lot of built in math programming knowledge!
March 16, 2025 at 5:44 PM
Reposted by Tamsin
Super excited about this new work with Yingxi Li, Anders Wikun, @ellen-v.bsky.social, and Madeleine Udell forthcoming at CPAIOR2025:

LLMs for Cold-Start Cutting Plane Separator Configuration

🔗: arxiv.org/abs/2412.12038
LLMs for Cold-Start Cutting Plane Separator Configuration
Mixed integer linear programming (MILP) solvers ship with a staggering number of parameters that are challenging to select a priori for all but expert optimization users, but can have an outsized impa...
arxiv.org
March 16, 2025 at 5:38 PM
Reposted by Tamsin
I'm currently updating my lecture notes and course material for the second iteration of my course on randomized algorithms. (Main "big" changes planned: upgrading Chapters 8 and 9)

If you have any suggestions or requests, please reach out!
ccanonne.github.io/teaching/COM... #TCSSky
Content for the COMP4270 and COMP5270 Course on “Randomised and Advanced Algorithms” at the University of Sydney ##
ccanonne.github.io
March 12, 2025 at 12:29 AM
Reposted by Tamsin
Trying Claude Code for some tasks. Paradoxically, it's most expensive when it doesn't work because it fails, then tries a couple of times again, burning through tokens.

So sometimes it's 20 cents for saving you 20 minutes of work.

Other times it's $1 for wasting 10 minutes.
March 12, 2025 at 9:06 AM
“Inspired by the human cognitive process of decomposing complex problems, our framework introduces a 3 stage methodology:

decomposition,
retrieval, and
reasoning with self-verification.

By integrating these components, CogGRAG enhances the accuracy of LLMs in complex problem solving”
Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving

Introduces a graph-based RAG framework that mimics human cognitive processes through question decomposition and self-verification to enhance complex reasoning in KGQA tasks.

📝 arxiv.org/abs/2503.06567
Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving
Large language models (LLMs) have demonstrated transformative potential across various domains, yet they face significant challenges in knowledge integration and complex problem reasoning, often leadi...
arxiv.org
March 11, 2025 at 5:40 AM