Shikhar Murty
@shikharmurty.bsky.social
470 followers 120 following 24 posts
Final year PhD Student in Computer Science @Stanford Work on: - Compositionality, syntax (language structure) - Web Agents: Synthetic data, tree search, exploration (language interpretation)
Posts Media Videos Starter Packs
shikharmurty.bsky.social
“casual interception” as defined in \citep{}…
Reposted by Shikhar Murty
zhuhao.me
Ever dreamed of AI agents learning through interacting with the open world unsupervisedly? Our latest preprint introduces NNetNav-Live which collects training data through exploration on real websites and hindsight labeling, which produces a SOTA OSS agent.
shikharmurty.bsky.social
controlling a browser / computer!
but requires a bit more tooling to set it up.
shikharmurty.bsky.social
Please check out our paper for more details: arxiv.org/pdf/2410.02907

And our code if you want a NNetNav-ed model for your own domain:
github.com/MurtyShikhar...

Done with collaborators: @zhuhao.me, Dzmitry Bahdanau and @chrmanning.bsky.social
arxiv.org
shikharmurty.bsky.social
We find that cross-website robustness is limited, and almost always, performance goes up from incorporating in-domain nnetnav data. This makes it even more important to work on unsupervised learning for agents - how are you going to collect human data for *any* website? [6/n]
shikharmurty.bsky.social
We use this data for SFT-ing LLama3.1-8b. Our best models outperform zero-shot GPT-4 on both WebArena and WebVoyager, and reach SoTA performance among unsupervised methods for both datasets [5/n]
shikharmurty.bsky.social
Main ideas behind NNetNav exploration
1 complex goals have intermediate subgoals thus complex trajectories must have meaningful sub-trajectories
2 Use an LM instruction relabeler + judge to test if trajectory-so-far is meaningful. If yes, continue exploring, otherwise prune [3/n]
shikharmurty.bsky.social
NNetNav uses a structured exploration method to efficiently search and collect traces on live-websites, which are retroactively labeled into instructions, finding a strikingly diverse set of workflows for any website (e.g. like this plot) [2/n]
shikharmurty.bsky.social
Want to make a browser agent for *any* domain like banking or healthcare?
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
✅ OSS SoTA on WebVoyager
✅ world's smallest high-performing web-agent
Try it here: nnetnav.dev
shikharmurty.bsky.social
going to stay off twitter for my own mental health. something has gone horribly wrong with that platform.
shikharmurty.bsky.social
Couldn't make it to NeurIPS due to work, but do check out our workshop happening in West Ballroom B. Lots of cool things to come, including a very fun panel!
nouhadziri.bsky.social
Super excited today for the System 2 Reasoning at Scale workshop, come join us to discover how to equip AI systems with reasoning that's optimized for renewable energy and not fossil fuel 🔥🚀

⏰When? today, 9am-5:30pm
📍West Ballroom B

s2r-at-scale-workshop.github.io
#NeurIPS2024
Reposted by Shikhar Murty
robertcsordas.bsky.social
Come visit our poster "MoEUT: Mixture-of-Experts Universal Transformers" on Friday at 4:30 in East Exhibit Hall A-C #1907 on #NeurIPS2024. With Kazuki Irie, Jürgen Schmidhuber, Christopher Potts and @chrmanning.bsky.social.
Reposted by Shikhar Murty
stanfordnlp.bsky.social
The extraordinary recent takeover of ML/AI by #NLP is well-known but insufficiently reflected on.

Look at the @neuripsconf.bsky.social tutorials in 2024!

neurips.cc/virtual/2024...

14 tutorials; 6 have "LLM" in the title; 4 more cover foundation models, with large NLP coverage. That's > 70% 😲
NeurIPS 2024 TutorialsNeurIPS 2024
neurips.cc
Reposted by Shikhar Murty
paulsoulos.bsky.social
🚨 Thrilled to share that Compositional Generalization Across Distributional Shifts with Sparse Tree Operations received a spotlight award at #NeurIPS2024! 🌟 I'll present a poster on Tuesday and give an invited lightning talk at the System 2 Reasoning Workshop on Sunday. 🧵👇
Reposted by Shikhar Murty
alex-lacoste.bsky.social
🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.
AgentLab diagram.

The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights:

Core Agent Features:

Dynamic Prompting and a Unified LLM API for interacting with large language models.
BrowserGym Platform:

A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others.
Key Features:

Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces.
Blue elements represent AgentLab components.
shikharmurty.bsky.social
Folks, I'm not going to be at Neurips this year, but we have an *awesome* workshop that i'm super proud of.

Go attend, and use the link below to ask all of your burning questions about LLM reasoning, agents and compositionality!
nouhadziri.bsky.social
🎊Excited for #neurips2024 and our "System 2 Reasoning at Scale" workshop. We have an excited lineup of speakers who will answer your most burning questions about AI and reasoning 🚀

🔥Got spicy questions? Submit & vote here:
app.sli.do/event/dJNU63...
Join Slido: Enter #code to vote and ask questions
Participate in a live poll, quiz or Q&A. No login required.
app.sli.do
Reposted by Shikhar Murty
nouhadziri.bsky.social
🎊Excited for #neurips2024 and our "System 2 Reasoning at Scale" workshop. We have an excited lineup of speakers who will answer your most burning questions about AI and reasoning 🚀

🔥Got spicy questions? Submit & vote here:
app.sli.do/event/dJNU63...
Join Slido: Enter #code to vote and ask questions
Participate in a live poll, quiz or Q&A. No login required.
app.sli.do
shikharmurty.bsky.social
I also wear the AI agents researcher hat. Can't say i'm similarly impressed by reviewers in that community...
shikharmurty.bsky.social
ACL syntax track reviewers >> almost any other conference.

These folks care about their sub-field and i learn something new every time!
shikharmurty.bsky.social
ACL syntax track reviewers >> almost any other conference.

These folks care about their sub-field and i learn something new every time!
shikharmurty.bsky.social
Now, reviewers are upset if we only finetune sub 10B parameter models!
shikharmurty.bsky.social
for more context: we are training the probe on sentences from PTB / BLIMP
shikharmurty.bsky.social
thx for sharing, though semantic parsing almost certainly benefits from modeling syntax :)
shikharmurty.bsky.social
SRL probe still rewards hidden states that model dependency relations, no? would like a probe thats agnostic to how well the underlying network models syntax
shikharmurty.bsky.social
could i get added? thx for making this!!