📍{NYC, SFO, YYZ}
🔗 https://beirami.github.io/
Current LLM agents lack reliability, creating a gap between demos and production. We solve this by automating the complex workflow of debugging, evaluation, and iteration required to make agents robust. 👇
Current LLM agents lack reliability, creating a gap between demos and production. We solve this by automating the complex workflow of debugging, evaluation, and iteration required to make agents robust. 👇
Current LLM agents lack reliability, creating a gap between demos and production. We solve this by automating the complex workflow of debugging, evaluation, and iteration required to make agents robust. 👇
Current LLM agents lack reliability, creating a gap between demos and production. We solve this by automating the complex workflow of debugging, evaluation, and iteration required to make agents robust. 👇
Current LLM agents lack reliability, creating a gap between demos and production. We solve this by automating the complex workflow of debugging, evaluation, and iteration required to make agents robust. 👇
- Thousands are reported dead in 72 hours.
- We are past the point of solidarity. Empty words do not stop bullets. Action does.
- The world must intervene now.
- Thousands are reported dead in 72 hours.
- We are past the point of solidarity. Empty words do not stop bullets. Action does.
- The world must intervene now.
This must end. Iranian people must prevail!
This must end. Iranian people must prevail!
When you’re choosing an internship or a job, what you work on and who you work with matter way more than the logo. Don’t optimize for brands. Become the brand!
When you’re choosing an internship or a job, what you work on and who you work with matter way more than the logo. Don’t optimize for brands. Become the brand!
–building reliable software on top of unreliable LLM primitives
–statistical evaluation of real-world deployments of LLM-based systems
I’m speaking about this on two NeurIPS workshop panels:
🗓️Saturday – Reliable ML Workshop
🗓️Sunday – LLM Evaluation Workshop
–building reliable software on top of unreliable LLM primitives
–statistical evaluation of real-world deployments of LLM-based systems
I’m speaking about this on two NeurIPS workshop panels:
🗓️Saturday – Reliable ML Workshop
🗓️Sunday – LLM Evaluation Workshop
- Wow, I won a NeurIPS award?!
- …runner-up, but I’ll take it.
- Wait, I didn’t submit a paper.
- Ah, I’m chairing the session and I’m supposed to give the award.
Huge congratulations to the actual winners and runners-up!
- Wow, I won a NeurIPS award?!
- …runner-up, but I’ll take it.
- Wait, I didn’t submit a paper.
- Ah, I’m chairing the session and I’m supposed to give the award.
Huge congratulations to the actual winners and runners-up!
If you are excited about AI engineering (orchestration, evals, and optimizing scaffolds), we are hiring!
On Saturday I’ll be on panels at the Reliable ML & UniReps workshops.
If you are excited about AI engineering (orchestration, evals, and optimizing scaffolds), we are hiring!
On Saturday I’ll be on panels at the Reliable ML & UniReps workshops.
If you are excited about AI engineering (orchestration, evals, and optimizing scaffolds), we are hiring!
On Saturday I’ll be on panels at the Reliable ML & UniReps workshops.
Layoffs can be emotionally challenging for everyone, whether you are directly affected or not.
Layoffs can be emotionally challenging for everyone, whether you are directly affected or not.
Focus on what you have accomplished and what you are excited about doing next; not just where you did it!
Focus on what you have accomplished and what you are excited about doing next; not just where you did it!
When a paper has a senior mentor and a junior mentee, the senior author must make sure the claims are correct and well supported. They must check every claim and gate the submission until it meets that bar.
When a paper has a senior mentor and a junior mentee, the senior author must make sure the claims are correct and well supported. They must check every claim and gate the submission until it meets that bar.
When a paper has a senior mentor and a junior mentee, the senior author must make sure the claims are correct and well supported. They must check every claim and gate the submission until it meets that bar.
When a paper has a senior mentor and a junior mentee, the senior author must make sure the claims are correct and well supported. They must check every claim and gate the submission until it meets that bar.
When a paper has a senior mentor and a junior mentee, the senior author must make sure the claims are correct and well supported. They must check every claim and gate the submission until it meets that bar.
Make enough assumptions and narrow down the claim, then prove a narrow result with caveats. Present it as broad, hide the caveats, and declare “XYZ is provable!”
Make enough assumptions and narrow down the claim, then prove a narrow result with caveats. Present it as broad, hide the caveats, and declare “XYZ is provable!”
We should not judge commitment by hours, especially in research. We should look for thoughtful work and steady progress.
We should not judge commitment by hours, especially in research. We should look for thoughtful work and steady progress.
Start with ONE example to validate the hypothesis, verify context, debug the design, then scale.
Start with ONE example to validate the hypothesis, verify context, debug the design, then scale.