Claude 4 Sonnet solved 26% of Kotlin-bench tasks, outperforming OpenAI's o3.
Claude 4 Sonnet & Opus are available in Firebender today for all users of JetBrains IDEs. Try them out and let us know what you think!
Claude 4 Sonnet solved 26% of Kotlin-bench tasks, outperforming OpenAI's o3.
Claude 4 Sonnet & Opus are available in Firebender today for all users of JetBrains IDEs. Try them out and let us know what you think!
agent benchmarks coming soon for kotlin-bench
agent benchmarks coming soon for kotlin-bench
TL;DR: Grok 3 is a very capable coding model for Android & Kotlin development. GPT-4.1 shows improvement but still trails behind other major competitors.
See the full leaderboard here:
firebender.com/leaderboard
TL;DR: Grok 3 is a very capable coding model for Android & Kotlin development. GPT-4.1 shows improvement but still trails behind other major competitors.
See the full leaderboard here:
firebender.com/leaderboard
Gemini 2.5 topped the leaderboard solving 14% of issues, with Claude 3.7 thinking solving 12% in 2nd place.
Code, datasets, and results here: firebender.com/blog/kotlin-...
Gemini 2.5 topped the leaderboard solving 14% of issues, with Claude 3.7 thinking solving 12% in 2nd place.
Code, datasets, and results here: firebender.com/blog/kotlin-...
1. Absolutely love how it fixes it's own errors. 😂
2. Autocomplete feels much faster than Copilot.
3. Very eager to make changes outside of the scope of the file I'm working on. Might be user error and might be fixable with rules
1. Absolutely love how it fixes it's own errors. 😂
2. Autocomplete feels much faster than Copilot.
3. Very eager to make changes outside of the scope of the file I'm working on. Might be user error and might be fixable with rules