26 million parameters. That’s the number for Needle, a new model from PlatPhorm News that just hit the scene on May 9, 2026. This isn’t another general-purpose chat bot or a model designed to write your next novel. No, Needle is hyper-focused: function-calling, or what some call “tool use.”
The Niche of Needle
PlatPhorm News open-sourced Needle, a model explicitly built to replicate Gemini’s tool-calling capabilities. The headline here isn’t its ability to hold a deep conversation; it’s the fact it does one thing, and aims to do it well, in a very small package. This 26M parameter model runs at some impressive speeds for its size: 6000 tokens per second for prefill and 1200 tokens per second for decoding. Those are numbers that get attention, especially when considering its purpose.
For context, Gemini Scribe 4.8.0 has been out for a while. We’ve seen various iterations of tool-calling from larger models. The new wrinkle here is the “distillation technique” PlatPhorm News used to get this specific functionality into such a compact model. The goal? Cheaper replication of that Gemini technology.
Not a Replacement, But a Partner
Let’s be clear about what Needle is not. It’s not here to replace Kimi 2.7, Claude Haiku, or Gemini Flash 3.1 lite. Those are conversational LLMs. If you’re looking for a model to write marketing copy or brainstorm ideas, Needle isn’t it. Its creators are upfront: this is for situations where an application is mostly tool-calling. Think orchestrating actions, interacting with APIs, or making decisions based on specific external functions.
This specialization makes a lot of sense. The AI space is getting crowded with generalist models, each trying to outdo the other in raw intelligence or creative output. But many real-world AI applications don’t need a poet; they need a reliable, fast, and efficient task executor. That’s where a model like Needle steps in.
Cost and Efficiency
The “cheaper replication” aspect is the real draw. Running massive LLMs for every single function call can become expensive, both in terms of computational resources and actual dollar cost. By distilling the tool-calling intelligence into a 26M parameter model, PlatPhorm News is offering a way to offload that specific task to a smaller, more efficient engine. This could mean significant savings for developers and businesses that rely heavily on AI agents interacting with external systems.
Imagine an AI agent managing your calendar, booking flights, or fetching specific data from a database. These tasks primarily involve calling specific functions with precise arguments. Using a massive, multi-billion parameter model for each of these relatively simple, structured interactions is overkill. Needle aims to be the focused workhorse for these scenarios, leaving the heavy lifting of complex reasoning or creative generation to larger, more general models when truly necessary.
The Future of Specialized AI
Needle’s arrival signals a continued trend toward specialized AI models. As the technology matures, we’re seeing less of a “one model fits all” approach and more of an ecosystem where different models excel at different tasks. This can lead to more efficient, cost-effective, and ultimately more capable AI systems. Instead of trying to force a large language model to do everything, developers can pick and choose the right tool for the job.
The open-source nature of Needle is also noteworthy. It opens the door for broader adoption and community contributions, potentially accelerating its development and integration into various projects. For developers already working with tool-calling in Gemini, Needle offers an alternative that could streamline operations and reduce overhead. It’s not about replacing your current LLM; it’s about optimizing your architecture for specific, function-driven tasks. And in the world of AI, efficiency often translates directly to utility and adoption.
đź•’ Published:
Related Articles
- AI Pre-Approvals Avert a Bureaucratic Blunder
- [SONNET] A aposta de $25 bilhões da CoreWeave mostra por que a infraestrutura em nuvem se tornou uma arma geopolĂtica.
- Beste Plattformen fĂĽr KI-Agents 2026: Ich habe 8 getestet, damit Sie es nicht tun mĂĽssen.
- [SONNETv3] Mistral Lança uma Bomba de Voz no TerritĂłrio da OpenAI’