Lambda launches ‘inference-as-a-service’ API claiming lowest costs in AI industry

Lambda, a 12-year-old company specializing in on-demand GPU services for AI model development, has launched a new Inference API. This API, positioned as the lowest-cost inference service in…

Open original source

Lambda, a 12-year-old company specializing in on-demand GPU services for AI model development, has launched a new Inference API. This API, positioned as the lowest-cost inference service in the industry, allows businesses to deploy AI models into production without managing the underlying compute infrastructure.

The API supports a wide range of cutting-edge large language models (LLMs), including Meta's Llama 3.3 and 3.1, Nous's Hermes-3, and Alibaba's Qwen 2.5, making it a highly accessible option for developers. Pricing is competitive, starting at $0.02 per million tokens for smaller models and scaling up to $0.90 for larger models, and Lambda's pay-as-you-go model eliminates subscriptions and rate limits.

Lambda's strength lies in its extensive GPU infrastructure, built over a decade of providing GPU clusters for training and fine-tuning. This extensive resource allows the company to offer cost-effective solutions, leveraging both older and newer Nvidia GPUs. The new Inference API completes Lambda's full-stack AI development offering, simplifying deployment for developers who have already used the platform for inference tasks.

This comprehensive approach, combined with a pay-as-you-go model and the ability to scale to trillions of tokens monthly, positions Lambda as a compelling alternative to cloud giants. The API's open and flexible architecture is a key selling point. It supports a range of open-source and proprietary models, providing developers with unrestricted access to high-performance inference.

Lambda's commitment to privacy and security is also highlighted, with no user data retention or sharing. This focus on privacy and security is crucial for businesses in industries like media, entertainment, and software development, which are increasingly adopting AI for applications like text summarization and generative content creation.

Ultimately, Lambda's Inference API aims to democratize access to AI by removing cost and infrastructure barriers. By offering a cost-effective, secure, and flexible solution, Lambda is poised to attract a wide range of users, from startups to large enterprises, helping them to more easily integrate and utilize AI models in their applications.

The company's focus on open-source models and multimodal applications further strengthens its position as a key player in the rapidly evolving AI landscape.