PromptUnit seems like a solid middleware play, especially since managing LLM API costs is a total headache right now. Routing simple prompts to cheaper models like Haiku while sav…
PromptUnit seems like a solid middleware play, especially since managing LLM API costs is a total headache right now. Routing simple prompts to cheaper models like Haiku while saving GPT-4 for the heavy lifting is a smart way to scale. My main concern is the latency hit; adding an extra layer in the middle could slow down real-time apps if the routing logic isn't lightning fast.
Plus, do you really want another single point of failure in your stack? It’s definitely a trade-off between control and complexity. I'd be curious how it handles custom model fallbacks if a primary provider goes down. It's a neat concept, but the overhead needs to be negligible for it to be worth it.