Show HN: I Built an Open Source API with Insanely Fast Whisper and Fly GPUs
Hi HN! Since the launch of JigsawStack.com, we've been trying to dive deeper into fully managed AI APIs built and fine tuned for specific use cases. Audio/video transcription was one of the more basic things and we wanted the best open source model at this point it is OpenAI's whisper large v3 model based on the number of languages it supports and its accuracy. The thing is, the model is huge and requires tons of GPU power for it to run efficiently at scale. Even OpenAI doesn't provide an API for their best transcription model while only providing whisper v2 at a pretty high price. I tried running the whisper large v3 model on multiple cloud providers from Modal.com, Replicate, and Hugging faces dedicated interface and it takes a long time to transcribe any content about ~30mins long for 150mins of audio and this doesn't include the machine startup time for on-demand GPUs. Keeping in mind at JigsawStack we aim to return any heavy computation under 25s or 2mins
DeepCamp AI