Buzz – An amazing AI tool to transcribe Audio to Text with OpenAI's Whisper


How are you all, I hope you are well.
As usual with a new topic.

What is Buzz?

Buzz is a desktop app based on OpenAI's Whisper. It can automatically transcribe audio to text. It has multiple models for transcribing, you just speak and it will convert your audio to text in real time. It is an open source tool that can be run on Windows, macOS, and Linux.

The machine learning model it uses is quite powerful. And through the microphone you can transcribe audio songs, videos. The app you launch will listen to the audio through the microphone and start transcribing.

In addition to audio transcripts, it can also do translations. Select the target language and it will do the rest. At the moment it only supports English. You can only use English as a transcript or translation input.

OpenAI launched Whisper a few days ago. It is an open source neural network that can provide English speech recognition and human level accuracy.

Buzz

GitHub link @ Buzz

How to use Buzz?

Buzz is currently available on GitHub written in Python. You can also run it directly from source or use the developer's binary release.

If you want to run from source, you must have Python and Poetry libraries installed on your system. If there are, issue the following command, it will install the required elements.

poetry install

But if you don't want to go through all this trouble, you can download the binary release and run the app directly. Currently there are Mac, Windows, and Linux versions available.

I will install the Windows version. It is better to say in advance that this is a heavy software so install it only if you have high hardware configuration.

First you need to select your microphone. By default it will be in transcription mode.

After the first run it will download the models in the background. So the first run may be a bit slow. When everything is ready you will get record button and you can start talking. will appear in your text editor. The entire transcription process will depend on various factors.

The accuracy of this software is good enough but not as smooth as Windows 10, 11 voice typing or Speechnotes website.  

last word

Whisper is excellent as a neural network. As a programmer and developer you can use it to create apps with speech to text functionality. Accuracy is ok but not so smooth. However, this is not a limitation of the GUI model. Hopefully the next updates will not have this problem.

So far today, see you soon with a new topic. Until then, stay well.

Next Post Previous Post
No Comment
Add Comment
comment url