whisper.cpy

Python wrapper for whisper.cpp

Highlight

Lightweight, using ctypes.CDLL to call functions from the libwhisper shared library.
Migrate the whisper-stream functions to deal with live streaming case for async-processing

Index

Preparing
Usage
- Basic Audio Transcribe and VAD
- Live Streaming
License

Preparing

1. Prepare `whisper.cpp` library

Clone whisper.cpp, then build it

# clone whisper.cpp
git clone https://github.com/ggml-org/whisper.cpp

# nevagation into this folder
cd whisper.cpp/

# checkout the stable version (current supporting)
git checkout v1.8.0

# build whisper.cpp
cmake -B build
cmake --build build --config Release

Download ggml models

# ASR model
sh ./models/download-ggml-model.sh [tiny|base|small|large]

# VAD model
sh ./models/download-vad-model.sh silero-v5.1.2

2. Install `whisper.cpy`

Install from source:

pip install git+https://github.com/fann1993814/whisper.cpy

Usage

Basic Audio Transcribe and Voice Activity Detection

Follow below steps, and trace trancribe.py

1. Share library, model, and testing audio setting

# WHISPER_CPP_PATH is the whisper.cpp project location

audio_wav = f"{WHISPER_CPP_PATH}/samples/jfk.wav"
asr_model_path = f"{WHISPER_CPP_PATH}/models/ggml-tiny.bin"
vad_model_path = f"{WHISPER_CPP_PATH}/models/ggml-silero-v5.1.2.bin"
library_path = f"{WHISPER_CPP_PATH}/build/src/libwhisper.dylib" # Mac: dylib, Linux: so, Win: dll

2. Read testing audio of whisper.cpp

import soundfile as sf

data, sr = sf.read(audio_wav, dtype='float32')

3. Load library and model with whisper.cpy, and transcribe, and get transcript results

from whispercpy import WhisperCPP
from whispercpy.utils import to_timestamp


model = WhisperCPP(library_path, asr_model_path,
                   vad_model_path, use_gpu=True, verbose=False)

print('--------- Lib Version ---------')
print(model.get_version())

# --------- Lib Version ---------
# Ver: 1.8.0

print('--------- VAD Result ----------')

for segment in model.vad(data):
    print(f'[{to_timestamp(segment.t0, False)}' +
          " --> " + f'{to_timestamp(segment.t1, False)}]')

# --------- VAD Result ----------
# [00:00:00.290 --> 00:00:02.210]
# [00:00:03.300 --> 00:00:03.770]
# [00:00:04.000 --> 00:00:04.350]
# [00:00:05.380 --> 00:00:07.650]
# [00:00:08.160 --> 00:00:10.590]

print('--------- ASR Result ----------')

for segment in model.transcribe(data, language='en', beam_size=5, token_timestamps=True):
    print(f'[{to_timestamp(segment.t0, False)}' +
          " --> " + f'{to_timestamp(segment.t1, False)}] ' + segment.text)
    print('--------- Token Info ----------')
    print('\n'.join([f'[{to_timestamp(token.t0, False)}' +
          " --> " + f'{to_timestamp(token.t1, False)}] {token.text}' for token in segment.tokens]))
    print('-------------------------------')

# --------- ASR Result ----------
# [00:00:00.000 --> 00:00:10.400]  And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
# --------- Token Info ----------
# [00:00:00.000 --> 00:00:00.000] [_BEG_]
# [00:00:00.320 --> 00:00:00.320]  And
# [00:00:00.330 --> 00:00:00.530]  so
# [00:00:00.680 --> 00:00:00.740] ,
# [00:00:00.740 --> 00:00:00.950]  my
# [00:00:00.950 --> 00:00:01.590]  fellow
# [00:00:01.590 --> 00:00:02.100]  Americans
# [00:00:02.550 --> 00:00:03.000] ,
# [00:00:03.290 --> 00:00:03.650]  ask
# [00:00:04.010 --> 00:00:04.280]  not
# [00:00:04.650 --> 00:00:05.200]  what
# [00:00:05.410 --> 00:00:05.560]  your
# [00:00:05.650 --> 00:00:06.410]  country
# [00:00:06.410 --> 00:00:06.750]  can
# [00:00:06.750 --> 00:00:06.920]  do
# [00:00:07.010 --> 00:00:07.490]  for
# [00:00:07.490 --> 00:00:07.970]  you
# [00:00:08.170 --> 00:00:08.170] ,
# [00:00:08.190 --> 00:00:08.430]  ask
# [00:00:08.430 --> 00:00:08.750]  what
# [00:00:08.910 --> 00:00:09.040]  you
# [00:00:09.040 --> 00:00:09.350]  can
# [00:00:09.350 --> 00:00:09.500]  do
# [00:00:09.500 --> 00:00:09.710]  for
# [00:00:09.720 --> 00:00:09.980]  your
# [00:00:09.990 --> 00:00:10.350]  country
# [00:00:10.470 --> 00:00:10.500] .
# [00:00:10.500 --> 00:00:10.500] [_TT_525]
# -------------------------------

to_timestamp can translate the time unit from whisper.cpp into a formal repesenation

Live Streaming

Follow below steps, and trace live.py

Note: for realtime inference,

tiny/base/small for cpu
medium/large/large-v2/large-v3 for gpu

1. Load core engine and steaming decoder with library and model

from whispercpy import WhisperCPP, WhisperStream

core = WhisperCPP(lib_path, model_path, use_gpu=False)
asr = WhisperStream(core, language='en', return_token=True)

2. Callback setting

count = 0

def callback(indata, frames, time, status):
    global count

    chunk = indata.copy().tobytes()
    asr.pipe(chunk)
    transcript = asr.get_transcript()
    transcripts = asr.get_transcripts()

    if len(transcripts) > count:
        print("\r"+transcripts[-1].text)
        print('--')
        count += 1
    else:
        print(f"\r{transcript.text}", end="", flush=True)

asr.pipe: a threading function for async to process audio for transcribing continuously
asr.get_transcript: get the current transcirption
asr.get_transcripts: get whole transcirptions

3. Microphone recording setting

import sounddevice as sd
from whispercpy.constant import STREAMING_ENDING

samplerate = 16000
block_duration = 0.25
block_size = int(samplerate * block_duration)
channels = 1

# Recording
try:
    with sd.InputStream(
        samplerate=samplerate,
        channels=channels,
        callback=callback, blocksize=block_size, dtype='float32'):
        print("🎤 Recording for ASR... Press Ctrl+C to stop.")
        while True:
            sd.sleep(1000)
except KeyboardInterrupt:
    print("⏹️ Recording stopped.")
    # send end signal
    asr.pipe(STREAMING_ENDING).join()

# Result
#
# 🎤 Recording for ASR... Press Ctrl+C to stop.
# This is my voice test.
# --
# Can you hear me?
# --
# ^C⏹️ Recording stopped.
# [00:00:00.800 --> 00:00:11.000]  This is my voice test.
# -------------------------------
# [00:00:00.800 --> 00:00:00.800] [_BEG_]
# [00:00:00.800 --> 00:00:01.490]  This
# [00:00:01.850 --> 00:00:01.850]  is
# [00:00:01.890 --> 00:00:02.200]  my
# [00:00:02.200 --> 00:00:03.080]  voice
# [00:00:03.080 --> 00:00:03.710]  test
# [00:00:03.710 --> 00:00:08.290] .
# [00:00:08.300 --> 00:00:11.000] [_TT_150]
# -------------------------------
# [00:00:11.300 --> 00:00:21.500]  Can you hear me?
# -------------------------------
# [00:00:11.300 --> 00:00:11.300] [_BEG_]
# [00:00:11.300 --> 00:00:11.790]  Can
# [00:00:12.050 --> 00:00:12.280]  you
# [00:00:12.280 --> 00:00:12.940]  hear
# [00:00:12.940 --> 00:00:13.070]  me
# [00:00:13.070 --> 00:00:17.960] ?
# [00:00:17.960 --> 00:00:21.500] [_TT_100]
# -------------------------------

STREAMING_ENDING: a singal for stopping transcribing, and use join() for waiting last thread complete.

License

This project follows whisper.cpp license as MIT

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
examples		examples
whispercpy		whispercpy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

whisper.cpy

Highlight

Index

Preparing

1. Prepare `whisper.cpp` library

2. Install `whisper.cpy`

Usage

Basic Audio Transcribe and Voice Activity Detection

1. Share library, model, and testing audio setting

2. Read testing audio of whisper.cpp

3. Load library and model with whisper.cpy, and transcribe, and get transcript results

Live Streaming

1. Load core engine and steaming decoder with library and model

2. Callback setting

3. Microphone recording setting

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

fann1993814/whisper.cpy

Folders and files

Latest commit

History

Repository files navigation

whisper.cpy

Highlight

Index

Preparing

1. Prepare whisper.cpp library

2. Install whisper.cpy

Usage

Basic Audio Transcribe and Voice Activity Detection

1. Share library, model, and testing audio setting

2. Read testing audio of whisper.cpp

3. Load library and model with whisper.cpy, and transcribe, and get transcript results

Live Streaming

1. Load core engine and steaming decoder with library and model

2. Callback setting

3. Microphone recording setting

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

1. Prepare `whisper.cpp` library

2. Install `whisper.cpy`

Packages