User Guide
This guide walks through common tasks in the SaLED UI and with the API.
Upload and transcribe audio
- Open the app at http://localhost:3000.
- Use the transcription panel to select an audio file (MP3/WAV/M4A).
- Choose a Whisper model (start with
tiny.en
for speed, larger models for quality). - Enable/disable speaker diarization as needed.
- Start. Watch progress indicators; segments appear incrementally.
Tips:
- For long files, processing is chunked; progress may advance in bursts.
- If diarization is off, SaLED applies a rule-based fallback to label speakers.
Inspect and navigate transcripts
- Use the waveform and minimap to jump to timestamps.
- The transcript panel shows segments with speaker, start/end, and text.
- Use zoom and playback controls to verify accuracy.
Fix tricky regions with Retranslation
If a region is inaccurate:
- Note the
start_time
andend_time
of the problematic span. - Reprocess just that portion with
/api/retranslate/segment
by uploading a clipped audio file; or - If the original full audio was uploaded for the same job, call
/api/retranslate/segment-from-full?job_id=<job_id>
with the time range and model. - Replace or merge the corrected text in your transcript.
When to escalate model size:
- Background noise, overlapping speakers, or domain-specific jargon.
- Try
small.en
→medium.en
→large-v3(-turbo)
as needed.
Summarize a transcript
- After transcription, send segments to
/api/summary/
with an optionalfull_text
. - Pick
format
:paragraph
,bullet_points
, orkey_points
. - Poll
/api/summary/status/{job_id}
and fetch/api/summary/{job_id}
for the final summary and topics.
Good prompts for summaries:
- Paragraph: broad narrative overview.
- Bullet points: meeting notes or action items.
- Key points: highlights for quick scan.
Chat about the transcript
- Compose a
messages
array with chat history and send alongtranscript
/full_text
to/api/chat/
. - Adjust
temperature
(0.0–1.0) for creativity vs. determinism. - Respect token limits with
max_tokens
for concise answers.
Best practices:
- Ask grounded questions (“What did Speaker 2 request after …?”).
- Provide
full_text
if possible; it reduces reconstruction overhead.
Managing Whisper models locally (CLI)
From backend/saled
:
bash
./cli.py model list
./cli.py model download base.en
./cli.py model delete base.en
./cli.py model update
Keyboard/UX tips
- Use the minimap to scroll large transcripts quickly.
- Toggle diarization for simpler single-speaker content.
- Save versions of transcripts before heavy edits.