Audio & Video Transcription Softwareby Eng. Francesco Barbato

A professional desktop application enabling automatic transcription of audio and video content using OpenAI Whisper. Built entirely in Python with a modern GUI powered by CustomTkinter, and integrated with FFmpeg for media processing. Features a real-time progress bar, splash screen, and dark/light theme with smooth transitions.

Note: This is the first base version of the software, designed to work efficiently with small and medium-sized video or audio files. The current limitation is due to CPU-only processing without GPU acceleration. We are already developing a new cloud-based version to handle larger files and improve overall performance.

🧩 Technologies & Architecture

Python — Core programming language ensuring efficiency and portability.
CustomTkinter — Modern GUI framework with dark/light mode.
FFmpeg — Backend for multimedia decoding and audio extraction.
OpenAI Whisper — AI model for accurate, multilingual transcription.
PyInstaller — Packaging scripts into standalone executables.
Inno Setup — Builds the final Windows installer with branding and icons.

⚙️ Key Functionalities

Real-time transcription for audio and video files in multiple languages.
Support for MP3, MP4, WAV, AVI, and other formats.
Elegant GUI with progress indicators and clean UX flow.
Branded splash screen with smooth animations.
Dynamic dark/light mode support.
Automatic error handling and optimized file management.

🚀 Deployment & Distribution

Distributed as a standalone Windows package built using Inno Setup. Users can install and run the app instantly without extra setup. The latest stable release is available for download below!

Download Software

💡 Development Process

A structured workflow combining design thinking and software engineering best practices.

Design

Initial design phase focusing on UI/UX with Figma and CustomTkinter layout mapping. Established color palette, splash screen, and overall application flow.

Model Integration

Integrated OpenAI Whisper for high-accuracy transcription of multilingual audio and video files. Optimized model usage for efficient performance on desktop.

GUI Development

Implemented a fully responsive interface using CustomTkinter with progress bar synchronization, dark/light theme, and smooth error handling.

Packaging

Packaged the project into an executable using PyInstaller, embedding FFmpeg and resource assets for standalone deployment.

Testing

Conducted extensive debugging across multiple file formats (MP3, WAV, MP4, AVI). Ensured system stability and consistent performance across Windows versions.

Deployment

Created the final installer using Inno Setup, including versioning, custom icons, and auto-cleanup. Deployed as a ready-to-use installer for end users.