Overview
This guide shows how to integrate Whisper MCP server (mcp-server-whisper) with Claude Code to enable direct audio transcription. With this integration, you can:
- Drop voice recordings → Claude transcribes → Get text files back
- Create content faster by speaking instead of typing
- Work with AI more efficiently using natural voice input
Real-world example: Tennis coaching – transcribe 40-minute match recordings for analysis
Why Customize This MCP Server?
The original Whisper MCP server uses OpenAI’s paid API for transcription. While this works well for small files, it has several limitations that make it expensive and restrictive for high-volume use:
Limitations of Default Setup
Cost and Privacy Concerns: OpenAI charges per minute of audio transcription, which adds up quickly when processing multiple long recordings. Additionally, all your audio data is sent to OpenAI’s servers, raising privacy concerns for sensitive content like business meetings or personal recordings.
File Size Restrictions: The standard setup has a 25MB file size limit, requiring you to manually compress or split larger files before transcription. This becomes tedious when working with longer recordings like 40-minute tennis matches or podcast episodes.
Limited Control: You’re completely dependent on OpenAI’s service availability and pricing changes. If their API goes down or prices increase, your workflow stops.
Benefits of Custom Endpoint
Drop-and-Process Simplicity: Just drop audio files into the ./audio_file directory and ask Claude to transcribe. The server handles everything automatically – chunking, compression, transcription, and combining results into a single text file.
Full Control & Customization: Since you own the infrastructure, you can modify the transcription process, add custom features, and never worry about third-party service availability or pricing changes.
This makes it ideal for high-volume use cases like transcribing multiple 40-minute tennis matches per week, podcast production, or any scenario where cost, privacy, and file size flexibility matter.
Prerequisites
Before starting, ensure you have:
- ✅ Node.js installed
- ✅ MCP CLI installed
- ✅ Python & uv installed
- ✅ Git (to clone repository)
Quick verification:
node --version
mcp --version
uv --version
git --version
Installation Steps
Step 1: Clone Repository
git clone https://github.com/arcaputo3/mcp-server-whisper.git
cd mcp-server-whisper
Step 2: Install Python Dependencies
# Install dependencies with uv
uv sync
# Verify installation
uv run pytest # Optional: run tests
Step 3: Create Audio Directory
# Create directory for audio files
mkdir -p ./audio_file
# Verify directory was created
ls -la ./audio_file
Configuration Files
The Whisper MCP integration requires two configuration files:
1. ~/.claude.json (MCP Server Configuration)
This is the main Claude Code configuration file that defines the Whisper MCP server.
{
"mcpServers": {
"whisper": {
"command": "mcp",
"args": ["dev", "/absolute/path/to/mcp-server-whisper/src/mcp_server_whisper/server.py"],
"env": {
"USE_CUSTOM_WHISPER": "true",
"CUSTOM_WHISPER_ENDPOINT": "https://whisper.adventuretube.net/whisper",
"AUDIO_FILES_PATH": "./audio_file"
}
}
}
}
Key Points:
command: Uses MCP CLI to run the Python serverargs: Path to server.py (⚠️ must be absolute path for user-scoped servers)env.USE_CUSTOM_WHISPER: Set totrueto use AdventureTube endpointenv.CUSTOM_WHISPER_ENDPOINT: Your custom Whisper API endpointenv.AUDIO_FILES_PATH: Path to audio files directory
2. .env File (Environment Variables)
Create .env file in project root:
AUDIO_FILES_PATH=./audio_file
USE_CUSTOM_WHISPER=true
CUSTOM_WHISPER_ENDPOINT=https://whisper.adventuretube.net/whisper
Note: Environment variables can be set in either .env file OR ~/.claude.json. Choose based on your preference.
Code Modifications Made
To use AdventureTube Whisper endpoint instead of OpenAI:
1. Added httpx for HTTP Requests
import httpx
2. Added Configuration Variables
CUSTOM_WHISPER_ENDPOINT = os.getenv("CUSTOM_WHISPER_ENDPOINT", "https://whisper.adventuretube.net/whisper")
USE_CUSTOM_WHISPER = os.getenv("USE_CUSTOM_WHISPER", "false").lower() == "true"
3. Created Custom Whisper Function
async def transcribe_with_custom_whisper(file_path: Path) -> dict[str, Any]:
"""Transcribe audio using AdventureTube Whisper endpoint."""
# Handles 10-minute chunking automatically
# Sends to AdventureTube endpoint via HTTP POST
# Returns combined transcript
4. Added httpx Dependency
mcp = FastMCP("whisper", dependencies=["openai", "pydub", "aiofiles", "httpx"])
Architecture: How It Works
Component Flow
User → Claude Code → MCP CLI → Whisper MCP Server → AdventureTube Whisper API → Transcript
Key Components
1. Claude Code (MCP Client)
- User interface where commands are issued
- Calls Whisper tools using MCP protocol
- Displays results to user
2. MCP CLI (Bridge)
- Launches Whisper MCP server as stdio process
- Manages server lifecycle
- Routes tool calls between Claude Code and server
3. Whisper MCP Server (Translation Layer)
- Speaks MCP protocol with Claude Code (stdio transport)
- Processes audio files (chunking, compression)
- Speaks HTTP with AdventureTube Whisper API
- Returns formatted transcripts
4. AdventureTube Whisper API
- Custom Whisper endpoint
- Performs actual transcription
- Returns JSON with transcript text
How a Request Flows
- Startup: MCP CLI launches Whisper server via Python
- Tool Call: User asks Claude to transcribe audio
- MCP Protocol: Claude Code sends MCP tool request to server
- File Processing: Server chunks large files into 10-minute segments
- Compression: Converts chunks to MP3 format (9.2MB per chunk)
- HTTP Request: Server sends POST to AdventureTube endpoint
- Transcription: AdventureTube processes audio and returns text
- Combine Results: Server combines all chunk transcripts
- Save Output: Individual chunk .txt files + combined transcript
- MCP Format: Server formats response for Claude Code
- Display: User sees result in Claude Code interface
Using Whisper in Claude Code
Once installed and configured, you can use natural language to transcribe audio:
Basic Transcription
> Claude, transcribe the audio file in ./audio_file
Claude will:
- Find the audio file
- Send it to Whisper MCP server
- Return the transcript as text
Transcribe Latest File
> Transcribe my latest recording
Claude will automatically find and transcribe the most recent audio file.
Transcribe and Analyze
> Transcribe match_recording.WAV and analyze the key points
Claude will:
- Transcribe the audio
- Analyze the content
- Provide insights and summary
Batch Processing
> Find all my recordings from this week and transcribe them
Claude will:
- List matching files
- Batch transcribe all files
- Save multiple text files
Features Added by Customization
These features are NOT in the original Whisper MCP server:
1. Automatic 10-Minute Chunking
- Prevents timeout errors on long files
- Splits audio into manageable segments
- Processes each chunk independently
2. MP3 Compression
- Reduces file size (9.2MB vs 230MB chunks)
- Faster uploads to AdventureTube
- Saves bandwidth
3. Individual Chunk Transcripts
- Each chunk gets its own .txt file
- Useful for debugging failed chunks
- Allows partial transcription recovery
4. Combined Transcript with Segment Markers
- Merges all chunks into single file
- Adds
[Segment N]markers - Saved in
text_file/directory
5. Graceful Failure Handling
- Continues processing even if some chunks fail
- Reports which chunks succeeded/failed
- Saves partial results
6. Progress Tracking
- Real-time updates for each chunk
- Shows processing status
- Provides transparency
Real-World Example: Tennis Coaching
The Challenge
- Record 40-minute tennis match commentary
- Need text transcription for analysis
- Want to track scores, shots, and player observations
- Manual transcription takes hours
The Solution
- Record Match: Use phone/recorder to capture live commentary
- Drop Audio File: Save
.WAVfile to./audio_file/directory - Ask Claude: “Transcribe match_recording.WAV”
- Get Results: Automated transcription in minutes
File Structure Created
audio_file/
├── DJI_32_20251027_190850.WAV (original 230MB file)
└── chunks_DJI_32_20251027_190850/
├── DJI_32_20251027_190850_chunk_01.mp3
├── DJI_32_20251027_190850_chunk_01.txt
├── DJI_32_20251027_190850_chunk_02.mp3
├── DJI_32_20251027_190850_chunk_02.txt
└── ...
text_file/
└── DJI_32_20251027_190850.txt (combined transcript)
Tennis Data Successfully Captured
The transcription system captures:
- Score tracking: “15-0”, “love-15”, “40-30”, “deuce”
- Shot analysis: “backhand down the line”, “forehand winner”, “double fault”
- Player observations: “UTR 9.8”, “consistent serve”, “weak backhand”
- Match progression: Game-by-game commentary
- Rally sequences: Shot-by-shot descriptions
- Tactical notes: “Player A targeting opponent’s backhand”
Results
- Processing time: ~15 minutes for 40-minute audio
- Chunks processed: 4 chunks (3 × 10min + 1 × 10min)
- Success rate: 100% (all chunks transcribed)
- Output quality: Accurate tennis terminology recognition
Troubleshooting
Issue #1: Server Won’t Connect
Symptoms:
/mcp
✘ failed · Failed to reconnect to whisper
Causes:
- Relative paths used instead of absolute paths
- MCP CLI not found in PATH
- Python dependencies not installed
Solutions:
# 1. Use absolute paths in ~/.claude.json
"args": ["/Users/chrislee/mcp-server-whisper/src/mcp_server_whisper/server.py"]
# 2. Verify MCP CLI
which mcp
# 3. Reinstall dependencies
uv sync
Issue #2: Custom Endpoint Not Working
Symptoms:
- Transcription fails
- Error: “Connection to AdventureTube failed”
Causes:
USE_CUSTOM_WHISPERnot set totrue- Incorrect endpoint URL
- Network connectivity issues
Solutions:
# 1. Check environment variables
echo $USE_CUSTOM_WHISPER
echo $CUSTOM_WHISPER_ENDPOINT
# 2. Verify endpoint is accessible
curl https://whisper.adventuretube.net/whisper
# 3. Test with small audio file first
Issue #3: Large Files Failing
Symptoms:
- Files >25MB fail to process
- “File too large” errors
Solution:
This should NOT happen with the custom endpoint setup. If it does:
- Verify
USE_CUSTOM_WHISPER=trueis set - Check that
transcribe_with_custom_whisperfunction is being called - Look for errors in chunk processing logs
Summary: Working Whisper MCP Setup
Final Architecture
- ✅ Custom stdio MCP server (modified
mcp-server-whisper) - ✅ User-scoped installation (available across all projects)
- ✅ Absolute paths for all file references
- ✅ Direct AdventureTube Whisper API communication
- ✅ Automatic 10-minute chunking for large files
Required Files
/absolute/path/to/mcp-server-whisper/– Cloned repository~/.claude.json– MCP configuration with absolute paths.envor environment variables – Custom endpoint configuration
Configuration Best Practices
| Scope | Path Type | Reason |
|---|---|---|
| User Scope | Absolute paths | Working directory varies by project |
| Project Scope | Relative paths OK | Working directory is predictable (project root) |
Verification Commands
# 1. Check MCP server status
/mcp
# Expected: ✔ whisper - connected
# 2. Test transcription
> Claude, transcribe the latest audio file
# 3. Verify output files
ls -la ./text_file/
ls -la ./audio_file/chunks_*/


