Whisper MCP Integration: Custom Audio Transcription Setup

Overview

This guide shows how to integrate Whisper MCP server (mcp-server-whisper) with Claude Code to enable direct audio transcription. With this integration, you can:

Drop voice recordings → Claude transcribes → Get text files back
Create content faster by speaking instead of typing
Work with AI more efficiently using natural voice input

Real-world example: Tennis coaching – transcribe 40-minute match recordings for analysis

Why Customize This MCP Server?

The original Whisper MCP server uses OpenAI’s paid API for transcription. While this works well for small files, it has several limitations that make it expensive and restrictive for high-volume use:

Limitations of Default Setup

Cost and Privacy Concerns: OpenAI charges per minute of audio transcription, which adds up quickly when processing multiple long recordings. Additionally, all your audio data is sent to OpenAI’s servers, raising privacy concerns for sensitive content like business meetings or personal recordings.

File Size Restrictions: The standard setup has a 25MB file size limit, requiring you to manually compress or split larger files before transcription. This becomes tedious when working with longer recordings like 40-minute tennis matches or podcast episodes.

Limited Control: You’re completely dependent on OpenAI’s service availability and pricing changes. If their API goes down or prices increase, your workflow stops.

Benefits of Custom Endpoint

Drop-and-Process Simplicity: Just drop audio files into the ./audio_file directory and ask Claude to transcribe. The server handles everything automatically – chunking, compression, transcription, and combining results into a single text file.

Full Control & Customization: Since you own the infrastructure, you can modify the transcription process, add custom features, and never worry about third-party service availability or pricing changes.

This makes it ideal for high-volume use cases like transcribing multiple 40-minute tennis matches per week, podcast production, or any scenario where cost, privacy, and file size flexibility matter.

Prerequisites

Before starting, ensure you have:

✅ Node.js installed
✅ MCP CLI installed
✅ Python & uv installed
✅ Git (to clone repository)

Quick verification:

node --version
mcp --version
uv --version
git --version

Installation Steps

Step 1: Clone Repository

git clone https://github.com/arcaputo3/mcp-server-whisper.git
cd mcp-server-whisper

Step 2: Install Python Dependencies

# Install dependencies with uv
uv sync

# Verify installation
uv run pytest  # Optional: run tests

Step 3: Create Audio Directory

# Create directory for audio files
mkdir -p ./audio_file

# Verify directory was created
ls -la ./audio_file

Configuration Files

The Whisper MCP integration requires two configuration files:

1. ~/.claude.json (MCP Server Configuration)

This is the main Claude Code configuration file that defines the Whisper MCP server.

{
  "mcpServers": {
    "whisper": {
      "command": "mcp",
      "args": ["dev", "/absolute/path/to/mcp-server-whisper/src/mcp_server_whisper/server.py"],
      "env": {
        "USE_CUSTOM_WHISPER": "true",
        "CUSTOM_WHISPER_ENDPOINT": "https://whisper.adventuretube.net/whisper",
        "AUDIO_FILES_PATH": "./audio_file"
      }
    }
  }
}

Key Points:

command: Uses MCP CLI to run the Python server
args: Path to server.py (⚠️ must be absolute path for user-scoped servers)
env.USE_CUSTOM_WHISPER: Set to true to use AdventureTube endpoint
env.CUSTOM_WHISPER_ENDPOINT: Your custom Whisper API endpoint
env.AUDIO_FILES_PATH: Path to audio files directory

2. .env File (Environment Variables)

Create .env file in project root:

AUDIO_FILES_PATH=./audio_file
USE_CUSTOM_WHISPER=true
CUSTOM_WHISPER_ENDPOINT=https://whisper.adventuretube.net/whisper

Note: Environment variables can be set in either .env file OR ~/.claude.json. Choose based on your preference.

Code Modifications Made

To use AdventureTube Whisper endpoint instead of OpenAI:

1. Added httpx for HTTP Requests

import httpx

2. Added Configuration Variables

CUSTOM_WHISPER_ENDPOINT = os.getenv("CUSTOM_WHISPER_ENDPOINT", "https://whisper.adventuretube.net/whisper")
USE_CUSTOM_WHISPER = os.getenv("USE_CUSTOM_WHISPER", "false").lower() == "true"

3. Created Custom Whisper Function

async def transcribe_with_custom_whisper(file_path: Path) -> dict[str, Any]:
    """Transcribe audio using AdventureTube Whisper endpoint."""
    # Handles 10-minute chunking automatically
    # Sends to AdventureTube endpoint via HTTP POST
    # Returns combined transcript

4. Added httpx Dependency

mcp = FastMCP("whisper", dependencies=["openai", "pydub", "aiofiles", "httpx"])

Architecture: How It Works

Component Flow

User → Claude Code → MCP CLI → Whisper MCP Server → AdventureTube Whisper API → Transcript

Key Components

1. Claude Code (MCP Client)

User interface where commands are issued
Calls Whisper tools using MCP protocol
Displays results to user

2. MCP CLI (Bridge)

Launches Whisper MCP server as stdio process
Manages server lifecycle
Routes tool calls between Claude Code and server

3. Whisper MCP Server (Translation Layer)

Speaks MCP protocol with Claude Code (stdio transport)
Processes audio files (chunking, compression)
Speaks HTTP with AdventureTube Whisper API
Returns formatted transcripts

4. AdventureTube Whisper API

Custom Whisper endpoint
Performs actual transcription
Returns JSON with transcript text

How a Request Flows

Startup: MCP CLI launches Whisper server via Python
Tool Call: User asks Claude to transcribe audio
MCP Protocol: Claude Code sends MCP tool request to server
File Processing: Server chunks large files into 10-minute segments
Compression: Converts chunks to MP3 format (9.2MB per chunk)
HTTP Request: Server sends POST to AdventureTube endpoint
Transcription: AdventureTube processes audio and returns text
Combine Results: Server combines all chunk transcripts
Save Output: Individual chunk .txt files + combined transcript
MCP Format: Server formats response for Claude Code
Display: User sees result in Claude Code interface

Using Whisper in Claude Code

Once installed and configured, you can use natural language to transcribe audio:

Basic Transcription

> Claude, transcribe the audio file in ./audio_file

Claude will:

Find the audio file
Send it to Whisper MCP server
Return the transcript as text

Transcribe Latest File

> Transcribe my latest recording

Claude will automatically find and transcribe the most recent audio file.

Transcribe and Analyze

> Transcribe match_recording.WAV and analyze the key points

Claude will:

Transcribe the audio
Analyze the content
Provide insights and summary

Batch Processing

> Find all my recordings from this week and transcribe them

Claude will:

List matching files
Batch transcribe all files
Save multiple text files

Features Added by Customization

These features are NOT in the original Whisper MCP server:

1. Automatic 10-Minute Chunking

Prevents timeout errors on long files
Splits audio into manageable segments
Processes each chunk independently

2. MP3 Compression

Reduces file size (9.2MB vs 230MB chunks)
Faster uploads to AdventureTube
Saves bandwidth

3. Individual Chunk Transcripts

Each chunk gets its own .txt file
Useful for debugging failed chunks
Allows partial transcription recovery

4. Combined Transcript with Segment Markers

Merges all chunks into single file
Adds [Segment N] markers
Saved in text_file/ directory

5. Graceful Failure Handling

Continues processing even if some chunks fail
Reports which chunks succeeded/failed
Saves partial results

6. Progress Tracking

Real-time updates for each chunk
Shows processing status
Provides transparency

Real-World Example: Tennis Coaching

The Challenge

Record 40-minute tennis match commentary
Need text transcription for analysis
Want to track scores, shots, and player observations
Manual transcription takes hours

The Solution

Record Match: Use phone/recorder to capture live commentary
Drop Audio File: Save .WAV file to ./audio_file/ directory
Ask Claude: “Transcribe match_recording.WAV”
Get Results: Automated transcription in minutes

File Structure Created

audio_file/
├── DJI_32_20251027_190850.WAV          (original 230MB file)
└── chunks_DJI_32_20251027_190850/
    ├── DJI_32_20251027_190850_chunk_01.mp3
    ├── DJI_32_20251027_190850_chunk_01.txt
    ├── DJI_32_20251027_190850_chunk_02.mp3
    ├── DJI_32_20251027_190850_chunk_02.txt
    └── ...

text_file/
└── DJI_32_20251027_190850.txt          (combined transcript)

Tennis Data Successfully Captured

The transcription system captures:

Score tracking: “15-0”, “love-15”, “40-30”, “deuce”
Shot analysis: “backhand down the line”, “forehand winner”, “double fault”
Player observations: “UTR 9.8”, “consistent serve”, “weak backhand”
Match progression: Game-by-game commentary
Rally sequences: Shot-by-shot descriptions
Tactical notes: “Player A targeting opponent’s backhand”

Results

Processing time: ~15 minutes for 40-minute audio
Chunks processed: 4 chunks (3 × 10min + 1 × 10min)
Success rate: 100% (all chunks transcribed)
Output quality: Accurate tennis terminology recognition

Troubleshooting

Issue #1: Server Won’t Connect

Symptoms:

/mcp
✘ failed · Failed to reconnect to whisper

Causes:

Relative paths used instead of absolute paths
MCP CLI not found in PATH
Python dependencies not installed

Solutions:

# 1. Use absolute paths in ~/.claude.json
"args": ["/Users/chrislee/mcp-server-whisper/src/mcp_server_whisper/server.py"]

# 2. Verify MCP CLI
which mcp

# 3. Reinstall dependencies
uv sync

Issue #2: Custom Endpoint Not Working

Symptoms:

Transcription fails
Error: “Connection to AdventureTube failed”

Causes:

USE_CUSTOM_WHISPER not set to true
Incorrect endpoint URL
Network connectivity issues

Solutions:

# 1. Check environment variables
echo $USE_CUSTOM_WHISPER
echo $CUSTOM_WHISPER_ENDPOINT

# 2. Verify endpoint is accessible
curl https://whisper.adventuretube.net/whisper

# 3. Test with small audio file first

Issue #3: Large Files Failing

Symptoms:

Files >25MB fail to process
“File too large” errors

Solution:
This should NOT happen with the custom endpoint setup. If it does:

Verify USE_CUSTOM_WHISPER=true is set
Check that transcribe_with_custom_whisper function is being called
Look for errors in chunk processing logs

Summary: Working Whisper MCP Setup

Final Architecture

✅ Custom stdio MCP server (modified mcp-server-whisper)
✅ User-scoped installation (available across all projects)
✅ Absolute paths for all file references
✅ Direct AdventureTube Whisper API communication
✅ Automatic 10-minute chunking for large files

Required Files

/absolute/path/to/mcp-server-whisper/ – Cloned repository
~/.claude.json – MCP configuration with absolute paths
.env or environment variables – Custom endpoint configuration

Configuration Best Practices

Scope	Path Type	Reason
User Scope	Absolute paths	Working directory varies by project
Project Scope	Relative paths OK	Working directory is predictable (project root)

Verification Commands

# 1. Check MCP server status
/mcp
# Expected: ✔ whisper - connected

# 2. Test transcription
> Claude, transcribe the latest audio file

# 3. Verify output files
ls -la ./text_file/
ls -la ./audio_file/chunks_*/

Overview

Why Customize This MCP Server?

Limitations of Default Setup

Benefits of Custom Endpoint

Prerequisites

Installation Steps

Step 1: Clone Repository

Step 2: Install Python Dependencies

Step 3: Create Audio Directory

Configuration Files

1. ~/.claude.json (MCP Server Configuration)

2. .env File (Environment Variables)

Code Modifications Made

1. Added httpx for HTTP Requests

2. Added Configuration Variables

3. Created Custom Whisper Function

4. Added httpx Dependency

Architecture: How It Works

Component Flow

Key Components

1. Claude Code (MCP Client)

2. MCP CLI (Bridge)

3. Whisper MCP Server (Translation Layer)

4. AdventureTube Whisper API

How a Request Flows

Using Whisper in Claude Code

Basic Transcription

Transcribe Latest File

Transcribe and Analyze

Batch Processing

Features Added by Customization

1. Automatic 10-Minute Chunking

2. MP3 Compression

3. Individual Chunk Transcripts

4. Combined Transcript with Segment Markers

5. Graceful Failure Handling

6. Progress Tracking

Real-World Example: Tennis Coaching

The Challenge

The Solution

File Structure Created

Tennis Data Successfully Captured

Results

Troubleshooting

Issue #1: Server Won’t Connect

Issue #2: Custom Endpoint Not Working

Issue #3: Large Files Failing

Summary: Working Whisper MCP Setup

Final Architecture

Required Files

Configuration Best Practices

Verification Commands

Related Posts

Leave a Comment Cancel Reply