Whisper MCP Integration: Custom Audio Transcription Setup

Overview

This guide shows how to integrate Whisper MCP server (mcp-server-whisper) with Claude Code to enable direct audio transcription. With this integration, you can:

  • Drop voice recordings → Claude transcribes → Get text files back
  • Create content faster by speaking instead of typing
  • Work with AI more efficiently using natural voice input

Real-world example: Tennis coaching – transcribe 40-minute match recordings for analysis


Why Customize This MCP Server?

The original Whisper MCP server uses OpenAI’s paid API for transcription. While this works well for small files, it has several limitations that make it expensive and restrictive for high-volume use:

Limitations of Default Setup

Cost and Privacy Concerns: OpenAI charges per minute of audio transcription, which adds up quickly when processing multiple long recordings. Additionally, all your audio data is sent to OpenAI’s servers, raising privacy concerns for sensitive content like business meetings or personal recordings.

File Size Restrictions: The standard setup has a 25MB file size limit, requiring you to manually compress or split larger files before transcription. This becomes tedious when working with longer recordings like 40-minute tennis matches or podcast episodes.

Limited Control: You’re completely dependent on OpenAI’s service availability and pricing changes. If their API goes down or prices increase, your workflow stops.

Benefits of Custom Endpoint

Drop-and-Process Simplicity: Just drop audio files into the ./audio_file directory and ask Claude to transcribe. The server handles everything automatically – chunking, compression, transcription, and combining results into a single text file.

Full Control & Customization: Since you own the infrastructure, you can modify the transcription process, add custom features, and never worry about third-party service availability or pricing changes.

This makes it ideal for high-volume use cases like transcribing multiple 40-minute tennis matches per week, podcast production, or any scenario where cost, privacy, and file size flexibility matter.


Prerequisites

Before starting, ensure you have:

  • Node.js installed
  • MCP CLI installed
  • Python & uv installed
  • Git (to clone repository)

Quick verification:

node --version
mcp --version
uv --version
git --version

Installation Steps

Step 1: Clone Repository

git clone https://github.com/arcaputo3/mcp-server-whisper.git
cd mcp-server-whisper

Step 2: Install Python Dependencies

# Install dependencies with uv
uv sync

# Verify installation
uv run pytest  # Optional: run tests

Step 3: Create Audio Directory

# Create directory for audio files
mkdir -p ./audio_file

# Verify directory was created
ls -la ./audio_file

Configuration Files

The Whisper MCP integration requires two configuration files:

1. ~/.claude.json (MCP Server Configuration)

This is the main Claude Code configuration file that defines the Whisper MCP server.

{
  "mcpServers": {
    "whisper": {
      "command": "mcp",
      "args": ["dev", "/absolute/path/to/mcp-server-whisper/src/mcp_server_whisper/server.py"],
      "env": {
        "USE_CUSTOM_WHISPER": "true",
        "CUSTOM_WHISPER_ENDPOINT": "https://whisper.adventuretube.net/whisper",
        "AUDIO_FILES_PATH": "./audio_file"
      }
    }
  }
}

Key Points:

  • command: Uses MCP CLI to run the Python server
  • args: Path to server.py (⚠️ must be absolute path for user-scoped servers)
  • env.USE_CUSTOM_WHISPER: Set to true to use AdventureTube endpoint
  • env.CUSTOM_WHISPER_ENDPOINT: Your custom Whisper API endpoint
  • env.AUDIO_FILES_PATH: Path to audio files directory

2. .env File (Environment Variables)

Create .env file in project root:

AUDIO_FILES_PATH=./audio_file
USE_CUSTOM_WHISPER=true
CUSTOM_WHISPER_ENDPOINT=https://whisper.adventuretube.net/whisper

Note: Environment variables can be set in either .env file OR ~/.claude.json. Choose based on your preference.


Code Modifications Made

To use AdventureTube Whisper endpoint instead of OpenAI:

1. Added httpx for HTTP Requests

import httpx

2. Added Configuration Variables

CUSTOM_WHISPER_ENDPOINT = os.getenv("CUSTOM_WHISPER_ENDPOINT", "https://whisper.adventuretube.net/whisper")
USE_CUSTOM_WHISPER = os.getenv("USE_CUSTOM_WHISPER", "false").lower() == "true"

3. Created Custom Whisper Function

async def transcribe_with_custom_whisper(file_path: Path) -> dict[str, Any]:
    """Transcribe audio using AdventureTube Whisper endpoint."""
    # Handles 10-minute chunking automatically
    # Sends to AdventureTube endpoint via HTTP POST
    # Returns combined transcript

4. Added httpx Dependency

mcp = FastMCP("whisper", dependencies=["openai", "pydub", "aiofiles", "httpx"])

Architecture: How It Works

Component Flow

User → Claude Code → MCP CLI → Whisper MCP Server → AdventureTube Whisper API → Transcript

Key Components

1. Claude Code (MCP Client)

  • User interface where commands are issued
  • Calls Whisper tools using MCP protocol
  • Displays results to user

2. MCP CLI (Bridge)

  • Launches Whisper MCP server as stdio process
  • Manages server lifecycle
  • Routes tool calls between Claude Code and server

3. Whisper MCP Server (Translation Layer)

  • Speaks MCP protocol with Claude Code (stdio transport)
  • Processes audio files (chunking, compression)
  • Speaks HTTP with AdventureTube Whisper API
  • Returns formatted transcripts

4. AdventureTube Whisper API

  • Custom Whisper endpoint
  • Performs actual transcription
  • Returns JSON with transcript text

How a Request Flows

  1. Startup: MCP CLI launches Whisper server via Python
  2. Tool Call: User asks Claude to transcribe audio
  3. MCP Protocol: Claude Code sends MCP tool request to server
  4. File Processing: Server chunks large files into 10-minute segments
  5. Compression: Converts chunks to MP3 format (9.2MB per chunk)
  6. HTTP Request: Server sends POST to AdventureTube endpoint
  7. Transcription: AdventureTube processes audio and returns text
  8. Combine Results: Server combines all chunk transcripts
  9. Save Output: Individual chunk .txt files + combined transcript
  10. MCP Format: Server formats response for Claude Code
  11. Display: User sees result in Claude Code interface

Using Whisper in Claude Code

Once installed and configured, you can use natural language to transcribe audio:

Basic Transcription

> Claude, transcribe the audio file in ./audio_file

Claude will:

  1. Find the audio file
  2. Send it to Whisper MCP server
  3. Return the transcript as text

Transcribe Latest File

> Transcribe my latest recording

Claude will automatically find and transcribe the most recent audio file.

Transcribe and Analyze

> Transcribe match_recording.WAV and analyze the key points

Claude will:

  1. Transcribe the audio
  2. Analyze the content
  3. Provide insights and summary

Batch Processing

> Find all my recordings from this week and transcribe them

Claude will:

  1. List matching files
  2. Batch transcribe all files
  3. Save multiple text files

Features Added by Customization

These features are NOT in the original Whisper MCP server:

1. Automatic 10-Minute Chunking

  • Prevents timeout errors on long files
  • Splits audio into manageable segments
  • Processes each chunk independently

2. MP3 Compression

  • Reduces file size (9.2MB vs 230MB chunks)
  • Faster uploads to AdventureTube
  • Saves bandwidth

3. Individual Chunk Transcripts

  • Each chunk gets its own .txt file
  • Useful for debugging failed chunks
  • Allows partial transcription recovery

4. Combined Transcript with Segment Markers

  • Merges all chunks into single file
  • Adds [Segment N] markers
  • Saved in text_file/ directory

5. Graceful Failure Handling

  • Continues processing even if some chunks fail
  • Reports which chunks succeeded/failed
  • Saves partial results

6. Progress Tracking

  • Real-time updates for each chunk
  • Shows processing status
  • Provides transparency

Real-World Example: Tennis Coaching

The Challenge

  • Record 40-minute tennis match commentary
  • Need text transcription for analysis
  • Want to track scores, shots, and player observations
  • Manual transcription takes hours

The Solution

  1. Record Match: Use phone/recorder to capture live commentary
  2. Drop Audio File: Save .WAV file to ./audio_file/ directory
  3. Ask Claude: “Transcribe match_recording.WAV”
  4. Get Results: Automated transcription in minutes

File Structure Created

audio_file/
├── DJI_32_20251027_190850.WAV          (original 230MB file)
└── chunks_DJI_32_20251027_190850/
    ├── DJI_32_20251027_190850_chunk_01.mp3
    ├── DJI_32_20251027_190850_chunk_01.txt
    ├── DJI_32_20251027_190850_chunk_02.mp3
    ├── DJI_32_20251027_190850_chunk_02.txt
    └── ...

text_file/
└── DJI_32_20251027_190850.txt          (combined transcript)

Tennis Data Successfully Captured

The transcription system captures:

  • Score tracking: “15-0”, “love-15”, “40-30”, “deuce”
  • Shot analysis: “backhand down the line”, “forehand winner”, “double fault”
  • Player observations: “UTR 9.8”, “consistent serve”, “weak backhand”
  • Match progression: Game-by-game commentary
  • Rally sequences: Shot-by-shot descriptions
  • Tactical notes: “Player A targeting opponent’s backhand”

Results

  • Processing time: ~15 minutes for 40-minute audio
  • Chunks processed: 4 chunks (3 × 10min + 1 × 10min)
  • Success rate: 100% (all chunks transcribed)
  • Output quality: Accurate tennis terminology recognition

Troubleshooting

Issue #1: Server Won’t Connect

Symptoms:

/mcp
✘ failed · Failed to reconnect to whisper

Causes:

  1. Relative paths used instead of absolute paths
  2. MCP CLI not found in PATH
  3. Python dependencies not installed

Solutions:

# 1. Use absolute paths in ~/.claude.json
"args": ["/Users/chrislee/mcp-server-whisper/src/mcp_server_whisper/server.py"]

# 2. Verify MCP CLI
which mcp

# 3. Reinstall dependencies
uv sync

Issue #2: Custom Endpoint Not Working

Symptoms:

  • Transcription fails
  • Error: “Connection to AdventureTube failed”

Causes:

  1. USE_CUSTOM_WHISPER not set to true
  2. Incorrect endpoint URL
  3. Network connectivity issues

Solutions:

# 1. Check environment variables
echo $USE_CUSTOM_WHISPER
echo $CUSTOM_WHISPER_ENDPOINT

# 2. Verify endpoint is accessible
curl https://whisper.adventuretube.net/whisper

# 3. Test with small audio file first

Issue #3: Large Files Failing

Symptoms:

  • Files >25MB fail to process
  • “File too large” errors

Solution:
This should NOT happen with the custom endpoint setup. If it does:

  1. Verify USE_CUSTOM_WHISPER=true is set
  2. Check that transcribe_with_custom_whisper function is being called
  3. Look for errors in chunk processing logs

Summary: Working Whisper MCP Setup

Final Architecture

  • ✅ Custom stdio MCP server (modified mcp-server-whisper)
  • ✅ User-scoped installation (available across all projects)
  • ✅ Absolute paths for all file references
  • ✅ Direct AdventureTube Whisper API communication
  • ✅ Automatic 10-minute chunking for large files

Required Files

  1. /absolute/path/to/mcp-server-whisper/ – Cloned repository
  2. ~/.claude.json – MCP configuration with absolute paths
  3. .env or environment variables – Custom endpoint configuration

Configuration Best Practices

Scope Path Type Reason
User Scope Absolute paths Working directory varies by project
Project Scope Relative paths OK Working directory is predictable (project root)

Verification Commands

# 1. Check MCP server status
/mcp
# Expected: ✔ whisper - connected

# 2. Test transcription
> Claude, transcribe the latest audio file

# 3. Verify output files
ls -la ./text_file/
ls -la ./audio_file/chunks_*/

Leave a Comment

Your email address will not be published. Required fields are marked *