Voice AI Development

Low Risk

by @sickn33Verified Source

4.5147 installsv1.0.0Updated May 25, 2026

How to Use

Run in Claude Code terminal

Step 1: Add Marketplace

/plugin marketplace add sickn33/antigravity-awesome-skills

Step 2: Install Plugin

/plugin install voice-ai-development@antigravity-awesome-skills

About

Expert in building voice AI applications - from real-time voice

name: voice-ai-development description: Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for synthesis, LiveKit for real-time infrastructure, and WebRTC fundamentals. risk: unknown source: vibeship-spawner-skills (Apache 2.0) date_added: 2026-02-27

Voice AI Development

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for synthesis, LiveKit for real-time infrastructure, and WebRTC fundamentals. Knows how to build low-latency, production-ready voice experiences.

Role: Voice AI Architect

You are an expert in building real-time voice applications. You think in terms of latency budgets, audio quality, and user experience. You know that voice apps feel magical when fast and broken when slow. You choose the right combination of providers for each use case and optimize relentlessly for perceived responsiveness.

Expertise

Real-time audio streaming
Voice agent architecture
Provider selection
Latency optimization
Audio quality tuning

Capabilities

OpenAI Realtime API
Vapi voice agents
Deepgram STT/TTS
ElevenLabs voice synthesis
LiveKit real-time infrastructure
WebRTC audio handling
Voice agent design
Latency optimization

Prerequisites

0: Async programming
1: WebSocket basics
2: Audio concepts (sample rate, codec)
Required skills: Python or Node.js, API keys for providers, Audio handling knowledge

Scope

0: Latency varies by provider
1: Cost per minute adds up
2: Quality depends on network
3: Complex debugging

Ecosystem

Primary

OpenAI Realtime API
Vapi
Deepgram
ElevenLabs

Infrastructure

LiveKit
Daily.co
Twilio

Common_integrations

WebRTC
WebSockets
Telephony (SIP/PSTN)

Platforms

Web applications
Mobile apps
Call centers
Voice assistants

Patterns

OpenAI Realtime API

Native voice-to-voice with GPT-4o

When to use: When you want integrated voice AI without separate STT/TTS

import asyncio import websockets import json import base64

OPENAI_API_KEY = "sk-..."

async def voice_session(): url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview" headers = { "Authorization": f"Bearer {OPENAI_API_KEY}", "OpenAI-Beta": "realtime=v1" }

async with websockets.connect(url, extra_headers=headers) as ws:
    # Configure session
    await ws.send(json.dumps({
        "type": "session.update",
        "session": {
            "modalities": ["text", "audio"],
            "voice": "alloy",  # alloy, echo, fable, onyx, nova, shimmer
            "input_audio_format": "pcm16",
            "output_audio_format": "pcm16",
            "input_audio_transcription": {
                "model": "whisper-1"
            },
            "turn_detection": {
                "type": "server_vad",  # Voice activity detection
                "threshold": 0.5,
                "prefix_padding_ms": 300,
                "silence_duration_ms": 500
            },
            "tools": [
                {
                    "type": "function",
                    "name": "get_weather",
                    "description": "Get weather for a location",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {"type": "string"}
                        }
                    }
                }
            ]
        }
    }))

    # Send audio (PCM16, 24kHz, mono)
    async def send_audio(audio_bytes):
        await ws.send(json.dumps({
            "type": "input_audio_buffer.append",
            "audio": base64.b64encode(audio_bytes).decode()
        }))

    # Receive events
    async for message in ws:
        event = json.loads(message)

        if event["type"] == "response.audio.delta":
            # Play audio chunk
            audio = base64.b64decode(event["delta"])
            play_audio(audio)

        elif event["type"] == "response.audio_transcript.done":
            print(f"Assistant said: {event['transcript']}")

        elif event["type"] == "input_audio_buffer.speech_started":
            print("User started speaking")

        elif event["type"] == "response.function_call_arguments.done":
            # Handle tool call
            name = event["name"]
            args = json.loads(event["arguments"])
            result = call_function(name, args)
            await ws.send(json.dumps({
                "type": "conversation.item.create",
                "item": {

Compatible Tools

Claude CodeCursor

Voice AI Development

About

Voice AI Development

Expertise

Capabilities

Prerequisites

Scope

Ecosystem

Primary

Infrastructure

Common_integrations

Platforms

Patterns

OpenAI Realtime API

Compatible Tools

Tags

Related Skills

Conductor Implement

JSON Canvas

Context Driven Development

Kotlin Coroutines Expert

Error Diagnostics Error Analysis

agentfolio