Backend
Backoffice
Rufa - Audio Processing

Rufa - Audio Processing Agent

Overview

Rufa is the agent responsible for audio processing in the TKM AI Agency Platform. Its primary function is to handle audio file transcription using Groq's Whisper model, process audio files, and manage audio-related data storage.

Directory Structure

Rufa/
├── data/                # Directory for transcription storage
├── data_recordings/     # Directory for processed audio files
├── rufa.py             # Main agent implementation
├── api_rufa.py         # FastAPI endpoints
├── tools.py            # Audio processing utilities
├── tools_schema.py     # Data models and schemas
├── tools_definitions.py # Constants and definitions
└── data_validation.py  # Audio validation utilities

Main Components

RufaAgent Class

The main class that handles audio processing and transcription:

  • Groq API integration
  • Audio file management
  • Transcription processing
  • Data path handling

Audio Processing Pipeline

  1. Audio file validation
  2. Audio preprocessing
  3. Transcription generation
  4. Result storage and management

Key Features

Audio Processing

  • Audio file validation
  • Format compatibility checks
  • Duration calculation
  • File preprocessing

Transcription Service

  • Integration with Groq's Whisper model
  • JSON response format
  • Error handling
  • Quality validation

Data Management

  • Structured file storage
  • Metadata tracking
  • Organization-level isolation
  • User-specific data handling

API Operations

Audio Transcription

# Transcription Request
{
    "audio_path": str,
    "user_id": str,
    "conversation_id": str,
    "organization_id": str
}
 
# Transcription Response
{
    "success": bool,
    "text": str,
    "file_path": str,
    "duration": float,
    "message": {
        "type": "audio",
        "content": str,
        "timestamp": str,
        "source_agent": "rufa"
    }
}

Integration

Agent Communication

  • Atta: Conversation verification and message storage
  • Bala: Optional text embedding generation
  • Niger: Data persistence (when available)

Data Flow

  1. Audio file reception
  2. Validation and preprocessing
  3. Transcription generation
  4. Result storage
  5. Message delivery to conversation

Error Handling

  • Audio file validation errors
  • Processing failures
  • Transcription service errors
  • Storage issues
  • Detailed error logging

Performance Features

Audio Processing

  • Efficient file handling
  • Format optimization
  • Resource management
  • Processing queue handling

Storage Management

  • Organized file structure
  • Metadata tracking
  • Space optimization
  • Cleanup routines

Data Models

Audio Transcription Result

{
    "success": bool,
    "transcription": Optional[str],
    "file_path": Optional[str],
    "duration": Optional[float],
    "error": Optional[str]
}

Audio Metadata

{
    "original_filename": str,
    "processed_filename": str,
    "duration": float,
    "created_at": str,
    "organization_id": str,
    "user_id": str
}