Scalaris - Document Management Agent
Overview
Scalaris is the agent responsible for document processing and management in the TKM AI Agency Platform. It handles document analysis, text extraction, summarization, and coordinates with other agents for comprehensive document processing.
Directory Structure
Scalaris/
├── data/ # Directory for processed documents
├── scalaris.py # Main agent implementation
├── api_scalaris.py # FastAPI endpoints
├── tools.py # Document processing utilities
├── tools_schema.py # Data models and schemas
├── tools_definitions.py # Constants and definitions
└── data_validation.py # Document validation utilities
Main Components
ScalarisAgent Class
The main class that handles document processing and coordination:
- Document processing pipeline management
- Integration with other agents
- Event handling
- Data path management
Document Processing Pipeline
- Document validation
- Text extraction
- Summary generation
- Metadata extraction
- Classification and organization
Key Features
Document Processing
- Text extraction from various document formats
- Document summarization
- Metadata extraction
- Format validation
Integration Features
- Embedding generation through Bala
- Message storage with Atta
- Classification with Orion
- Data persistence with Niger
Data Management
- Structured document storage
- Metadata organization
- Organization-level isolation
- User-specific data handling
API Operations
Document Processing
# Processing Request
{
"file_path": str,
"user_id": str,
"session_id": str,
"conversation_id": str,
"organization_id": str
}
# Processing Response
{
"success": bool,
"data": {
"summary": str,
"document_info": dict,
"text_status": str,
"tokens_info": dict,
"folder_structure": dict,
"embedding_id": str
}
}
Integration
Agent Communication
- Atta: Message storage and conversation management
- Bala: Text embedding generation
- Orion: Document classification and organization
- Niger: Data persistence
Data Flow
- Document reception and validation
- Text extraction and processing
- Summary generation
- Embedding generation
- Classification
- Storage and organization
Error Handling
- Document format validation
- Processing errors
- Integration failures
- Storage issues
- Comprehensive error logging
Performance Features
Document Processing
- Efficient text extraction
- Optimized summarization
- Resource management
- Processing queue handling
Storage Management
- Organized file structure
- Metadata tracking
- Space optimization
- Cleanup routines
Data Models
Document Metadata
{
"file_path": str,
"metadata": dict,
"embedding_id": str,
"tokens_info": {
"prompt_tokens": int,
"completion_tokens": int,
"total_tokens": int
},
"folder_structure": {
"folder_name": str,
"category": str,
"subcategory": str,
"file_name": str,
"state": str,
"confidence": float
}
}
Processing Result
{
"success": bool,
"extracted_text": str,
"summary": dict,
"file_path": str,
"metadata": dict,
"document_info": dict,
"tokens_info": dict
}