Kosmos: Integrate OpenAI-Compatible LLM Providers

Alex Johnson
-
Kosmos: Integrate OpenAI-Compatible LLM Providers

This article details the proposed enhancement to the Kosmos platform, focusing on enabling support for multiple Language Model (LLM) providers compatible with the OpenAI API. This improvement will greatly increase Kosmos's versatility, accessibility, and user choice. This enhancement is based on community interest expressed in issue #1.

Background and Motivation

Currently, Kosmos is tightly integrated with Anthropic's Claude API, leading to several limitations. By supporting OpenAI-compatible providers, Kosmos aims to offer users greater cost flexibility, enhanced privacy options, provider independence, increased redundancy, access to specialized models, and the ability to create multi-model workflows.

Current State of Kosmos

Kosmos is currently tightly coupled to Anthropic's Claude API. Direct anthropic SDK dependencies are scattered throughout 28+ files. There is also Claude-specific prompt caching in kosmos/core/claude_cache.py, hard-coded imports like from kosmos.core.llm import ClaudeClient, and configurations exclusively for Anthropic (ANTHROPIC_API_KEY, CLAUDE_MODEL).

Files Requiring Refactoring:

  • Core (4 files): llm.py, async_llm.py, claude_cache.py, config.py
  • Agents (4 files): hypothesis_generator.py, research_director.py, data_analyst.py, literature_analyzer.py
  • Hypothesis tools (3 files): refiner.py, prioritizer.py, testability.py
  • Utilities (3 files): summarizer.py, code_generator.py, concept_extractor.py
  • Plus 14+ additional modules (28 total)

Key Benefits of Multi-Provider Support

Supporting OpenAI-compatible providers will unlock significant advantages:

  1. Cost Flexibility: Mix expensive and inexpensive models depending on the task, or even use free local models to optimize costs.
  2. Privacy Options: Run models locally for sensitive research, ensuring data privacy and compliance.
  3. Provider Independence: Easily switch between providers based on availability, pricing, or performance, avoiding vendor lock-in.
  4. Redundancy: Mitigate rate limits and service disruptions by having multiple providers available.
  5. Access to Specialized Models: Utilize code-specific, domain-specific, or fine-tuned models that better suit specific needs.
  6. Multi-Model Workflows: Compare results across different providers to improve accuracy and reliability.

Proposed Solution: A Phased Approach

To achieve these benefits, we propose implementing a provider-agnostic abstraction layer using a phased approach. This will minimize disruption and ensure a smooth transition.

Phase 1: Core Abstraction Layer

The initial phase focuses on creating a unified LLMProvider interface to abstract the underlying provider implementation. This involves defining a base class with methods for generating text, handling asynchronous requests, and managing structured data. This abstraction will be the foundation for supporting multiple LLM providers in Kosmos. The key is to ensure that all interactions with LLMs go through this layer, regardless of the specific provider being used.

# kosmos/core/providers/base.py
class LLMProvider:
    """Unified interface for all LLM providers"""
    
    def generate(self, messages: list, model: str, temperature: float,
                 max_tokens: int, **kwargs) -> str:
        """Synchronous completion"""
        pass
    
    async def generate_async(self, messages: list, model: str, temperature: float,
                            max_tokens: int, **kwargs) -> str:
        """Async completion"""
        pass
    
    def generate_structured(self, messages: list, schema: dict, **kwargs) -> dict:
        """Generate structured JSON output"""
        pass
    
    def get_cost_estimate(self) -> dict:
        """Return usage and cost metrics"""
        pass

Implementation Steps:

  1. Create LLMProvider base interface to define the standard methods for interacting with LLMs.
  2. Refactor existing ClaudeClient into AnthropicProvider to maintain backward compatibility and isolate Anthropic-specific logic.
  3. Implement OpenAIProvider using the openai Python package to enable integration with OpenAI-compatible models.
  4. Update all 28 files to import from the provider abstraction, ensuring all LLM interactions use the new interface.
  5. Add provider selection logic in config.py to allow users to choose their preferred provider.
  6. Update documentation and .env.example to guide users on configuring different providers.

Phase 2: Configuration & Provider Registry

Phase 2 introduces a factory pattern for provider instantiation, driven by configuration settings. This allows users to easily switch between providers without modifying the code. The implementation involves reading configuration settings to determine which provider to instantiate and use throughout the Kosmos platform. This phase ensures that the selection of LLM providers is dynamic and easily configurable.

Example Configurations:

# Option 1: Use Claude (existing behavior - DEFAULT)
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
CLAUDE_MODEL=claude-3-5-sonnet-20241022

# Option 2: Use OpenAI
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4-turbo
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional

# Option 3: Use local Ollama
LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_MODEL=llama3.1:70b
OPENAI_API_KEY=ollama  # Dummy key for compatibility

# Option 4: Use OpenRouter (100+ models)
LLM_PROVIDER=openai
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_API_KEY=sk-or-...
OPENAI_MODEL=anthropic/claude-3.5-sonnet

# Option 5: Use LM Studio
LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:1234/v1
OPENAI_MODEL=local-model
OPENAI_API_KEY=lm-studio

Provider Factory Implementation:

# kosmos/core/providers/factory.py
def get_provider(config: Config) -> LLMProvider:
    provider_type = config.llm_provider.lower()
    
    if provider_type == "anthropic":
        return AnthropicProvider(config)
    elif provider_type == "openai":
        return OpenAIProvider(config)
    else:
        raise ValueError(f"Unknown provider: {provider_type}")

Cache Migration:

  • Refactor claude_cache.py to llm_cache.py for provider-agnostic caching, ensuring compatibility with various LLM providers.
  • Maintain cache key compatibility for Claude to preserve the existing cache, minimizing data loss during the transition.

Supported OpenAI-Compatible Providers

The implementation will support a wide range of OpenAI-compatible providers, including:

  • Official: OpenAI, Azure OpenAI
  • Aggregators: OpenRouter (100+ models), Together AI, Anyscale
  • Local: Ollama, LM Studio, LocalAI, Jan, GPT4All
  • Self-hosted: vLLM, Text Generation Inference (TGI), FastChat
  • Specialized: Groq (ultra-fast), DeepInfra, Fireworks AI

Success Criteria

To ensure the successful implementation of multi-provider support, the following criteria must be met:

  • [ ] All existing Claude functionality preserved (100% backward compatible).
  • [ ] OpenAI provider passes all existing test suites, ensuring compatibility and reliability.
  • [ ] At least one local model provider (Ollama) verified working to confirm local model support.
  • [ ] Configuration-driven provider switching (zero code changes) for ease of use.
  • [ ] Documentation updated with provider setup guides to assist users in configuring different providers.
  • [ ] Cost tracking works across all providers, enabling accurate cost management.
  • [ ] Provider-agnostic caching implemented for efficient resource utilization.

Non-Goals (Out of Scope)

The following features are valuable but will be addressed in separate issues after the core functionality is stable:

  • Provider-specific advanced features (Claude artifacts, OpenAI function calling).
  • Automatic task-based model routing (cheap models for summarization, expensive for reasoning).
  • Fallback chains (auto-retry with alternative provider on failure).
  • Automatic model capability detection.
  • Provider performance benchmarking.
  • Multi-provider ensemble/voting.

These advanced features can be addressed in future enhancements once the core abstraction layer is stable and well-tested.

Community Questions

Before proceeding with the implementation, we would like to gather feedback from the community on the following questions:

  1. Priority Providers: Which OpenAI-compatible providers are most valuable to you?

    • [ ] OpenAI official
    • [ ] OpenRouter (aggregates 100+ models)
    • [ ] Local models (Ollama, LM Studio)
    • [ ] Azure OpenAI
    • [ ] Together AI / Anyscale
    • [ ] Other:
  2. Migration Strategy: Should we maintain both old (ClaudeClient) and new APIs temporarily for smoother migration?

  3. Default Behavior: Should Claude remain the default provider (backward compatibility) or require explicit configuration?

Your input will help us prioritize and refine the implementation to best meet the needs of the Kosmos community.

Estimated Effort

The estimated effort for this enhancement is Medium-Large: approximately 2-3 weeks for comprehensive implementation.

  • Phase 1: 1-1.5 weeks
  • Phase 2: 0.5-1 week
  • Testing & documentation: 0.5 week

References

For more information on Language Model APIs, visit the OpenAI API documentation.

You may also like