Complex Intent Recognition Pipeline Challenge - OpenAI API Integration

I’m working on an intent classification system for a conversational AI platform at Rizz AI lovers using WordPress REST API + vanilla JavaScript and facing critical technical bottlenecks with OpenAI API integration.

Core Issues:

Multi-Intent Classification: Complex queries like “Hey, how do you do?” yield inconsistent JSON responses with ~15% classification errors using GPT-3.5-turbo. Should I switch to function calling vs structured prompts?

Context Management: 8+ turn conversations hit 4k token limits. Current sliding window (last 3 exchanges) loses critical context affecting accuracy. Vector embeddings for context retrieval worth exploring?

Latency Problem: OpenAI API averages 800ms, need <200ms for real-time chat. Local models (DistilBERT) offer speed but accuracy drops from 94% to 78%.

Cost Scaling: 50k tokens/day projecting $150+ monthly just for intent classification. Redis caching only achieves 40% hit rate.

Technical Constraints: WordPress plugin async processing limitations, JavaScript promise chain complexity, session persistence across page reloads.

What I’ve Tried: Temperature tuning (0.1-0.3), system message optimization, prompt chaining, fine-tuning on 500+ examples.

Seeking: Hybrid architecture patterns (local + OpenAI), production benchmarks OpenAI vs local models, cost-effective scaling strategies, real-world error handling approaches.

Anyone solved similar challenges in production conversational AI? Particularly interested in WordPress-based implementations and latency optimization techniques.

1 Like