We’re building a semantic search system to surface past support tickets that match the meaning—not just the keywords—of new incoming requests. Our current approach involves cleaning and chunking historical data, embedding it using models like OpenAI or Hugging Face, and storing it in a vector database for real-time retrieval. We’re also exploring RAG-style workflows and would love to hear from others who’ve tackled semantic retrieval at scale.

Problem we’re solving: How do you automatically find similar past tickets when a new support request is logged and route it to appropriate CS Agents

Not keyword matching. Not basic classification. We’re aiming for semantic relevance — surfacing tickets that reflect the same underlying issue, even if worded differently.

Here’s the flow we’re currently following:

Data Sources We’re working with historical support data from knowledge bases (PDFs), ticket system exports, agent notes, tags, resolutions, and support portals. We’re evaluating both off-the-shelf ETL tools and custom Python pipelines for cleaning, deduplication, normalization, and ongoing ingestion.

Chunking We break content into semantically coherent chunks using techniques like fixed-length token splits, model-aware, or metadata-aware segmentation. The goal is to ensure embeddings stay focused and meaningful. We’re evaluating LangChain, Unstructured.io, and Sentence-BERT variants.

Embedding To capture intent—not just surface-level text—we’re evaluating embedding models such as OpenAI text-embedding-3-small, Hugging Face all-MiniLM-L6-v2, and Cohere embed-english-v3.

Indexing and Storage Once the corpus is chunked and embedded, we store it in a vector database. We’re evaluating options like Pinecone, Qdrant, PGVector, and AstraDB for efficient similarity search.

Retrieval For each new ticket, we embed it in real-time and retrieve the top-k closest matches from the vector database using semantic similarity.

RAG-style augmentation (optional) We’re experimenting with workflows where retrieved tickets are passed to an LLM to summarize similar cases, suggest next steps, or draft replies for support agents.

If you’ve built semantic retrieval systems:

What trade-offs did you run into? Any embedding model or chunking strategy that worked best? Any real-world lessons that changed your approach?

Would love to hear what’s worked for you — and what hasn’t. Feel free to DM me and happy to do a demo and deep dive.

From a positioning standpoint: we see this as a conversation between Human Agents and AI Agents — starting in chatbot mode and evolving into copilots. We’re calling this framework 𝗖𝗵𝗮𝘁𝗣𝗶𝗹𝗼𝘁 (Chatbot + Copilot).