Now, back to my boss’s request—he’s heading into a board meeting and wants a summary of unauthorized data breaches from 2018 and 2021. I just joined the company and don’t have the historical context; it’s all locked in people’s heads or archives.
So, I ingested the archives Cohesity had already backed up, indexed it, and asked: “Can you summarize the differences between the unauthorized data breaches in 2018 and 2021?” For this demo, I’m anonymizing company names because it’s real data.
The system takes my text question, semantically compares it against 10,000 PDFs, extracts relevant snippets, packages them in a prompt, and sends them to a large language model. The LLM uses that context to generate an answer—with good detail on both events and general observations.
Because it’s using internal data, I also get resource links and citations—so I can download the source and add it to the board meeting materials. Keith: That’s great for explainability—so the AI isn’t just guessing. Greg: Exactly.
You get a much more robust and trustworthy result because you’re using your internal data as the source of truth. This isn’t something you’d want to do with a public ChatGPT-type tool—this is proprietary data. Keith: Right. Greg: What we’re doing here sits between full model fine-tuning and simple querying.
Instead of training a model on all your data, we pull relevant content at the time of inference—like handing an artificial researcher a stack of topic-specific papers and asking a question.