The Chat Model is Too Constraining (continued)

ORCFLO began as a passion project in 2025, consuming our nights and weekends...becoming a real company in 2026. All this happened in an unplanned 'stealth mode'. This series explains the back-story of our journey.

Our first pass at understanding the chat model's limitations focused on the hands-on experience - the lack of transparency and the tedium of manual interaction. But as we dug deeper, we found more.

The Control Problem

What happens when the AI makes a mistake? In a chat interface, you often don't know something went wrong until you read the final answer. Maybe the AI misunderstood your question. Maybe it made an assumption you'd never make. Maybe it went down a rabbit hole that seemed reasonable to it but was completely off-base for your purposes.

By the time you realize there's a problem, you're looking at a finished response built on a flawed foundation. Your options? Retype the prompt and hope for better results. Or try to surgically correct the error with follow-up questions, which sometimes works and sometimes just makes things worse. Hope, as we've learned over the years, is not a strategy...so onward to other options.

What we wanted was the ability to intervene. To see each step in the process as it happened, check the intermediate results, and make corrections right there - without starting over from scratch. If we're on step seven of ten and something goes sideways, we want to fix step seven and continue. Not go back to step one.

The Repeatability Problem

Most of us solve similar problems over and over. Analysts analyze. Executives review reports. Job seekers evaluate postings and tailor resumes. We develop approaches that work, little recipes we've refined through trial and error.

In Excel, when we build a model that works, we save it. We reuse it. We tweak it for new situations. Our spreadsheets become tools we return to again and again.

Chat doesn't work that way. Every conversation starts fresh. If we craft a perfect sequence of prompts that produces exactly the analysis we need, there's no natural way to save that sequence. We end up copying prompts into documents, storing them in folders, trying to remember which version was the good one. Suddenly we're managing version control for a stack of text files. That's not a solution - it's a new administrative problem.

What we wanted was a workbench. Somewhere we could build our approaches, save them, refine them, and reuse them. Somewhere we could make small tweaks and see if they improved our results. A place for experimentation and iteration, not just one-off conversations.

The Comparison Problem

Like most people, we started with ChatGPT when it launched a few years ago. Since then, we've tried Claude, Gemini, and others. They're all impressive - and they're all different. Some are better at creative tasks. Some handle technical analysis more reliably. Some are faster. Some are cheaper.

So how do we know which model to use for a given task? Should we pay for the most powerful option, or would a simpler model work just as well? Why buy a Lexus when a Toyota gets you there just fine?

The only way to answer these questions is to run the same task through multiple models and compare the results. But in a chat interface, that means manually retyping prompts, switching between tabs, trying to hold the outputs in our heads long enough to evaluate them. It's possible, but barely.

What we wanted was a way to run identical workflows through different models and see the results side by side - quality and cost, compared directly. Then we could make informed decisions about which model fit which task.

We now had a clear picture of what was wrong. Transparency, tedium, control, repeatability, comparison. Five limitations that kept us from doing what we actually wanted to do.

The next question was: what would the solution even look like?

NEXT UP:
Step 4: What is an Agent and where can I get one?

The Control Problem

The Repeatability Problem

The Comparison Problem

Ready to automate your work?