I spent the last 90 days running a deliberate experiment: identify every repeatable operational task in my company and try to hand it off to an AI agent.
This post is what I actually found — not the polished version I’d present to an investor, but the honest, somewhat chaotic reality.
What worked better than expected
Research and competitive intelligence. I set up an agent to monitor news, job postings, and LinkedIn activity from a list of competitors. It produces a weekly briefing that would have taken my team several hours. Quality is 80% of what a smart analyst would produce. That 80% is more than enough.
First drafts of everything. Proposals, emails, meeting agendas, post-mortems. Once I built good templates, agents could produce decent first drafts with minimal prompting. I still edit everything, but I’m editing instead of writing from scratch. Faster.
Data analysis. I described what I wanted to understand in plain language, and agents wrote and ran the code. I had to debug occasionally, but the barrier to getting insights from data dropped dramatically.
What failed
Anything requiring judgment about people. Hiring decisions, performance conversations, partnership negotiations. The outputs were technically coherent but missed crucial context. I scrapped every attempt in this area quickly.
Unstructured workflows. Agents perform best with clear inputs, clear success criteria, and limited scope. Anything open-ended or exploratory — product strategy, creative direction — wasn’t a good fit.
Customer-facing work. We tested one agent handling initial support inquiries. Customers could tell, and they didn’t like it. We pulled it within a week.
The biggest surprise
The bottleneck wasn’t the AI. It was me.
Writing good instructions for agents turned out to be a skill — one I was much worse at than I expected. Vague inputs produce vague outputs. To get agents working reliably, I had to think more clearly about what I actually wanted than I usually do.
That turned out to be valuable in its own right.
Where I’m going next
Deeper investment in agent infrastructure — better evals, better logging, better ability to spot when something has gone wrong before it causes a problem. The agents are capable enough. The management layer isn’t mature yet.
That’s my job to build.