Chat Is Not the Final Interface
- Andrew Simmons
- 1 hour ago
- 4 min read
Since the launch of ChatGPT, we’ve embraced a new conversational way of interacting with computers. These systems offer insight into the world—and even into our personal lives—in ways we never expected from machines.
This raises an obvious question: does the traditional point-and-click interface still make sense? Do we really need tactile, visual interaction with software, or should text and voice be enough—a smarter, more natural way to compute?
“In some sense, I feel like I’ve come full circle… I started with a command prompt and I’m ending, in some sense, with a prompt.” - Satya Nadella
I don’t believe the command prompt is where computing ends. But I do believe chat will become an integral layer of nearly every app and website.
Already, I often paste screenshots of apps into Gemini just to ask where to click. That alone signals something profound: conversation is becoming a navigation layer for software.
The rise of AI assistants that promise to handle nearly any task—just say what you want and it’s done—has revived an old dream in computing: the best interface is no interface at all.
It’s a compelling vision. But in the real world, pure chat breaks down surprisingly quickly.
Not because AI isn’t intelligent enough. Because language itself is low-fidelity and linear.
Many important tasks require:
rapid iteration
visual comparison
persistent context
emergent decision-making
These are not edge cases. They are how humans naturally explore complex choices.
Even a perfectly intelligent assistant—human or AI—responding only through text or voice would hit the same limits.
The problem isn’t intelligence. It’s the feedback loop.
Why Chat Alone Isn’t Enough
When tasks are simple and low-decision—asking for a fact, summarizing a document, triggering a single action—chat works beautifully.
But when users must explore, compare, adjust, and discover what they actually want, chat-only systems fail.
Often we begin with one idea and end somewhere better. That emergent decision-making is central to tools like Expedia, Airbnb, Photoshop, or Figma.
The outcome improves when the human stays inside the loop from start to finish.
Travel planning makes this obvious.
Booking flights once meant calling a travel agent and describing preferences out loud, hoping they were translated correctly. The internet didn’t improve this by being smarter—it improved it by providing dynamic visual interaction.
You scan prices across days. Compare routes side-by-side. Inspect seat maps. Notice first class is only slightly more. Instantly change your mind.
Preferences evolve after seeing the option space.
In chat, every shift must be verbalized, interpreted, regenerated, and re-evaluated. What takes seconds visually becomes slow, brittle, and cognitively expensive.
The Illusion of the AI Personal Assistant
Executives and high-net-worth individuals rely on human assistants to book travel and make purchases. So why wouldn’t AI agents work the same way?
Because the real cost was never just the assistant.
People who outsource decisions often overspend to avoid deciding:
booking first class regardless of price
buying the highest-end product to guarantee features
changing plans at the last minute without regard for cost
For most people, the ideal outcome requires careful comparison and iterative adjustment. That process depends on seeing options, not describing them.
Discovery Can’t Be One-Shot
This pattern appears everywhere.
Flights
Users rarely know exactly what they want upfront. They discover better options visually: a longer route with empty premium seats, a better arrival time for a small price increase, a seat location that suddenly matters more than legroom.
These are not mistakes in intent. They are emergent preferences.
Chat forces users to guess their desires in advance—then struggle to revise them.
Food ordering (Uber Eats)
On Uber Eats, decisions happen while browsing:
scrolling photos
comparing prices and delivery times
switching cuisines mid-flow
abandoning one restaurant after spotting a better dish
People often don’t know what they want until they see it. Remove visual browsing, and you remove craving, serendipity, and fast comparison—the real decision engine.
Creative work (Photoshop, Figma)
Photoshop reveals the deepest limit of language interfaces.
Creative editing depends on continuous visual feedback:
adjust → glance → undo → refine → repeat.
Subtle spatial intent—edge softness, color warmth, layer blending—cannot be efficiently serialized into text. Even perfect AI understanding would still force a slow loop of:
describe → generate → review → re-describe.
Chat can assist creativity. But replacing the canvas removes the feedback that makes creativity possible.
Complex state (Jira, Linear)
Tools like Jira expose a different failure: situational awareness.
Backlogs, priorities, dependencies, assignees, and timelines all matter simultaneously. A visual board provides instant understanding.
In chat, every action becomes opaque:
move ticket
list blockers
query status
Without a persistent visual model, users lose their mental map of the system.
Chat works for commands. But replacing the board destroys coordination and confidence.
Linear Language vs. Iterative Thinking
Across all examples, the same truth emerges:
Chat is excellent for intent. Terrible for exploration.
Language is sequential. Visual interfaces are parallel.
Humans are extremely good at scanning and comparing visually— and extremely bad at holding large option spaces in working memory through text.
Ironically, chat-only systems reintroduce the very problems GUIs solved decades ago:
high cognitive load
poor discoverability
fragile state
slow iteration
It isn’t progress. In many cases, It’s regression with better autocomplete.
The Future Is Hybrid
None of this means chat is a mistake.
Chat is powerful. It’s the best interface we’ve ever had for:
expressing intent
asking questions
initiating workflows
But replacing visual interfaces misunderstands why GUIs exist.
The future is hybrid— what we might call chatile (chat-tactile) interfaces.
Start with conversation. Then surface calendars, maps, boards, previews, and selectors inline.
Let users talk and see. Explore without narrating every thought.
Chat isn’t the enemy of good UI. Treating it as a replacement for visual feedback is.
Navigating complex decisions through language alone is like exploring a city using only verbal directions: possible in theory, exhausting in practice, and far worse than opening a map.
The winning interfaces won’t be chat-only or GUI-only. They will be the ones that keep humans firmly in the loop—combining language and touch to move decisions forward together.



Comments