Episode 16: The AEO Podcast EP 16: Measuring AI Search Without Lying to Yourself
AI search measurement is messy. This episode lays out a grounded way to measure AI visibility, referrals, prompted checks, and business impact without pretending attribution is cleaner than it is.
Check your own store's AI visibility with our free diagnostic tool -- it takes about 75 seconds and requires no signup.
Full Transcript
HOST: Welcome back to the AEO Podcast. Jessica here with Matt. Today we're talking about measurement, which means half the audience is excited and half is already tired.
MATT: The tired half should stay. Bad measurement is how merchants talk themselves out of good channels.
HOST: The episode title is "Measuring AI Search Without Lying to Yourself." What are the lies?
MATT: Three big ones.
HOST: Give me the first lie.
MATT: "We got no AI traffic because Google Analytics says ChatGPT sent almost nothing."
HOST: Second lie?
MATT: "AI search is replacing organic search."
HOST: And the third?
MATT: "If I can't see every impression, I can't measure anything."
HOST: Let's unpack those.
MATT: Start with the first. AI traffic attribution is messy. ChatGPT, Perplexity, Gemini, Copilot, Claude, Grok, AI Mode, AI Overviews: they do not all show up the same way in analytics. Some produce referral traffic. Some open in app browsers. Some route through search. Some don't pass clean referrers. Google AI Overviews are generally counted inside Search traffic, not a separate "AI Overview" channel in Search Console.
HOST: So the report is undercounting.
MATT: Often, yes. Shopify's own 2026 analysis notes that some AI-assisted discovery pathways, like Google AI Overviews, are classified as organic search in standard analytics. So if you only look at referral source equals ChatGPT, you're missing the bigger picture.
HOST: What's the second lie, that AI search is replacing organic?
MATT: Shopify's analysis is useful here because they see a lot of storefront behavior. They reported AI chatbot referral sessions growing more than eight times year over year as of Q1 2026. That's huge. But they also said organic search still refers more sessions to Shopify merchants than all tracked AI platforms combined, and same-store organic sessions were up roughly five percent.
HOST: So AI is growing fast, but organic is still bigger.
MATT: Exactly. The right conclusion is not "abandon SEO." The right conclusion is "SEO and AEO are now coupled." Many AI systems retrieve from search indexes. Google AI features are inside Search. Product feeds and structured data reinforce both.
HOST: Third lie: if I can't see impressions, I can't measure anything.
MATT: This one comes from paid marketing brain. People want a dashboard that says, "ChatGPT showed you 812 times, cited you 43 times, sent 17 clicks." That would be nice. It mostly doesn't exist yet for open AI search. But lack of perfect impression data does not mean you have no signal.
HOST: What should merchants measure instead?
MATT: Use a five-layer measurement stack.
Layer one: referral traffic. In Shopify Analytics and Google Analytics, watch referrals from chatgpt.com, perplexity.ai, copilot.microsoft.com, gemini.google.com where they appear. For Shopify specifically, check Analytics, then Reports, then Total sales by referrer, and look for ChatGPT as a referrer to Online Store orders. Don't expect completeness. Use it as directional signal.
Layer two: landing-page mix. AI-referred traffic often lands deeper. Shopify reported more than half of AI-referred sessions starting on product detail pages, versus about twenty percent for organic search. So measure whether more sessions are starting on product pages after your AEO work.
Layer three: conversion quality. Shopify reported AI-referred product-detail-page sessions converting at nearly fifty percent higher rates and AI-search orders carrying fourteen percent higher average order value than organic search in its analysis. Your store may differ by category, price point, attribution setup, and product quality, but this is the right question: not "how many clicks?" but "are these clicks more qualified?"
Layer four: Search Console. Google AI Overviews and AI Mode traffic are included in overall Search Console web data, not broken out as a clean AI channel. So look for changes in long-tail query impressions, product-page clicks, and average position across question-like queries.
Layer five: controlled prompt testing. Once a week, run a fixed set of prompts in ChatGPT, Perplexity, Gemini, and Copilot. Track whether your brand appears, which competitors appear, which source pages are cited, and whether facts are accurate.
HOST: That last one sounds manual.
MATT: It can be. Manual is fine at first. Ten prompts, once a week, in a spreadsheet. The goal is not scientific perfection. The goal is trend visibility.
HOST: What should the prompt set include?
MATT: Three categories.
Category one: branded prompts. "What does [brand] sell?" "Is [brand] good for [use case]?" "Where can I buy [brand]?"
Category two: category prompts. "Best [product category] for [specific buyer]." "Where can I buy [attribute] [product] under [price]?" These are the money prompts.
Category three: comparison prompts. "[Brand] vs [competitor]." "Alternatives to [competitor] for [use case]." These reveal whether AI understands your positioning.
HOST: How do you avoid fooling yourself with prompt testing?
MATT: Be consistent. Same prompts. Same region if the tool lets you control it. Logged-out or separate testing account if personalization is contaminating results. Record date, tool, prompt, answer summary, sources, whether your brand appeared, and what facts were wrong.
HOST: What about personalization? OpenAI says ChatGPT can consider Memory or Custom Instructions in shopping results.
MATT: Yes, which means two users may see different products. That's reality. For testing, you want a consistent baseline, not a claim that every shopper sees the same thing. Then supplement with actual analytics data when clicks arrive.
HOST: Where do UTMs fit?
MATT: UTMs are useful only when you control the link. If you publish your own ChatGPT app, newsletter, social post, or partner link, use UTMs. But you generally can't force ChatGPT's organic product result to use your UTM. So don't build the whole measurement plan around UTMs.
HOST: Server logs?
MATT: Very useful if you have them. Logs can show crawler visits from OAI-SearchBot, PerplexityBot, Bingbot, Googlebot, and others. That doesn't prove recommendation, but it proves discovery and access. If a page is never crawled, it is unlikely to be cited.
HOST: What's the most practical dashboard for a Shopify merchant?
MATT: Weekly, one page. AI referral sessions by source where visible. Product-detail-page entry sessions and conversion rate. Top landing products from AI and organic. Search Console long-tail query changes for key products. And a prompt test score: appeared, cited, accurate, competitor mentioned.
HOST: Prompt test score?
MATT: Simple scoring. Zero means absent. One means mentioned but not recommended. Two means recommended without citation. Three means recommended with a correct citation or link. Then track accuracy separately. If the AI recommends you with wrong facts, that's not a win.
HOST: How should merchants decide if AEO work is paying off?
MATT: Look for clusters, not one metric. More product-page entry sessions. More long-tail impressions. Higher conversion rate on AI referrals. More prompt appearances. Better factual accuracy. If three of those move together, something is working.
HOST: And if nothing moves?
MATT: Diagnose in order. Can crawlers access the site? Is the product data clear? Are pages indexed? Do you have enough authority or reviews? Are you testing prompts that real buyers would use? Did you give it enough time? AEO is not instant, especially for Google and Perplexity.
HOST: How should a SEOMelon user think about this?
MATT: SEOMelon can help with the content and diagnostic side: visibility checks, product clarity gaps, FAQ generation, schema suggestions, and Shopify workflow. But analytics still belongs to the merchant. I don't want anyone outsourcing their understanding of the channel to a vendor dashboard.
HOST: That's unusually anti-vendor.
MATT: It's pro-operator. A vendor dashboard can help. It should not be your only source of truth.
HOST: Give me the operating principle.
MATT: AI search measurement is messy, but not impossible. Stop looking for one perfect number. Build a small measurement stack: referrals, landing pages, conversion quality, Search Console, and prompt tests. That is enough to make better decisions than most competitors.
HOST: What's the first measurement habit?
MATT: Create a ten-prompt AI visibility panel for your store. Run it every Friday for four weeks. If you don't see your brand, don't panic. Use the missing answers as your content roadmap.
HOST: That gives us the full loop: product data, crawler access, and measurement. Now we can stop guessing and start improving the right pages.
Check your store's AI visibility
Free, 75 seconds, no signup. See how your store scores when real buyers ask AI for recommendations.
Run the free diagnostic