Concept testing — real signal from real conversations

Concept testing that gets to real signal — not survey averages

Hundreds of real conversations with real customers, in their language, at the depth of a senior moderator. Test any concept — sentence, slide, image, or video — in days, not weeks.

Let's Talk!

[ the problem ]

Why most concept tests don't actually de-risk launches

Most concept tests ask respondents to rate something they haven't fully understood. A static PDF designed for a product manager — not for a Tier-2 shopper encountering the category for the first time. The comprehension gap is invisible to a 1–5 scale.

A score of 3.8 on a concept nobody understood is noise dressed as research. Launch decisions made on that number aren't de-risked — they're randomised.

Survey averages also hide the variance that matters. A mean purchase-intent of 4.1 could be a tight cluster or a bimodal split between enthusiasts and rejecters. The only way to see it is to have the conversation.

High scores don't mean people understood the concept

[ how alchemic solves it ]

Three things that work together.

Test with stimuli they actually understand

The comprehension gap is the most common failure mode in concept testing — and the least measured. A score of 3.8 on a concept nobody understood is noise dressed as research.

Alchemic turns your brief into an interactive concept card: structured, expandable, tappable. Key claims surface individually. The AI tracks engagement before any question is asked.

Text, slide, image, or video — the format is a parameter, not a constraint.

Test with stimuli they actually understand — Alchemic concept card

Probe in real conversation. Get the why behind every rating.

The AI listens for the hesitation, the partial agreement, the vague 'it's okay' hiding a real objection. When someone rates appeal at 3, it asks why 3 and not 4.

Adaptive in real time. Consistent across every respondent. In 57 languages with native probing — not a translation layer applying English research logic to other languages.

Survey averages hide the variance that matters. Every theme links to respondents, every respondent to the verbatim, every verbatim to the voice note moment.

See how Alchemic AI moderation works →

Probe in real conversation — every rating has a reason behind it

Run it end-to-end — every stage, every channel, every language, with live insights

Concept testing isn't a single event. Early screening kills six concepts cheaply to develop two. Mid-stage fixes the messaging. Late validation confirms the bar before the production commit. Post-launch tells you why reality didn't match the prediction.

WhatsApp, voice call, web link — 57 languages natively moderated. No app install, no redirect friction. The channel meets the respondent where they are.

The report is live before fieldwork closes. Themes build as interviews come in. Theme reels surface the best evidence in a format a CMO can watch in three minutes.

Live insights while research is still running — Alchemic concept testing dashboard

[ vs. ]

How Alchemic compares

Survey-only concept tests give you scores without understanding. Focus groups give depth but not scale. Alchemic gives both — at the speed and sample size a real launch decision needs. If you need normative benchmarks from a syndicated database, the established survey vendors serve that well. If you need to understand why your concept scores what it scores, and what you need to change, that is where Alchemic works best.

	Survey-only concept test	Focus group	Alchemic
Interviews	Hundreds (fixed questions)	6–12 respondents	200+ adaptive interviews
Stimulus formats	PDF or image	Printed or screened	Text, slide, image, video, prototype
Comprehension check	Rarely included	Moderator-dependent	Built in — AI checks before rating
Turnaround	1–2 weeks	3–5 weeks	5 days
Languages	Available, translated post-hoc	One per session	57 natively
The why behind scores	Open-end field only	Partial — moderator-led	AI probes every rating
Sample reach	Panel-dependent	Metro and accessible only	Tier 1–3, WhatsApp, voice
Report format	Scorecard + open-end dump	Topline + notes	Live dashboard + theme reels + verbatim

Voice-first study or compliance-heavy audience? AI Phone Research → WhatsApp-native respondents in low-bandwidth zones? WhatsApp Interviews →

[ use cases ]

Where concept testing with Alchemic works

FMCG packaging and claims

Test packaging designs, front-of-pack claims, and new product positioning across Tier 1–3 markets. Understand which claim resonates, which creates confusion, and which triggers competitive comparison before committing to print runs.

Tech feature prototypes

Show a Figma prototype inline during the interview. The AI probes what's intuitive, what creates hesitation, and whether the value proposition lands with the actual user. Qualitative signal before an engineering sprint.

DTC brand positioning

Test two or three positioning options with your exact customer profile. Which creates emotional connection? Which sounds like every other brand in the space? AI-moderated interviews surface the gap with verbatim evidence.

Retail and assortment decisions

Which SKU to launch first? Which pack size or format? Test with real shoppers in the relevant channel — kiranas, modern trade, QSR, or D2C — before making range decisions that are expensive to reverse.

Service propositions

Test a new insurance plan, fintech feature, or healthcare service package before building operations around it. Catch the objections before a call centre hears them at scale.

B2B and SaaS concepts

Test product positioning and feature names with decision-makers and end users separately — they almost never agree. The CIO and the analyst care about entirely different things. Alchemic runs both conversations, in parallel.

Trusted by brand and insights teams at

Frequently asked

About this product

What is concept testing?

Concept testing evaluates a new product idea, feature, positioning, or brand concept with target consumers before launch. It identifies whether the concept is understood, desirable, and likely to drive purchase — at a stage when changes are cheap. Traditional concept tests use surveys; Alchemic runs AI-moderated interviews that get the why behind every rating, not just the numbers.

How does AI concept testing work?

Alchemic's AI turns your brief into an interactive concept card respondents can explore. An AI moderator then interviews each respondent — probing comprehension, appeal, purchase intent, and the reasoning behind every answer. Hundreds of conversations run simultaneously, in the respondent's language, with real-time theme coding.

Monadic vs sequential monadic — which is better?

Monadic design (each respondent sees one concept) avoids order effects and is the gold standard for clean purchase-intent scores. Sequential monadic (each respondent sees all concepts in random order) is efficient for ranking but risks carry-over bias. Use monadic when a score's absolute value matters; sequential monadic when you need to rank concepts head-to-head. Alchemic supports both designs.

How many respondents do I need for concept testing?

For a single-concept monadic test with a broad target, 100–200 interviews typically produce stable themes and reliable quant scores. Multi-concept studies need 200–400 total across cells. Studies requiring robust sub-group cuts need at least 50–80 respondents per sub-group. Alchemic fields 200+ interviews in 5 days as standard.

How fast can concept testing be done?

Brief on Day 1, builder live within 48 hours, full theme-coded report within 5 days — for a standard 200-interview single-market study. Multi-market or large-sample studies typically add 2–3 days. Fieldwork is never the bottleneck — 200 conversations happen in parallel, not in a queue.

What is the difference between concept testing and product testing?

Concept testing evaluates an idea before a product is built or launched. Product testing (IHUT) puts a physical product in consumers' hands to evaluate actual experience. Concept testing is earlier, cheaper, and faster; product testing validates post-production. Alchemic covers both.

How do you test a concept before launch?

Define your target audience and the decision you need to make. Prepare your stimulus — sentence, deck slide, mock packaging, image, or video. Field AI-moderated interviews: comprehension first, then appeal and intent, then open probing on the why. Analyse themes across 200+ conversations. Make the go/no-go call with evidence.

How do you test concepts for FMCG, SaaS, or DTC brands?

FMCG: test packaging and claims with household decision-makers in Tier 1–3 cities, in regional languages. SaaS: test feature names and pricing frames with decision-makers; Alchemic supports Figma prototype stimulus. DTC: test proposition and creative messaging with your exact customer profile. Same platform, different briefing and stimulus.

About concept testing

What is concept testing?

Concept testing is a research method used to evaluate how consumers respond to a new product, service, ad, or brand idea before it is built or launched. It captures both whether the idea resonates and why — surfacing the language, objections, and emotional triggers that determine whether a concept will actually work in market. It is typically done early, when the cost of changing direction is still low.

When should concept testing happen in the product or marketing cycle?

Concept testing should happen as early as possible — ideally when the idea is a sketch, claim, or rough mock, not a finished product. Testing early lets you kill weak concepts cheaply, sharpen strong ones, and avoid the much higher cost of changing a fully built product or campaign. Many teams also retest at the pre-launch stage to lock the final positioning, pricing message, or pack design.

What is the difference between concept testing and prototype testing?

Concept testing evaluates an idea; prototype testing evaluates an experience. Concept testing usually presents a written or visual stimulus and asks whether the proposition is appealing, believable, and differentiated. Prototype testing puts a working version — an app screen, a sample product, a service walkthrough — in front of users and observes how they actually interact with it. Most teams concept-test first, then prototype-test the survivors.

How many concepts can I test at once?

Most concept tests run three to six concepts in a single study, which is enough to generate meaningful comparisons without overwhelming respondents. Beyond six, fatigue sets in and feedback gets shallower. If you have more concepts than that, screen them down first with an internal review or a quick monadic round, then take the shortlist into a deeper qualitative test where you can explore the why behind reactions.

What makes a good concept-test stimulus?

A good concept stimulus is sharp, single-minded, and written in plain consumer language — not internal marketing speak. It should clearly state the core benefit, who it is for, and what makes it different, ideally in under 60 words plus one visual. Avoid bundling multiple ideas into one stimulus; if you cannot tell which element drove the reaction, the result is not actionable.

Should I test concepts qualitatively or quantitatively?

Test qualitatively when you need to understand why a concept works or fails, and quantitatively when you need to rank or size demand. Quant gives you a score; qualitative research gives you the diagnostic — the objections, the language consumers use, the tweaks that would unlock appeal. For most early-stage decisions, qualitative is more useful because it tells you what to change, not just what to pick.

How do I know if a concept is a winner?

A winning concept is one that consumers can play back in their own words, connect to a real need or occasion, and describe as different from what they already use. Strong purchase intent alone is a weak signal — people overclaim. Look instead for unprompted specifics: who they would buy it for, when they would use it, and what they would pay. Vague enthusiasm usually does not convert.

Can concept testing be done remotely or on WhatsApp?

Yes — remote and WhatsApp-based concept testing has become standard, especially for reaching consumers outside metros or across multiple markets at once. Respondents view stimuli on their phones and respond in their own words, often with voice notes, which preserves nuance better than typed surveys. AI-moderated platforms like Alchemic run these conversations at scale and synthesise themes automatically, making remote qualitative testing faster than traditional focus groups.