A complete methodology for running large-scale qualitative interview studies — from participant simulation through final client report — with full methodological grounding and quality standards at every step.
Pipeline Overview
The pipeline is hybrid by design: qualitative rigor at the codebook-building stage, quantitative discipline at the analysis and reporting stage. That combination makes it possible to run 150-200 participant studies and produce defensible, statistically grounded claims — while maintaining the insight depth clients expect from qualitative research.
Design the interview guide, configure the study, set up the roster, simulate participants for pipeline testing, and process real Maze transcripts when real interviews are complete.
Six sub-steps convert raw participant responses into a finalized codebook instrument — the most methodologically intensive phase of the pipeline.
Two independent agents apply the finalized codebook to all participants. Inter-rater reliability is calculated, disagreements are resolved, and quality flags are issued.
Coded data is assembled into a flat participant-by-code matrix — the single source of truth for all reporting and analysis downstream.
Defining dimensions are identified and validated, PCA reduces the variable space, and k-means clustering discovers natural participant groups.
Frequency and cross-tab analysis surfaces the key findings. The client report is built as an interactive HTML document, deployed to Cloudflare Pages.
Methodological Foundation
The pipeline does not use Braun and Clarke's reflexive thematic analysis — that method was designed for interpretive, meaning-centered research and its authors explicitly argue that counting theme frequencies does not add analytic value. Our goals require a different foundation: methods built from the ground up to support systematic coding that produces comparable, quantifiable data across participants.
Codes are built inductively on the first study, then treated as a fixed measurement instrument for all subsequent studies in the same domain. This is what makes cross-study comparison valid.
Hsieh & Shannon, 2005Originally developed for large-scale applied policy research. Produces a participant × code matrix enabling systematic cross-case comparison — the "how does this theme vary by segment" question.
Ritchie & Spencer, 1994Structured, rule-based approach that explicitly bridges qualitative interpretation and quantitative analysis. Each code requires a definition, decision rules, and inclusion/exclusion criteria.
Mayring, 2000Multi-perspective extraction and clustering feed a single high-capability codebook architect that runs internal parsimony and distinction passes, producing codebooks of comparable quality to expert human-coded ones.
CollabCoder — Gao et al., CHI 2024Before any analysis can run, the study must be designed to generate the right data. Every downstream analysis decision — what to code, what to segment on, what to report — flows from what questions were asked. A poorly designed guide cannot be rescued by better analysis.
The interview guide is the instrument. Every downstream analysis decision flows from what questions were asked. There are three question modules, and the distinction between them is the most important design decision in the study.
When writing evaluation trigger questions, use a Jobs-to-be-Done framing: what was the participant trying to accomplish, what context made them start looking, what would need to be true for them to take action? People do not naturally articulate evaluation criteria — they tell stories about situations. The stories contain the criteria.
Before running real interviews, the full pipeline is tested with simulated participants. Simulation allows you to catch bugs, validate script paths, and build an initial codebook before spending budget on real transcripts.
Simulation is not one-size-fits-all. Before running, explicitly decide on these roster parameters:
Decide these before launching and document them in the study's roster design notes. The simulation is only as realistic as the roster it draws from.
Each participant is simulated with one of three verboseness levels. These defaults are empirically calibrated from a real Maze study.
| Level | Share | Observed avg | Observed range |
|---|---|---|---|
| Not Verbose | ~34% | 808 words | 500–1,200 |
| Somewhat Verbose | ~43% | ~1,500 words | 1,200–2,100 |
| Very Verbose | ~23% | 2,100+ words | 1,900–3,000 |
Simulated participants are assigned realistic demographic variation from the roster parameters above. The simulation agent answers as a realistic professional in this domain — with the vocabulary, concerns, and communication patterns that role and domain entail.
research/00 How to simulate participants/simulate.pyParticipant simulation is synthetic text generation, not expert reasoning. Opus's reasoning advantage delivers effectively zero value on this task while costing ~5× Sonnet (~$29/study vs. ~$6/study for 180 participants). Haiku 4.5 would cut the cost from ~$6 to ~$2 per study but carries real risk on rule adherence: the prompt contains 11 numbered simulation rules plus the full interview guide, and smaller models tend to drop rules late in long prompts. Sonnet reliably holds all rules across the batch and produces distinct-sounding participants. The extra ~$4/study over Haiku is a cheap insurance policy on the input data every downstream pipeline step depends on.
After each batch returns, simulate.py runs a word-count check against the verboseness spec, overwrites the reported total_words with the ground-truth count, and flags any participant whose word count falls outside their target range or who answered fewer questions than the interview guide contains. Warnings print per batch but do not fail the run — the roster designer reviews warnings and decides whether to re-run individual batches.
Real interviews conducted via Maze are exported as a CSV where each column is one participant's transcript. The Maze export has a known inconsistency: some participants have their full transcript in a single cell; others have it split across multiple rows.
The discovery phase converts raw participant responses into a structured codebook — the instrument that defines exactly what themes exist, how they are bounded, and what counts as an inclusion. Every downstream analysis depends on getting this right.
Hybrid execution model. Discovery runs across two modes. Steps 1.1, 1.2a, 1.2b, and 1.6 run in the Claude Code conversation window using the Claude Code subscription (no per-call API charges). Steps 1.3 (extraction) and 1.4 (clustering) run via run_extract_cluster.py on the Anthropic API because of their high parallelism. Step 1.5 (enhanced six-pass codebook construction) runs via run_codebook.py on the API because its ~60–75K token structured JSON output exceeds what can be produced reliably in a single conversation session.
Before any other coding step runs, Claude Code (running in the conversation window using the Claude Pro subscription) reads a random sample of 50 participant responses and produces a structured four-section study context document. This step used to run as a Python+API subprocess; moving it into conversation mode eliminates its API charge entirely.
context_generatorWhat it is not: Descriptive, not prescriptive. It tells downstream agents what the study population is like — not which codes to create. Analytical conclusions come from the data.
Classification runs in two sequential sub-steps, both now executed in the Claude Code conversation window using the Claude Pro subscription. Part 1a determines question type cheaply. Part 1b defines the structural codebook for non-thematic questions with full data coverage. Output is written directly to questions-registry.csv in the study data folder.
question_classifierClassification is a high-leverage, low-volume decision. The pipeline makes one call per question, typically 15-25 calls per study, so the cost delta between Sonnet and Opus is trivial (roughly $1-2 extra per study). The failure mode, however, is catastrophic and silent. A thematic question misclassified as categorical collapses all inductive themes into a handful of named buckets and cannot be recovered downstream. A categorical question misclassified as thematic fills the codebook with noisy open-ended clusters that should have been clean counts. A rank-order question misclassified as thematic loses the ordinal structure entirely.
Because the classification drives every downstream branch of the pipeline, and because the error is invisible until a human reviewer catches it in the finished codebook, the strongest available reasoning model is warranted even though the task itself is usually straightforward. Opus is also better at catching the subtle case where the question text looks open-ended but the responses themselves fall cleanly into a small set of categories, or where a nominally closed question receives rich open-ended explanations that deserve thematic coding. Sonnet tends to classify from the question text alone; Opus weighs both the text and how participants actually answered.
The 5-participant sample is retained because it is sufficient for the reasoning model to spot the pattern — going larger would raise cost without materially improving accuracy on a decision this structural.
extractor_1Classification is a structured decision with a finite, well-defined outcome space — the four types are exhaustive and mutually exclusive. It is not an interpretive judgment the way codebook construction is. Two agents classifying the same question would almost always agree; the rare disagreement would be on edge cases better resolved by reading more responses, not by running a second agent. The complexity and cost of dual-agent classification with reconciliation is not justified by the improvement in output quality.
One Sonnet agent processes all participant responses to thematic questions. Before reading any responses, it receives the study context document from Step 1.1 — who was interviewed, how they communicate, and what topics they discuss. This primes the agent with the professional vocabulary and communication style of the participants so it can make better interpretation decisions.
extractor_1Dunivin (2024), Scalable qualitative coding with LLMs, establishes the central principle: LLMs require more precise codebook descriptions than human coders do because they lack the contextual understanding human coders develop through training and discussion. Every guard rule below is a precision instruction the model would otherwise miss.
Reliability target grounding. Dunivin (2024) reports that GPT-4 with chain-of-thought prompting achieved Cohen's κ ≥ 0.79 (excellent agreement) on 3 of 9 codes and κ ≥ 0.6 (substantial) on 8 of 9 codes against human coders. Our HR Leaders BambooHR study achieved an overall weighted κ of 0.909 — above the strongest results in the published literature. This is why we are confident the single-extractor design is sufficient.
Chain-of-thought grounding. Dunivin (2024) finds that requiring the model to reason about each code before assigning it improves coding fidelity. This is the basis for the landscape analysis requirement in Steps 1.4 and 1.5 — clustering and codebook construction agents must write what they observe before proposing structure.
The HR Leaders BambooHR study produced an overall weighted Kappa of 0.909. The four codes that fell below threshold were definition problems, not extraction failures — a second extractor would not have fixed them. Dual extraction was adding methodological complexity without improving downstream reliability. The complexity budget is better spent at the codebook construction step, where boundary-drawing actually matters.
Extraction is the highest-volume call in the entire pipeline. For a typical 180-participant × 13-thematic-question study, that is ~2,340 extraction calls — roughly 10× the number of Opus calls everywhere else in discovery combined. At ~800 output tokens each, that is ~1.9M output tokens just for extraction. Opus is ~5× the cost of Sonnet, so upgrading extraction is the single biggest cost delta you could make to the pipeline.
The HR Leaders BambooHR study hit weighted κ = 0.909 with Sonnet extraction. That is above the strongest published results (Dunivin 2024 reports GPT-4 at κ ≥ 0.79 on 3 of 9 codes). When you debrief that study, the four codes that fell below threshold were definition problems at construction time, not extraction misses. Opus at extraction would not have fixed them.
Extraction is a more mechanical task than construction. It is "find spans that express one idea, write a short label." It is not extended reasoning. Opus's reasoning advantage (GPQA Diamond, etc.) is smallest on pattern-recognition tasks like this. The reasoning-heavy step is codebook construction — which is exactly where we already spend the Opus budget.
The documented failure modes for LLM qualitative extraction are:
On #1 and #3 (tone and implicature), Opus is genuinely better. On #2 and #4 (structural), the gap is small — Sonnet handles these well with explicit prompt instructions, which the current extractor has. The current extractor prompt includes guard rules for sarcasm, hedging, negation, and compound sentences; if a future study surfaces a concentration of errors in the tone categories specifically, extraction can be upgraded to Opus as a targeted fix rather than a blanket cost increase.
clustererClustering per question first reduces cognitive load on the global construction agent. Instead of receiving thousands of raw descriptive labels in one block, it receives pre-organized clusters per question — making the landscape analysis step more tractable.
Batching: One call per question. The clusterer runs once per thematic question, processing all meaning units for that question in a single Opus call. Questions are processed in parallel via thread pool, but within a question there is no internal batching. Input is capped at 500 meaning unit descriptions per question (evenly sampled if more) — comfortably within Opus's context window.
No minimum or maximum on the number of clusters. The prompt explicitly instructs the agent to let the data decide how many clusters to create. If a question's responses contain 7 distinct ideas, the agent should make 7 clusters; if they contain 35, the agent should make 35. Under-clustering is irreversible (distinctions collapsed here cannot be recovered); over-clustering is recoverable downstream. The agent is told to err toward finer-grained clusters.
Receives the study context from Step 1.1. Before reading any meaning unit descriptions, the clusterer is primed with the same who/how/what context document the extractor sees. This helps it recognize when descriptions that look superficially different are referring to the same underlying concept in the participants' shared vocabulary.
Terminology note: At this stage we deliberately use the phrase "meaning unit description" rather than "code." The word "code" is reserved for the final codebook entries produced in Step 1.5. The clusterer is grouping descriptive labels — it is not creating codes.
Why minimums of 5: The clusterer must include at least 5 representative meaning unit descriptions and at least 5 participant IDs (with quotes attached downstream) per cluster. The richer the cluster summary, the better the downstream architect can decide what to merge, split, and define. Opus is configured with a 32,000-token max output here, so the richer cluster summaries fit comfortably without crowding the budget.
codebook_architectWhy this step stays on the Anthropic API (not conversation mode): The enhanced output is roughly 60–75K tokens of structured JSON — themes with definitions, inclusion/exclusion criteria, adjacency tests, positive examples, and boundary examples, plus the decisions log and landscape analysis. This exceeds what can be produced reliably in a single sustained conversation session. Running it via run_codebook.py on the API produces it in one validated call; conversation-mode attempts risk dropped fields or broken schemas in the single most-critical file in the entire pipeline.
Why a single Opus agent (vs. the prior dual + reconciler architecture): The previous design used two Sonnet agents (parsimony + distinction) running in parallel and an Opus reconciler making the final calls. That architecture cost three API calls for one decision and added an integration step that could itself introduce errors. A single Opus call with internal parsimony pass + distinction critique chain-of-thought captures the bulk of the divergence-then-reconcile benefit at a fraction of the cost and complexity. Opus is strong enough to hold both lenses simultaneously inside one extended reasoning pass.
After run_codebook.py commits the global codebook, Step 1.6 runs in the Claude Code conversation window (not as a Python+API subprocess). Claude Code performs the per-question validator pass and the dimension architect pass sequentially, assembles the final codebook.json, and pauses for human review. Moving this step into conversation mode eliminates its API cost entirely.
The two reasoning passes Claude Code executes in conversation:
validatordimension_architectcodebook-audit-trail.csv — these are the merge and split decisions the architect flagged as non-obvious, exactly where the instrument is weakestA low-cost dry run on 20 stratified participants before committing to the full Phase 2 coding run. Catches soft definitions before they propagate across the full dataset and become expensive arbiter calls.
Pilot calibration runs the full dual-coder application pipeline against a stratified sample of 20 participants, reviews the disagreements, and refines any code definitions that proved soft in practice — all before committing to the full ~180-participant Phase 2 run where definition problems become expensive.
Even with the enhanced Step 1.5 prompt (adjacency tests, boundary examples, banned "and" in names), some definitions only reveal their softness when two independent coders apply them to real responses. Catching those in a 20-participant pilot costs roughly 1/9th of the full run. Catching them only after a full run means re-running the arbiter across hundreds of disagreements, or worse, shipping a weaker final dataset.
run_coding.py on the pilot subset. The application pipeline runs dual-coder extraction + Kappa + arbiter against only the 20 pilot participants. Cost is roughly $5–8 instead of ~$50 for the full run.definition_ambiguous=true flag from the arbiter. These are the codes whose definitions need tightening.The application phase applies the finalized codebook to all participant transcripts using two independent coding agents, calculates inter-rater reliability, and resolves disagreements. Script: run_coding.py
Two independent agents each receive the finalized codebook and all participant responses. Each agent independently processes every participant's responses to every thematic question and produces a participant-level output: for each participant, which codes apply.
Independent application by two agents replicates the intercoder reliability design from qualitative research (O'Connor & Joffe, 2020). Disagreements between the two agents flag cases where the codebook definition is ambiguous enough to produce different readings — exactly the cases that need a resolver and may warrant codebook refinement.
agent_1inclusion_first — the "INCLUDE when" criteria are presented before the "EXCLUDE when" criteria in the formatted codebook, subtly biasing the reading toward applying the code when the evidence partially matches.agent_2exclusion_first — the "EXCLUDE when" criteria are presented before the "INCLUDE when" criteria, subtly biasing the reading toward rejecting the code unless the evidence clearly meets the definition.Application coding is the highest-volume phase in the pipeline. A typical study is ~180 participants × ~15 thematic questions × 2 agents = ~5,400 coding calls, plus the arbiter calls on disagreements. Running all of that on Opus would cost roughly 5× more and slow the wall-clock significantly. That cost would be hard to justify for a task where Sonnet is already strong.
Application coding is fundamentally a pattern-matching problem, not a novel reasoning problem. The codebook already exists. The inclusion and exclusion criteria, the definition, the adjacency tests, and the three positive and three boundary examples — all the hard thinking has been done upstream by the Opus codebook architect in Step 1.5. The coder's job is to read a participant's words and decide whether they match the criteria. Sonnet's accuracy on that task is close to Opus's, and the HR Leaders BambooHR simulated run hit a weighted κ = 0.909 with this exact setup, well above the Landis & Koch "almost perfect" threshold of 0.81 and well above the client-deliverable threshold set at κ ≥ 0.65. The four codes that fell below threshold in that run were definition problems in the codebook, not coder errors — Opus coders would not have rescued them.
The productive signal in dual coding is disagreement — the arbiter cannot do its job if both coders agree on everything. That disagreement is generated by prompting Agent 1 as inclusive (temperature 0) and Agent 2 as conservative (temperature 0.3), and by flipping the order in which the codebook inclusion and exclusion criteria are presented. Switching one or both agents to Opus would not create more useful disagreement; it would just make both agents more confident. The goal is two coherent but different coding philosophies, not two different raw IQs.
The one failure mode this architecture does not catch is when both Sonnet coders confidently agree on the wrong answer. No disagreement means no arbiter trigger. That risk is a codebook-quality problem, not a model-choice problem — if the definition and the positive and negative examples are sharp enough, two independent Sonnet agents with different temperatures and different emphasis orders will almost always diverge on ambiguous cases. The fix for that failure mode is better definitions in Step 1.5, which is why the codebook architect is required to provide adjacency tests plus exactly 3 positive and 3 boundary examples per code.
Concurrency: Both coding agents run in parallel using a semaphore-controlled thread pool (18 concurrent workers). The API output token rate limit is the binding constraint, not compute.
After both agents have coded all participants, Cohen's Kappa is calculated per code and as an overall weighted average at the participant × code level.
Why participant × code, not meaning unit × code: Our downstream analysis asks "what percentage of participants expressed theme X" — a participant-level question. Kappa at the meaning unit level would be influenced by segmentation variability between agents. Participant-level Kappa measures what actually matters.
Source: Landis & Koch, 1977
For any participant × code combination where the two agents disagreed, a third resolver agent reviews both agents' reasoning alongside the participant's actual transcript and the codebook definition, and makes a final determination.
The resolver records its reasoning in the output file alongside the final code assignment. This creates an audit trail for every contested coding decision.
agent_3balanced — inclusion and exclusion criteria shown in neutral order so the arbiter reads the rulebook straight rather than with either coder's biasDisagreements are where the hard calls cluster. By definition, the arbiter only sees cases where two reasonable coders looked at the same evidence and came to different conclusions — the edge cases involving sarcasm, hedged language, compound statements, and borderline definition fits. This is exactly the shape of task where Opus's reasoning advantage is largest, because the decision requires weighing competing considerations rather than pattern-matching to a clear example.
The arbiter is also low-volume and high-leverage. If the codebook is clean, dual-coder agreement on most items runs 80–90%, meaning the arbiter fires on only the remaining 10–20%. On a 5,400-call study, that is roughly 540–1,080 Opus calls — a rounding error in cost — yet each of those calls directly determines a final code assignment that enters the participant database. Spending more per call on the highest-stakes decisions is exactly the right place to put the token budget.
This mirrors the architecture used on the discovery side. The Step 1.5 codebook architect is Opus because its decisions propagate everywhere downstream. The arbiter is the mirror image on the application side: a small number of decisions that disproportionately determine final output quality. Running the arbiter on Sonnet would save almost nothing and would risk propagating coder-level confusion into the final dataset on exactly the cases that matter most.
The arbiter also carries a second responsibility beyond code assignment: it flags whether the underlying definition was ambiguous. Those flags feed back into codebook refinement and are the primary signal for whether a code needs rewording before the next study. That meta-judgment — "is this disagreement about the evidence, or about the rulebook?" — requires reasoning about the codebook itself, not just applying it.
Coded data is assembled into a flat participant-by-code matrix — the single source of truth for all analysis and reporting downstream. Scripts: build-master-dataset.py and build-frequency-report.py
Takes final_codes.json, codebook.json, and roster.json and assembles a flat CSV with one row per participant.
{qid}_{code_name_snake_case} = 0 or 1Q5_reporting_analytics_gap and Q8_reporting_analytics_gap as separate columns preserves that distinction. Collapsing to a single column per code loses it permanently.With 140 codes across 5 thematic questions, this produces approximately 700 binary columns. This is correct — do not collapse them.
Reads master-participants.csv and codebook.json and produces code frequency tables and cross-tabs by firmographic variable.
Defining dimensions are identified and validated, PCA reduces the variable space, and k-means clustering discovers natural participant groups. Scripts: segment-prep.py and run-segmentation.py
The codebook's dimensions section classifies each coded variable. The segment-prep script reads these classifications and applies variance filters to produce a clean input for clustering.
Binary variables outside the 20-80% prevalence range are excluded from clustering. A variable where 95% of participants scored 1 carries almost no discriminating power.
N/10 rule: Maximum defining dimensions = sample_size / 10. With 180 participants, maximum 18 defining dimensions. More than this produces unstable clustering at our sample sizes.
The run-segmentation script standardizes the defining dimensions, reduces with PCA, and runs k-means clustering with automatic k selection.
Why PCA before clustering: Clustering directly on many binary dimensions suffers from the curse of dimensionality — distance metrics become less meaningful as dimensions increase. PCA reduces the space to a manageable number of orthogonal factors while preserving most of the variance.
After clustering, each segment is cross-tabbed against outcome variables and profiling variables to build segment descriptions and validate the solution.
For each segment, identify 2-3 observable proxies a sales rep can assess without a full research interview: company size (LinkedIn), seniority and job title, industry/company type, tech stack (G2, job postings), buying signals (recent funding, tool migration postings).
Before writing any section of the client report, the frequency report reveals which themes are prevalent, which are rare, and where the most interesting cross-tab differences appear.
When rejection reason data exists and at least 3 competitors have rejection sample sizes of n ≥ 10, build a competitive vulnerability summary showing each competitor's top 3 rejection reasons with positioning angles for the client's sales team.
Include when rejection reasons show differentiated patterns across competitors and the client's strengths (pricing, implementation speed, ease of use) map to competitors' top rejection reasons.
The client report is an HTML document built with Astro and deployed to Cloudflare Pages. It is the primary deliverable of the study — interactive, printable, and structured to serve marketing, product, and sales simultaneously.
break-inside: avoid on tables. Large tables push entirely to the next page, creating huge white gaps. Let tables break between rows; protect individual rows with break-inside: avoid on tr.Use this CSS pattern to keep headings with their charts:
This creates a chain: heading stays with description, description stays with legend/chart.
<div class="crosstab-section">page: landscape to the wrapper, NOT to .crosstab alone (otherwise the title stays on the portrait page)table-layout: fixed; width: 100%, font-size 9px data / 8px headerswhite-space: normal@page { margin: 16mm; }!important on all background colors + -webkit-print-color-adjust: exact !important on *details.accordion { page-break-before: always; } with first-of-type excludeddisplay: block !important on details and bodywrangler pages project create [name] --production-branch master before the deploy command.Pipeline Outputs
All output files go into A6 Data Files - Simulated/ or B6 Data Files - Real/ for new studies. Segmentation outputs go into the A5/B5 folder. (The HR Leaders BambooHR study uses A3/A4/A5 — a pre-convention study that is not being reorganized.)
| File | Created by | Contents |
|---|---|---|
| study-context.json | Discovery 1.1 | Who was interviewed, communication patterns, dominant topics |
| questions-registry.csv | Discovery 1.2 | One row per question with coding type and code count |
| meaning-units-log.csv | Discovery 1.3 | All meaning units with exact quotes and descriptive labels |
| codebook-audit-trail.csv | Discovery 1.5 | Architect's non-obvious merge and split decisions with reasoning |
| codebook-landscape-analysis.txt | Discovery 1.5 | Architect's chain-of-thought landscape analysis |
| codebook.json | After human review | Final approved codebook with all code definitions, criteria, and examples |
| agent-registry.json | Discovery + Application | Full record of every agent used: model, temperature, persona hash, role, run date |
| agent-1-codes.json | Application 2.1 | Raw coding output from Agent 1 |
| agent-2-codes.json | Application 2.1 | Raw coding output from Agent 2 |
| application-coding-detail.json | Application 2.3 | Full per-agent detail with resolver notes for every contested decision |
| reliability.txt | Application 2.2 | Human-readable Kappa report |
| reliability-summary.json | Application 2.2 | Machine-readable Kappa data per code |
| flagged-items.json | Application 2.2 | Codes with Kappa below 0.65 and ambiguous definitions |
| coding-summary.md | Application end | Run summary with participant counts and quality metrics |
| master-participants.csv | Dataset 3.1 (grows) | One row per participant, all coded variables + roster, grows with factor scores and segment assignments |
| frequency-report.html | Dataset 3.2 | Sortable tables, bar charts, collapsible cross-tab sections by firmographic variable |
| frequency-report.csv | Dataset 3.2 | Machine-readable frequencies |
| segmentation-ready.csv | Segmentation 4.1 | Defining dimensions after variance filter (A5 folder) |
| segmentation-validation-report.txt | Segmentation 4.1 | Variance checks on all coded variables (A5 folder) |
| segmentation-assignments.csv | Segmentation 4.2 | Segment assignments with factor scores (A5 folder) |
| segmentation-pca-report.txt | Segmentation 4.2 | PCA statistics, silhouette scores, cluster profiles (A5 folder) |
Research Citations
Every major design decision in this pipeline traces to published research. These are the sources cited when explaining the methodology to clients or peer reviewers.