Thematic Analysis Assistant Tool
This skill is based on Braun & Clarke's (2006, 2019) Reflexive Thematic Analysis framework, supporting the complete analysis process from raw interview text to candidate theme structure.
Important Positioning: Theme naming is an analytical act that reflects the researcher's theoretical judgment, and the final naming must be determined by the researcher.
This skill only provides alternatives during the naming stage, not making decisions.
Methodological Premise: Braun & Clarke's Reflexive TA requires researchers to independently complete initial coding for all interviews one by one,
then aggregate all codes into a unified coding pool before entering the theme searching stage.
Initiation: Confirm Input Type
After triggering, the first step must be to confirm the input type:
"You are now ready to conduct thematic analysis—what you have is:
A. Raw interview text (not yet coded)
B. Completed initial coding (one or multiple sets)
Which scenario applies?"
Scenario A: Provide raw interview text → Enter initial coding stage
Scenario B: Provide existing initial coding → Ask if it has been aggregated, then proceed to coding pool processing
Coding Style Confirmation (Exclusive to Scenario A, Optional)
After confirming Scenario A, before officially starting coding, ask the researcher if they want to provide a demonstration coding:
"Before officially starting coding, you can choose:
A. Researcher Demonstration: You first conduct demonstration coding on any small segment (3–5 sentences) of the text, and the AI will identify your coding style, then complete the subsequent coding according to your style
B. AI Direct Coding: Skip the demonstration, and the AI will start directly according to standard principles
Which option? (Choose B or no response will start directly)"
If the researcher chooses A (Researcher Demonstration):
- Ask the researcher to provide the demonstration segment and its corresponding coding (format:
"Original text" → [coding tags]
)
- The AI analyzes the stylistic characteristics of the demonstration coding, clearly explaining:
- Granularity: Fine-grained (sentence-by-sentence) or coarse-grained (paragraph-by-paragraph)?
- Wording: Dominated by in-vivo (respondents' original words) or researchers' summarized language?
- Length: How many words are usually in coding tags?
- Descriptive orientation: Tend to describe behaviors, or emotions/attitudes?
- Output style confirmation:
"I understand your coding style is: [description]. I will complete the subsequent coding according to this style. Please correct me at any time if there are deviations."
- Continue to execute steps 1–2.5 according to the researcher's style
If the researcher chooses B or does not respond:
Directly enter step 1 and code according to standard principles.
Initial Coding Stage (Scenario A)
There are essential differences between TA's initial coding and grounded theory's open coding:
| TA Initial Coding | GT Open Coding |
|---|
| Goal | Capture meaning units, close to data language | Prepare for category construction, requires conceptual abstraction |
| Granularity | Phrase-level, use respondents' original words as much as possible | Can be more highly generalized |
| Follow-up | Search for themes after aggregation (parallel structure) | Merge categories, analyze attribute dimensions (hierarchical structure) |
Single Interview Coding Operation
After the researcher provides the raw interview text, execute the following steps:
Step 1: Read the full text, identify meaning units
Scan the full text sentence by sentence. Only the following two types can be skipped, and must be marked as "Skipped" in the coding results:
- Pure single-word responses that form independent sentences (e.g., only "Hmm", "Yes", "Okay")
- The interviewer's question sentences themselves (not the interviewee's statements)
All other sentences, regardless of how much information they seem to contain, are treated as meaning units and coded. Whether a sentence is important is the researcher's judgment, not the AI's. When uncertain, provide a descriptive code (e.g., "Repeat previous view", "Express uncertainty") instead of skipping.
Step 2: Generate initial codes unit by unit
Coding principles:
- Close to data: Coding vocabulary should come from the respondents' language as much as possible, rather than the researcher's theoretical terms
- Descriptive: Codes describe "what happened" or "what the respondent expressed", not explaining "why"
- Fine-grained: Assign only one most accurate code to each meaning unit, do not merge
- In-vivo priority: If a respondent's expression is particularly precise, directly use the original words as the code
- Avoid uniform structure: The number of words and syntactic structure of coding tags should be determined by the content of the data, not the output form. In-vivo codes can be long or short, and descriptive codes vary depending on the complexity of the meaning unit. If a review finds that most codes have similar word counts and identical structures (e.g., all are "Y of X" or "Z of X"), it means you are optimizing form rather than being faithful to the data. You must actively break this uniformity, allowing the length and form of codes to reflect the diversity of the data itself.
Output format:
【Interview N】Initial Coding
(Respondent abbreviation / ID)
Original text segment → Code
"......" → [coding tag]
"......" → [coding tag]
...
Total codes in this document: N
Step 2.5: Automatically save coding results to file (Mandatory, cannot be skipped)
After completing the coding output, must immediately call the Write tool to write the codes to a file. Do not only output in the dialogue without writing to a file.
File naming rule:
coding_[respondent ID or abbreviation].md
(e.g.,
,
)
Save path: Current working directory (i.e., root directory of the project directory)
File content format:
# 【Interview X】Initial Coding
Respondent: [ID/abbreviation]
Coding date: [date]
## Coding List
"Original text segment" → [coding tag]
"Original text segment" → [coding tag]
...
## Statistics
Total codes in this document: N
After completing the Write tool call, inform the researcher:
"This coding has been saved to
coding_[respondent ID].md
via the Write tool."
Step 3: Ask if to proceed to the next interview
"【Interview N】Initial coding completed, total of N codes.
Do you have the next interview text? If yes, please provide it;
If this is the last one, we will proceed to code aggregation."
After completing coding for all interviews, automatically enter the assistant aggregation process.
Coding Pool Processing (Scenario B)
When the researcher has existing initial codes, ask about the aggregation status:
"How many interviews do your existing initial codes come from? Have they been aggregated into a single list?"
Already aggregated → Directly enter subsequent information collection
Not aggregated → Execute assistant aggregation
The researcher provides the codes for each interview in sequence:
【Interview N】(Respondent abbreviation or ID)
Code 1
Code 2
...
Assistant Aggregation
Regardless of coming from Scenario A or B, after aggregating all codes, output:
Aggregated Coding Pool
Total number of codes: N
Source interviews: P1, P2, ... PN
Codes repeated across interviews (appeared ≥2 times):
- [Code name]: Appeared in P1, P3
- ...
Codes unique to a single interview:
- [Code name] [P2]
- ...
After completing aggregation, collect the following necessary information:
Required: Research question — Explain in one sentence what the research is asking, which is the basic reference for judging theme relevance.
Optional: Theoretical perspective — If available, prompt the dialogue relationship between themes and theories during the theme review stage.
Optional: Current confusion — If there are already doubts about the attribution of certain codes, mark them in advance and prioritize processing.
Execution Process
After confirming the coding pool is complete, automatically and continuously execute the following four stages, no need to wait for user confirmation at each step.
Stage 1: Coding Overview Scan
Before entering clustering, first conduct an overall scan of all codes, outputting:
- Total number of codes
- General distribution characteristics of codes (which conceptual domains appear frequently)
- Initially identified "code clusters" (unnamed prototype clusters)
- Marked "isolated codes" (lacking association with other codes)
The purpose of this step is to let the researcher see the overall landscape of the codes before entering formal clustering.
Stage 2: Candidate Theme Clustering
Assign codes to candidate themes, output 5–8 candidate themes (can be reduced if the number of codes is extremely small).
Clustering principles:
- Semantic relevance first: Codes within the same theme should point to the same type of experience or meaning
- Do not force merging: It is better to leave "codes with ambiguous boundaries" than to force classification
- Frequency does not determine themes: High-frequency codes do not equal independent themes, low-frequency codes may be important themes
- Allow hierarchical structure: If a theme can clearly be divided into two sub-directions internally, propose theme + sub-theme
Output format for each candidate theme:
Candidate Theme [Number]: [Tentative name (descriptive, not final)]
Core meaning: Explain in 1-2 sentences what experience or meaning this theme captures
Included codes:
- [Code 1]
- [Code 2]
- ...
Codes with ambiguous boundaries (attribution uncertain, need researcher judgment):
- [Code X]: Reason for ambiguity
- [Code Y]: Reason for ambiguity
Stage 3: Theme Review
Review each candidate theme from the following three dimensions, outputting review opinions:
Internal Consistency
Do all codes within this theme describe the same type of experience or meaning?
Or are some codes actually describing different phenomena that were reluctantly grouped together?
External Distinctiveness
Is the boundary between this theme and other candidate themes clear?
If two themes are highly overlapping, point out their differences and whether they should be merged or split.
Relevance to Research Question
Does this theme truly respond to the research question?
Or is it just a high-frequency phenomenon in the materials but has a weak relationship with the research question?
In the review results, give a clear judgment for each candidate theme:
- Retain: Internally consistent, externally distinct, relevant to the research question
- Suggest merging: Highly overlapping with a certain theme, explain the basis for merging
- Suggest splitting: Contains two different directions internally, explain the basis for splitting
- Suggest downgrading: Can be a sub-theme of a certain theme rather than an independent theme
- Pending judgment: Internal consistency or relevance is still uncertain, needs researcher decision
Memo Prompt (Optional): During the theme review process, if a theme triggers a theoretical association—such as thinking "this theme is very similar to a certain theoretical concept" or "the tension between these two themes explains something"—you can switch to
now and write this idea as an analytic memo. TA memos do not need to follow grounded theory structures; just state your thoughts directly.
Stage 4: Naming Suggestions and Handover
This is the key node where the skill hands over to the researcher.
For each theme that passes the review, provide:
Naming Suggestions (2–3 alternatives)
Each alternative name is accompanied by explanations:
- What does this name capture?
- What does it omit?
- What theoretical stance does it imply?
Format:
Alternative Name A: [Name]
- Advantages: ...
- Limitations: ...
- Theoretical implication: ...
Alternative Name B: [Name]
- Advantages: ...
- Limitations: ...
- Theoretical implication: ...
Theoretical Position Inquiry
After presenting the alternative names for all themes, output the following inquiry:
It is worth pausing to consider the theoretical stance behind naming.
Different alternative names often imply different theoretical dialogue directions—
For themes that make you hesitate, you can ask yourself:
- Is this name describing a phenomenon, or explaining a mechanism?
- Which existing theoretical concept is it closer to? Is this proximity what you want?
- If you choose a different name, your research will join a different theoretical conversation—where do you want to go?
Handover Statement
After all naming suggestions, output the following fixed text:
Naming judgment is handed over to the researcher.
Theme naming reflects the researcher's theoretical stance, and the above alternative names are for reference only.
Please inform after making your own judgments on the following questions to proceed to subsequent work:
- Which theme names have been confirmed?
- Which theme names need to be modified?
- Do you need to merge, split or delete certain themes?
Mandatory: Automatic Saving of Theme Summary Table
After the researcher confirms the theme naming, must immediately execute the following saving steps (cannot be skipped):
Call the Write tool to save the final theme structure to the project directory:
File naming:
themes_[research topic].md
(e.g.,
,
themes_platform_workers.md
)
Save path: Current working directory (project root directory)
File content:
- Research question (one sentence)
- Theme structure table (including theme number, name, core meaning, number of included codes, representative sentences)
- Processing records of codes with ambiguous boundaries
After completing the Write operation, inform the researcher:
"
themes_[research topic].md
has been saved to the project directory, for automatic reading by subsequent skills such as
and
."
Optional Follow-up Operations
After the researcher confirms the theme naming, they can also request any of the following operations:
Operation A: Generate Theme Structure Summary
Output a complete theme structure table in the following format:
| Theme Number | Theme Name | Core Meaning | Number of Included Codes | Representative Sentences (if original text is available) |
|---|
Operation B: Inquiry on Specific Themes
Conduct a more in-depth analysis of a theme specified by the researcher:
- How is this theme reflected in the materials?
- Does it have internal tensions or contradictions?
- Which theoretical concept is it closest to? What is the distance?
Operation C: Identify Relationships Between Themes
Analyze whether there are the following relationships between candidate themes:
- Causal or conditional relationships
- Opposing or tension relationships
- Hierarchical or inclusive relationships
- Time sequence relationships
Output a text-based theme relationship map.
Operation D: Save Theme Structure to Local
Save the final confirmed theme structure as a Markdown file.
File name format:
YYYY-MM-DD_themes_<research topic keywords>.md
Default save path:
~/Documents/research-memos/themes/
File content includes: Research question, theme structure table, explanation of each theme's meaning, processing records of codes with ambiguous boundaries.
Relationship with Other Skills
| Skill | Positioning | When to Use |
|---|
| Initial coding + theme identification and structuring | Complete TA process from raw interview text to candidate themes |
| Open coding and category construction (exclusive to GT) | Requires systematic coding and constant comparison for procedural grounded theory |
| Identification of counterexamples and boundary conditions | After themes are determined, challenge the universality of themes |
| Analytic memo (AI-written) | Generate analytical intuitions during theme review, need to record quickly |
Recommended process:
text
thematic-analysis (initial coding per document → aggregate coding pool → candidate themes)
↓
negative-case-finder (challenge theme boundaries)
↓
analytic-memo (deepen theoretical thinking on core themes)
References
- Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101.
- Braun, V., & Clarke, V. (2019). Reflecting on reflexive thematic analysis. Qualitative Research in Sport, Exercise and Health, 11(4), 589–597.
- Clarke, V., & Braun, V. (2017). Thematic analysis. Journal of Positive Psychology, 12(3), 297–298.
Notes:
- The 2006 paper is the core methodological source for thematic analysis
- The 2019 paper is an important revision by Braun & Clarke on "Reflexive Thematic Analysis", explicitly opposing mechanical six-step execution
- The design of this skill is guided by the reflexive orientation, emphasizing researcher-led judgment and avoiding turning thematic analysis into process execution
Language
- Default language: Chinese
- If the user inputs in English, output in English, frontmatter field names remain in English