Thematic Analysis Assistant Tool

This skill is based on Braun & Clarke's (2006, 2019) Reflexive Thematic Analysis framework, supporting the complete analysis process from raw interview text to candidate theme structure.

Important Positioning: Theme naming is an analytical act that reflects the researcher's theoretical judgment, and the final naming must be determined by the researcher. This skill only provides alternatives during the naming stage, not making decisions.

Methodological Premise: Braun & Clarke's Reflexive TA requires researchers to independently complete initial coding for all interviews one by one, then aggregate all codes into a unified coding pool before entering the theme searching stage.

Initiation: Confirm Input Type

After triggering, the first step must be to confirm the input type:

"You are now ready to conduct thematic analysis—what you have is: A. Raw interview text (not yet coded) B. Completed initial coding (one or multiple sets) Which scenario applies?"

Scenario A: Provide raw interview text → Enter initial coding stage

Scenario B: Provide existing initial coding → Ask if it has been aggregated, then proceed to coding pool processing

Coding Style Confirmation (Exclusive to Scenario A, Optional)

After confirming Scenario A, before officially starting coding, ask the researcher if they want to provide a demonstration coding:

"Before officially starting coding, you can choose: A. Researcher Demonstration: You first conduct demonstration coding on any small segment (3–5 sentences) of the text, and the AI will identify your coding style, then complete the subsequent coding according to your style B. AI Direct Coding: Skip the demonstration, and the AI will start directly according to standard principles

Which option? (Choose B or no response will start directly)"

If the researcher chooses A (Researcher Demonstration):

Ask the researcher to provide the demonstration segment and its corresponding coding (format:
```
"Original text" → [coding tags]
```
)
The AI analyzes the stylistic characteristics of the demonstration coding, clearly explaining:
- Granularity: Fine-grained (sentence-by-sentence) or coarse-grained (paragraph-by-paragraph)?
- Wording: Dominated by in-vivo (respondents' original words) or researchers' summarized language?
- Length: How many words are usually in coding tags?
- Descriptive orientation: Tend to describe behaviors, or emotions/attitudes?
Output style confirmation:

"I understand your coding style is: [description]. I will complete the subsequent coding according to this style. Please correct me at any time if there are deviations."
Continue to execute steps 1–2.5 according to the researcher's style

If the researcher chooses B or does not respond:

Directly enter step 1 and code according to standard principles.

Initial Coding Stage (Scenario A)

There are essential differences between TA's initial coding and grounded theory's open coding:

	TA Initial Coding	GT Open Coding
Goal	Capture meaning units, close to data language	Prepare for category construction, requires conceptual abstraction
Granularity	Phrase-level, use respondents' original words as much as possible	Can be more highly generalized
Follow-up	Search for themes after aggregation (parallel structure)	Merge categories, analyze attribute dimensions (hierarchical structure)

Single Interview Coding Operation

After the researcher provides the raw interview text, execute the following steps:

Step 1: Read the full text, identify meaning units

Scan the full text sentence by sentence. Only the following two types can be skipped, and must be marked as "Skipped" in the coding results:

Pure single-word responses that form independent sentences (e.g., only "Hmm", "Yes", "Okay")
The interviewer's question sentences themselves (not the interviewee's statements)

All other sentences, regardless of how much information they seem to contain, are treated as meaning units and coded. Whether a sentence is important is the researcher's judgment, not the AI's. When uncertain, provide a descriptive code (e.g., "Repeat previous view", "Express uncertainty") instead of skipping.

Step 2: Generate initial codes unit by unit

Coding principles:

Close to data: Coding vocabulary should come from the respondents' language as much as possible, rather than the researcher's theoretical terms
Descriptive: Codes describe "what happened" or "what the respondent expressed", not explaining "why"
Fine-grained: Assign only one most accurate code to each meaning unit, do not merge
In-vivo priority: If a respondent's expression is particularly precise, directly use the original words as the code
Avoid uniform structure: The number of words and syntactic structure of coding tags should be determined by the content of the data, not the output form. In-vivo codes can be long or short, and descriptive codes vary depending on the complexity of the meaning unit. If a review finds that most codes have similar word counts and identical structures (e.g., all are "Y of X" or "Z of X"), it means you are optimizing form rather than being faithful to the data. You must actively break this uniformity, allowing the length and form of codes to reflect the diversity of the data itself.

Output format:

【Interview N】Initial Coding
(Respondent abbreviation / ID)

Original text segment → Code
"......" → [coding tag]
"......" → [coding tag]
...

Total codes in this document: N

Step 2.5: Automatically save coding results to file (Mandatory, cannot be skipped)

After completing the coding output, must immediately call the Write tool to write the codes to a file. Do not only output in the dialogue without writing to a file.

File naming rule:

coding_[respondent ID or abbreviation].md

(e.g.,

coding_A.md

coding_P1.md

)

Save path: Current working directory (i.e., root directory of the project directory)

File content format:

# 【Interview X】Initial Coding
Respondent: [ID/abbreviation]
Coding date: [date]

## Coding List
"Original text segment" → [coding tag]
"Original text segment" → [coding tag]
...

## Statistics
Total codes in this document: N

After completing the Write tool call, inform the researcher:

"This coding has been saved to
coding_[respondent ID].md
via the Write tool."

Step 3: Ask if to proceed to the next interview

"【Interview N】Initial coding completed, total of N codes. Do you have the next interview text? If yes, please provide it; If this is the last one, we will proceed to code aggregation."

After completing coding for all interviews, automatically enter the assistant aggregation process.

Coding Pool Processing (Scenario B)

When the researcher has existing initial codes, ask about the aggregation status:

"How many interviews do your existing initial codes come from? Have they been aggregated into a single list?"

Already aggregated → Directly enter subsequent information collection

Not aggregated → Execute assistant aggregation

The researcher provides the codes for each interview in sequence:

【Interview N】(Respondent abbreviation or ID)
Code 1
Code 2
...

Assistant Aggregation

Regardless of coming from Scenario A or B, after aggregating all codes, output:

Aggregated Coding Pool
Total number of codes: N
Source interviews: P1, P2, ... PN

Codes repeated across interviews (appeared ≥2 times):
- [Code name]: Appeared in P1, P3
- ...

Codes unique to a single interview:
- [Code name] [P2]
- ...

After completing aggregation, collect the following necessary information:

Required: Research question — Explain in one sentence what the research is asking, which is the basic reference for judging theme relevance.

Optional: Theoretical perspective — If available, prompt the dialogue relationship between themes and theories during the theme review stage.

Optional: Current confusion — If there are already doubts about the attribution of certain codes, mark them in advance and prioritize processing.

Execution Process

After confirming the coding pool is complete, automatically and continuously execute the following four stages, no need to wait for user confirmation at each step.

Stage 1: Coding Overview Scan

Before entering clustering, first conduct an overall scan of all codes, outputting:

Total number of codes
General distribution characteristics of codes (which conceptual domains appear frequently)
Initially identified "code clusters" (unnamed prototype clusters)
Marked "isolated codes" (lacking association with other codes)

The purpose of this step is to let the researcher see the overall landscape of the codes before entering formal clustering.

Stage 2: Candidate Theme Clustering

Assign codes to candidate themes, output 5–8 candidate themes (can be reduced if the number of codes is extremely small).

Clustering principles:

Semantic relevance first: Codes within the same theme should point to the same type of experience or meaning
Do not force merging: It is better to leave "codes with ambiguous boundaries" than to force classification
Frequency does not determine themes: High-frequency codes do not equal independent themes, low-frequency codes may be important themes
Allow hierarchical structure: If a theme can clearly be divided into two sub-directions internally, propose theme + sub-theme

Output format for each candidate theme:

Candidate Theme [Number]: [Tentative name (descriptive, not final)]

Core meaning: Explain in 1-2 sentences what experience or meaning this theme captures

Included codes:
- [Code 1]
- [Code 2]
- ...

Codes with ambiguous boundaries (attribution uncertain, need researcher judgment):
- [Code X]: Reason for ambiguity
- [Code Y]: Reason for ambiguity

Stage 3: Theme Review

Review each candidate theme from the following three dimensions, outputting review opinions:

Internal Consistency

Do all codes within this theme describe the same type of experience or meaning? Or are some codes actually describing different phenomena that were reluctantly grouped together?

External Distinctiveness

Is the boundary between this theme and other candidate themes clear? If two themes are highly overlapping, point out their differences and whether they should be merged or split.

Relevance to Research Question

Does this theme truly respond to the research question? Or is it just a high-frequency phenomenon in the materials but has a weak relationship with the research question?

In the review results, give a clear judgment for each candidate theme:

Retain: Internally consistent, externally distinct, relevant to the research question
Suggest merging: Highly overlapping with a certain theme, explain the basis for merging
Suggest splitting: Contains two different directions internally, explain the basis for splitting
Suggest downgrading: Can be a sub-theme of a certain theme rather than an independent theme
Pending judgment: Internal consistency or relevance is still uncertain, needs researcher decision

Memo Prompt (Optional): During the theme review process, if a theme triggers a theoretical association—such as thinking "this theme is very similar to a certain theoretical concept" or "the tension between these two themes explains something"—you can switch to
analytic-memo
now and write this idea as an analytic memo. TA memos do not need to follow grounded theory structures; just state your thoughts directly.

Stage 4: Naming Suggestions and Handover

This is the key node where the skill hands over to the researcher.

For each theme that passes the review, provide:

Naming Suggestions (2–3 alternatives)

Each alternative name is accompanied by explanations:

What does this name capture?
What does it omit?
What theoretical stance does it imply?

Format:

Alternative Name A: [Name]
- Advantages: ...
- Limitations: ...
- Theoretical implication: ...

Alternative Name B: [Name]
- Advantages: ...
- Limitations: ...
- Theoretical implication: ...

Theoretical Position Inquiry

After presenting the alternative names for all themes, output the following inquiry:

It is worth pausing to consider the theoretical stance behind naming. Different alternative names often imply different theoretical dialogue directions— For themes that make you hesitate, you can ask yourself:

Is this name describing a phenomenon, or explaining a mechanism?

Which existing theoretical concept is it closer to? Is this proximity what you want?

If you choose a different name, your research will join a different theoretical conversation—where do you want to go?

Handover Statement

After all naming suggestions, output the following fixed text:

Naming judgment is handed over to the researcher. Theme naming reflects the researcher's theoretical stance, and the above alternative names are for reference only. Please inform after making your own judgments on the following questions to proceed to subsequent work:

Which theme names have been confirmed?

Which theme names need to be modified?

Do you need to merge, split or delete certain themes?

Mandatory: Automatic Saving of Theme Summary Table

After the researcher confirms the theme naming, must immediately execute the following saving steps (cannot be skipped):

Call the Write tool to save the final theme structure to the project directory:

File naming:

themes_[research topic].md

(e.g.,

themes_taxi_drivers.md

themes_platform_workers.md

) Save path: Current working directory (project root directory)

File content:

Research question (one sentence)
Theme structure table (including theme number, name, core meaning, number of included codes, representative sentences)
Processing records of codes with ambiguous boundaries

After completing the Write operation, inform the researcher:

"
themes_[research topic].md
has been saved to the project directory, for automatic reading by subsequent skills such as
ta-methods-writer
and
ta-findings-writer
."

Optional Follow-up Operations

After the researcher confirms the theme naming, they can also request any of the following operations:

Operation A: Generate Theme Structure Summary

Output a complete theme structure table in the following format:

Theme Number	Theme Name	Core Meaning	Number of Included Codes	Representative Sentences (if original text is available)

Operation B: Inquiry on Specific Themes

Conduct a more in-depth analysis of a theme specified by the researcher:

How is this theme reflected in the materials?
Does it have internal tensions or contradictions?
Which theoretical concept is it closest to? What is the distance?

Operation C: Identify Relationships Between Themes

Analyze whether there are the following relationships between candidate themes:

Causal or conditional relationships
Opposing or tension relationships
Hierarchical or inclusive relationships
Time sequence relationships

Output a text-based theme relationship map.

Operation D: Save Theme Structure to Local

Save the final confirmed theme structure as a Markdown file.

File name format:

YYYY-MM-DD_themes_<research topic keywords>.md

Default save path:

~/Documents/research-memos/themes/

File content includes: Research question, theme structure table, explanation of each theme's meaning, processing records of codes with ambiguous boundaries.

Relationship with Other Skills

Skill	Positioning	When to Use
`thematic-analysis`	Initial coding + theme identification and structuring	Complete TA process from raw interview text to candidate themes
`grounded-coding`	Open coding and category construction (exclusive to GT)	Requires systematic coding and constant comparison for procedural grounded theory
`negative-case-finder`	Identification of counterexamples and boundary conditions	After themes are determined, challenge the universality of themes
`analytic-memo`	Analytic memo (AI-written)	Generate analytical intuitions during theme review, need to record quickly

Recommended process:

text

thematic-analysis (initial coding per document → aggregate coding pool → candidate themes)
    ↓
negative-case-finder (challenge theme boundaries)
    ↓
analytic-memo (deepen theoretical thinking on core themes)

References

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101.
Braun, V., & Clarke, V. (2019). Reflecting on reflexive thematic analysis. Qualitative Research in Sport, Exercise and Health, 11(4), 589–597.
Clarke, V., & Braun, V. (2017). Thematic analysis. Journal of Positive Psychology, 12(3), 297–298.

Notes:

The 2006 paper is the core methodological source for thematic analysis
The 2019 paper is an important revision by Braun & Clarke on "Reflexive Thematic Analysis", explicitly opposing mechanical six-step execution
The design of this skill is guided by the reflexive orientation, emphasizing researcher-led judgment and avoiding turning thematic analysis into process execution

Language

Default language: Chinese
If the user inputs in English, output in English, frontmatter field names remain in English

thematic-analysis

NPX Install

Tags

SKILL.md Content (Chinese)