Loading...
Loading...
Use this skill whenever the user wants to work with survey data using the `survy` Python library. Triggers include: loading or reading survey CSV/Excel/JSON/SPSS files, handling multiselect (multi-choice) questions, computing frequency tables or crosstabs, exporting survey data to SPSS (.sav) or other formats, updating variable labels or value indices, transforming survey data between wide/compact formats, filtering respondents, replacing values, adding/dropping/sorting variables, or any task involving survy's API (read_csv, read_excel, read_json, read_polars, read_spss, crosstab, survey["Q1"], to_spss, to_csv, to_excel, to_json, etc.). Also trigger when the user says things like "analyze my survey", "process questionnaire data", "build a survey analysis script", or "help me with survy". Always read this skill before writing any survy code — it contains the correct API, patterns, and gotchas.
npx skill4agent add hoanghaoha/survy survey-analysissurvypip install --upgrade survyread_*survey["Q1"]| Attribute | Type | Description |
|---|---|---|
| | Column name (read/write via property) |
| | Human-readable label (read/write); defaults to |
| | One of |
| | Answer code → numeric index mapping; always empty |
| | Count of non-null/non-empty responses |
| | Total row count including nulls |
| | Underlying Polars dtype |
| | Frequency table (value, count, proportion) |
| | SPSS syntax for this variable |
Survey;id, gender, hobby
1, Male, Sport;Book
2, Female, Sport;Movie
3, Male, Moviehobby"Sport;Book"_1_2id, gender, hobby_1, hobby_2, hobby_3
1, Male, Book, , Sport
2, Female, , Movie, Sport
3, Male, , Movie,hobby_1hobby_2hobby_3hobbyhobbyMULTISELECThobby: [["Book", "Sport"], ["Movie", "Sport"], ["Movie"]]name_patternidmulti_.:"id(_multi)?"hobby_1id="hobby"multi="1"hobby_2id="hobby"hobby_1gender"id.multi"Q1.1Q1.2"id:multi"Q1:aQ1:bmy.var_1parse_idmy.var_1myvar_1my@var_1compact_idsauto_detect=Truecompact_separatorauto_detect=Truecompact_ids.savcompact_idsauto_detecthobby_1hobby_2name_pattern.sav"Male""Female"pyreadstat# Wide multiselect detected automatically
survey = survy.read_spss("data.sav")
# Custom suffix convention (Q1.1, Q1.2, ...)
survey = survy.read_spss("data.sav", name_pattern="id.multi")compact_idsauto_detectread_spssread_csvread_excelread_polarsread_json| Parameter | Type | Default | Description |
|---|---|---|---|
| | | Column IDs to treat as compact multiselect |
| | | Separator used to split compact cells |
| | | Auto-detect compact columns by scanning for separator |
| | | Format template for wide column names. Tokens: |
import survy
# --- Compact format data ---
# Option A: you know which columns are compact
survey = survy.read_csv("data_compact.csv", compact_ids=["hobby"], compact_separator=";")
# Option B: let survy scan for the separator automatically
survey = survy.read_csv("data_compact.csv", auto_detect=True, compact_separator=";")
# --- Wide format data ---
# Wide detection is automatic via name_pattern (default works for Q1_1, Q1_2, ...)
survey = survy.read_csv("data_wide.csv")
# Custom name_pattern if your columns use a different suffix convention
survey = survy.read_csv("data_wide.csv", name_pattern="id(_multi)?")
# --- Mixed: some columns are wide, some are compact ---
survey = survy.read_csv("data_mixed.csv", name_pattern="id(_multi)?", auto_detect=True)
# Excel — identical API to read_csv
survey = survy.read_excel("data.xlsx", auto_detect=True, compact_separator=";"){
"variables": [
{
"id": "gender",
"data": ["Male", "Female", "Male"],
"label": "Gender of respondent",
"value_indices": {"Female": 1, "Male": 2}
},
{
"id": "yob",
"data": [2000, 1999, 1998],
"label": "",
"value_indices": {}
},
{
"id": "hobby",
"data": [["Book", "Sport"], ["Movie", "Sport"], ["Movie"]],
"label": "Hobbies",
"value_indices": {"Book": 1, "Movie": 2, "Sport": 3}
}
]
}"variables""id""data""label""value_indices""data""data""value_indices"{}"data""value_indices"to_json()"vtype""select""multi_select""number"read_json()"vtype"survey = survy.read_json("data.json")exclude_nullTrueimport polars, survy
df = polars.DataFrame({
"gender": ["Male", "Female", "Male"],
"yob": [2000, 1999, 1998],
"hobby": ["Sport;Book", "Sport;Movie", "Movie"],
"animal_1": ["Cat", "", "Cat"],
"animal_2": ["Dog", "Dog", ""],
})
survey = survy.read_polars(df, auto_detect=True)survey.update([
{"id": "Q1", "label": "Satisfaction", "value_indices": {"good": 1, "bad": 2}},
{"id": "Q2", "label": "Channels used"},
])value_indicessurvey.add(some_variable) # Variable object
survey.add(polars.Series("new", [1, 2, 3])) # auto-wrapped into Variable"Q1#1"survey.drop("Q3") # silently ignored if not foundsurvey.sort() # alphabetical by id (default)
survey.sort(key=lambda v: v.base, reverse=True) # by response count descsurvey["gender"].replace({"Male": "M", "Female": "F"})value_indicesv = survey["Q1"]
v.id = "satisfaction"
v.label = "Overall satisfaction"
v.value_indices = {"very_satisfied": 1, "satisfied": 2, "neutral": 3}DataStructureErrorfiltered = survey.filter("hobby", ["Sport", "Book"])
filtered = survey.filter("gender", "Male") # single value also worksdf = survey.get_df(
select_dtype="text", # "text" | "number"
multiselect_dtype="compact", # "compact" | "text" | "number"
)select_dtype"text""number"value_indicesmultiselect_dtype"compact"List[str]"text"Q_1Q_2null"number"10"text""number""compact""numeric""string"survey["Q1"].frequencies
# → Polars DataFrame: columns [variable_id, "count", "proportion"]result = survy.crosstab(
column=survey["gender"], # grouping variable (columns)
row=survey["hobby"], # analyzed variable (rows)
filter=None, # optional: segment by another variable
aggfunc="count", # "count" | "percent" | "mean" | "median" | "sum"
alpha=0.05, # significance level for stat tests
)
# Returns dict[str, polars.DataFrame]
# Key is "Total" when no filter, or each filter-value when filter is provided"count""percent""mean""median""sum"name{name}_data.csvcompact{name}_variables_info.csvidvtypelabel{name}_values_info.csvidtextindexcompactFalseTrue"Book;Sport"Falsehobby_1hobby_2hobby_3# Default (compact=False) — multiselect expanded to wide columns
survey.to_csv("output/", name="results")
# Compact mode — multiselect joined into single cells
survey.to_csv("output/", name="results", compact=True, compact_separator=";")
# Excel — identical API and output structure (.xlsx files instead of .csv)
survey.to_excel("output/", name="results")
survey.to_excel("output/", name="results", compact=True){name}.sav{name}.spspyreadstatsurvey.to_spss("output/", name="results"){name}.jsonread_json"vtype"read_jsonensure_ascii=Falsesurvey.to_json("output/", name="results")"output/results.csv"name=print(survey.sps) # full syntax: VARIABLE LABELS, VALUE LABELS, MRSETS, CTABLESauto_detectcompact_idsvalue_indicesDataStructureErrorvalue_indicesupdate()get_df()filter()Noneread_csvFileTypeError.csvread_excel.xlsxread_spss.savto_csvto_excelcompact=Falsecompact=True_.:my.var_1parse_idmyvar_1| Task | Code |
|---|---|
| Load CSV auto-detect | |
| Load CSV explicit compact | |
| Load CSV wide format | |
| Load SPSS | |
| Load JSON | |
| Load from Polars DF | |
| Inspect variable | |
| Frequencies | |
| Crosstab count | |
| Crosstab percent | |
| Crosstab with filter | |
| Crosstab mean | |
| Filter respondents | |
| Replace values | |
| Add variable | |
| Drop variable | |
| Sort variables | |
| Batch update labels | |
| Get compact DF | |
| Get wide binary DF | |
| Export CSV | |
| Export SPSS | |
| Export JSON | |
| SPSS syntax string | |
| Serialize variable | |
references/api_reference.mdscripts/validate_survey.pyscripts/batch_export.pyassets/sample_data.csvassets/sample_data_compact.csv