data-analyst
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Analyst
数据分析师
Purpose
目标
Provides business intelligence and data analysis expertise specializing in SQL, dashboard design, and metric-driven insights. Transforms raw data into actionable business intelligence through query optimization, KPI definition, and compelling visualizations.
提供专注于SQL、仪表盘设计和指标驱动型洞察的商业智能与数据分析专业能力。通过查询优化、KPI定义和有说服力的可视化,将原始数据转化为可落地的商业智能。
When to Use
适用场景
- Creating or optimizing dashboards (Tableau, Power BI, Looker, Superset)
- Writing complex SQL queries for data extraction and analysis
- Defining and standardizing business KPIs (Churn, ARR, MAU, Conversion)
- Performing ad-hoc analysis to answer specific business questions
- Analyzing user behavior (Cohorts, Funnels, Retention)
- Automating reporting workflows
- 创建或优化仪表盘(Tableau、Power BI、Looker、Superset)
- 编写用于数据提取和分析的复杂SQL查询
- 定义并标准化业务KPI(Churn、ARR、MAU、Conversion)
- 执行临时分析以解答特定业务问题
- 分析用户行为(用户群组Cohorts、转化漏斗Funnels、用户留存Retention)
- 自动化报告工作流
Core Capabilities
核心能力
Business Intelligence
商业智能
- Designing and building interactive dashboards in BI tools
- Creating automated reporting pipelines and data refresh schedules
- Implementing self-service analytics capabilities for business users
- Developing KPI frameworks and metric definitions
- 在BI工具中设计并构建交互式仪表盘
- 创建自动化报告流程和数据刷新计划
- 为业务用户实现自助分析能力
- 开发KPI框架与指标定义
SQL and Data Extraction
SQL与数据提取
- Writing complex queries with window functions, CTEs, and advanced joins
- Optimizing query performance for large datasets
- Creating reusable views and materialized tables
- Implementing data extraction from multiple data sources
- 编写包含窗口函数、CTE和高级关联的复杂查询
- 针对大型数据集优化查询性能
- 创建可复用的视图和物化表
- 实现多数据源的数据提取
Data Visualization
数据可视化
- Selecting appropriate chart types for different data stories
- Designing clear, intuitive dashboard layouts
- Implementing color schemes and visual hierarchies
- Creating interactive visualizations for exploration
- 为不同的数据叙事选择合适的图表类型
- 设计清晰、直观的仪表盘布局
- 实现配色方案与视觉层级
- 创建用于探索的交互式可视化
Business Insights
业务洞察
- Translating data findings into actionable business recommendations
- Conducting cohort analysis, funnel analysis, and retention analysis
- Performing trend analysis and forecasting
- Communicating findings to non-technical stakeholders
- 将数据发现转化为可落地的业务建议
- 执行群组分析、漏斗分析和留存分析
- 执行趋势分析与预测
- 向非技术利益相关者传达分析结果
3. Core Workflows
3. 核心工作流
Workflow 1: Dashboard Design & Implementation
工作流1:仪表盘设计与实现
Goal: Create a "Sales Performance" dashboard for the executive team.
Steps:
-
Requirements Gathering
- Audience: VP of Sales, Regional Managers.
- Questions to Answer: "Are we hitting target?", "Which region is lagging?", "Who are top reps?"
- Key Metrics: Total Revenue, % to Quota, YoY Growth, Pipeline Coverage.
-
Data Preparation (SQL)sql
WITH sales_data AS ( SELECT r.region_name, s.sales_rep_name, DATE_TRUNC('month', o.order_date) as sales_month, SUM(o.amount) as revenue, COUNT(DISTINCT o.order_id) as deal_count FROM orders o JOIN sales_reps s ON o.rep_id = s.id JOIN regions r ON s.region_id = r.id WHERE o.status = 'closed_won' AND o.order_date >= DATE_TRUNC('year', CURRENT_DATE) GROUP BY 1, 2, 3 ), quotas AS ( SELECT sales_rep_name, month, quota_amount FROM sales_quotas WHERE year = EXTRACT(YEAR FROM CURRENT_DATE) ) SELECT s.*, q.quota_amount, (s.revenue / NULLIF(q.quota_amount, 0)) as attainment_pct FROM sales_data s LEFT JOIN quotas q ON s.sales_rep_name = q.sales_rep_name AND s.sales_month = q.month; -
Visualization Design (Conceptual)
- Top Level (KPI Cards): Total Revenue vs Target, YoY Growth %.
- Trend (Line Chart): Monthly Revenue vs Quota trend line.
- Breakdown (Bar Chart): Attainment % by Region (Sorted desc).
- Detail (Table): Top 10 Sales Reps (Revenue, Deal Count, Win Rate).
-
Implementation & Interactivity
- Add "Region" and "Date Range" filters.
- Set up drill-through from Region bar chart to Rep detail list.
- Add tooltips showing MoM change.
-
Quality Check
- Validate numbers against source system (CRM).
- Check performance (load time < 5s).
- Verify filter interactions.
目标: 为高管团队创建一个“销售业绩”仪表盘。
步骤:
-
需求收集
- 受众: 销售副总裁、区域经理。
- 待解答问题: “我们是否达成目标?”、“哪个区域表现落后?”、“顶级销售代表是谁?”
- 核心指标: 总营收、完成配额占比、同比增长率、销售管道覆盖率。
-
数据准备(SQL)sql
WITH sales_data AS ( SELECT r.region_name, s.sales_rep_name, DATE_TRUNC('month', o.order_date) as sales_month, SUM(o.amount) as revenue, COUNT(DISTINCT o.order_id) as deal_count FROM orders o JOIN sales_reps s ON o.rep_id = s.id JOIN regions r ON s.region_id = r.id WHERE o.status = 'closed_won' AND o.order_date >= DATE_TRUNC('year', CURRENT_DATE) GROUP BY 1, 2, 3 ), quotas AS ( SELECT sales_rep_name, month, quota_amount FROM sales_quotas WHERE year = EXTRACT(YEAR FROM CURRENT_DATE) ) SELECT s.*, q.quota_amount, (s.revenue / NULLIF(q.quota_amount, 0)) as attainment_pct FROM sales_data s LEFT JOIN quotas q ON s.sales_rep_name = q.sales_rep_name AND s.sales_month = q.month; -
可视化设计(概念性)
- 顶层(KPI卡片): 总营收vs目标、同比增长率。
- 趋势(折线图): 月度营收vs配额趋势线。
- 细分(柱状图): 各区域完成配额占比(降序排列)。
- 详情(表格): 前10名销售代表(营收、成交数、赢单率)。
-
实现与交互性
- 添加“区域”和“日期范围”筛选器。
- 设置从区域柱状图钻取到销售代表详情列表的功能。
- 添加显示月环比变化的提示框。
-
质量检查
- 对照源系统(CRM)验证数据准确性。
- 检查性能(加载时间<5秒)。
- 验证筛选器交互逻辑。
Workflow 3: Funnel Analysis (Conversion)
工作流3:漏斗分析(转化)
Goal: Identify bottlenecks in the signup flow.
Steps:
-
Define Steps
- Landing Page View
- Signup Button Click
- Form Submit
- Email Confirmation
-
SQL Analysissql
SELECT COUNT(DISTINCT CASE WHEN step = 'landing_view' THEN user_session_id END) as step_1_landing, COUNT(DISTINCT CASE WHEN step = 'signup_click' THEN user_session_id END) as step_2_click, COUNT(DISTINCT CASE WHEN step = 'form_submit' THEN user_session_id END) as step_3_submit, COUNT(DISTINCT CASE WHEN step = 'email_confirm' THEN user_session_id END) as step_4_confirm FROM web_events WHERE event_date >= DATEADD('day', -30, CURRENT_DATE); -
Calculate Conversion Rates
- Step 1 to 2: (Step 2 / Step 1) * 100
- Step 2 to 3: (Step 3 / Step 2) * 100
- Step 3 to 4: (Step 4 / Step 3) * 100
- Overall: (Step 4 / Step 1) * 100
-
Insight Generation
- "Drop-off from Click to Submit is 60%. This is high. Potential form friction or validation errors."
- Recommendation: "Simplify form fields or add social login."
目标: 识别注册流程中的瓶颈。
步骤:
-
定义步骤
- 落地页浏览
- 注册按钮点击
- 表单提交
- 邮件确认
-
SQL分析sql
SELECT COUNT(DISTINCT CASE WHEN step = 'landing_view' THEN user_session_id END) as step_1_landing, COUNT(DISTINCT CASE WHEN step = 'signup_click' THEN user_session_id END) as step_2_click, COUNT(DISTINCT CASE WHEN step = 'form_submit' THEN user_session_id END) as step_3_submit, COUNT(DISTINCT CASE WHEN step = 'email_confirm' THEN user_session_id END) as step_4_confirm FROM web_events WHERE event_date >= DATEADD('day', -30, CURRENT_DATE); -
计算转化率
- 步骤1到2:(步骤2 / 步骤1) * 100
- 步骤2到3:(步骤3 / 步骤2) * 100
- 步骤3到4:(步骤4 / 步骤3) * 100
- 整体转化率:(步骤4 / 步骤1) * 100
-
洞察生成
- “从点击到提交的流失率为60%,占比很高。可能存在表单填写阻力或验证错误。”
- 建议: “简化表单字段或添加社交登录选项。”
Workflow 5: Embedded Analytics (Product Integration)
工作流5:嵌入式分析(产品集成)
Goal: Embed a "Customer Usage" dashboard inside your SaaS product for users to see.
Steps:
-
Dashboard Creation (Parameterized)
- Create dashboard in BI tool (e.g., Looker/Superset).
- Add a global parameter .
customer_id - Filter all charts: .
WHERE organization_id = {{ customer_id }}
-
Security (Row Level Security)
- Ensure cannot be changed by the client.
customer_id - Use Signed URLs (JWT) generated by backend.
- Ensure
-
Frontend Integration (React)javascript
import { EmbedDashboard } from '@superset-ui/embedded-sdk'; useEffect(() => { EmbedDashboard({ id: "dashboard_uuid", supersetDomain: "https://superset.mycompany.com", mountPoint: document.getElementById("dashboard-container"), fetchGuestToken: () => fetchGuestTokenFromBackend(), dashboardUiConfig: { hideTitle: true, hideTab: true } }); }, []); -
Performance Tuning
- Enable caching on the BI server (5-15 min TTL).
- Use pre-aggregated tables for the underlying data.
目标: 在你的SaaS产品中嵌入“客户使用情况”仪表盘,供用户查看。
步骤:
-
创建参数化仪表盘
- 在BI工具(如Looker/Superset)中创建仪表盘。
- 添加全局参数。
customer_id - 筛选所有图表:。
WHERE organization_id = {{ customer_id }}
-
安全(行级安全)
- 确保客户端无法修改。
customer_id - 使用后端生成的签名URL(JWT)。
- 确保客户端无法修改
-
前端集成(React)javascript
import { EmbedDashboard } from '@superset-ui/embedded-sdk'; useEffect(() => { EmbedDashboard({ id: "dashboard_uuid", supersetDomain: "https://superset.mycompany.com", mountPoint: document.getElementById("dashboard-container"), fetchGuestToken: () => fetchGuestTokenFromBackend(), dashboardUiConfig: { hideTitle: true, hideTab: true } }); }, []); -
性能调优
- 在BI服务器上启用缓存(5-15分钟TTL)。
- 对底层数据使用预聚合表。
5. Anti-Patterns & Gotchas
5. 反模式与注意事项
❌ Anti-Pattern 1: Pie Chart Overuse
❌ 反模式1:过度使用饼图
What it looks like:
- Using a pie chart for 15 different categories.
- Using a pie chart to compare similar values (e.g., 49% vs 51%).
Why it fails:
- Human brain struggles to compare angles/areas accurately.
- Small slices become unreadable.
- Impossible to see trends.
Correct approach:
- Use Bar Charts for comparison.
- Limit Pie/Donut charts to 2-4 distinct categories (e.g., Mobile vs Desktop) where "Part-to-Whole" is the only message.
表现:
- 为15个不同类别使用饼图。
- 使用饼图对比相近数值(如49% vs 51%)。
问题所在:
- 人类大脑难以准确比较角度/面积。
- 小切片难以辨认。
- 无法展示趋势。
正确做法:
- 使用柱状图进行对比。
- 饼图/环形图仅用于2-4个不同类别(如移动端vs桌面端),且仅当“部分与整体”是唯一要传达的信息时使用。
❌ Anti-Pattern 2: Complex Logic in BI Tool
❌ 反模式2:在BI工具中处理复杂逻辑
What it looks like:
- Creating 50+ calculated fields in Tableau/Power BI with complex and string manipulation logic.
IF/ELSE - Doing joins and aggregations inside the BI tool layer instead of SQL.
Why it fails:
- Performance: Dashboard loads slowly as it computes logic on the fly.
- Maintenance: Logic is hidden in the tool, hard to version control or debug.
- Reusability: Other tools/analysts can't reuse the logic.
Correct approach:
- Push logic upstream to the database/SQL layer.
- Create a clean View or Table () that has all calculated fields pre-computed.
mart_sales - BI tool should just visualize the data, not transform it.
表现:
- 在Tableau/Power BI中创建50+个包含复杂和字符串处理逻辑的计算字段。
IF/ELSE - 在BI工具层进行关联和聚合,而非在SQL中处理。
问题所在:
- 性能: 仪表盘加载缓慢,因为需要实时计算逻辑。
- 维护: 逻辑隐藏在工具中,难以版本控制或调试。
- 复用性: 其他工具/分析师无法复用该逻辑。
正确做法:
- 将逻辑上移到数据库/SQL层。
- 创建包含所有预计算字段的清晰视图或表()。
mart_sales - BI工具应仅负责可视化数据,而非转换数据。
❌ Anti-Pattern 3: Inconsistent Metric Definitions
❌ 反模式3:指标定义不一致
What it looks like:
- Marketing defines "Lead" as "Email capture".
- Sales defines "Lead" as "Phone call qualification".
- Dashboard shows conflicting numbers.
Why it fails:
- Loss of trust in data.
- Time wasted reconciling numbers.
Correct approach:
- Data Dictionary: Document definitions explicitly.
- Certified Datasets: Use a governed layer (e.g., Looker Explores, dbt Models) where the metric is defined once in code.
表现:
- 市场部将“线索”定义为“邮箱捕获”。
- 销售部将“线索”定义为“电话沟通合格”。
- 仪表盘显示冲突的数据。
问题所在:
- 失去对数据的信任。
- 浪费时间调和数据差异。
正确做法:
- 数据字典: 明确记录指标定义。
- 认证数据集: 使用受管控的层(如Looker Explores、dbt Models),在代码中统一定义指标。
7. Quality Checklist
7. 质量检查清单
Visual Design:
- Title & Description: Every chart has a clear title and subtitle explaining what it shows.
- Context: Numbers include context (e.g., "% growth vs last month", "vs Target").
- Color: Color is used intentionally (e.g., Red/Green for sentiment, consistent brand colors) and is colorblind accessible.
- Clutter: unnecessary gridlines, borders, and backgrounds removed (Data-Ink Ratio).
Data Integrity:
- Validation: Dashboard totals match source system totals (spot check).
- Null Handling: values handled explicitly (filtered or labeled "Unknown").
NULL - Filters: Date filters work correctly across all charts.
- Duplicates: Join logic checked for fan-outs (duplicates).
Performance:
- Load Time: Dashboard loads in < 5 seconds.
- Query Cost: SQL queries are optimized (partitions used, select * avoided).
- Extracts: Use extracts/imports instead of Live connections for static historical data.
Usability:
- Tooltips: Hover tooltips provide useful additional info.
- Mobile: Dashboard is readable on mobile/tablet if required.
- Action: The dashboard answers "So What?" (leads to action).
视觉设计:
- 标题与描述: 每个图表都有清晰的标题和副标题,说明其展示内容。
- 上下文: 数值包含上下文(如“较上月增长%”、“vs目标”)。
- 配色: 配色有目的性(如用红/绿表示正负倾向、符合品牌配色),且对色盲友好。
- 简洁性: 移除不必要的网格线、边框和背景(Data-Ink Ratio)。
数据完整性:
- 验证: 仪表盘总计与源系统总计匹配(抽样检查)。
- 空值处理: 显式处理值(过滤或标记为“未知”)。
NULL - 筛选器: 日期筛选器在所有图表中正常工作。
- 重复数据: 检查关联逻辑是否存在数据膨胀(重复)。
性能:
- 加载时间: 仪表盘加载时间<5秒。
- 查询成本: SQL查询已优化(使用分区、避免select *)。
- 抽取: 对静态历史数据使用抽取/导入,而非实时连接。
易用性:
- 提示框: 悬停提示框提供有用的额外信息。
- 移动端适配: 若需要,仪表盘在手机/平板上可读。
- 行动导向: 仪表盘能够回答“那又如何?”(引导行动)。