<objective>
Build revenue analytics infrastructure on HubSpot API + SQL data warehouse. Covers ICP validation, ML lead scoring, competitive intelligence, activity analysis, and pipeline forecasting — bridging CRM data into actionable intelligence products.
</objective>
<quick_start>
- Create a HubSpot Private App with required CRM scopes (contacts, companies, deals, owners, timeline)
- Confirm SQL replica access and schema prefix for your data warehouse
- Run ICP validation query (UC1) to segment conversion rates
- Build pipeline forecast (UC5) using stage-specific historical win rates
</quick_start>
<success_criteria>
- HubSpot Private App authenticated with all required scopes
- SQL warehouse connected and data freshness validated (sync lag < 24h)
- At least one use case (ICP, scoring, competitive, activity, forecast) producing results
- Lead scoring model trained on 200+ historical closed deals with measurable AUC
- Enrichment pipeline writing scores back to HubSpot without duplicates
</success_criteria>
HubSpot RevOps Analytics
Revenue analytics infrastructure on HubSpot API + SQL data warehouse.
Bridges CRM data → analytics → intelligence products → revenue impact.
Scope: HubSpot-specific analytics stack. For basic CRM CRUD, use
. For generic dashboards, use
.
Setup Checklist
1. HubSpot Private App
Create at Settings → Integrations → Private Apps:
| Scope | Permission | Why |
|---|
crm.objects.contacts.read/write
| Read/Write | Contact enrichment |
crm.objects.companies.read
| Read | Company data |
crm.objects.deals.read/write
| Read/Write | Pipeline analytics |
| Read | Custom objects |
| Read | Rep attribution |
| Read | Activity data |
2. SQL Replica Access
Discovery questions for your data warehouse:
| Question | Options |
|---|
| Where is HubSpot data replicated? | Snowflake / BigQuery / Postgres / Redshift |
| What ETL tool syncs it? | Fivetran / Airbyte / Stitch / HubSpot Data Sync |
| Sync frequency? | Real-time / Hourly / Daily |
| Schema prefix? | / / custom |
3. Python Environment
bash
pip install hubspot-api-client pandas scikit-learn requests
python
# SDK initialization
from hubspot import HubSpot
client = HubSpot(access_token="pat-na1-xxxxx")
# Or raw requests
import requests
HEADERS = {"Authorization": "Bearer pat-na1-xxxxx", "Content-Type": "application/json"}
BASE = "https://api.hubapi.com"
Core Use Cases
| # | Use Case | Input | Output | Tools |
|---|
| 1 | ICP Validation | Contact + company data | Segment conversion rates | SQL + Clay |
| 2 | Lead Scoring | Historical deals | Win probability per lead | SQL + ML + API |
| 3 | Competitive Intel | Deal close reasons | Win/loss by competitor | SQL + webhook |
| 4 | Activity Analysis | Engagement data | Activity→outcome correlation | SQL |
| 5 | Pipeline Forecast | Open deals + stage history | Weighted revenue forecast | SQL |
Use Case Details
UC1 — ICP Validation: Join contacts + companies + deals in SQL, segment by industry/size/geo, compute conversion rates per segment. Feed results to Clay for enrichment writeback.
UC2 — Lead Scoring: Train GradientBoostingClassifier on historical won/lost deals. Features: company size, industry, engagement score, days in pipeline. Deploy scores back to HubSpot as custom property.
UC3 — Competitive Intel: Extract competitor mentions from deal
. Build win/loss matrix by competitor. Trigger webhook alerts on competitive displacement patterns.
UC4 — Activity Analysis: Correlate email opens, meetings booked, calls logged with deal outcomes. Identify which activities actually move deals forward.
UC5 — Pipeline Forecast: Calculate weighted forecast using stage-specific win rates from historical data. Factor in deal age, velocity, and rep performance.
Reference: See
reference/sql-analytics.md
for complete SQL templates per use case.
Quick Reference: HubSpot API Endpoints
| Object | Endpoint | Key Operations |
|---|
| Contacts | | Search, create, update, batch |
| Companies | /crm/v3/objects/companies
| Search, associate to contacts |
| Deals | | Pipeline, stage history |
| Engagements | /crm/v3/objects/engagements
| Emails, calls, meetings |
| Properties | /crm/v3/properties/{object}
| Custom property CRUD |
| Associations | /crm/v4/associations/{from}/{to}
| Object linking |
| Search | /crm/v3/objects/{object}/search
| Filter + sort (max 10k) |
Reference: See
for auth, SDK patterns, batch operations.
Quick Reference: SQL Object Model
| HubSpot Object | SQL Table (typical) | Key Columns | Join Key |
|---|
| Contacts | | email, lifecycle_stage, lead_score | contact_id |
| Companies | | domain, industry, employee_count | company_id |
| Deals | | amount, stage, close_date, pipeline | deal_id |
| Deal Stages | hubspot.deal_stage_history
| stage, timestamp, duration | deal_id |
| Engagements | | type, created_at, contact_id | engagement_id |
| Owners | | email, first_name, team | owner_id |
Join pattern: contacts → associations → companies/deals (via association tables)
Integration Points
| Skill | Relationship |
|---|
| Base CRUD patterns, auth setup |
| Visualization, Streamlit dashboards |
| Pipeline metrics, MEDDIC context, forecasting |
| Market/competitive research methodology |
| Track API calls + Clay enrichment spend |
Common Mistakes
| Mistake | Fix |
|---|
| Exceeding 100 requests/10s rate limit | Use batch endpoints, add exponential backoff |
| Using Search API for >10k results | Switch to SQL warehouse for bulk analytics |
| Hardcoded property internal names | Fetch property definitions first: GET /crm/v3/properties/{object}
|
| Missing association API for object links | Use v4 associations: POST /crm/v4/associations/{from}/{to}/batch/read
|
| SQL in Postgres | Use or — see dialect notes |
| Not handling HubSpot's | Always include in property requests |
| Clay enrichment without dedup | Check existing property values before writeback |
| Scoring model trained on small dataset | Need 200+ closed deals minimum for reliable ML scores |
Workflow Phases
Phase 1: Foundation
- Set up Private App with required scopes
- Confirm SQL replica access and schema
- Run schema discovery queries
- Validate data freshness (sync lag)
Phase 2: Analytics
- Build ICP validation queries (UC1)
- Create pipeline velocity dashboard (UC2, UC5)
- Set up competitive intelligence tracking (UC3)
Phase 3: Intelligence
- Train lead scoring model on historical deals
- Deploy scores to HubSpot via API
- Build enrichment pipelines (Clay → HubSpot)
- Set up automated alerts and webhooks
Reference: See
reference/enrichment-pipelines.md
for ML scoring and Clay integration.
Reference: See
reference/architecture.md
for deployment patterns and cost estimates.