stock-correlation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStock Correlation Analysis Skill
股票相关性分析技能
Finds and analyzes correlated stocks using historical price data from Yahoo Finance via yfinance. Routes to specialized sub-skills based on user intent.
Important: This is for research and educational purposes only. Not financial advice. yfinance is not affiliated with Yahoo, Inc.
通过yfinance获取Yahoo Finance的历史价格数据,查找并分析关联股票。根据用户意图调用对应的专项子技能。
重要提示:本技能仅用于研究和教育目的,不构成投资建议。yfinance与雅虎公司(Yahoo, Inc.)无关联。
Step 1: Ensure Dependencies Are Available
步骤1:确保依赖项可用
Before running any code, install required packages if needed:
python
import subprocess, sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "yfinance", "pandas", "numpy"])Always include this at the top of your script.
在运行任何代码前,若需要请安装所需依赖包:
python
import subprocess, sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "yfinance", "pandas", "numpy"])请始终在脚本顶部添加上述代码。
Step 2: Route to the Correct Sub-Skill
步骤2:调用对应的专项子技能
Classify the user's request and jump to the matching sub-skill section below.
| User Request | Route To | Examples |
|---|---|---|
| Single ticker, wants to find related stocks | Sub-Skill A: Co-movement Discovery | "what correlates with NVDA", "find stocks related to AMD", "sympathy plays for TSLA" |
| Two or more specific tickers, wants relationship details | Sub-Skill B: Return Correlation | "correlation between AMD and NVDA", "how do LITE and COHR move together", "compare AAPL vs MSFT" |
| Group of tickers, wants structure/grouping | Sub-Skill C: Sector Clustering | "correlation matrix for FAANG", "cluster these semiconductor stocks", "sector peers for AMD" |
| Wants time-varying or conditional correlation | Sub-Skill D: Realized Correlation | "rolling correlation AMD NVDA", "when NVDA drops what else drops", "how has correlation changed" |
If ambiguous, default to Sub-Skill A (Co-movement Discovery) for single tickers, or Sub-Skill B (Return Correlation) for two tickers.
对用户的请求进行分类,跳转到下方对应的子技能章节。
| 用户请求类型 | 跳转至 | 示例 |
|---|---|---|
| 单个股票代码,希望找到关联标的 | 子技能A:联动标的发现 | "与NVDA相关的标的有哪些"、"找到与AMD相关的股票"、"TSLA的跟风交易标的" |
| 两个或多个特定股票代码,希望了解其关系细节 | 子技能B:收益相关性分析 | "AMD与NVDA的相关性"、"LITE和COHR的联动情况"、"对比AAPL与MSFT的走势相关性" |
| 一组股票代码,希望了解其相关性结构与聚类 | 子技能C:行业聚类分析 | "FAANG的相关系数矩阵"、"对这些半导体股票进行聚类"、"AMD的行业同行股" |
| 希望了解随时间变化或不同市场环境下的相关性 | 子技能D:已实现相关性分析 | "AMD与NVDA的滚动相关性"、"当NVDA下跌时还有哪些会下跌"、"相关性如何随时间变化" |
若请求存在歧义,单个股票代码默认使用子技能A(联动标的发现),两个股票代码默认使用子技能B(收益相关性分析)。
Defaults for all sub-skills
所有子技能的默认参数
| Parameter | Default |
|---|---|
| Lookback period | |
| Data interval | |
| Correlation method | Pearson |
| Minimum correlation threshold | 0.60 |
| Number of results | Top 10 |
| Return type | Daily log returns |
| Rolling window | 60 trading days |
| 参数 | 默认值 |
|---|---|
| 回溯周期 | |
| 数据间隔 | |
| 相关性计算方法 | Pearson(皮尔逊相关系数) |
| 最小相关性阈值 | 0.60 |
| 结果数量 | 前10个 |
| 收益类型 | 日度对数收益 |
| 滚动窗口 | 60个交易日 |
Sub-Skill A: Co-movement Discovery
子技能A:联动标的发现
Goal: Given a single ticker, find stocks that move with it.
目标:给定单个股票代码(ticker),找到与其联动的股票。
A1: Build the peer universe
A1:构建同行标的池
You need 15-30 candidates. Do not use hardcoded ticker lists — build the universe dynamically at runtime. See for the full implementation. The approach:
references/sector_universes.md- Screen same-industry stocks using +
yf.screen()to find stocks in the same industry as the targetyf.EquityQuery - Broaden to sector if the industry screen returns fewer than 10 peers
- Add thematic/adjacent industries — read the target's and screen 1-2 related industries (e.g., a semiconductor company → also screen semiconductor equipment)
longBusinessSummary - Combine, deduplicate, remove target ticker
你需要15-30个候选标的。请勿使用硬编码的股票代码列表——需在运行时动态构建标的池。完整实现请参考。具体方法:
references/sector_universes.md- 筛选同行业股票:使用+
yf.screen()找到与目标标的同行业的股票yf.EquityQuery - 扩展至行业范围:若同行业筛选结果不足10个,则扩大至整个行业
- 添加主题/相邻行业标的:读取目标标的的,筛选1-2个相关行业的标的(例如,半导体公司→同时筛选半导体设备行业标的)
longBusinessSummary - 合并、去重、移除目标标的自身
A2: Compute correlations
A2:计算相关性
python
import yfinance as yf
import pandas as pd
import numpy as np
def discover_comovement(target_ticker, peer_tickers, period="1y"):
all_tickers = [target_ticker] + [t for t in peer_tickers if t != target_ticker]
data = yf.download(all_tickers, period=period, auto_adjust=True, progress=False)
# Extract close prices — yf.download returns MultiIndex (Price, Ticker) columns
closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
# Log returns
returns = np.log(closes / closes.shift(1)).dropna()
corr_series = returns.corr()[target_ticker].drop(target_ticker, errors="ignore")
# Rank by absolute correlation
ranked = corr_series.abs().sort_values(ascending=False)
result = pd.DataFrame({
"Ticker": ranked.index,
"Correlation": [round(corr_series[t], 4) for t in ranked.index],
})
return result, returnspython
import yfinance as yf
import pandas as pd
import numpy as np
def discover_comovement(target_ticker, peer_tickers, period="1y"):
all_tickers = [target_ticker] + [t for t in peer_tickers if t != target_ticker]
data = yf.download(all_tickers, period=period, auto_adjust=True, progress=False)
# Extract close prices — yf.download returns MultiIndex (Price, Ticker) columns
closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
# Log returns
returns = np.log(closes / closes.shift(1)).dropna()
corr_series = returns.corr()[target_ticker].drop(target_ticker, errors="ignore")
# Rank by absolute correlation
ranked = corr_series.abs().sort_values(ascending=False)
result = pd.DataFrame({
"Ticker": ranked.index,
"Correlation": [round(corr_series[t], 4) for t in ranked.index],
})
return result, returnsA3: Present results
A3:展示结果
Show a ranked table with company names and sectors (fetch via ):
yf.Ticker(t).info.get("shortName")| Rank | Ticker | Company | Correlation | Why linked |
|---|---|---|---|---|
| 1 | AMD | Advanced Micro Devices | 0.82 | Same industry — GPU/CPU |
| 2 | AVGO | Broadcom | 0.78 | AI infrastructure peer |
Include:
- Top 10 positively correlated stocks
- Any notable negatively correlated stocks (potential hedges)
- Brief explanation of why each might be linked (sector, supply chain, customer overlap)
展示带公司名称和行业的排名表格(通过获取):
yf.Ticker(t).info.get("shortName")| 排名 | 股票代码(Ticker) | 公司名称 | 相关系数 | 关联原因 |
|---|---|---|---|---|
| 1 | AMD | Advanced Micro Devices | 0.82 | 同行业——GPU/CPU厂商 |
| 2 | AVGO | Broadcom | 0.78 | AI基础设施同行 |
需包含:
- 前10个正相关度最高的股票
- 任何值得关注的负相关股票(潜在对冲标的)
- 每个标的关联原因的简要说明(行业、供应链、客户重叠等)
Sub-Skill B: Return Correlation
子技能B:收益相关性分析
Goal: Deep-dive into the relationship between two (or a few) specific tickers.
目标:深入分析两个(或少数几个)特定股票代码之间的关系。
B1: Download and compute
B1:下载数据并计算
python
import yfinance as yf
import pandas as pd
import numpy as np
def return_correlation(ticker_a, ticker_b, period="1y"):
data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
closes = data["Close"][[ticker_a, ticker_b]].dropna()
returns = np.log(closes / closes.shift(1)).dropna()
corr = returns[ticker_a].corr(returns[ticker_b])
# Beta: how much does B move per unit move of A
cov_matrix = returns.cov()
beta = cov_matrix.loc[ticker_b, ticker_a] / cov_matrix.loc[ticker_a, ticker_a]
# R-squared
r_squared = corr ** 2
# Rolling 60-day correlation for stability
rolling_corr = returns[ticker_a].rolling(60).corr(returns[ticker_b])
# Spread (log price ratio) for mean-reversion
spread = np.log(closes[ticker_a] / closes[ticker_b])
spread_z = (spread - spread.mean()) / spread.std()
return {
"correlation": round(corr, 4),
"beta": round(beta, 4),
"r_squared": round(r_squared, 4),
"rolling_corr_mean": round(rolling_corr.mean(), 4),
"rolling_corr_std": round(rolling_corr.std(), 4),
"rolling_corr_min": round(rolling_corr.min(), 4),
"rolling_corr_max": round(rolling_corr.max(), 4),
"spread_z_current": round(spread_z.iloc[-1], 4),
"observations": len(returns),
}python
import yfinance as yf
import pandas as pd
import numpy as np
def return_correlation(ticker_a, ticker_b, period="1y"):
data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
closes = data["Close"][[ticker_a, ticker_b]].dropna()
returns = np.log(closes / closes.shift(1)).dropna()
corr = returns[ticker_a].corr(returns[ticker_b])
# Beta: how much does B move per unit move of A
cov_matrix = returns.cov()
beta = cov_matrix.loc[ticker_b, ticker_a] / cov_matrix.loc[ticker_a, ticker_a]
# R-squared
r_squared = corr ** 2
# Rolling 60-day correlation for stability
rolling_corr = returns[ticker_a].rolling(60).corr(returns[ticker_b])
# Spread (log price ratio) for mean-reversion
spread = np.log(closes[ticker_a] / closes[ticker_b])
spread_z = (spread - spread.mean()) / spread.std()
return {
"correlation": round(corr, 4),
"beta": round(beta, 4),
"r_squared": round(r_squared, 4),
"rolling_corr_mean": round(rolling_corr.mean(), 4),
"rolling_corr_std": round(rolling_corr.std(), 4),
"rolling_corr_min": round(rolling_corr.min(), 4),
"rolling_corr_max": round(rolling_corr.max(), 4),
"spread_z_current": round(spread_z.iloc[-1], 4),
"observations": len(returns),
}B2: Present results
B2:展示结果
Show a summary card:
| Metric | Value |
|---|---|
| Pearson Correlation | 0.82 |
| Beta (B vs A) | 1.15 |
| R-squared | 0.67 |
| Rolling Corr (60d avg) | 0.80 |
| Rolling Corr Range | [0.55, 0.94] |
| Rolling Corr Std Dev | 0.08 |
| Spread Z-Score (current) | +1.2 |
| Observations | 250 |
Interpretation guide:
- Correlation > 0.80: Strong co-movement — these stocks are tightly linked
- Correlation 0.50–0.80: Moderate — shared sector drivers but independent factors too
- Correlation < 0.50: Weak — limited co-movement despite possible sector overlap
- High rolling std: Unstable relationship — correlation varies significantly over time
- Spread Z > |2|: Unusual divergence from historical relationship
展示摘要卡片:
| 指标 | 数值 |
|---|---|
| Pearson相关系数 | 0.82 |
| Beta(B vs A) | 1.15 |
| R-squared | 0.67 |
| 60日滚动相关性均值 | 0.80 |
| 滚动相关性范围 | [0.55, 0.94] |
| 滚动相关性标准差 | 0.08 |
| 当前价差Z值 | +1.2 |
| 观测样本量 | 250 |
解读指南:
- 相关系数>0.80:强联动性——这些股票走势高度绑定
- 相关系数0.50–0.80:中等联动性——受共同行业驱动,但也存在独立影响因素
- 相关系数<0.50:弱联动性——尽管可能同属一个行业,但联动性有限
- 滚动相关性标准差高:关系不稳定——相关性随时间波动显著
- 价差Z值> |2|:与历史关系出现异常偏离
Sub-Skill C: Sector Clustering
子技能C:行业聚类分析
Goal: Given a group of tickers, show the full correlation structure and identify clusters.
目标:给定一组股票代码,展示完整的相关性结构并识别聚类。
C1: Build the correlation matrix
C1:构建相关系数矩阵
python
import yfinance as yf
import pandas as pd
import numpy as np
def sector_clustering(tickers, period="1y"):
data = yf.download(tickers, period=period, auto_adjust=True, progress=False)
# yf.download returns MultiIndex (Price, Ticker) columns
closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
returns = np.log(closes / closes.shift(1)).dropna()
corr_matrix = returns.corr()
# Hierarchical clustering order
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import squareform
dist_matrix = 1 - corr_matrix.abs()
np.fill_diagonal(dist_matrix.values, 0)
condensed = squareform(dist_matrix)
linkage_matrix = linkage(condensed, method="ward")
order = leaves_list(linkage_matrix)
ordered_tickers = [corr_matrix.columns[i] for i in order]
# Reorder matrix
clustered = corr_matrix.loc[ordered_tickers, ordered_tickers]
return clustered, returnsNote: if is not available, fall back to sorting by average correlation instead of hierarchical clustering.
scipypython
import yfinance as yf
import pandas as pd
import numpy as np
def sector_clustering(tickers, period="1y"):
data = yf.download(tickers, period=period, auto_adjust=True, progress=False)
# yf.download returns MultiIndex (Price, Ticker) columns
closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
returns = np.log(closes / closes.shift(1)).dropna()
corr_matrix = returns.corr()
# Hierarchical clustering order
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import squareform
dist_matrix = 1 - corr_matrix.abs()
np.fill_diagonal(dist_matrix.values, 0)
condensed = squareform(dist_matrix)
linkage_matrix = linkage(condensed, method="ward")
order = leaves_list(linkage_matrix)
ordered_tickers = [corr_matrix.columns[i] for i in order]
# Reorder matrix
clustered = corr_matrix.loc[ordered_tickers, ordered_tickers]
return clustered, returns注意:若不可用,则退而求其次,按平均相关系数排序,而非使用层次聚类。
scipyC2: Present results
C2:展示结果
-
Full correlation matrix — formatted as a table. For more than 8 tickers, show as a heatmap description or highlight only the strongest/weakest pairs.
-
Identified clusters — group tickers that have high intra-group correlation:
- Cluster 1: [NVDA, AMD, AVGO] — avg intra-correlation 0.82
- Cluster 2: [AAPL, MSFT] — avg intra-correlation 0.75
-
Outliers — tickers with low average correlation to the group (potential diversifiers).
-
Strongest pairs — top 5 highest-correlation pairs in the matrix.
-
Weakest pairs — top 5 lowest/negative-correlation pairs (hedging candidates).
-
完整相关系数矩阵:以表格形式展示。若标的数量超过8个,则以热力图描述或仅突出最强/最弱配对。
-
识别出的聚类:将组内相关度高的标的分组:
- 聚类1:[NVDA, AMD, AVGO] — 组内平均相关度0.82
- 聚类2:[AAPL, MSFT] — 组内平均相关度0.75
-
异常值:与组内其他标的平均相关度低的标的(潜在分散化配置标的)。
-
最强配对:矩阵中相关度最高的前5对标的。
-
最弱配对:矩阵中相关度最低/负相关的前5对标的(对冲候选标的)。
Sub-Skill D: Realized Correlation
子技能D:已实现相关性分析
Goal: Show how correlation changes over time and under different market conditions.
目标:展示相关性如何随时间变化以及在不同市场环境下的表现。
D1: Rolling correlation
D1:滚动相关性
python
import yfinance as yf
import pandas as pd
import numpy as np
def realized_correlation(ticker_a, ticker_b, period="2y", windows=[20, 60, 120]):
data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
closes = data["Close"][[ticker_a, ticker_b]].dropna()
returns = np.log(closes / closes.shift(1)).dropna()
rolling = {}
for w in windows:
rolling[f"{w}d"] = returns[ticker_a].rolling(w).corr(returns[ticker_b])
return rolling, returnspython
import yfinance as yf
import pandas as pd
import numpy as np
def realized_correlation(ticker_a, ticker_b, period="2y", windows=[20, 60, 120]):
data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
closes = data["Close"][[ticker_a, ticker_b]].dropna()
returns = np.log(closes / closes.shift(1)).dropna()
rolling = {}
for w in windows:
rolling[f"{w}d"] = returns[ticker_a].rolling(w).corr(returns[ticker_b])
return rolling, returnsD2: Regime-conditional correlation
D2:基于市场状态的条件相关性
python
def regime_correlation(returns, ticker_a, ticker_b, condition_ticker=None):
"""Compare correlation across up/down/volatile regimes."""
if condition_ticker is None:
condition_ticker = ticker_a
ret = returns[condition_ticker]
regimes = {
"All Days": pd.Series(True, index=returns.index),
"Up Days (target > 0)": ret > 0,
"Down Days (target < 0)": ret < 0,
"High Vol (top 25%)": ret.abs() > ret.abs().quantile(0.75),
"Low Vol (bottom 25%)": ret.abs() < ret.abs().quantile(0.25),
"Large Drawdown (< -2%)": ret < -0.02,
}
results = {}
for name, mask in regimes.items():
subset = returns[mask]
if len(subset) >= 20:
results[name] = {
"correlation": round(subset[ticker_a].corr(subset[ticker_b]), 4),
"days": int(mask.sum()),
}
return resultspython
def regime_correlation(returns, ticker_a, ticker_b, condition_ticker=None):
"""Compare correlation across up/down/volatile regimes."""
if condition_ticker is None:
condition_ticker = ticker_a
ret = returns[condition_ticker]
regimes = {
"All Days": pd.Series(True, index=returns.index),
"Up Days (target > 0)": ret > 0,
"Down Days (target < 0)": ret < 0,
"High Vol (top 25%)": ret.abs() > ret.abs().quantile(0.75),
"Low Vol (bottom 25%)": ret.abs() < ret.abs().quantile(0.25),
"Large Drawdown (< -2%)": ret < -0.02,
}
results = {}
for name, mask in regimes.items():
subset = returns[mask]
if len(subset) >= 20:
results[name] = {
"correlation": round(subset[ticker_a].corr(subset[ticker_b]), 4),
"days": int(mask.sum()),
}
return resultsD3: Present results
D3:展示结果
- Rolling correlation summary table:
| Window | Current | Mean | Min | Max | Std |
|---|---|---|---|---|---|
| 20-day | 0.88 | 0.76 | 0.32 | 0.95 | 0.12 |
| 60-day | 0.82 | 0.78 | 0.55 | 0.92 | 0.08 |
| 120-day | 0.80 | 0.79 | 0.68 | 0.88 | 0.05 |
- Regime correlation table:
| Regime | Correlation | Days |
|---|---|---|
| All Days | 0.82 | 250 |
| Up Days | 0.75 | 132 |
| Down Days | 0.87 | 118 |
| High Vol (top 25%) | 0.90 | 63 |
| Large Drawdown (< -2%) | 0.93 | 28 |
-
Key insight: Highlight whether correlation increases during sell-offs (very common — "correlations go to 1 in a crisis"). This is critical for risk management.
-
Trend: Is correlation trending higher or lower recently vs. its historical average?
- 滚动相关性摘要表:
| 窗口 | 当前值 | 均值 | 最小值 | 最大值 | 标准差 |
|---|---|---|---|---|---|
| 20日 | 0.88 | 0.76 | 0.32 | 0.95 | 0.12 |
| 60日 | 0.82 | 0.78 | 0.55 | 0.92 | 0.08 |
| 120日 | 0.80 | 0.79 | 0.68 | 0.88 | 0.05 |
- 市场状态相关性表:
| 市场状态 | 相关系数 | 天数 |
|---|---|---|
| 所有交易日 | 0.82 | 250 |
| 上涨日 | 0.75 | 132 |
| 下跌日 | 0.87 | 118 |
| 高波动日(前25%) | 0.90 | 63 |
| 大幅下跌日(< -2%) | 0.93 | 28 |
-
关键洞察:重点突出下跌行情中相关性是否会上升(非常常见——“危机中相关性趋近于1”)。这对风险管理至关重要。
-
趋势:近期相关性相较于历史均值是呈上升还是下降趋势?
Step 3: Respond to the User
步骤3:响应用户
After running the appropriate sub-skill, present results clearly:
运行对应的子技能后,清晰展示结果:
Always include
必须包含的信息
- The lookback period and data interval used
- The number of observations (trading days)
- Any tickers dropped due to insufficient data
- 使用的回溯周期和数据间隔
- 观测样本量(交易日数量)
- 因数据不足而被剔除的股票代码
Always caveat
必须添加的免责声明
- Correlation is not causation — co-movement does not imply a causal link
- Past correlation does not guarantee future correlation — regimes shift
- Short lookback windows produce noisy estimates; longer windows smooth but may miss regime changes
- 相关性不等于因果关系——联动性不意味着存在因果联系
- 历史相关性不代表未来表现——市场状态会发生变化
- 短回溯窗口会产生噪声较大的结果;长窗口会平滑数据,但可能无法捕捉市场状态的变化
Practical applications (mention when relevant)
实际应用场景(相关时提及)
- Sympathy plays: Stocks likely to follow a peer's earnings/news move
- Pair trading: High-correlation pairs where the spread has diverged from its mean
- Portfolio diversification: Finding low-correlation assets to reduce risk
- Hedging: Identifying inversely correlated instruments
- Sector rotation: Understanding which sectors move together
- Risk management: Correlation spikes during stress — diversification may fail when needed most
Important: Never recommend specific trades. Present data and let the user draw conclusions.
- 跟风交易:可能跟随同行财报/新闻涨跌的股票
- 配对交易:相关性高且价差偏离历史均值的配对标的
- 投资组合分散化:寻找低相关性资产以降低风险
- 对冲:识别反向联动的工具
- 行业轮动:了解哪些行业会联动涨跌
- 风险管理:压力环境下相关性会上升——分散化配置可能在最需要时失效
重要提示:绝不推荐具体交易。仅展示数据,由用户自行得出结论。
Reference Files
参考文件
- — Dynamic peer universe construction using yfinance Screener API
references/sector_universes.md
Read the reference file when you need to build a peer universe for a given ticker.
- — 使用yfinance筛选API动态构建同行标的池的方法
references/sector_universes.md
当你需要为给定股票代码构建同行标的池时,请阅读此参考文件。