stock-correlation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Stock Correlation Analysis Skill

股票相关性分析技能

Finds and analyzes correlated stocks using historical price data from Yahoo Finance via yfinance. Routes to specialized sub-skills based on user intent.
Important: This is for research and educational purposes only. Not financial advice. yfinance is not affiliated with Yahoo, Inc.

通过yfinance获取Yahoo Finance的历史价格数据,查找并分析关联股票。根据用户意图调用对应的专项子技能。
重要提示:本技能仅用于研究和教育目的,不构成投资建议。yfinance与雅虎公司(Yahoo, Inc.)无关联。

Step 1: Ensure Dependencies Are Available

步骤1:确保依赖项可用

Before running any code, install required packages if needed:
python
import subprocess, sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "yfinance", "pandas", "numpy"])
Always include this at the top of your script.

在运行任何代码前,若需要请安装所需依赖包:
python
import subprocess, sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "yfinance", "pandas", "numpy"])
请始终在脚本顶部添加上述代码。

Step 2: Route to the Correct Sub-Skill

步骤2:调用对应的专项子技能

Classify the user's request and jump to the matching sub-skill section below.
User RequestRoute ToExamples
Single ticker, wants to find related stocksSub-Skill A: Co-movement Discovery"what correlates with NVDA", "find stocks related to AMD", "sympathy plays for TSLA"
Two or more specific tickers, wants relationship detailsSub-Skill B: Return Correlation"correlation between AMD and NVDA", "how do LITE and COHR move together", "compare AAPL vs MSFT"
Group of tickers, wants structure/groupingSub-Skill C: Sector Clustering"correlation matrix for FAANG", "cluster these semiconductor stocks", "sector peers for AMD"
Wants time-varying or conditional correlationSub-Skill D: Realized Correlation"rolling correlation AMD NVDA", "when NVDA drops what else drops", "how has correlation changed"
If ambiguous, default to Sub-Skill A (Co-movement Discovery) for single tickers, or Sub-Skill B (Return Correlation) for two tickers.
对用户的请求进行分类,跳转到下方对应的子技能章节。
用户请求类型跳转至示例
单个股票代码,希望找到关联标的子技能A:联动标的发现"与NVDA相关的标的有哪些"、"找到与AMD相关的股票"、"TSLA的跟风交易标的"
两个或多个特定股票代码,希望了解其关系细节子技能B:收益相关性分析"AMD与NVDA的相关性"、"LITE和COHR的联动情况"、"对比AAPL与MSFT的走势相关性"
一组股票代码,希望了解其相关性结构与聚类子技能C:行业聚类分析"FAANG的相关系数矩阵"、"对这些半导体股票进行聚类"、"AMD的行业同行股"
希望了解随时间变化或不同市场环境下的相关性子技能D:已实现相关性分析"AMD与NVDA的滚动相关性"、"当NVDA下跌时还有哪些会下跌"、"相关性如何随时间变化"
若请求存在歧义,单个股票代码默认使用子技能A(联动标的发现),两个股票代码默认使用子技能B(收益相关性分析)

Defaults for all sub-skills

所有子技能的默认参数

ParameterDefault
Lookback period
1y
(1 year)
Data interval
1d
(daily)
Correlation methodPearson
Minimum correlation threshold0.60
Number of resultsTop 10
Return typeDaily log returns
Rolling window60 trading days

参数默认值
回溯周期
1y
(1年)
数据间隔
1d
(日线)
相关性计算方法Pearson(皮尔逊相关系数)
最小相关性阈值0.60
结果数量前10个
收益类型日度对数收益
滚动窗口60个交易日

Sub-Skill A: Co-movement Discovery

子技能A:联动标的发现

Goal: Given a single ticker, find stocks that move with it.
目标:给定单个股票代码(ticker),找到与其联动的股票。

A1: Build the peer universe

A1:构建同行标的池

You need 15-30 candidates. Do not use hardcoded ticker lists — build the universe dynamically at runtime. See
references/sector_universes.md
for the full implementation. The approach:
  1. Screen same-industry stocks using
    yf.screen()
    +
    yf.EquityQuery
    to find stocks in the same industry as the target
  2. Broaden to sector if the industry screen returns fewer than 10 peers
  3. Add thematic/adjacent industries — read the target's
    longBusinessSummary
    and screen 1-2 related industries (e.g., a semiconductor company → also screen semiconductor equipment)
  4. Combine, deduplicate, remove target ticker
你需要15-30个候选标的。请勿使用硬编码的股票代码列表——需在运行时动态构建标的池。完整实现请参考
references/sector_universes.md
。具体方法:
  1. 筛选同行业股票:使用
    yf.screen()
    +
    yf.EquityQuery
    找到与目标标的同行业的股票
  2. 扩展至行业范围:若同行业筛选结果不足10个,则扩大至整个行业
  3. 添加主题/相邻行业标的:读取目标标的的
    longBusinessSummary
    ,筛选1-2个相关行业的标的(例如,半导体公司→同时筛选半导体设备行业标的)
  4. 合并、去重、移除目标标的自身

A2: Compute correlations

A2:计算相关性

python
import yfinance as yf
import pandas as pd
import numpy as np

def discover_comovement(target_ticker, peer_tickers, period="1y"):
    all_tickers = [target_ticker] + [t for t in peer_tickers if t != target_ticker]
    data = yf.download(all_tickers, period=period, auto_adjust=True, progress=False)

    # Extract close prices — yf.download returns MultiIndex (Price, Ticker) columns
    closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))

    # Log returns
    returns = np.log(closes / closes.shift(1)).dropna()
    corr_series = returns.corr()[target_ticker].drop(target_ticker, errors="ignore")

    # Rank by absolute correlation
    ranked = corr_series.abs().sort_values(ascending=False)

    result = pd.DataFrame({
        "Ticker": ranked.index,
        "Correlation": [round(corr_series[t], 4) for t in ranked.index],
    })
    return result, returns
python
import yfinance as yf
import pandas as pd
import numpy as np

def discover_comovement(target_ticker, peer_tickers, period="1y"):
    all_tickers = [target_ticker] + [t for t in peer_tickers if t != target_ticker]
    data = yf.download(all_tickers, period=period, auto_adjust=True, progress=False)

    # Extract close prices — yf.download returns MultiIndex (Price, Ticker) columns
    closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))

    # Log returns
    returns = np.log(closes / closes.shift(1)).dropna()
    corr_series = returns.corr()[target_ticker].drop(target_ticker, errors="ignore")

    # Rank by absolute correlation
    ranked = corr_series.abs().sort_values(ascending=False)

    result = pd.DataFrame({
        "Ticker": ranked.index,
        "Correlation": [round(corr_series[t], 4) for t in ranked.index],
    })
    return result, returns

A3: Present results

A3:展示结果

Show a ranked table with company names and sectors (fetch via
yf.Ticker(t).info.get("shortName")
):
RankTickerCompanyCorrelationWhy linked
1AMDAdvanced Micro Devices0.82Same industry — GPU/CPU
2AVGOBroadcom0.78AI infrastructure peer
Include:
  • Top 10 positively correlated stocks
  • Any notable negatively correlated stocks (potential hedges)
  • Brief explanation of why each might be linked (sector, supply chain, customer overlap)

展示带公司名称和行业的排名表格(通过
yf.Ticker(t).info.get("shortName")
获取):
排名股票代码(Ticker)公司名称相关系数关联原因
1AMDAdvanced Micro Devices0.82同行业——GPU/CPU厂商
2AVGOBroadcom0.78AI基础设施同行
需包含:
  • 前10个正相关度最高的股票
  • 任何值得关注的负相关股票(潜在对冲标的)
  • 每个标的关联原因的简要说明(行业、供应链、客户重叠等)

Sub-Skill B: Return Correlation

子技能B:收益相关性分析

Goal: Deep-dive into the relationship between two (or a few) specific tickers.
目标:深入分析两个(或少数几个)特定股票代码之间的关系。

B1: Download and compute

B1:下载数据并计算

python
import yfinance as yf
import pandas as pd
import numpy as np

def return_correlation(ticker_a, ticker_b, period="1y"):
    data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
    closes = data["Close"][[ticker_a, ticker_b]].dropna()

    returns = np.log(closes / closes.shift(1)).dropna()
    corr = returns[ticker_a].corr(returns[ticker_b])

    # Beta: how much does B move per unit move of A
    cov_matrix = returns.cov()
    beta = cov_matrix.loc[ticker_b, ticker_a] / cov_matrix.loc[ticker_a, ticker_a]

    # R-squared
    r_squared = corr ** 2

    # Rolling 60-day correlation for stability
    rolling_corr = returns[ticker_a].rolling(60).corr(returns[ticker_b])

    # Spread (log price ratio) for mean-reversion
    spread = np.log(closes[ticker_a] / closes[ticker_b])
    spread_z = (spread - spread.mean()) / spread.std()

    return {
        "correlation": round(corr, 4),
        "beta": round(beta, 4),
        "r_squared": round(r_squared, 4),
        "rolling_corr_mean": round(rolling_corr.mean(), 4),
        "rolling_corr_std": round(rolling_corr.std(), 4),
        "rolling_corr_min": round(rolling_corr.min(), 4),
        "rolling_corr_max": round(rolling_corr.max(), 4),
        "spread_z_current": round(spread_z.iloc[-1], 4),
        "observations": len(returns),
    }
python
import yfinance as yf
import pandas as pd
import numpy as np

def return_correlation(ticker_a, ticker_b, period="1y"):
    data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
    closes = data["Close"][[ticker_a, ticker_b]].dropna()

    returns = np.log(closes / closes.shift(1)).dropna()
    corr = returns[ticker_a].corr(returns[ticker_b])

    # Beta: how much does B move per unit move of A
    cov_matrix = returns.cov()
    beta = cov_matrix.loc[ticker_b, ticker_a] / cov_matrix.loc[ticker_a, ticker_a]

    # R-squared
    r_squared = corr ** 2

    # Rolling 60-day correlation for stability
    rolling_corr = returns[ticker_a].rolling(60).corr(returns[ticker_b])

    # Spread (log price ratio) for mean-reversion
    spread = np.log(closes[ticker_a] / closes[ticker_b])
    spread_z = (spread - spread.mean()) / spread.std()

    return {
        "correlation": round(corr, 4),
        "beta": round(beta, 4),
        "r_squared": round(r_squared, 4),
        "rolling_corr_mean": round(rolling_corr.mean(), 4),
        "rolling_corr_std": round(rolling_corr.std(), 4),
        "rolling_corr_min": round(rolling_corr.min(), 4),
        "rolling_corr_max": round(rolling_corr.max(), 4),
        "spread_z_current": round(spread_z.iloc[-1], 4),
        "observations": len(returns),
    }

B2: Present results

B2:展示结果

Show a summary card:
MetricValue
Pearson Correlation0.82
Beta (B vs A)1.15
R-squared0.67
Rolling Corr (60d avg)0.80
Rolling Corr Range[0.55, 0.94]
Rolling Corr Std Dev0.08
Spread Z-Score (current)+1.2
Observations250
Interpretation guide:
  • Correlation > 0.80: Strong co-movement — these stocks are tightly linked
  • Correlation 0.50–0.80: Moderate — shared sector drivers but independent factors too
  • Correlation < 0.50: Weak — limited co-movement despite possible sector overlap
  • High rolling std: Unstable relationship — correlation varies significantly over time
  • Spread Z > |2|: Unusual divergence from historical relationship

展示摘要卡片:
指标数值
Pearson相关系数0.82
Beta(B vs A)1.15
R-squared0.67
60日滚动相关性均值0.80
滚动相关性范围[0.55, 0.94]
滚动相关性标准差0.08
当前价差Z值+1.2
观测样本量250
解读指南:
  • 相关系数>0.80:强联动性——这些股票走势高度绑定
  • 相关系数0.50–0.80:中等联动性——受共同行业驱动,但也存在独立影响因素
  • 相关系数<0.50:弱联动性——尽管可能同属一个行业,但联动性有限
  • 滚动相关性标准差高:关系不稳定——相关性随时间波动显著
  • 价差Z值> |2|:与历史关系出现异常偏离

Sub-Skill C: Sector Clustering

子技能C:行业聚类分析

Goal: Given a group of tickers, show the full correlation structure and identify clusters.
目标:给定一组股票代码,展示完整的相关性结构并识别聚类。

C1: Build the correlation matrix

C1:构建相关系数矩阵

python
import yfinance as yf
import pandas as pd
import numpy as np

def sector_clustering(tickers, period="1y"):
    data = yf.download(tickers, period=period, auto_adjust=True, progress=False)

    # yf.download returns MultiIndex (Price, Ticker) columns
    closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
    returns = np.log(closes / closes.shift(1)).dropna()
    corr_matrix = returns.corr()

    # Hierarchical clustering order
    from scipy.cluster.hierarchy import linkage, leaves_list
    from scipy.spatial.distance import squareform

    dist_matrix = 1 - corr_matrix.abs()
    np.fill_diagonal(dist_matrix.values, 0)
    condensed = squareform(dist_matrix)
    linkage_matrix = linkage(condensed, method="ward")
    order = leaves_list(linkage_matrix)
    ordered_tickers = [corr_matrix.columns[i] for i in order]

    # Reorder matrix
    clustered = corr_matrix.loc[ordered_tickers, ordered_tickers]

    return clustered, returns
Note: if
scipy
is not available, fall back to sorting by average correlation instead of hierarchical clustering.
python
import yfinance as yf
import pandas as pd
import numpy as np

def sector_clustering(tickers, period="1y"):
    data = yf.download(tickers, period=period, auto_adjust=True, progress=False)

    # yf.download returns MultiIndex (Price, Ticker) columns
    closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
    returns = np.log(closes / closes.shift(1)).dropna()
    corr_matrix = returns.corr()

    # Hierarchical clustering order
    from scipy.cluster.hierarchy import linkage, leaves_list
    from scipy.spatial.distance import squareform

    dist_matrix = 1 - corr_matrix.abs()
    np.fill_diagonal(dist_matrix.values, 0)
    condensed = squareform(dist_matrix)
    linkage_matrix = linkage(condensed, method="ward")
    order = leaves_list(linkage_matrix)
    ordered_tickers = [corr_matrix.columns[i] for i in order]

    # Reorder matrix
    clustered = corr_matrix.loc[ordered_tickers, ordered_tickers]

    return clustered, returns
注意:若
scipy
不可用,则退而求其次,按平均相关系数排序,而非使用层次聚类。

C2: Present results

C2:展示结果

  1. Full correlation matrix — formatted as a table. For more than 8 tickers, show as a heatmap description or highlight only the strongest/weakest pairs.
  2. Identified clusters — group tickers that have high intra-group correlation:
    • Cluster 1: [NVDA, AMD, AVGO] — avg intra-correlation 0.82
    • Cluster 2: [AAPL, MSFT] — avg intra-correlation 0.75
  3. Outliers — tickers with low average correlation to the group (potential diversifiers).
  4. Strongest pairs — top 5 highest-correlation pairs in the matrix.
  5. Weakest pairs — top 5 lowest/negative-correlation pairs (hedging candidates).

  1. 完整相关系数矩阵:以表格形式展示。若标的数量超过8个,则以热力图描述或仅突出最强/最弱配对。
  2. 识别出的聚类:将组内相关度高的标的分组:
    • 聚类1:[NVDA, AMD, AVGO] — 组内平均相关度0.82
    • 聚类2:[AAPL, MSFT] — 组内平均相关度0.75
  3. 异常值:与组内其他标的平均相关度低的标的(潜在分散化配置标的)。
  4. 最强配对:矩阵中相关度最高的前5对标的。
  5. 最弱配对:矩阵中相关度最低/负相关的前5对标的(对冲候选标的)。

Sub-Skill D: Realized Correlation

子技能D:已实现相关性分析

Goal: Show how correlation changes over time and under different market conditions.
目标:展示相关性如何随时间变化以及在不同市场环境下的表现。

D1: Rolling correlation

D1:滚动相关性

python
import yfinance as yf
import pandas as pd
import numpy as np

def realized_correlation(ticker_a, ticker_b, period="2y", windows=[20, 60, 120]):
    data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
    closes = data["Close"][[ticker_a, ticker_b]].dropna()

    returns = np.log(closes / closes.shift(1)).dropna()

    rolling = {}
    for w in windows:
        rolling[f"{w}d"] = returns[ticker_a].rolling(w).corr(returns[ticker_b])

    return rolling, returns
python
import yfinance as yf
import pandas as pd
import numpy as np

def realized_correlation(ticker_a, ticker_b, period="2y", windows=[20, 60, 120]):
    data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
    closes = data["Close"][[ticker_a, ticker_b]].dropna()

    returns = np.log(closes / closes.shift(1)).dropna()

    rolling = {}
    for w in windows:
        rolling[f"{w}d"] = returns[ticker_a].rolling(w).corr(returns[ticker_b])

    return rolling, returns

D2: Regime-conditional correlation

D2:基于市场状态的条件相关性

python
def regime_correlation(returns, ticker_a, ticker_b, condition_ticker=None):
    """Compare correlation across up/down/volatile regimes."""
    if condition_ticker is None:
        condition_ticker = ticker_a

    ret = returns[condition_ticker]

    regimes = {
        "All Days": pd.Series(True, index=returns.index),
        "Up Days (target > 0)": ret > 0,
        "Down Days (target < 0)": ret < 0,
        "High Vol (top 25%)": ret.abs() > ret.abs().quantile(0.75),
        "Low Vol (bottom 25%)": ret.abs() < ret.abs().quantile(0.25),
        "Large Drawdown (< -2%)": ret < -0.02,
    }

    results = {}
    for name, mask in regimes.items():
        subset = returns[mask]
        if len(subset) >= 20:
            results[name] = {
                "correlation": round(subset[ticker_a].corr(subset[ticker_b]), 4),
                "days": int(mask.sum()),
            }

    return results
python
def regime_correlation(returns, ticker_a, ticker_b, condition_ticker=None):
    """Compare correlation across up/down/volatile regimes."""
    if condition_ticker is None:
        condition_ticker = ticker_a

    ret = returns[condition_ticker]

    regimes = {
        "All Days": pd.Series(True, index=returns.index),
        "Up Days (target > 0)": ret > 0,
        "Down Days (target < 0)": ret < 0,
        "High Vol (top 25%)": ret.abs() > ret.abs().quantile(0.75),
        "Low Vol (bottom 25%)": ret.abs() < ret.abs().quantile(0.25),
        "Large Drawdown (< -2%)": ret < -0.02,
    }

    results = {}
    for name, mask in regimes.items():
        subset = returns[mask]
        if len(subset) >= 20:
            results[name] = {
                "correlation": round(subset[ticker_a].corr(subset[ticker_b]), 4),
                "days": int(mask.sum()),
            }

    return results

D3: Present results

D3:展示结果

  1. Rolling correlation summary table:
WindowCurrentMeanMinMaxStd
20-day0.880.760.320.950.12
60-day0.820.780.550.920.08
120-day0.800.790.680.880.05
  1. Regime correlation table:
RegimeCorrelationDays
All Days0.82250
Up Days0.75132
Down Days0.87118
High Vol (top 25%)0.9063
Large Drawdown (< -2%)0.9328
  1. Key insight: Highlight whether correlation increases during sell-offs (very common — "correlations go to 1 in a crisis"). This is critical for risk management.
  2. Trend: Is correlation trending higher or lower recently vs. its historical average?

  1. 滚动相关性摘要表
窗口当前值均值最小值最大值标准差
20日0.880.760.320.950.12
60日0.820.780.550.920.08
120日0.800.790.680.880.05
  1. 市场状态相关性表
市场状态相关系数天数
所有交易日0.82250
上涨日0.75132
下跌日0.87118
高波动日(前25%)0.9063
大幅下跌日(< -2%)0.9328
  1. 关键洞察:重点突出下跌行情中相关性是否会上升(非常常见——“危机中相关性趋近于1”)。这对风险管理至关重要。
  2. 趋势:近期相关性相较于历史均值是呈上升还是下降趋势?

Step 3: Respond to the User

步骤3:响应用户

After running the appropriate sub-skill, present results clearly:
运行对应的子技能后,清晰展示结果:

Always include

必须包含的信息

  • The lookback period and data interval used
  • The number of observations (trading days)
  • Any tickers dropped due to insufficient data
  • 使用的回溯周期数据间隔
  • 观测样本量(交易日数量)
  • 因数据不足而被剔除的股票代码

Always caveat

必须添加的免责声明

  • Correlation is not causation — co-movement does not imply a causal link
  • Past correlation does not guarantee future correlation — regimes shift
  • Short lookback windows produce noisy estimates; longer windows smooth but may miss regime changes
  • 相关性不等于因果关系——联动性不意味着存在因果联系
  • 历史相关性不代表未来表现——市场状态会发生变化
  • 短回溯窗口会产生噪声较大的结果;长窗口会平滑数据,但可能无法捕捉市场状态的变化

Practical applications (mention when relevant)

实际应用场景(相关时提及)

  • Sympathy plays: Stocks likely to follow a peer's earnings/news move
  • Pair trading: High-correlation pairs where the spread has diverged from its mean
  • Portfolio diversification: Finding low-correlation assets to reduce risk
  • Hedging: Identifying inversely correlated instruments
  • Sector rotation: Understanding which sectors move together
  • Risk management: Correlation spikes during stress — diversification may fail when needed most
Important: Never recommend specific trades. Present data and let the user draw conclusions.

  • 跟风交易:可能跟随同行财报/新闻涨跌的股票
  • 配对交易:相关性高且价差偏离历史均值的配对标的
  • 投资组合分散化:寻找低相关性资产以降低风险
  • 对冲:识别反向联动的工具
  • 行业轮动:了解哪些行业会联动涨跌
  • 风险管理:压力环境下相关性会上升——分散化配置可能在最需要时失效
重要提示:绝不推荐具体交易。仅展示数据,由用户自行得出结论。

Reference Files

参考文件

  • references/sector_universes.md
    — Dynamic peer universe construction using yfinance Screener API
Read the reference file when you need to build a peer universe for a given ticker.
  • references/sector_universes.md
    — 使用yfinance筛选API动态构建同行标的池的方法
当你需要为给定股票代码构建同行标的池时,请阅读此参考文件。