pygraphistry-ai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PyGraphistry AI

PyGraphistry AI

Doc routing (local + canonical)

文档路由(本地+标准)

  • First route with
    ../pygraphistry/references/pygraphistry-readthedocs-toc.md
    .
  • Use
    ../pygraphistry/references/pygraphistry-readthedocs-top-level.tsv
    for section-level shortcuts.
  • Only scan
    ../pygraphistry/references/pygraphistry-readthedocs-sitemap.xml
    when a needed page is missing.
  • Use one batched discovery read before deep-page reads; avoid
    cat *
    and serial micro-reads.
  • In user-facing answers, prefer canonical
    https://pygraphistry.readthedocs.io/en/latest/...
    links.
  • 首先使用
    ../pygraphistry/references/pygraphistry-readthedocs-toc.md
    进行路由。
  • 使用
    ../pygraphistry/references/pygraphistry-readthedocs-top-level.tsv
    获取章节级快捷方式。
  • 仅在所需页面缺失时扫描
    ../pygraphistry/references/pygraphistry-readthedocs-sitemap.xml
  • 在深度页面读取前先进行一次批量发现读取;避免使用
    cat *
    和串行微读取。
  • 在面向用户的回答中,优先使用标准链接
    https://pygraphistry.readthedocs.io/en/latest/...

Typical workflow

典型工作流

  1. Build graph from nodes/edges.
  2. Run feature/embedding method (
    umap
    ,
    embed
    , optional
    dbscan
    ).
  3. Inspect derived columns/features and visualize.
  4. Iterate on feature columns and sampling strategy.
  1. 从节点/边构建图。
  2. 运行特征/嵌入方法(
    umap
    embed
    ,可选
    dbscan
    )。
  3. 检查派生列/特征并进行可视化。
  4. 迭代优化特征列和采样策略。

Baseline examples

基准示例

python
undefined
python
undefined

Similarity embedding / projection

Similarity embedding / projection

g2 = graphistry.nodes(df, 'id').umap(X=['f1', 'f2', 'f3']) g2.plot()

```python
g2 = graphistry.nodes(df, 'id').umap(X=['f1', 'f2', 'f3']) g2.plot()

```python

Fit/transform flow for consistent projection on new batches

Fit/transform flow for consistent projection on new batches

g_train = graphistry.nodes(df_train, 'id').umap(X=['f1', 'f2']) g_batch = g_train.transform_umap(df_batch, return_graph=True) g_batch.plot()

```python
g_train = graphistry.nodes(df_train, 'id').umap(X=['f1', 'f2']) g_batch = g_train.transform_umap(df_batch, return_graph=True) g_batch.plot()

```python

Semantic search over embedded features

Semantic search over embedded features

g2 = graphistry.nodes(df, 'id').umap(X=['text_col']) results_df, query_vector = g2.search('suspicious login pattern')

```python
g2 = graphistry.nodes(df, 'id').umap(X=['text_col']) results_df, query_vector = g2.search('suspicious login pattern')

```python

Text-first workflow: featurize then search/cluster

Text-first workflow: featurize then search/cluster

g2 = graphistry.nodes(df, 'id').featurize(kind='nodes', X=['title', 'body']).umap(kind='nodes').dbscan() hits, qv = g2.search('credential stuffing campaign')

```python
g2 = graphistry.nodes(df, 'id').featurize(kind='nodes', X=['title', 'body']).umap(kind='nodes').dbscan() hits, qv = g2.search('credential stuffing campaign')

```python

Precomputed embedding columns

Precomputed embedding columns

embedding_cols = [c for c in df.columns if c.startswith('emb_')] g2 = graphistry.nodes(df, 'id').umap(X=embedding_cols) g_new = g2.transform_umap(df_new, return_graph=True)
undefined
embedding_cols = [c for c in df.columns if c.startswith('emb_')] g2 = graphistry.nodes(df, 'id').umap(X=embedding_cols) g_new = g2.transform_umap(df_new, return_graph=True)
undefined

Practical guardrails

实用注意事项

  • Start with small/representative samples before full runs.
  • Keep explicit feature lists (
    X=...
    ) for reproducibility.
  • Track engine/dataframe type for CPU vs GPU behavior.
  • For anomaly workflows, document thresholds and false-positive assumptions.
  • For graph ML tasks, route deeper model workflows to RGCN/link-prediction references.
  • For text workflows, prefer
    featurize(...).umap(...).search(...)
    when queries are natural language.
  • If users already have embeddings, reuse them via explicit embedding column lists (
    X=[...]
    ) before recomputing.
  • When user asks for a concise workflow snippet, prefer one short code block and avoid long narrative wrappers.
  • 在全量运行前先从小型/代表性样本开始。
  • 保留明确的特征列表(
    X=...
    )以确保可复现性。
  • 跟踪引擎/数据框类型以区分CPU与GPU行为。
  • 对于异常检测工作流,记录阈值和假阳性假设。
  • 对于图机器学习任务,将深度模型工作流引导至RGCN/链接预测参考文档。
  • 对于文本工作流,当查询为自然语言时,优先使用
    featurize(...).umap(...).search(...)
    流程。
  • 如果用户已有嵌入向量,通过明确的嵌入列列表(
    X=[...]
    )复用它们,而非重新计算。
  • 当用户请求简洁的工作流代码片段时,优先提供单个简短代码块,避免冗长的叙述性内容。

Canonical docs

标准文档