streamlit

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Streamlit Data Application Skill

Streamlit 数据应用技能

Build beautiful, interactive data applications with pure Python. Transform data scripts into shareable web apps in minutes with widgets, charts, and layouts.

使用纯Python构建美观、交互式的数据应用。借助小部件、图表和布局，只需几分钟即可将数据脚本转换为可共享的Web应用。

When to Use This Skill

何时使用该技能

USE Streamlit when:

适合使用Streamlit的场景：

Rapid prototyping - Need to build a data app quickly
Internal tools - Creating tools for your team
Data exploration - Interactive exploration of datasets
Demo applications - Showcasing data science projects
ML model demos - Building interfaces for model inference
Simple dashboards - Quick insights without complex setup
Python-only development - No JavaScript/frontend knowledge required

快速原型开发 - 需要快速构建数据应用
内部工具 - 为团队创建工具
数据探索 - 交互式探索数据集
演示应用 - 展示数据科学项目
ML模型演示 - 为模型推理构建界面
简单仪表板 - 无需复杂设置即可快速获取洞察
纯Python开发 - 无需JavaScript/前端知识

DON'T USE Streamlit when:

不适合使用Streamlit的场景：

Complex interactivity - Need fine-grained callback control (use Dash)
Enterprise deployment - Require advanced authentication/scaling (use Dash Enterprise)
Custom components - Heavy custom JavaScript requirements
High-traffic production - Thousands of concurrent users
Real-time streaming - Sub-second update requirements

复杂交互 - 需要细粒度回调控制（使用Dash）
企业级部署 - 需要高级认证/扩展能力（使用Dash Enterprise）
自定义组件 - 有大量自定义JavaScript需求
高流量生产环境 - 数千并发用户
实时流处理 - 亚秒级更新需求

Prerequisites

前提条件

bash

undefined

bash

undefined

Basic installation

基础安装

pip install streamlit

With common extras

安装常用扩展

pip install streamlit plotly pandas polars

Using uv (recommended)

使用uv（推荐）

uv pip install streamlit plotly pandas polars altair

Verify installation

验证安装

streamlit hello

undefined

streamlit hello

undefined

Core Capabilities

核心功能

1. Basic Application Structure

1. 基础应用结构

Minimal App (app.py):

python

import streamlit as st
import pandas as pd
import polars as pl

最小应用（app.py）：

python

import streamlit as st
import pandas as pd
import polars as pl

Page configuration (must be first Streamlit command)

页面配置（必须是第一个Streamlit命令）

st.set_page_config( page_title="My Data App", page_icon="📊", layout="wide", initial_sidebar_state="expanded" )

Title and header

标题和页眉

st.title("My Data Application") st.header("Welcome to the Dashboard") st.subheader("Data Analysis Section")

Text elements

文本元素

st.text("This is plain text") st.markdown("Bold and italic text with links") st.caption("This is a caption for additional context") st.code("print('Hello, Streamlit!')", language="python")

Display data

展示数据

df = pd.DataFrame({ "Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "City": ["NYC", "LA", "Chicago"] })

st.dataframe(df) # Interactive table st.table(df) # Static table st.json({"key": "value", "list": [1, 2, 3]})

df = pd.DataFrame({ "Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "City": ["NYC", "LA", "Chicago"] })

st.dataframe(df) # 交互式表格 st.table(df) # 静态表格 st.json({"key": "value", "list": [1, 2, 3]})

Metrics

指标卡片

col1, col2, col3 = st.columns(3) col1.metric("Revenue", "$1.2M", "+12%") col2.metric("Users", "10,234", "-2%") col3.metric("Conversion", "3.2%", "+0.5%")


**Run the app:**
```bash
streamlit run app.py

col1, col2, col3 = st.columns(3) col1.metric("Revenue", "$1.2M", "+12%") col2.metric("Users", "10,234", "-2%") col3.metric("Conversion", "3.2%", "+0.5%")


**运行应用：**
```bash
streamlit run app.py

2. Widgets and User Input

2. 小部件与用户输入

Input Widgets:

python

import streamlit as st
from datetime import datetime, date

输入小部件：

python

import streamlit as st
from datetime import datetime, date

Text inputs

文本输入

name = st.text_input("Enter your name", value="User") bio = st.text_area("Tell us about yourself", height=100) password = st.text_input("Password", type="password")

Numeric inputs

数值输入

age = st.number_input("Age", min_value=0, max_value=120, value=25, step=1) price = st.slider("Price Range", 0.0, 100.0, (25.0, 75.0)) # Range slider rating = st.slider("Rating", 1, 5, 3)

age = st.number_input("Age", min_value=0, max_value=120, value=25, step=1) price = st.slider("Price Range", 0.0, 100.0, (25.0, 75.0)) # 范围滑块 rating = st.slider("Rating", 1, 5, 3)

Selection widgets

选择小部件

option = st.selectbox("Choose an option", ["Option A", "Option B", "Option C"]) options = st.multiselect("Select multiple", ["Red", "Green", "Blue"], default=["Red"]) radio_choice = st.radio("Pick one", ["Small", "Medium", "Large"], horizontal=True)

Boolean inputs

布尔输入

agree = st.checkbox("I agree to the terms") toggle = st.toggle("Enable feature")

Date and time

日期和时间

selected_date = st.date_input("Select a date", value=date.today()) date_range = st.date_input( "Date range", value=(date(2025, 1, 1), date.today()), format="YYYY-MM-DD" ) selected_time = st.time_input("Select a time")

File upload

文件上传

uploaded_file = st.file_uploader("Upload a CSV file", type=["csv", "xlsx"]) if uploaded_file is not None: df = pd.read_csv(uploaded_file) st.write(f"Loaded {len(df)} rows")

Color picker

颜色选择器

color = st.color_picker("Pick a color", "#00FF00")

Buttons

按钮

if st.button("Click me"): st.write("Button clicked!")

Download button

下载按钮

@st.cache_data def get_data(): return pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})

csv = get_data().to_csv(index=False) st.download_button( label="Download CSV", data=csv, file_name="data.csv", mime="text/csv" )

undefined

@st.cache_data def get_data(): return pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})

csv = get_data().to_csv(index=False) st.download_button( label="Download CSV", data=csv, file_name="data.csv", mime="text/csv" )

undefined

3. Layout and Organization

3. 布局与组织

Columns:

python

import streamlit as st

列布局：

python

import streamlit as st

Equal columns

等宽列

col1, col2, col3 = st.columns(3)

with col1: st.header("Column 1") st.write("Content for column 1")

with col2: st.header("Column 2") st.metric("Metric", "100")

with col3: st.header("Column 3") st.button("Action")

col1, col2, col3 = st.columns(3)

with col1: st.header("Column 1") st.write("Content for column 1")

with col2: st.header("Column 2") st.metric("Metric", "100")

with col3: st.header("Column 3") st.button("Action")

Unequal columns

不等宽列

left, right = st.columns([2, 1]) # 2:1 ratio

with left: st.write("Wider column")

with right: st.write("Narrower column")


**Sidebar:**
```python
import streamlit as st

left, right = st.columns([2, 1]) # 2:1比例

with left: st.write("Wider column")

with right: st.write("Narrower column")


**侧边栏：**
```python
import streamlit as st

Sidebar content

侧边栏内容

st.sidebar.title("Navigation") st.sidebar.header("Filters")

Sidebar widgets

侧边栏小部件

category = st.sidebar.selectbox("Category", ["All", "A", "B", "C"]) min_value = st.sidebar.slider("Minimum Value", 0, 100, 25) show_raw = st.sidebar.checkbox("Show raw data")

Using 'with' syntax

使用'with'语法

with st.sidebar: st.header("Settings") theme = st.radio("Theme", ["Light", "Dark"]) st.divider() st.caption("App v1.0.0")


**Tabs:**
```python
import streamlit as st

tab1, tab2, tab3 = st.tabs(["📈 Chart", "📊 Data", "⚙️ Settings"])

with tab1:
    st.header("Chart View")
    # Add chart here

with tab2:
    st.header("Data View")
    # Add dataframe here

with tab3:
    st.header("Settings")
    # Add settings here

Expanders and Containers:

python

import streamlit as st

with st.sidebar: st.header("Settings") theme = st.radio("Theme", ["Light", "Dark"]) st.divider() st.caption("App v1.0.0")


**标签页：**
```python
import streamlit as st

tab1, tab2, tab3 = st.tabs(["📈 Chart", "📊 Data", "⚙️ Settings"])

with tab1:
    st.header("Chart View")
    # 在此处添加图表

with tab2:
    st.header("Data View")
    # 在此处添加数据框

with tab3:
    st.header("Settings")
    # 在此处添加设置

展开面板与容器：

python

import streamlit as st

Expander (collapsible section)

展开面板（可折叠区域）

with st.expander("Click to expand"): st.write("Hidden content revealed!") st.code("print('Hello')")

Container (grouping elements)

容器（元素分组）

with st.container(): st.write("This is inside a container") col1, col2 = st.columns(2) col1.write("Left") col2.write("Right")

Container with border

带边框的容器

with st.container(border=True): st.write("Content with border")

Empty placeholder (for dynamic updates)

空占位符（用于动态更新）

placeholder = st.empty() placeholder.text("Initial text")

Later: placeholder.text("Updated text")

后续更新：placeholder.text("Updated text")

undefined

undefined

4. Data Visualization

4. 数据可视化

Plotly Integration:

python

import streamlit as st
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

Plotly集成：

python

import streamlit as st
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

Sample data

示例数据

df = pd.DataFrame({ "date": pd.date_range("2025-01-01", periods=100), "value": [i + (i % 7) * 5 for i in range(100)], "category": ["A", "B", "C", "D"] * 25 })

Plotly Express charts

Plotly Express图表

fig = px.line(df, x="date", y="value", color="category", title="Time Series") st.plotly_chart(fig, use_container_width=True)

Scatter plot

散点图

fig_scatter = px.scatter( df, x="date", y="value", color="category", size="value", hover_data=["category"] ) st.plotly_chart(fig_scatter, use_container_width=True)

Bar chart

柱状图

category_totals = df.groupby("category")["value"].sum().reset_index() fig_bar = px.bar(category_totals, x="category", y="value", title="Category Totals") st.plotly_chart(fig_bar, use_container_width=True)

Graph Objects for more control

更灵活的Graph Objects

fig_go = go.Figure() fig_go.add_trace(go.Scatter( x=df["date"], y=df["value"], mode="lines+markers", name="Values" )) fig_go.update_layout(title="Custom Plotly Chart", hovermode="x unified") st.plotly_chart(fig_go, use_container_width=True)


**Built-in Charts:**
```python
import streamlit as st
import pandas as pd
import numpy as np


**内置图表：**
```python
import streamlit as st
import pandas as pd
import numpy as np

Sample data

示例数据

chart_data = pd.DataFrame( np.random.randn(20, 3), columns=["A", "B", "C"] )

Simple line chart

简单折线图

st.line_chart(chart_data)

Area chart

面积图

st.area_chart(chart_data)

Bar chart

柱状图

st.bar_chart(chart_data)

Scatter chart (Streamlit 1.26+)

散点图（Streamlit 1.26+）

scatter_data = pd.DataFrame({ "x": np.random.randn(100), "y": np.random.randn(100), "size": np.random.rand(100) * 100 }) st.scatter_chart(scatter_data, x="x", y="y", size="size")

Map

地图

map_data = pd.DataFrame({ "lat": np.random.randn(100) / 50 + 37.76, "lon": np.random.randn(100) / 50 - 122.4 }) st.map(map_data)


**Matplotlib Integration:**
```python
import streamlit as st
import matplotlib.pyplot as plt
import numpy as np

map_data = pd.DataFrame({ "lat": np.random.randn(100) / 50 + 37.76, "lon": np.random.randn(100) / 50 - 122.4 }) st.map(map_data)


**Matplotlib集成：**
```python
import streamlit as st
import matplotlib.pyplot as plt
import numpy as np

Create matplotlib figure

创建Matplotlib图表

fig, ax = plt.subplots(figsize=(10, 6)) x = np.linspace(0, 10, 100) ax.plot(x, np.sin(x), label="sin(x)") ax.plot(x, np.cos(x), label="cos(x)") ax.legend() ax.set_title("Matplotlib Chart")

Display in Streamlit

在Streamlit中展示

st.pyplot(fig)

undefined

st.pyplot(fig)

undefined

5. Caching for Performance

5. 缓存优化性能

Cache Data (for expensive data operations):

python

import streamlit as st
import pandas as pd
import polars as pl
import time

@st.cache_data
def load_data(file_path: str) -> pd.DataFrame:
    """Load and cache data. Cache key: file_path."""
    time.sleep(2)  # Simulate slow load
    return pd.read_csv(file_path)

@st.cache_data(ttl=3600)  # Cache expires after 1 hour
def fetch_api_data(endpoint: str) -> dict:
    """Fetch data from API with time-based cache."""
    import requests
    response = requests.get(endpoint)
    return response.json()

@st.cache_data(show_spinner="Loading data...")
def load_with_spinner(path: str) -> pl.DataFrame:
    """Show custom spinner while loading."""
    return pl.read_parquet(path)

缓存数据（用于耗时的数据操作）：

python

import streamlit as st
import pandas as pd
import polars as pl
import time

@st.cache_data
def load_data(file_path: str) -> pd.DataFrame:
    """加载并缓存数据。缓存键：file_path。"""
    time.sleep(2)  # 模拟慢速加载
    return pd.read_csv(file_path)

@st.cache_data(ttl=3600)  # 缓存1小时后过期
def fetch_api_data(endpoint: str) -> dict:
    """从API获取数据并设置时间缓存。"""
    import requests
    response = requests.get(endpoint)
    return response.json()

@st.cache_data(show_spinner="Loading data...")
def load_with_spinner(path: str) -> pl.DataFrame:
    """加载时显示自定义加载动画。"""
    return pl.read_parquet(path)

Using cached functions

使用缓存函数

df = load_data("data/sales.csv") # First call: slow df = load_data("data/sales.csv") # Second call: instant (cached)

df = load_data("data/sales.csv") # 第一次调用：慢速 df = load_data("data/sales.csv") # 第二次调用：即时（已缓存）

Clear cache programmatically

程序化清除缓存

if st.button("Clear cache"): st.cache_data.clear()


**Cache Resources (for global resources):**
```python
import streamlit as st
from sqlalchemy import create_engine

@st.cache_resource
def get_database_connection():
    """Cache database connection (singleton pattern)."""
    return create_engine("postgresql://user:pass@localhost/db")

@st.cache_resource
def load_ml_model():
    """Cache ML model (loaded once per session)."""
    import joblib
    return joblib.load("model.pkl")

if st.button("Clear cache"): st.cache_data.clear()


**缓存资源（用于全局资源）：**
```python
import streamlit as st
from sqlalchemy import create_engine

@st.cache_resource
def get_database_connection():
    """缓存数据库连接（单例模式）。"""
    return create_engine("postgresql://user:pass@localhost/db")

@st.cache_resource
def load_ml_model():
    """缓存ML模型（每个会话仅加载一次）。"""
    import joblib
    return joblib.load("model.pkl")

Use cached resources

使用缓存资源

engine = get_database_connection() model = load_ml_model()

undefined

engine = get_database_connection() model = load_ml_model()

undefined

6. Session State

6. 会话状态

Managing State:

python

import streamlit as st

状态管理：

python

import streamlit as st

Initialize state

初始化状态

if "counter" not in st.session_state: st.session_state.counter = 0

if "messages" not in st.session_state: st.session_state.messages = []

if "counter" not in st.session_state: st.session_state.counter = 0

if "messages" not in st.session_state: st.session_state.messages = []

Display current state

显示当前状态

st.write(f"Counter: {st.session_state.counter}")

Update state with buttons

使用按钮更新状态

col1, col2, col3 = st.columns(3)

if col1.button("Increment"): st.session_state.counter += 1 st.rerun()

if col2.button("Decrement"): st.session_state.counter -= 1 st.rerun()

if col3.button("Reset"): st.session_state.counter = 0 st.rerun()

col1, col2, col3 = st.columns(3)

if col1.button("Increment"): st.session_state.counter += 1 st.rerun()

if col2.button("Decrement"): st.session_state.counter -= 1 st.rerun()

if col3.button("Reset"): st.session_state.counter = 0 st.rerun()

State with widgets

与小部件结合的状态

st.text_input("Name", key="user_name") st.write(f"Hello, {st.session_state.user_name}!")

State callback

状态回调

def on_change(): st.session_state.processed = st.session_state.raw_input.upper()

st.text_input("Raw input", key="raw_input", on_change=on_change) if "processed" in st.session_state: st.write(f"Processed: {st.session_state.processed}")


**Form State:**
```python
import streamlit as st

def on_change(): st.session_state.processed = st.session_state.raw_input.upper()

st.text_input("Raw input", key="raw_input", on_change=on_change) if "processed" in st.session_state: st.write(f"Processed: {st.session_state.processed}")


**表单状态：**
```python
import streamlit as st

Forms prevent rerunning on every widget change

表单可避免每次小部件交互都重新运行

with st.form("my_form"): st.write("Submit all at once:") name = st.text_input("Name") age = st.number_input("Age", min_value=0, max_value=120) color = st.selectbox("Favorite color", ["Red", "Green", "Blue"])

# Every form needs a submit button
submitted = st.form_submit_button("Submit")

if submitted:
    st.success(f"Thanks {name}! You're {age} and like {color}.")

undefined

# 每个表单都需要一个提交按钮
submitted = st.form_submit_button("Submit")

if submitted:
    st.success(f"Thanks {name}! You're {age} and like {color}.")

undefined

7. Multi-Page Applications

7. 多页面应用

Directory Structure:

my_app/
├── app.py              # Main entry point (optional)
├── pages/
│   ├── 1_📊_Dashboard.py
│   ├── 2_📈_Analytics.py
│   └── 3_⚙️_Settings.py
└── utils/
    └── helpers.py

Main App (app.py):

python

import streamlit as st

st.set_page_config(
    page_title="Multi-Page App",
    page_icon="🏠",
    layout="wide"
)

st.title("Welcome to My App")
st.write("Use the sidebar to navigate between pages.")

目录结构：

my_app/
├── app.py              # 主入口（可选）
├── pages/
│   ├── 1_📊_Dashboard.py
│   ├── 2_📈_Analytics.py
│   └── 3_⚙️_Settings.py
└── utils/
    └── helpers.py

主应用（app.py）：

python

import streamlit as st

st.set_page_config(
    page_title="Multi-Page App",
    page_icon="🏠",
    layout="wide"
)

st.title("Welcome to My App")
st.write("Use the sidebar to navigate between pages.")

Shared state initialization

共享状态初始化

if "user" not in st.session_state: st.session_state.user = None


**Page 1 (pages/1_Dashboard.py):**
```python
import streamlit as st

st.set_page_config(page_title="Dashboard", page_icon="📊")

st.title("📊 Dashboard")
st.write("This is the dashboard page")

if "user" not in st.session_state: st.session_state.user = None


**页面1（pages/1_Dashboard.py）：**
```python
import streamlit as st

st.set_page_config(page_title="Dashboard", page_icon="📊")

st.title("📊 Dashboard")
st.write("This is the dashboard page")

Access shared state

访问共享状态

if st.session_state.get("user"): st.write(f"Welcome back, {st.session_state.user}!")


**Page 2 (pages/2_Analytics.py):**
```python
import streamlit as st

st.set_page_config(page_title="Analytics", page_icon="📈")

st.title("📈 Analytics")
st.write("This is the analytics page")

if st.session_state.get("user"): st.write(f"Welcome back, {st.session_state.user}!")


**页面2（pages/2_Analytics.py）：**
```python
import streamlit as st

st.set_page_config(page_title="Analytics", page_icon="📈")

st.title("📈 Analytics")
st.write("This is the analytics page")

Add analytics content

添加分析内容

undefined

undefined

8. Advanced Features

8. 高级功能

Status and Progress:

python

import streamlit as st
import time

状态与进度：

python

import streamlit as st
import time

Progress bar

进度条

progress = st.progress(0, text="Processing...") for i in range(100): time.sleep(0.01) progress.progress(i + 1, text=f"Processing... {i+1}%")

Spinner

加载动画

with st.spinner("Loading data..."): time.sleep(2) st.success("Done!")

Status messages

状态消息

st.success("Operation successful!") st.info("This is informational") st.warning("Warning: Check your inputs") st.error("An error occurred") st.exception(ValueError("Example exception"))

Toast notifications

提示通知

st.toast("Data saved!", icon="✅")

Balloons and snow

气球和雪花效果

st.balloons() st.snow()


**Chat Interface:**
```python
import streamlit as st
import time

st.title("Chat Demo")

st.balloons() st.snow()


**聊天界面：**
```python
import streamlit as st
import time

st.title("Chat Demo")

Initialize chat history

初始化聊天历史

if "messages" not in st.session_state: st.session_state.messages = []

Display chat history

显示聊天历史

for message in st.session_state.messages: with st.chat_message(message["role"]): st.markdown(message["content"])

Chat input

聊天输入

if prompt := st.chat_input("What's on your mind?"): # Add user message st.session_state.messages.append({"role": "user", "content": prompt}) with st.chat_message("user"): st.markdown(prompt)

# Generate response
with st.chat_message("assistant"):
    response = f"You said: {prompt}"
    st.markdown(response)
st.session_state.messages.append({"role": "assistant", "content": response})


**Data Editor:**
```python
import streamlit as st
import pandas as pd

if prompt := st.chat_input("What's on your mind?"): # 添加用户消息 st.session_state.messages.append({"role": "user", "content": prompt}) with st.chat_message("user"): st.markdown(prompt)

# 生成回复
with st.chat_message("assistant"):
    response = f"You said: {prompt}"
    st.markdown(response)
st.session_state.messages.append({"role": "assistant", "content": response})


**数据编辑器：**
```python
import streamlit as st
import pandas as pd

Editable dataframe

可编辑的数据框

df = pd.DataFrame({ "Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "Active": [True, False, True] })

edited_df = st.data_editor( df, num_rows="dynamic", # Allow adding/deleting rows column_config={ "Name": st.column_config.TextColumn("Name", required=True), "Age": st.column_config.NumberColumn("Age", min_value=0, max_value=120), "Active": st.column_config.CheckboxColumn("Active") } )

if st.button("Save changes"): st.write("Saved:", edited_df)

undefined

df = pd.DataFrame({ "Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "Active": [True, False, True] })

edited_df = st.data_editor( df, num_rows="dynamic", # 允许添加/删除行 column_config={ "Name": st.column_config.TextColumn("Name", required=True), "Age": st.column_config.NumberColumn("Age", min_value=0, max_value=120), "Active": st.column_config.CheckboxColumn("Active") } )

if st.button("Save changes"): st.write("Saved:", edited_df)

undefined

Complete Examples

完整示例

Example 1: Sales Dashboard

示例1：销售仪表板

python

import streamlit as st
import pandas as pd
import polars as pl
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta

python

import streamlit as st
import pandas as pd
import polars as pl
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta

Page config

页面配置

st.set_page_config( page_title="Sales Dashboard", page_icon="📊", layout="wide" )

Custom CSS

自定义CSS

st.markdown("""

""", unsafe_allow_html=True)

st.markdown("""

""", unsafe_allow_html=True)

Title

标题

st.title("📊 Sales Analytics Dashboard")

st.title("📊 销售分析仪表板")

Sidebar filters

侧边栏过滤器

st.sidebar.header("Filters")

st.sidebar.header("过滤器")

Date range filter

日期范围过滤器

date_range = st.sidebar.date_input( "Date Range", value=(datetime.now() - timedelta(days=30), datetime.now()), format="YYYY-MM-DD" )

date_range = st.sidebar.date_input( "日期范围", value=(datetime.now() - timedelta(days=30), datetime.now()), format="YYYY-MM-DD" )

Category filter

类别过滤器

categories = st.sidebar.multiselect( "Categories", options=["Electronics", "Clothing", "Food", "Home", "Sports"], default=["Electronics", "Clothing", "Food"] )

categories = st.sidebar.multiselect( "类别", options=["电子产品", "服装", "食品", "家居", "运动"], default=["电子产品", "服装", "食品"] )

Region filter

区域过滤器

regions = st.sidebar.multiselect( "Regions", options=["North", "South", "East", "West"], default=["North", "South", "East", "West"] )

regions = st.sidebar.multiselect( "区域", options=["北部", "南部", "东部", "西部"], default=["北部", "南部", "东部", "西部"] )

Load and filter data

加载并过滤数据

@st.cache_data def load_sales_data(): """Generate sample sales data.""" import numpy as np np.random.seed(42)

dates = pd.date_range(start="2024-01-01", end="2025-12-31", freq="D")
n = len(dates) * 10  # Multiple records per day

return pd.DataFrame({
    "date": np.random.choice(dates, n),
    "category": np.random.choice(
        ["Electronics", "Clothing", "Food", "Home", "Sports"], n
    ),
    "region": np.random.choice(["North", "South", "East", "West"], n),
    "revenue": np.random.uniform(100, 5000, n),
    "units": np.random.randint(1, 50, n),
    "customer_id": np.random.randint(1000, 9999, n)
})

@st.cache_data def load_sales_data(): """生成示例销售数据。""" import numpy as np np.random.seed(42)

dates = pd.date_range(start="2024-01-01", end="2025-12-31", freq="D")
n = len(dates) * 10  # 每天多条记录

return pd.DataFrame({
    "date": np.random.choice(dates, n),
    "category": np.random.choice(
        ["电子产品", "服装", "食品", "家居", "运动"], n
    ),
    "region": np.random.choice(["北部", "南部", "东部", "西部"], n),
    "revenue": np.random.uniform(100, 5000, n),
    "units": np.random.randint(1, 50, n),
    "customer_id": np.random.randint(1000, 9999, n)
})

Load data

加载数据

df = load_sales_data()

Apply filters

应用过滤器

filtered_df = df[ (df["date"] >= pd.Timestamp(date_range[0])) & (df["date"] <= pd.Timestamp(date_range[1])) & (df["category"].isin(categories)) & (df["region"].isin(regions)) ]

KPI Metrics Row

KPI指标行

st.subheader("Key Performance Indicators") col1, col2, col3, col4 = st.columns(4)

total_revenue = filtered_df["revenue"].sum() total_units = filtered_df["units"].sum() total_orders = len(filtered_df) unique_customers = filtered_df["customer_id"].nunique()

col1.metric( "Total Revenue", f"${total_revenue:,.0f}", delta=f"+{(total_revenue * 0.12):,.0f} vs last period" ) col2.metric( "Units Sold", f"{total_units:,}", delta=f"+{int(total_units * 0.08):,}" ) col3.metric( "Orders", f"{total_orders:,}", delta=f"+{int(total_orders * 0.05):,}" ) col4.metric( "Unique Customers", f"{unique_customers:,}", delta=f"+{int(unique_customers * 0.03):,}" )

st.subheader("关键绩效指标") col1, col2, col3, col4 = st.columns(4)

total_revenue = filtered_df["revenue"].sum() total_units = filtered_df["units"].sum() total_orders = len(filtered_df) unique_customers = filtered_df["customer_id"].nunique()

col1.metric( "总销售额", f"${total_revenue:,.0f}", delta=f"+{(total_revenue * 0.12):,.0f} 较上期" ) col2.metric( "销售数量", f"{total_units:,}", delta=f"+{int(total_units * 0.08):,}" ) col3.metric( "订单数", f"{total_orders:,}", delta=f"+{int(total_orders * 0.05):,}" ) col4.metric( "独立客户数", f"{unique_customers:,}", delta=f"+{int(unique_customers * 0.03):,}" )

Charts Row

图表行

st.subheader("Revenue Analysis")

col1, col2 = st.columns(2)

with col1: # Revenue trend daily_revenue = filtered_df.groupby("date")["revenue"].sum().reset_index() fig_trend = px.line( daily_revenue, x="date", y="revenue", title="Daily Revenue Trend" ) fig_trend.update_layout(hovermode="x unified") st.plotly_chart(fig_trend, use_container_width=True)

with col2: # Revenue by category category_revenue = filtered_df.groupby("category")["revenue"].sum().reset_index() fig_category = px.pie( category_revenue, values="revenue", names="category", title="Revenue by Category" ) st.plotly_chart(fig_category, use_container_width=True)

st.subheader("销售额分析")

col1, col2 = st.columns(2)

with col1: # 销售额趋势 daily_revenue = filtered_df.groupby("date")["revenue"].sum().reset_index() fig_trend = px.line( daily_revenue, x="date", y="revenue", title="每日销售额趋势" ) fig_trend.update_layout(hovermode="x unified") st.plotly_chart(fig_trend, use_container_width=True)

with col2: # 按类别销售额 category_revenue = filtered_df.groupby("category")["revenue"].sum().reset_index() fig_category = px.pie( category_revenue, values="revenue", names="category", title="按类别销售额分布" ) st.plotly_chart(fig_category, use_container_width=True)

Second charts row

第二行图表

col1, col2 = st.columns(2)

with col1: # Regional comparison regional_data = filtered_df.groupby("region").agg({ "revenue": "sum", "units": "sum" }).reset_index()

fig_region = px.bar(
    regional_data,
    x="region",
    y="revenue",
    color="region",
    title="Revenue by Region"
)
st.plotly_chart(fig_region, use_container_width=True)

with col2: # Category by region heatmap pivot_data = filtered_df.pivot_table( values="revenue", index="category", columns="region", aggfunc="sum" )

fig_heatmap = px.imshow(
    pivot_data,
    title="Revenue Heatmap: Category vs Region",
    color_continuous_scale="Blues",
    text_auto=".0f"
)
st.plotly_chart(fig_heatmap, use_container_width=True)

col1, col2 = st.columns(2)

with col1: # 区域对比 regional_data = filtered_df.groupby("region").agg({ "revenue": "sum", "units": "sum" }).reset_index()

fig_region = px.bar(
    regional_data,
    x="region",
    y="revenue",
    color="region",
    title="按区域销售额"
)
st.plotly_chart(fig_region, use_container_width=True)

with col2: # 类别与区域热力图 pivot_data = filtered_df.pivot_table( values="revenue", index="category", columns="region", aggfunc="sum" )

fig_heatmap = px.imshow(
    pivot_data,
    title="销售额热力图：类别 vs 区域",
    color_continuous_scale="Blues",
    text_auto=".0f"
)
st.plotly_chart(fig_heatmap, use_container_width=True)

Data Table

数据表格

st.subheader("Detailed Data")

with st.expander("View Raw Data"): # Aggregated summary summary = filtered_df.groupby(["date", "category", "region"]).agg({ "revenue": "sum", "units": "sum", "customer_id": "nunique" }).reset_index() summary.columns = ["Date", "Category", "Region", "Revenue", "Units", "Customers"]

st.dataframe(
    summary.sort_values("Date", ascending=False),
    use_container_width=True,
    height=400
)

# Download button
csv = summary.to_csv(index=False)
st.download_button(
    label="Download CSV",
    data=csv,
    file_name=f"sales_data_{datetime.now().strftime('%Y%m%d')}.csv",
    mime="text/csv"
)

st.subheader("详细数据")

with st.expander("查看原始数据"): # 聚合汇总 summary = filtered_df.groupby(["date", "category", "region"]).agg({ "revenue": "sum", "units": "sum", "customer_id": "nunique" }).reset_index() summary.columns = ["日期", "类别", "区域", "销售额", "数量", "客户数"]

st.dataframe(
    summary.sort_values("日期", ascending=False),
    use_container_width=True,
    height=400
)

# 下载按钮
csv = summary.to_csv(index=False)
st.download_button(
    label="下载CSV",
    data=csv,
    file_name=f"销售数据_{datetime.now().strftime('%Y%m%d')}.csv",
    mime="text/csv"
)

Footer

页脚

st.markdown("---") st.caption(f"Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

undefined

st.markdown("---") st.caption(f"最后更新：{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

undefined

Example 2: Data Explorer Tool

示例2：数据探索工具

python

import streamlit as st
import pandas as pd
import polars as pl
import plotly.express as px

st.set_page_config(page_title="Data Explorer", page_icon="🔍", layout="wide")

st.title("🔍 Interactive Data Explorer")

python

import streamlit as st
import pandas as pd
import polars as pl
import plotly.express as px

st.set_page_config(page_title="数据探索器", page_icon="🔍", layout="wide")

st.title("🔍 交互式数据探索器")

File upload

文件上传

uploaded_file = st.file_uploader( "Upload your data file", type=["csv", "xlsx", "parquet"], help="Supported formats: CSV, Excel, Parquet" )

@st.cache_data def load_uploaded_file(file, file_type): """Load uploaded file based on type.""" if file_type == "csv": return pd.read_csv(file) elif file_type == "xlsx": return pd.read_excel(file) elif file_type == "parquet": return pd.read_parquet(file)

if uploaded_file is not None: # Determine file type file_type = uploaded_file.name.split(".")[-1].lower()

# Load data
with st.spinner("Loading data..."):
    df = load_uploaded_file(uploaded_file, file_type)

st.success(f"Loaded {len(df):,} rows and {len(df.columns)} columns")

# Data overview tabs
tab1, tab2, tab3, tab4 = st.tabs([
    "📋 Overview",
    "📊 Visualize",
    "🔢 Statistics",
    "🔍 Filter & Export"
])

with tab1:
    st.subheader("Data Preview")
    st.dataframe(df.head(100), use_container_width=True)

    col1, col2 = st.columns(2)
    with col1:
        st.write("**Column Types:**")
        type_df = pd.DataFrame({
            "Column": df.columns,
            "Type": df.dtypes.astype(str),
            "Non-Null": df.notna().sum(),
            "Null %": (df.isna().sum() / len(df) * 100).round(2)
        })
        st.dataframe(type_df, use_container_width=True)

    with col2:
        st.write("**Data Shape:**")
        st.write(f"- Rows: {len(df):,}")
        st.write(f"- Columns: {len(df.columns)}")
        st.write(f"- Memory: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")

with tab2:
    st.subheader("Quick Visualizations")

    # Get column types
    numeric_cols = df.select_dtypes(include=["number"]).columns.tolist()
    categorical_cols = df.select_dtypes(include=["object", "category"]).columns.tolist()

    chart_type = st.selectbox(
        "Chart Type",
        ["Histogram", "Scatter", "Line", "Bar", "Box Plot"]
    )

    if chart_type == "Histogram":
        col = st.selectbox("Select column", numeric_cols)
        bins = st.slider("Number of bins", 10, 100, 30)
        fig = px.histogram(df, x=col, nbins=bins, title=f"Distribution of {col}")
        st.plotly_chart(fig, use_container_width=True)

    elif chart_type == "Scatter":
        col1, col2 = st.columns(2)
        x_col = col1.selectbox("X axis", numeric_cols)
        y_col = col2.selectbox("Y axis", numeric_cols, index=min(1, len(numeric_cols)-1))
        color_col = st.selectbox("Color by (optional)", ["None"] + categorical_cols)

        fig = px.scatter(
            df, x=x_col, y=y_col,
            color=None if color_col == "None" else color_col,
            title=f"{y_col} vs {x_col}"
        )
        st.plotly_chart(fig, use_container_width=True)

    elif chart_type == "Line":
        x_col = st.selectbox("X axis", df.columns.tolist())
        y_cols = st.multiselect("Y axis (select multiple)", numeric_cols)
        if y_cols:
            fig = px.line(df, x=x_col, y=y_cols, title="Line Chart")
            st.plotly_chart(fig, use_container_width=True)

    elif chart_type == "Bar":
        cat_col = st.selectbox("Category", categorical_cols if categorical_cols else df.columns.tolist())
        val_col = st.selectbox("Value", numeric_cols)
        agg_func = st.selectbox("Aggregation", ["sum", "mean", "count", "median"])

        agg_data = df.groupby(cat_col)[val_col].agg(agg_func).reset_index()
        fig = px.bar(agg_data, x=cat_col, y=val_col, title=f"{agg_func.title()} of {val_col} by {cat_col}")
        st.plotly_chart(fig, use_container_width=True)

    elif chart_type == "Box Plot":
        val_col = st.selectbox("Value", numeric_cols)
        group_col = st.selectbox("Group by (optional)", ["None"] + categorical_cols)

        fig = px.box(
            df, y=val_col,
            x=None if group_col == "None" else group_col,
            title=f"Distribution of {val_col}"
        )
        st.plotly_chart(fig, use_container_width=True)

with tab3:
    st.subheader("Statistical Summary")

    # Numeric statistics
    if numeric_cols:
        st.write("**Numeric Columns:**")
        st.dataframe(df[numeric_cols].describe(), use_container_width=True)

    # Categorical statistics
    if categorical_cols:
        st.write("**Categorical Columns:**")
        for col in categorical_cols[:5]:  # Limit to first 5
            with st.expander(f"{col} value counts"):
                st.dataframe(
                    df[col].value_counts().head(20).reset_index(),
                    use_container_width=True
                )

    # Correlation matrix
    if len(numeric_cols) > 1:
        st.write("**Correlation Matrix:**")
        corr = df[numeric_cols].corr()
        fig = px.imshow(
            corr,
            text_auto=".2f",
            color_continuous_scale="RdBu_r",
            aspect="auto"
        )
        st.plotly_chart(fig, use_container_width=True)

with tab4:
    st.subheader("Filter & Export")

    # Dynamic filtering
    st.write("**Apply Filters:**")

    filtered_df = df.copy()

    for col in df.columns[:10]:  # Limit columns for UI
        if df[col].dtype in ["int64", "float64"]:
            min_val, max_val = float(df[col].min()), float(df[col].max())
            if min_val < max_val:
                range_val = st.slider(
                    f"{col} range",
                    min_val, max_val, (min_val, max_val),
                    key=f"filter_{col}"
                )
                filtered_df = filtered_df[
                    (filtered_df[col] >= range_val[0]) &
                    (filtered_df[col] <= range_val[1])
                ]
        elif df[col].dtype == "object" and df[col].nunique() < 20:
            selected = st.multiselect(
                f"{col}",
                options=df[col].unique().tolist(),
                default=df[col].unique().tolist(),
                key=f"filter_{col}"
            )
            filtered_df = filtered_df[filtered_df[col].isin(selected)]

    st.write(f"**Filtered data: {len(filtered_df):,} rows**")
    st.dataframe(filtered_df.head(100), use_container_width=True)

    # Export
    col1, col2 = st.columns(2)
    with col1:
        csv = filtered_df.to_csv(index=False)
        st.download_button(
            "Download as CSV",
            csv,
            "filtered_data.csv",
            "text/csv"
        )
    with col2:
        excel_buffer = pd.ExcelWriter("filtered_data.xlsx", engine="openpyxl")
        filtered_df.to_excel(excel_buffer, index=False)

else: st.info("Please upload a data file to get started.")

# Sample data option
if st.button("Load sample data"):
    import numpy as np
    np.random.seed(42)

    sample_df = pd.DataFrame({
        "date": pd.date_range("2025-01-01", periods=100),
        "category": np.random.choice(["A", "B", "C"], 100),
        "value": np.random.randn(100) * 100 + 500,
        "count": np.random.randint(1, 100, 100)
    })
    sample_df.to_csv("/tmp/sample_data.csv", index=False)
    st.success("Sample data created! Upload '/tmp/sample_data.csv'")

undefined

uploaded_file = st.file_uploader( "上传您的数据文件", type=["csv", "xlsx", "parquet"], help="支持格式：CSV、Excel、Parquet" )

@st.cache_data def load_uploaded_file(file, file_type): """根据文件类型加载上传的文件。""" if file_type == "csv": return pd.read_csv(file) elif file_type == "xlsx": return pd.read_excel(file) elif file_type == "parquet": return pd.read_parquet(file)

if uploaded_file is not None: # 判断文件类型 file_type = uploaded_file.name.split(".")[-1].lower()

# 加载数据
with st.spinner("加载数据中..."):
    df = load_uploaded_file(uploaded_file, file_type)

st.success(f"已加载 {len(df):,} 行和 {len(df.columns)} 列")

# 数据概览标签页
tab1, tab2, tab3, tab4 = st.tabs([
    "📋 概览",
    "📊 可视化",
    "🔢 统计",
    "🔍 过滤与导出"
])

with tab1:
    st.subheader("数据预览")
    st.dataframe(df.head(100), use_container_width=True)

    col1, col2 = st.columns(2)
    with col1:
        st.write("**列类型：**")
        type_df = pd.DataFrame({
            "列名": df.columns,
            "类型": df.dtypes.astype(str),
            "非空值": df.notna().sum(),
            "空值占比%": (df.isna().sum() / len(df) * 100).round(2)
        })
        st.dataframe(type_df, use_container_width=True)

    with col2:
        st.write("**数据形状：**")
        st.write(f"- 行数：{len(df):,}")
        st.write(f"- 列数：{len(df.columns)}")
        st.write(f"- 内存占用：{df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")

with tab2:
    st.subheader("快速可视化")

    # 获取列类型
    numeric_cols = df.select_dtypes(include=["number"]).columns.tolist()
    categorical_cols = df.select_dtypes(include=["object", "category"]).columns.tolist()

    chart_type = st.selectbox(
        "图表类型",
        ["直方图", "散点图", "折线图", "柱状图", "箱线图"]
    )

    if chart_type == "直方图":
        col = st.selectbox("选择列", numeric_cols)
        bins = st.slider("分箱数", 10, 100, 30)
        fig = px.histogram(df, x=col, nbins=bins, title=f"{col} 的分布")
        st.plotly_chart(fig, use_container_width=True)

    elif chart_type == "散点图":
        col1, col2 = st.columns(2)
        x_col = col1.selectbox("X轴", numeric_cols)
        y_col = col2.selectbox("Y轴", numeric_cols, index=min(1, len(numeric_cols)-1))
        color_col = st.selectbox("颜色分组（可选）", ["无"] + categorical_cols)

        fig = px.scatter(
            df, x=x_col, y=y_col,
            color=None if color_col == "无" else color_col,
            title=f"{y_col} vs {x_col}"
        )
        st.plotly_chart(fig, use_container_width=True)

    elif chart_type == "折线图":
        x_col = st.selectbox("X轴", df.columns.tolist())
        y_cols = st.multiselect("Y轴（可多选）", numeric_cols)
        if y_cols:
            fig = px.line(df, x=x_col, y=y_cols, title="折线图")
            st.plotly_chart(fig, use_container_width=True)

    elif chart_type == "柱状图":
        cat_col = st.selectbox("类别列", categorical_cols if categorical_cols else df.columns.tolist())
        val_col = st.selectbox("值列", numeric_cols)
        agg_func = st.selectbox("聚合方式", ["求和", "均值", "计数", "中位数"])

        agg_data = df.groupby(cat_col)[val_col].agg(agg_func).reset_index()
        fig = px.bar(agg_data, x=cat_col, y=val_col, title=f"{agg_func} {val_col} 按 {cat_col}")
        st.plotly_chart(fig, use_container_width=True)

    elif chart_type == "箱线图":
        val_col = st.selectbox("值列", numeric_cols)
        group_col = st.selectbox("分组列（可选）", ["无"] + categorical_cols)

        fig = px.box(
            df, y=val_col,
            x=None if group_col == "无" else group_col,
            title=f"{val_col} 的分布"
        )
        st.plotly_chart(fig, use_container_width=True)

with tab3:
    st.subheader("统计摘要")

    # 数值列统计
    if numeric_cols:
        st.write("**数值列统计：**")
        st.dataframe(df[numeric_cols].describe(), use_container_width=True)

    # 分类列统计
    if categorical_cols:
        st.write("**分类列统计：**")
        for col in categorical_cols[:5]:  # 限制显示前5列
            with st.expander(f"{col} 值计数"):
                st.dataframe(
                    df[col].value_counts().head(20).reset_index(),
                    use_container_width=True
                )

    # 相关矩阵
    if len(numeric_cols) > 1:
        st.write("**相关矩阵：**")
        corr = df[numeric_cols].corr()
        fig = px.imshow(
            corr,
            text_auto=".2f",
            color_continuous_scale="RdBu_r",
            aspect="auto"
        )
        st.plotly_chart(fig, use_container_width=True)

with tab4:
    st.subheader("过滤与导出")

    # 动态过滤
    st.write("**应用过滤器：**")

    filtered_df = df.copy()

    for col in df.columns[:10]:  # 限制显示列数以优化UI
        if df[col].dtype in ["int64", "float64"]:
            min_val, max_val = float(df[col].min()), float(df[col].max())
            if min_val < max_val:
                range_val = st.slider(
                    f"{col} 范围",
                    min_val, max_val, (min_val, max_val),
                    key=f"filter_{col}"
                )
                filtered_df = filtered_df[
                    (filtered_df[col] >= range_val[0]) &
                    (filtered_df[col] <= range_val[1])
                ]
        elif df[col].dtype == "object" and df[col].nunique() < 20:
            selected = st.multiselect(
                f"{col}",
                options=df[col].unique().tolist(),
                default=df[col].unique().tolist(),
                key=f"filter_{col}"
            )
            filtered_df = filtered_df[filtered_df[col].isin(selected)]

    st.write(f"**过滤后数据：{len(filtered_df):,} 行**")
    st.dataframe(filtered_df.head(100), use_container_width=True)

    # 导出
    col1, col2 = st.columns(2)
    with col1:
        csv = filtered_df.to_csv(index=False)
        st.download_button(
            "导出为CSV",
            csv,
            "filtered_data.csv",
            "text/csv"
        )
    with col2:
        excel_buffer = pd.ExcelWriter("filtered_data.xlsx", engine="openpyxl")
        filtered_df.to_excel(excel_buffer, index=False)

else: st.info("请上传数据文件开始使用。")

# 示例数据选项
if st.button("加载示例数据"):
    import numpy as np
    np.random.seed(42)

    sample_df = pd.DataFrame({
        "date": pd.date_range("2025-01-01", periods=100),
        "category": np.random.choice(["A", "B", "C"], 100),
        "value": np.random.randn(100) * 100 + 500,
        "count": np.random.randint(1, 100, 100)
    })
    sample_df.to_csv("/tmp/sample_data.csv", index=False)
    st.success("示例数据已创建！请上传 '/tmp/sample_data.csv'")

undefined

Example 3: ML Model Demo

示例3：ML模型演示

python

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

st.set_page_config(page_title="ML Demo", page_icon="🤖", layout="wide")

st.title("🤖 Machine Learning Model Demo")

python

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

st.set_page_config(page_title="ML演示", page_icon="🤖", layout="wide")

st.title("🤖 机器学习模型演示")

Sidebar

侧边栏

st.sidebar.header("Model Configuration")

st.sidebar.header("模型配置")

Model parameters

模型参数

n_estimators = st.sidebar.slider("Number of trees", 10, 200, 100) max_depth = st.sidebar.slider("Max depth", 1, 20, 5) test_size = st.sidebar.slider("Test size", 0.1, 0.5, 0.2)

n_estimators = st.sidebar.slider("决策树数量", 10, 200, 100) max_depth = st.sidebar.slider("最大深度", 1, 20, 5) test_size = st.sidebar.slider("测试集比例", 0.1, 0.5, 0.2)

Generate sample data

生成示例数据

@st.cache_data def generate_sample_data(n_samples=1000): np.random.seed(42)

# Features
X = pd.DataFrame({
    "feature_1": np.random.randn(n_samples),
    "feature_2": np.random.randn(n_samples),
    "feature_3": np.random.uniform(0, 100, n_samples),
    "feature_4": np.random.choice(["A", "B", "C"], n_samples)
})

# Target (based on features with some noise)
y = (
    (X["feature_1"] > 0).astype(int) +
    (X["feature_3"] > 50).astype(int)
) >= 1
y = y.astype(int)

return X, y

X, y = generate_sample_data()

@st.cache_data def generate_sample_data(n_samples=1000): np.random.seed(42)

# 特征
X = pd.DataFrame({
    "feature_1": np.random.randn(n_samples),
    "feature_2": np.random.randn(n_samples),
    "feature_3": np.random.uniform(0, 100, n_samples),
    "feature_4": np.random.choice(["A", "B", "C"], n_samples)
})

# 目标变量（基于特征并添加噪声）
y = (
    (X["feature_1"] > 0).astype(int) +
    (X["feature_3"] > 50).astype(int)
) >= 1
y = y.astype(int)

return X, y

X, y = generate_sample_data()

Convert categorical

转换分类变量

X_encoded = pd.get_dummies(X, columns=["feature_4"])

Train/test split

划分训练集和测试集

X_train, X_test, y_train, y_test = train_test_split( X_encoded, y, test_size=test_size, random_state=42 )

Train model

训练模型

@st.cache_resource def train_model(n_est, depth, _X_train, _y_train): model = RandomForestClassifier( n_estimators=n_est, max_depth=depth, random_state=42 ) model.fit(_X_train, _y_train) return model

Training

训练中

with st.spinner("Training model..."): model = train_model(n_estimators, max_depth, X_train, y_train)

with st.spinner("训练模型中..."): model = train_model(n_estimators, max_depth, X_train, y_train)

Predictions

预测

y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred)

Results

结果

st.subheader("Model Performance")

col1, col2, col3 = st.columns(3) col1.metric("Accuracy", f"{accuracy:.2%}") col2.metric("Training Samples", f"{len(X_train):,}") col3.metric("Test Samples", f"{len(X_test):,}")

st.subheader("模型性能")

col1, col2, col3 = st.columns(3) col1.metric("准确率", f"{accuracy:.2%}") col2.metric("训练样本数", f"{len(X_train):,}") col3.metric("测试样本数", f"{len(X_test):,}")

Feature importance

特征重要性

st.subheader("Feature Importance") importance_df = pd.DataFrame({ "Feature": X_encoded.columns, "Importance": model.feature_importances_ }).sort_values("Importance", ascending=True)

fig = px.bar( importance_df, x="Importance", y="Feature", orientation="h", title="Feature Importance" ) st.plotly_chart(fig, use_container_width=True)

st.subheader("特征重要性") importance_df = pd.DataFrame({ "特征": X_encoded.columns, "重要性": model.feature_importances_ }).sort_values("重要性", ascending=True)

fig = px.bar( importance_df, x="重要性", y="特征", orientation="h", title="特征重要性" ) st.plotly_chart(fig, use_container_width=True)

Interactive prediction

交互式预测

st.subheader("Make a Prediction")

with st.form("prediction_form"): col1, col2 = st.columns(2)

with col1:
    f1 = st.number_input("Feature 1", value=0.0)
    f2 = st.number_input("Feature 2", value=0.0)

with col2:
    f3 = st.number_input("Feature 3", min_value=0.0, max_value=100.0, value=50.0)
    f4 = st.selectbox("Feature 4", ["A", "B", "C"])

submitted = st.form_submit_button("Predict")

if submitted:
    # Prepare input
    input_data = pd.DataFrame({
        "feature_1": [f1],
        "feature_2": [f2],
        "feature_3": [f3],
        "feature_4": [f4]
    })
    input_encoded = pd.get_dummies(input_data, columns=["feature_4"])

    # Align columns
    for col in X_encoded.columns:
        if col not in input_encoded.columns:
            input_encoded[col] = 0
    input_encoded = input_encoded[X_encoded.columns]

    # Predict
    prediction = model.predict(input_encoded)[0]
    proba = model.predict_proba(input_encoded)[0]

    st.success(f"Prediction: **{'Positive' if prediction == 1 else 'Negative'}**")
    st.write(f"Confidence: {max(proba):.2%}")

undefined

st.subheader("进行预测")

with st.form("prediction_form"): col1, col2 = st.columns(2)

with col1:
    f1 = st.number_input("Feature 1", value=0.0)
    f2 = st.number_input("Feature 2", value=0.0)

with col2:
    f3 = st.number_input("Feature 3", min_value=0.0, max_value=100.0, value=50.0)
    f4 = st.selectbox("Feature 4", ["A", "B", "C"])

submitted = st.form_submit_button("预测")

if submitted:
    # 准备输入数据
    input_data = pd.DataFrame({
        "feature_1": [f1],
        "feature_2": [f2],
        "feature_3": [f3],
        "feature_4": [f4]
    })
    input_encoded = pd.get_dummies(input_data, columns=["feature_4"])

    # 对齐列
    for col in X_encoded.columns:
        if col not in input_encoded.columns:
            input_encoded[col] = 0
    input_encoded = input_encoded[X_encoded.columns]

    # 预测
    prediction = model.predict(input_encoded)[0]
    proba = model.predict_proba(input_encoded)[0]

    st.success(f"预测结果：**{'阳性' if prediction == 1 else '阴性'}**")
    st.write(f"置信度：{max(proba):.2%}")

undefined

Deployment Patterns

部署模式

Streamlit Cloud Deployment

Streamlit Cloud部署

yaml

undefined

yaml

undefined

requirements.txt

streamlit>=1.32.0 pandas>=2.0.0 polars>=0.20.0 plotly>=5.18.0 numpy>=1.24.0


```toml

streamlit>=1.32.0 pandas>=2.0.0 polars>=0.20.0 plotly>=5.18.0 numpy>=1.24.0


```toml

.streamlit/config.toml

[theme] primaryColor="#1f77b4" backgroundColor="#ffffff" secondaryBackgroundColor="#f0f2f6" textColor="#262730" font="sans serif"

[server] maxUploadSize=200 enableXsrfProtection=true

undefined

[theme] primaryColor="#1f77b4" backgroundColor="#ffffff" secondaryBackgroundColor="#f0f2f6" textColor="#262730" font="sans serif"

[server] maxUploadSize=200 enableXsrfProtection=true

undefined

Docker Deployment

Docker部署

dockerfile

undefined

dockerfile

undefined

Dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8501

HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health

ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]


```bash

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8501

HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health

ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]


```bash

Build and run

构建并运行

docker build -t my-streamlit-app . docker run -p 8501:8501 my-streamlit-app

undefined

docker build -t my-streamlit-app . docker run -p 8501:8501 my-streamlit-app

undefined

Best Practices

最佳实践

1. Use Caching Appropriately

1. 合理使用缓存

python

undefined

python

undefined

GOOD: Cache data loading

推荐：缓存数据加载

@st.cache_data def load_data(): return pd.read_csv("data.csv")

GOOD: Cache resources (DB connections, models)

推荐：缓存资源（数据库连接、模型）

@st.cache_resource def get_model(): return load_model("model.pkl")

AVOID: Caching with unhashable arguments

避免：使用不可哈希的参数进行缓存

Use _arg prefix to skip hashing

使用_arg前缀跳过哈希

@st.cache_data def process_data(_db_connection, query): return _db_connection.execute(query)

undefined

@st.cache_data def process_data(_db_connection, query): return _db_connection.execute(query)

undefined

2. Organize Large Apps

2. 组织大型应用

python

undefined

python

undefined

utils/data.py

def load_data(): pass

utils/charts.py

def create_chart(df): pass

app.py

from utils.data import load_data from utils.charts import create_chart

undefined

from utils.data import load_data from utils.charts import create_chart

undefined

3. Handle State Carefully

3. 谨慎处理状态

python

undefined

python

undefined

GOOD: Initialize state at the top

推荐：在顶部初始化状态

if "data" not in st.session_state: st.session_state.data = None

GOOD: Use callbacks for complex updates

推荐：使用回调处理复杂更新

def on_filter_change(): st.session_state.filtered_data = apply_filter(st.session_state.data)

st.selectbox("Filter", options, on_change=on_filter_change)

undefined

def on_filter_change(): st.session_state.filtered_data = apply_filter(st.session_state.data)

st.selectbox("过滤器", options, on_change=on_filter_change)

undefined

4. Optimize Performance

4. 优化性能

python

undefined

python

undefined

Use containers for layout stability

使用容器保持布局稳定

placeholder = st.empty()

Batch widget updates in forms

在表单中批量处理小部件更新

with st.form("filters"): # Multiple widgets st.form_submit_button()

with st.form("filters"): # 多个小部件 st.form_submit_button()

Use columns for responsive layout

使用列实现响应式布局

cols = st.columns([1, 2, 1])

undefined

cols = st.columns([1, 2, 1])

undefined

Troubleshooting

故障排除

Common Issues

常见问题

Issue: App reruns on every interaction

python

undefined

问题：每次交互应用都会重新运行

python

undefined

Use forms to batch inputs

使用表单批量处理输入

with st.form("my_form"): input1 = st.text_input("Input") submit = st.form_submit_button()


**Issue: Slow data loading**
```python

with st.form("my_form"): input1 = st.text_input("输入") submit = st.form_submit_button()


**问题：数据加载缓慢**
```python

Add caching

添加缓存

@st.cache_data(ttl=3600) def load_data(): return pd.read_csv("large_file.csv")


**Issue: Memory issues with large files**
```python

@st.cache_data(ttl=3600) def load_data(): return pd.read_csv("large_file.csv")


**问题：大文件导致内存问题**
```python

Use chunking

使用分块加载

@st.cache_data def load_large_file(path, nrows=10000): return pd.read_csv(path, nrows=nrows)


**Issue: Widget state lost on rerun**
```python

@st.cache_data def load_large_file(path, nrows=10000): return pd.read_csv(path, nrows=nrows)


**问题：重新运行后小部件状态丢失**
```python

Persist in session state

将会话状态持久化

if "value" not in st.session_state: st.session_state.value = default_value

Use key parameter

使用key参数

st.text_input("Name", key="user_name")

undefined

st.text_input("姓名", key="user_name")

undefined

Version History

版本历史

1.0.0 (2026-01-17): Initial release
- Basic app structure and widgets
- Layout and organization patterns
- Data visualization integration
- Caching strategies
- Session state management
- Multi-page applications
- Complete dashboard examples
- Deployment patterns
- Best practices and troubleshooting

1.0.0 (2026-01-17): 初始版本
- 基础应用结构和小部件
- 布局与组织模式
- 数据可视化集成
- 缓存策略
- 会话状态管理
- 多页面应用
- 完整仪表板示例
- 部署模式
- 最佳实践与故障排除

Resources

资源

Official Docs: https://docs.streamlit.io/
Gallery: https://streamlit.io/gallery
Components: https://streamlit.io/components
Cloud: https://streamlit.io/cloud
GitHub: https://github.com/streamlit/streamlit

Build beautiful data apps with pure Python - no frontend experience required!

官方文档: https://docs.streamlit.io/
应用画廊: https://streamlit.io/gallery
组件库: https://streamlit.io/components
Streamlit云: https://streamlit.io/cloud
GitHub仓库: https://github.com/streamlit/streamlit

使用纯Python构建美观的数据应用 - 无需前端开发经验！