ibis-data

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Ibis Data Interface

Ibis 数据接口

Ibis provides a database-agnostic Python DataFrame API. Write queries once in Python; Ibis translates them to optimized SQL for the connected backend (DuckDB, PostgreSQL, SQLite, etc.).
Ibis 提供了一个与数据库无关的Python DataFrame API。只需用Python编写一次查询,Ibis就会将其转换为适配所连接后端(DuckDB、PostgreSQL、SQLite等)的优化SQL。

Why Ibis

为什么选择Ibis

  • Portability: Develop with DuckDB, deploy against PostgreSQL -- change only the connection
  • Lazy evaluation: Operations build an expression tree; nothing executes until
    .execute()
  • Full SQL power: Window functions, CTEs, joins, aggregations -- all through Python
  • No ORM: You get SQL performance without SQL strings
  • 可移植性:使用DuckDB开发,部署到PostgreSQL环境——只需修改连接配置即可
  • 惰性求值:操作会构建表达式树,直到调用
    .execute()
    才会执行
  • 完整SQL能力:窗口函数、CTE、连接、聚合——全部通过Python实现
  • 无ORM:无需编写SQL字符串即可获得SQL级别的性能

Connecting

连接数据库

python
import ibis
python
import ibis

DuckDB (default for local/parquet work)

DuckDB(本地/Parquet文件处理的默认选择)

con = ibis.duckdb.connect() con = ibis.duckdb.connect("my.duckdb")
con = ibis.duckdb.connect() con = ibis.duckdb.connect("my.duckdb")

PostgreSQL

PostgreSQL

con = ibis.postgres.connect(host="localhost", database="mydb", user="user", password="pass")
con = ibis.postgres.connect(host="localhost", database="mydb", user="user", password="pass")

SQLite

SQLite

con = ibis.sqlite.connect("my.sqlite")
con = ibis.sqlite.connect("my.sqlite")

Read files directly

直接读取文件

table = con.read_parquet("data.parquet") table = con.read_csv("data.csv")
undefined
table = con.read_parquet("data.parquet") table = con.read_csv("data.csv")
undefined

Core Operations

核心操作

python
undefined
python
undefined

Explore

探索数据

table.schema() # column names and types table.head(10) # preview rows table.describe() # summary statistics table.count().execute() # row count
table.schema() # 列名和类型 table.head(10) # 预览行数据 table.describe() # 统计摘要 table.count().execute() # 行数统计

Select and filter

选择与过滤

selected = table.select("id", "amount", "date") filtered = table.filter((table.amount > 100) & (table.date >= "2024-01-01")) sorted_data = table.order_by(table.amount.desc())
selected = table.select("id", "amount", "date") filtered = table.filter((table.amount > 100) & (table.date >= "2024-01-01")) sorted_data = table.order_by(table.amount.desc())

Transform

数据转换

enriched = table.mutate( revenue=table.quantity * table.unit_price, year=table.date.year(), size=ibis.case() .when(table.amount < 100, "small") .when(table.amount < 1000, "medium") .else_("large") .end() )
enriched = table.mutate( revenue=table.quantity * table.unit_price, year=table.date.year(), size=ibis.case() .when(table.amount < 100, "small") .when(table.amount < 1000, "medium") .else_("large") .end() )

Aggregate

聚合操作

summary = ( table.group_by("category") .aggregate( total=table.amount.sum(), avg=table.amount.mean(), count=table.count() ) )
summary = ( table.group_by("category") .aggregate( total=table.amount.sum(), avg=table.amount.mean(), count=table.count() ) )

Join

表连接

joined = ( orders .join(customers, orders.customer_id == customers.id, how="left") .select(orders.order_id, orders.amount, customers.name) )
joined = ( orders .join(customers, orders.customer_id == customers.id, how="left") .select(orders.order_id, orders.amount, customers.name) )

Window functions

窗口函数

ranked = table.mutate( rank=table.amount.rank().over( ibis.window(group_by="category", order_by=table.amount.desc()) ) )
ranked = table.mutate( rank=table.amount.rank().over( ibis.window(group_by="category", order_by=table.amount.desc()) ) )

Execute and export

执行与导出

df = summary.execute() # -> pandas DataFrame con.to_parquet(summary, "out.parquet") df.to_csv("out.csv", index=False)
undefined
df = summary.execute() # -> pandas DataFrame con.to_parquet(summary, "out.parquet") df.to_csv("out.csv", index=False)
undefined

API Reference

API 参考

APIWhat it covers
Table expressions
select
,
filter
,
mutate
,
group_by
,
agg
,
join
,
order_by
SelectorsChoose columns by name, type, or regex
Generic expressions
.cast()
,
.isnull()
,
.fillna()
,
case()
,
.ifelse()
Numeric expressions
sum()
,
mean()
,
std()
, rounding, logarithms
String expressionsSlicing, regex, case conversion, stripping
Temporal expressions
.year()
,
.month()
, interval arithmetic, formatting
Collection expressionsArray/map operations, unnesting
JSON expressionsPath-based extraction from JSON columns
API涵盖内容
Table expressions
select
,
filter
,
mutate
,
group_by
,
agg
,
join
,
order_by
Selectors按名称、类型或正则表达式选择列
Generic expressions
.cast()
,
.isnull()
,
.fillna()
,
case()
,
.ifelse()
Numeric expressions
sum()
,
mean()
,
std()
, 取整、对数运算
String expressions切片、正则、大小写转换、去除空白
Temporal expressions
.year()
,
.month()
, 间隔运算、格式化
Collection expressions数组/映射操作、展开嵌套
JSON expressions从JSON列中基于路径提取数据

Best Practices

最佳实践

  • Filter early: Reduce data volume before aggregations
  • Stay lazy: Chain operations before calling
    .execute()
  • Use selectors: Apply operations to multiple columns programmatically
  • Handle nulls: Check with
    .isnull()
    and handle with
    .fillna()
    explicitly
  • Check SQL: Use
    ibis.to_sql(expr)
    to inspect generated queries
  • 尽早过滤:在聚合前减少数据量
  • 保持惰性:在调用
    .execute()
    前链式调用操作
  • 使用选择器:以编程方式对多列应用操作
  • 处理空值:使用
    .isnull()
    检查并显式用
    .fillna()
    处理
  • 检查SQL:使用
    ibis.to_sql(expr)
    查看生成的查询语句

Installation

安装

bash
uv add "ibis-framework[duckdb]"        # DuckDB backend
uv add "ibis-framework[postgres]"      # PostgreSQL backend
bash
uv add "ibis-framework[duckdb]"        # DuckDB 后端
uv add "ibis-framework[postgres]"      # PostgreSQL 后端