Ibis Data Interface

Ibis 数据接口

Ibis provides a database-agnostic Python DataFrame API. Write queries once in Python; Ibis translates them to optimized SQL for the connected backend (DuckDB, PostgreSQL, SQLite, etc.).

Ibis 提供了一个与数据库无关的Python DataFrame API。只需用Python编写一次查询，Ibis就会将其转换为适配所连接后端（DuckDB、PostgreSQL、SQLite等）的优化SQL。

Why Ibis

为什么选择Ibis

Portability: Develop with DuckDB, deploy against PostgreSQL -- change only the connection
Lazy evaluation: Operations build an expression tree; nothing executes until
```
.execute()
```
Full SQL power: Window functions, CTEs, joins, aggregations -- all through Python
No ORM: You get SQL performance without SQL strings

可移植性：使用DuckDB开发，部署到PostgreSQL环境——只需修改连接配置即可
惰性求值：操作会构建表达式树，直到调用
```
.execute()
```
才会执行
完整SQL能力：窗口函数、CTE、连接、聚合——全部通过Python实现
无ORM：无需编写SQL字符串即可获得SQL级别的性能

Connecting

连接数据库

python

import ibis

python

import ibis

DuckDB (default for local/parquet work)

DuckDB（本地/Parquet文件处理的默认选择）

con = ibis.duckdb.connect() con = ibis.duckdb.connect("my.duckdb")

PostgreSQL

con = ibis.postgres.connect(host="localhost", database="mydb", user="user", password="pass")

SQLite

con = ibis.sqlite.connect("my.sqlite")

Read files directly

直接读取文件

table = con.read_parquet("data.parquet") table = con.read_csv("data.csv")

undefined

table = con.read_parquet("data.parquet") table = con.read_csv("data.csv")

undefined

Core Operations

核心操作

python

undefined

python

undefined

Explore

探索数据

table.schema() # column names and types table.head(10) # preview rows table.describe() # summary statistics table.count().execute() # row count

table.schema() # 列名和类型 table.head(10) # 预览行数据 table.describe() # 统计摘要 table.count().execute() # 行数统计

Select and filter

选择与过滤

selected = table.select("id", "amount", "date") filtered = table.filter((table.amount > 100) & (table.date >= "2024-01-01")) sorted_data = table.order_by(table.amount.desc())

Transform

数据转换

enriched = table.mutate( revenue=table.quantity * table.unit_price, year=table.date.year(), size=ibis.case() .when(table.amount < 100, "small") .when(table.amount < 1000, "medium") .else_("large") .end() )

Aggregate

聚合操作

summary = ( table.group_by("category") .aggregate( total=table.amount.sum(), avg=table.amount.mean(), count=table.count() ) )

Join

表连接

joined = ( orders .join(customers, orders.customer_id == customers.id, how="left") .select(orders.order_id, orders.amount, customers.name) )

Window functions

窗口函数

ranked = table.mutate( rank=table.amount.rank().over( ibis.window(group_by="category", order_by=table.amount.desc()) ) )

Execute and export

执行与导出

df = summary.execute() # -> pandas DataFrame con.to_parquet(summary, "out.parquet") df.to_csv("out.csv", index=False)

undefined

df = summary.execute() # -> pandas DataFrame con.to_parquet(summary, "out.parquet") df.to_csv("out.csv", index=False)

undefined

API Reference

API 参考

API	What it covers
Table expressions	`select` , `filter` , `mutate` , `group_by` , `agg` , `join` , `order_by`
Selectors	Choose columns by name, type, or regex
Generic expressions	`.cast()` , `.isnull()` , `.fillna()` , `case()` , `.ifelse()`
Numeric expressions	`sum()` , `mean()` , `std()` , rounding, logarithms
String expressions	Slicing, regex, case conversion, stripping
Temporal expressions	`.year()` , `.month()` , interval arithmetic, formatting
Collection expressions	Array/map operations, unnesting
JSON expressions	Path-based extraction from JSON columns

API	涵盖内容
Table expressions	`select` , `filter` , `mutate` , `group_by` , `agg` , `join` , `order_by`
Selectors	按名称、类型或正则表达式选择列
Generic expressions	`.cast()` , `.isnull()` , `.fillna()` , `case()` , `.ifelse()`
Numeric expressions	`sum()` , `mean()` , `std()` , 取整、对数运算
String expressions	切片、正则、大小写转换、去除空白
Temporal expressions	`.year()` , `.month()` , 间隔运算、格式化
Collection expressions	数组/映射操作、展开嵌套
JSON expressions	从JSON列中基于路径提取数据

Best Practices

最佳实践

Filter early: Reduce data volume before aggregations
Stay lazy: Chain operations before calling
```
.execute()
```
Use selectors: Apply operations to multiple columns programmatically
Handle nulls: Check with
```
.isnull()
```
and handle with
```
.fillna()
```
explicitly
Check SQL: Use
```
ibis.to_sql(expr)
```
to inspect generated queries

尽早过滤：在聚合前减少数据量
保持惰性：在调用
```
.execute()
```
前链式调用操作
使用选择器：以编程方式对多列应用操作
处理空值：使用
```
.isnull()
```
检查并显式用
```
.fillna()
```
处理
检查SQL：使用
```
ibis.to_sql(expr)
```
查看生成的查询语句

Installation

安装

bash

uv add "ibis-framework[duckdb]"        # DuckDB backend
uv add "ibis-framework[postgres]"      # PostgreSQL backend

bash

uv add "ibis-framework[duckdb]"        # DuckDB 后端
uv add "ibis-framework[postgres]"      # PostgreSQL 后端

ibis-data

Original

Translation

Ibis Data Interface

Ibis 数据接口

Why Ibis

为什么选择Ibis

Connecting

连接数据库

DuckDB (default for local/parquet work)

DuckDB（本地/Parquet文件处理的默认选择）

PostgreSQL

PostgreSQL

SQLite

SQLite

Read files directly

直接读取文件

Core Operations

核心操作

Explore

探索数据

Select and filter

选择与过滤

Transform

数据转换

Aggregate

聚合操作

Join

表连接

Window functions

窗口函数

Execute and export

执行与导出

API Reference

API 参考

Best Practices

最佳实践

Installation

安装