14 KiB
sqlmodel-pg-kit
Reusable SQLModel + PostgreSQL kit with src layout, sync/async engines, and generic CRUD repositories. Managed via uv (PEP 621/517) and built with hatchling. Includes minimal tests and a conda recipe.
Features
- Config dataclass -> builds sync/async URLs
- Engine + Session factories (sync/async)
- Generic
Repository/AsyncRepositorywith create/get/list/update/delete/bulk_insert - Examples and notebooks for common SQLModel patterns
- PyPI/conda packaging setup, smoke test
Quickstart (uv)
- Create venv:
uv venv - Editable install:
uv pip install -e . - Run tests:
uv pip install pytest && pytest -q
Local Postgres (Docker)
- Start the container:
docker compose up -d - Stop when done:
docker compose down - Default credentials match
DatabaseConfig: userappuser, passwordchangeme, databaseappdb - Export
SQL_SSLMODE=disable(container does not use TLS by default) - Port: 5433 (mapped from container 5432)
Usage
- Configure environment (either
SQL_*or PostgresPG*vars). Example for container ADDR:export SQL_HOST=192.168.64.8export SQL_PORT=5432export SQL_USER=postgresexport SQL_PASSWORD=change-me-strongexport SQL_DATABASE=appdbexport SQL_SSLMODE=disable
You can perform CRUD with SQLModel without writing raw SQL strings. This kit exposes:
create_all()to create tables from models collected inSQLModel.metadata- Generic repositories:
Repository(Model)andAsyncRepository(Model)for CRUD and bulk insert - Session helpers:
get_session()andget_async_session()for custom queries using SQLModel/SQLAlchemy expressions
Choose Backend: SQLite or Postgres
- SQLite (in-memory) quick demo — no environment variables needed:
from typing import Optional
from sqlmodel import SQLModel, Field
from sqlmodel_pg_kit import create_all
from sqlmodel_pg_kit import db # access engine factory
# Override engine to SQLite in-memory
db.engine = db.create_engine("sqlite:///:memory:", echo=False)
class Hero(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str
create_all() # creates tables on the current engine (SQLite)
- SQLite (file) quick demo:
from sqlmodel_pg_kit import db, create_all
db.engine = db.create_engine("sqlite:///./demo.db", echo=False)
create_all()
- Postgres (recommended for production) via environment variables:
export SQL_HOST=127.0.0.1
export SQL_PORT=5432
export SQL_USER=postgres
export SQL_PASSWORD=change-me-strong
export SQL_DATABASE=appdb
export SQL_SSLMODE=disable # or require/verify-full as needed
Then your Python code can just call:
from sqlmodel_pg_kit import create_all
create_all() # uses the default Postgres engine constructed from env
- Postgres explicit configuration (no env needed):
from sqlmodel_pg_kit import db, create_all
cfg = db.DatabaseConfig(
host="127.0.0.1", port=5432, user="postgres",
password="change-me-strong", database="appdb", sslmode="disable",
)
db.engine = db.create_engine(cfg.sync_url(), echo=False)
create_all()
CSV → SQLModel → Table (Auto‑generate Model)
Auto‑generate a SQLModel class from a CSV header and import the rows.
- CLI example:
uv run python examples/06_csv_to_sqlmodel.py --csv ./data/molecules.csv --sqlite ./demo.db
# Or with Postgres via env vars:
# export SQL_HOST=...; export SQL_USER=...; ...
# uv run python examples/06_csv_to_sqlmodel.py --csv ./data/molecules.csv
- In code:
from sqlmodel_pg_kit.csv_import import build_model_from_csv, insert_rows
from sqlmodel_pg_kit import create_all
from sqlmodel_pg_kit.db import get_session
spec, rows = build_model_from_csv("./data/molecules.csv", class_name="Molecules", table_name="molecules")
create_all()
with get_session() as s:
n = insert_rows(spec.model, rows, s)
print("inserted:", n)
Type inference rules per column: bool if values are true/false/1/0/yes/no; else int; else float; otherwise str. Empty/NA/NaN/None/null become NULL.
Sync CRUD (helpers)
uv run python examples/01_sync_crud.py
Shows create/get/list/update/delete using Repository. Minimal snippet:
from typing import Optional
from sqlmodel import SQLModel, Field
from sqlmodel_pg_kit import create_all, Repository
from sqlmodel_pg_kit.db import get_session
class Hero(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str = Field(index=True)
age: Optional[int] = None
create_all()
repo = Repository(Hero)
with get_session() as s:
h = repo.create(s, {"name": "Alice", "age": 20})
h2 = repo.get(s, h.id)
page = repo.list(s, page=1, size=5)
h3 = repo.update(s, h.id, age=21)
ok = repo.delete(s, h.id)
For a container-backed workflow, run uv run python examples/07_postgres_minimal.py (paired with notebooks/07_postgres_minimal.ipynb). That walkthrough now mirrors a REST-style request cycle:
- wipes existing demo rows so each run is deterministic
- upserts seed rows via
Repository.bulk_insert - paginates filtered results with
Repository.list(... where=..., order_by=..., page=..., size=...) - performs fuzzy search using
select(...).where(Model.name.ilike(...)) - counts inventory with
session.exec(select(func.count(...))).scalar()(compatible with SQLAlchemy 1.4/2.x) - updates and deletes with the repository helpers you would call from PATCH/DELETE handlers
Bulk insert + filters/pagination
uv run python examples/02_bulk_and_filters.py
Demonstrates Repository.bulk_insert(rows) and filtering with SQLModel expressions:
from typing import Optional
from sqlmodel import select, SQLModel, Field
from sqlmodel_pg_kit import Repository
from sqlmodel_pg_kit.db import get_session
class Hero(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str
age: Optional[int] = None
with get_session() as s:
Repository(Hero).bulk_insert(s, [{"name":"PG Hero","age":1},{"name":"PG Hero","age":2}])
heroes = s.exec(select(Hero).where(Hero.name == "PG Hero")).all()
Relationships (Team <- Hero)
uv run python examples/03_relationships.py
Creates a Team, assigns heroes, and reads back with selectinload eager loading.
Async CRUD
uv run python examples/04_async_crud.py
Uses get_async_session() with AsyncRepository to write/read without raw SQL. You may call create_all() once before async operations.
Tests via Makefile
make test-sqlite: run fast smoke tests on SQLite (no Postgres needed)make test: run all tests found by pytestmake test-pg: run Postgres integration once on current Python (requires SQL_/PG env)make test-pg-once: run Postgres integration across Python 3.10–3.13 (requiresuvand SQL_/PG env)
Cheminformatics Example
This repo includes a practical multi-table example tailored for cheminformatics workflows: examples/05_cheminformatics.py.
What it shows:
- Molecules with descriptors:
smiles,selfies,qed,sa_score(no RDKit dependency in runtime; compute upstream and store). - Many-to-many datasets:
Datasetand link tableMoleculeDatasetfor train/holdout/etc. - Dataclass interop: lightweight
@dataclass MoleculeDTOthat converts to SQLModel for fast CRUD. - Typical queries: threshold filters, pattern matching, eager-loaded relationships, join filters.
Run it (requires SQL_/PG env):
uv run python examples/05_cheminformatics.py
Step-by-step outline you can adapt:
- Define models
class Molecule(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
smiles: str = Field(index=True)
selfies: Optional[str] = None
qed: Optional[float] = Field(default=None, index=True)
sa_score: Optional[float] = Field(default=None, index=True)
datasets: List[Dataset] = Relationship(back_populates="molecules", link_model=MoleculeDataset)
class Dataset(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str = Field(index=True)
molecules: List[Molecule] = Relationship(back_populates="datasets", link_model=MoleculeDataset)
class MoleculeDataset(SQLModel, table=True):
molecule_id: int = Field(foreign_key="molecule.id", primary_key=True)
dataset_id: int = Field(foreign_key="dataset.id", primary_key=True)
- Bring your RDKit pipeline data via dataclass
@dataclass
class MoleculeDTO:
smiles: str
selfies: Optional[str] = None
qed: Optional[float] = None
sa_score: Optional[float] = None
def to_model(self) -> Molecule:
return Molecule(**vars(self))
- Fast CRUD
from sqlmodel_pg_kit.db import get_session
create_all() # once
with get_session() as s:
s.add(Molecule(smiles="CCO", qed=0.5, sa_score=2.1))
s.commit()
with get_session() as s:
m = s.exec(select(Molecule).where(Molecule.smiles == "CCO")).one()
m.qed = 0.55
s.add(m); s.commit(); s.refresh(m)
with get_session() as s:
s.delete(m); s.commit()
- Link datasets and join queries
with get_session() as s:
ds = Dataset(name="train"); s.add(ds); s.commit(); s.refresh(ds)
m = s.exec(select(Molecule).where(Molecule.smiles == "CCO")).one()
s.add(MoleculeDataset(molecule_id=m.id, dataset_id=ds.id)); s.commit()
with get_session() as s:
stmt = (
select(Molecule)
.join(MoleculeDataset, Molecule.id == MoleculeDataset.molecule_id)
.join(Dataset, Dataset.id == MoleculeDataset.dataset_id)
.where(Dataset.name == "train")
)
train_mols = s.exec(stmt).all()
- Tips for high-throughput tasks
- Bulk insert many rows: use SQLAlchemy Core
insert(Model).values(list_of_dicts)thens.commit(). - Paginate:
select(Model).offset((page-1)*size).limit(size). - Eager-load related rows:
.options(selectinload(Model.rel))to avoid N+1. - Partial updates: load row, set fields, commit; or use Core
update()when you don’t need ORM instances.
See examples/05_cheminformatics.py for a complete, runnable walkthrough.
Cheminformatics: RDKit + Mordred
Below are patterns to integrate RDKit and Mordred descriptor pipelines.
-
Modeling patterns:
- Wide columns: put high‑value fields directly on
Molecule(e.g.,qed,sa_score) with B‑Tree indexes for fast filters. - JSONB payload: store a large descriptor dict in a JSONB column when you need many descriptors but query only a subset.
- Normalized table (EAV):
MoleculeDescriptor(molecule_id, name, value)when you frequently query and index specific descriptors by name.
- Wide columns: put high‑value fields directly on
-
JSONB example (Postgres):
from typing import Optional, Dict
from sqlmodel import SQLModel, Field
from sqlalchemy import Column
from sqlalchemy.dialects.postgresql import JSONB
class Molecule(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
smiles: str = Field(index=True)
descriptors: Dict[str, float] = Field(default_factory=dict, sa_column=Column(JSONB))
- EAV example:
class MoleculeDescriptor(SQLModel, table=True):
molecule_id: int = Field(foreign_key="molecule.id", index=True)
name: str = Field(index=True)
value: float = Field(index=True)
- RDKit + Mordred pipeline sketch:
# Optional installs in Jupyter:
# %pip install rdkit-pypi mordred
from rdkit import Chem
from rdkit.Chem import QED
from mordred import Calculator, descriptors
mol = Chem.MolFromSmiles("c1ccccc1O")
qed = float(QED.qed(mol))
calc = Calculator(descriptors, ignore_3D=True)
md = calc(mol) # returns mapping-like object
desc = {k: float(v) for k,v in md.items() if v is not None and isinstance(v, (int, float))}
from sqlmodel_pg_kit.db import get_session
from sqlmodel import select
with get_session() as s:
m = Molecule(smiles="c1ccccc1O", descriptors=desc)
s.add(m); s.commit(); s.refresh(m)
# filter by a descriptor threshold
# for JSONB: use SQLAlchemy JSON operators or extract into wide columns for hot features
Indexing tips:
- Create B‑Tree indexes on hot numeric columns:
qed,sa_score, etc. - For JSONB, consider GIN indexes with jsonb_path_ops on frequently accessed keys.
- For EAV, add
(name, value)composite indexes and partial indexes for the top N descriptor names.
Jupyter Notebooks
A ready‑to‑run tutorial notebook is included under notebooks/:
notebooks/01_cheminformatics_quickstart.ipynb: end‑to‑end walkthrough (install, configuration, CRUD, joins, optional RDKit/Mordred computation). You can execute it in your Jupyter environment.notebooks/05_cheminformatics.ipynb: a teaching version ofexamples/05_cheminformatics.py, step‑by‑step CRUD and joins tailored for learning.notebooks/01_sync_crud.ipynb: helpers-based create/get/list/update/delete (with optional SQLite override).notebooks/02_bulk_and_filters.ipynb: bulk insert and SQLModel filtering examples.notebooks/03_relationships.ipynb: Team ↔ Hero relationships with eager loading.notebooks/04_async_crud.ipynb: async session CRUD, with optional async SQLite override.notebooks/06_csv_import.ipynb: CSV → SQLModel 自动建模与入库(含 SQLite/PG 切换与筛选查询)。
Typical start in Jupyter:
%pip install -e . pytest # kit + tests
# Optional dependencies for chem pipelines:
# %pip install rdkit-pypi mordred
Micromamba environment
If you already have a micromamba env named sqlmodel:
micromamba activate sqlmodel
jupyter lab # or jupyter notebook
Then open one of the notebooks under notebooks/ to follow along.
Build & Publish
- PyPI build:
uv build - PyPI publish:
uv publish(configure token viauv keyring set ...) - Conda build:
conda build conda/
Notes
- For production, prefer
sslmode=verify-full, least-privileged DB users, and consider PgBouncer. - Bring your own Alembic migrations; set
target_metadata = SQLModel.metadatain env.py.
Tests via Makefile
make test-sqlite: run fast smoke tests on SQLite (no Postgres needed)make test: run all tests found by pytestmake test-pg: run Postgres integration once on current Python (requires SQL_/PG env)make test-pg-once: run Postgres integration across Python 3.10–3.13 (requiresuvand SQL_/PG env)