# sqlmodel-pg-kit Reusable SQLModel + PostgreSQL kit with src layout, sync/async engines, and generic CRUD repositories. Managed via uv (PEP 621/517) and built with hatchling. Includes minimal tests and a conda recipe. ## Features - Config dataclass -> builds sync/async URLs - Engine + Session factories (sync/async) - Generic `Repository`/`AsyncRepository` with create/get/list/update/delete/bulk_insert - Examples and notebooks for common SQLModel patterns - PyPI/conda packaging setup, smoke test ## Quickstart (uv) - Create venv: `uv venv` - Editable install: `uv pip install -e .` - Run tests: `uv pip install pytest && pytest -q` ## Local Postgres (Docker) - Start the container: `docker compose up -d` - Stop when done: `docker compose down` - Default credentials match `DatabaseConfig`: user `appuser`, password `changeme`, database `appdb` - Export `SQL_SSLMODE=disable` (container does not use TLS by default) - Port: 5433 (mapped from container 5432) ## Usage - Configure environment (either `SQL_*` or Postgres `PG*` vars). Example for container ADDR: - `export SQL_HOST=192.168.64.8` - `export SQL_PORT=5432` - `export SQL_USER=postgres` - `export SQL_PASSWORD=change-me-strong` - `export SQL_DATABASE=appdb` - `export SQL_SSLMODE=disable` You can perform CRUD with SQLModel without writing raw SQL strings. This kit exposes: - `create_all()` to create tables from models collected in `SQLModel.metadata` - Generic repositories: `Repository(Model)` and `AsyncRepository(Model)` for CRUD and bulk insert - Session helpers: `get_session()` and `get_async_session()` for custom queries using SQLModel/SQLAlchemy expressions ### Choose Backend: SQLite or Postgres - SQLite (in-memory) quick demo — no environment variables needed: ```python from typing import Optional from sqlmodel import SQLModel, Field from sqlmodel_pg_kit import create_all from sqlmodel_pg_kit import db # access engine factory # Override engine to SQLite in-memory db.engine = db.create_engine("sqlite:///:memory:", echo=False) class Hero(SQLModel, table=True): id: Optional[int] = Field(default=None, primary_key=True) name: str create_all() # creates tables on the current engine (SQLite) ``` - SQLite (file) quick demo: ```python from sqlmodel_pg_kit import db, create_all db.engine = db.create_engine("sqlite:///./demo.db", echo=False) create_all() ``` - Postgres (recommended for production) via environment variables: ```bash export SQL_HOST=127.0.0.1 export SQL_PORT=5432 export SQL_USER=postgres export SQL_PASSWORD=change-me-strong export SQL_DATABASE=appdb export SQL_SSLMODE=disable # or require/verify-full as needed ``` Then your Python code can just call: ```python from sqlmodel_pg_kit import create_all create_all() # uses the default Postgres engine constructed from env ``` - Postgres explicit configuration (no env needed): ```python from sqlmodel_pg_kit import db, create_all cfg = db.DatabaseConfig( host="127.0.0.1", port=5432, user="postgres", password="change-me-strong", database="appdb", sslmode="disable", ) db.engine = db.create_engine(cfg.sync_url(), echo=False) create_all() ``` ### CSV → SQLModel → Table (Auto‑generate Model) Auto‑generate a SQLModel class from a CSV header and import the rows. - CLI example: ```bash uv run python examples/06_csv_to_sqlmodel.py --csv ./data/molecules.csv --sqlite ./demo.db # Or with Postgres via env vars: # export SQL_HOST=...; export SQL_USER=...; ... # uv run python examples/06_csv_to_sqlmodel.py --csv ./data/molecules.csv ``` - In code: ```python from sqlmodel_pg_kit.csv_import import build_model_from_csv, insert_rows from sqlmodel_pg_kit import create_all from sqlmodel_pg_kit.db import get_session spec, rows = build_model_from_csv("./data/molecules.csv", class_name="Molecules", table_name="molecules") create_all() with get_session() as s: n = insert_rows(spec.model, rows, s) print("inserted:", n) ``` Type inference rules per column: bool if values are true/false/1/0/yes/no; else int; else float; otherwise str. Empty/NA/NaN/None/null become NULL. ### Sync CRUD (helpers) ```bash uv run python examples/01_sync_crud.py ``` Shows create/get/list/update/delete using Repository. Minimal snippet: ```python from typing import Optional from sqlmodel import SQLModel, Field from sqlmodel_pg_kit import create_all, Repository from sqlmodel_pg_kit.db import get_session class Hero(SQLModel, table=True): id: Optional[int] = Field(default=None, primary_key=True) name: str = Field(index=True) age: Optional[int] = None create_all() repo = Repository(Hero) with get_session() as s: h = repo.create(s, {"name": "Alice", "age": 20}) h2 = repo.get(s, h.id) page = repo.list(s, page=1, size=5) h3 = repo.update(s, h.id, age=21) ok = repo.delete(s, h.id) ``` For a container-backed workflow, run `uv run python examples/07_postgres_minimal.py` (paired with `notebooks/07_postgres_minimal.ipynb`). That walkthrough now mirrors a REST-style request cycle: - wipes existing demo rows so each run is deterministic - upserts seed rows via `Repository.bulk_insert` - paginates filtered results with `Repository.list(... where=..., order_by=..., page=..., size=...)` - performs fuzzy search using `select(...).where(Model.name.ilike(...))` - counts inventory with `session.exec(select(func.count(...))).scalar()` (compatible with SQLAlchemy 1.4/2.x) - updates and deletes with the repository helpers you would call from PATCH/DELETE handlers ### Bulk insert + filters/pagination ```bash uv run python examples/02_bulk_and_filters.py ``` Demonstrates `Repository.bulk_insert(rows)` and filtering with SQLModel expressions: ```python from typing import Optional from sqlmodel import select, SQLModel, Field from sqlmodel_pg_kit import Repository from sqlmodel_pg_kit.db import get_session class Hero(SQLModel, table=True): id: Optional[int] = Field(default=None, primary_key=True) name: str age: Optional[int] = None with get_session() as s: Repository(Hero).bulk_insert(s, [{"name":"PG Hero","age":1},{"name":"PG Hero","age":2}]) heroes = s.exec(select(Hero).where(Hero.name == "PG Hero")).all() ``` ### Relationships (Team <- Hero) ```bash uv run python examples/03_relationships.py ``` Creates a `Team`, assigns heroes, and reads back with `selectinload` eager loading. ### Async CRUD ```bash uv run python examples/04_async_crud.py ``` Uses `get_async_session()` with `AsyncRepository` to write/read without raw SQL. You may call `create_all()` once before async operations. ## Tests via Makefile - `make test-sqlite`: run fast smoke tests on SQLite (no Postgres needed) - `make test`: run all tests found by pytest - `make test-pg`: run Postgres integration once on current Python (requires SQL_*/PG* env) - `make test-pg-once`: run Postgres integration across Python 3.10–3.13 (requires `uv` and SQL_*/PG* env) ## Cheminformatics Example This repo includes a practical multi-table example tailored for cheminformatics workflows: `examples/05_cheminformatics.py`. What it shows: - Molecules with descriptors: `smiles`, `selfies`, `qed`, `sa_score` (no RDKit dependency in runtime; compute upstream and store). - Many-to-many datasets: `Dataset` and link table `MoleculeDataset` for train/holdout/etc. - Dataclass interop: lightweight `@dataclass MoleculeDTO` that converts to SQLModel for fast CRUD. - Typical queries: threshold filters, pattern matching, eager-loaded relationships, join filters. Run it (requires SQL_*/PG* env): ```bash uv run python examples/05_cheminformatics.py ``` Step-by-step outline you can adapt: 1) Define models ```python class Molecule(SQLModel, table=True): id: Optional[int] = Field(default=None, primary_key=True) smiles: str = Field(index=True) selfies: Optional[str] = None qed: Optional[float] = Field(default=None, index=True) sa_score: Optional[float] = Field(default=None, index=True) datasets: List[Dataset] = Relationship(back_populates="molecules", link_model=MoleculeDataset) class Dataset(SQLModel, table=True): id: Optional[int] = Field(default=None, primary_key=True) name: str = Field(index=True) molecules: List[Molecule] = Relationship(back_populates="datasets", link_model=MoleculeDataset) class MoleculeDataset(SQLModel, table=True): molecule_id: int = Field(foreign_key="molecule.id", primary_key=True) dataset_id: int = Field(foreign_key="dataset.id", primary_key=True) ``` 2) Bring your RDKit pipeline data via dataclass ```python @dataclass class MoleculeDTO: smiles: str selfies: Optional[str] = None qed: Optional[float] = None sa_score: Optional[float] = None def to_model(self) -> Molecule: return Molecule(**vars(self)) ``` 3) Fast CRUD ```python from sqlmodel_pg_kit.db import get_session create_all() # once with get_session() as s: s.add(Molecule(smiles="CCO", qed=0.5, sa_score=2.1)) s.commit() with get_session() as s: m = s.exec(select(Molecule).where(Molecule.smiles == "CCO")).one() m.qed = 0.55 s.add(m); s.commit(); s.refresh(m) with get_session() as s: s.delete(m); s.commit() ``` 4) Link datasets and join queries ```python with get_session() as s: ds = Dataset(name="train"); s.add(ds); s.commit(); s.refresh(ds) m = s.exec(select(Molecule).where(Molecule.smiles == "CCO")).one() s.add(MoleculeDataset(molecule_id=m.id, dataset_id=ds.id)); s.commit() with get_session() as s: stmt = ( select(Molecule) .join(MoleculeDataset, Molecule.id == MoleculeDataset.molecule_id) .join(Dataset, Dataset.id == MoleculeDataset.dataset_id) .where(Dataset.name == "train") ) train_mols = s.exec(stmt).all() ``` 5) Tips for high-throughput tasks - Bulk insert many rows: use SQLAlchemy Core `insert(Model).values(list_of_dicts)` then `s.commit()`. - Paginate: `select(Model).offset((page-1)*size).limit(size)`. - Eager-load related rows: `.options(selectinload(Model.rel))` to avoid N+1. - Partial updates: load row, set fields, commit; or use Core `update()` when you don’t need ORM instances. See `examples/05_cheminformatics.py` for a complete, runnable walkthrough. ## Cheminformatics: RDKit + Mordred Below are patterns to integrate RDKit and Mordred descriptor pipelines. - Modeling patterns: - Wide columns: put high‑value fields directly on `Molecule` (e.g., `qed`, `sa_score`) with B‑Tree indexes for fast filters. - JSONB payload: store a large descriptor dict in a JSONB column when you need many descriptors but query only a subset. - Normalized table (EAV): `MoleculeDescriptor(molecule_id, name, value)` when you frequently query and index specific descriptors by name. - JSONB example (Postgres): ```python from typing import Optional, Dict from sqlmodel import SQLModel, Field from sqlalchemy import Column from sqlalchemy.dialects.postgresql import JSONB class Molecule(SQLModel, table=True): id: Optional[int] = Field(default=None, primary_key=True) smiles: str = Field(index=True) descriptors: Dict[str, float] = Field(default_factory=dict, sa_column=Column(JSONB)) ``` - EAV example: ```python class MoleculeDescriptor(SQLModel, table=True): molecule_id: int = Field(foreign_key="molecule.id", index=True) name: str = Field(index=True) value: float = Field(index=True) ``` - RDKit + Mordred pipeline sketch: ```python # Optional installs in Jupyter: # %pip install rdkit-pypi mordred from rdkit import Chem from rdkit.Chem import QED from mordred import Calculator, descriptors mol = Chem.MolFromSmiles("c1ccccc1O") qed = float(QED.qed(mol)) calc = Calculator(descriptors, ignore_3D=True) md = calc(mol) # returns mapping-like object desc = {k: float(v) for k,v in md.items() if v is not None and isinstance(v, (int, float))} from sqlmodel_pg_kit.db import get_session from sqlmodel import select with get_session() as s: m = Molecule(smiles="c1ccccc1O", descriptors=desc) s.add(m); s.commit(); s.refresh(m) # filter by a descriptor threshold # for JSONB: use SQLAlchemy JSON operators or extract into wide columns for hot features ``` Indexing tips: - Create B‑Tree indexes on hot numeric columns: `qed`, `sa_score`, etc. - For JSONB, consider GIN indexes with jsonb_path_ops on frequently accessed keys. - For EAV, add `(name, value)` composite indexes and partial indexes for the top N descriptor names. ## Jupyter Notebooks A ready‑to‑run tutorial notebook is included under `notebooks/`: - `notebooks/01_cheminformatics_quickstart.ipynb`: end‑to‑end walkthrough (install, configuration, CRUD, joins, optional RDKit/Mordred computation). You can execute it in your Jupyter environment. - `notebooks/05_cheminformatics.ipynb`: a teaching version of `examples/05_cheminformatics.py`, step‑by‑step CRUD and joins tailored for learning. - `notebooks/01_sync_crud.ipynb`: helpers-based create/get/list/update/delete (with optional SQLite override). - `notebooks/02_bulk_and_filters.ipynb`: bulk insert and SQLModel filtering examples. - `notebooks/03_relationships.ipynb`: Team ↔ Hero relationships with eager loading. - `notebooks/04_async_crud.ipynb`: async session CRUD, with optional async SQLite override. - `notebooks/06_csv_import.ipynb`: CSV → SQLModel 自动建模与入库(含 SQLite/PG 切换与筛选查询)。 Typical start in Jupyter: ```python %pip install -e . pytest # kit + tests # Optional dependencies for chem pipelines: # %pip install rdkit-pypi mordred ``` ### Micromamba environment If you already have a micromamba env named `sqlmodel`: ```bash micromamba activate sqlmodel jupyter lab # or jupyter notebook ``` Then open one of the notebooks under `notebooks/` to follow along. ## Build & Publish - PyPI build: `uv build` - PyPI publish: `uv publish` (configure token via `uv keyring set ...`) - Conda build: `conda build conda/` ## Notes - For production, prefer `sslmode=verify-full`, least-privileged DB users, and consider PgBouncer. - Bring your own Alembic migrations; set `target_metadata = SQLModel.metadata` in env.py. ## Tests via Makefile - `make test-sqlite`: run fast smoke tests on SQLite (no Postgres needed) - `make test`: run all tests found by pytest - `make test-pg`: run Postgres integration once on current Python (requires SQL_*/PG* env) - `make test-pg-once`: run Postgres integration across Python 3.10–3.13 (requires `uv` and SQL_*/PG* env)