Files
sqlmodel-pg-kit/README.md
2025-08-17 22:18:45 +08:00

14 KiB
Raw Blame History

sqlmodel-pg-kit

Reusable SQLModel + PostgreSQL kit with src layout, sync/async engines, and generic CRUD repositories. Managed via uv (PEP 621/517) and built with hatchling. Includes minimal tests and a conda recipe.

Features

  • Config dataclass -> builds sync/async URLs
  • Engine + Session factories (sync/async)
  • Generic Repository/AsyncRepository with create/get/list/update/delete/bulk_insert
  • Examples and notebooks for common SQLModel patterns
  • PyPI/conda packaging setup, smoke test

Quickstart (uv)

  • Create venv: uv venv
  • Editable install: uv pip install -e .
  • Run tests: uv pip install pytest && pytest -q

Usage

  • Configure environment (either SQL_* or Postgres PG* vars). Example for container ADDR:
    • export SQL_HOST=192.168.64.8
    • export SQL_PORT=5432
    • export SQL_USER=postgres
    • export SQL_PASSWORD=change-me-strong
    • export SQL_DATABASE=appdb
    • export SQL_SSLMODE=disable

You can perform CRUD with SQLModel without writing raw SQL strings. This kit exposes:

  • create_all() to create tables from models collected in SQLModel.metadata
  • Generic repositories: Repository(Model) and AsyncRepository(Model) for CRUD and bulk insert
  • Session helpers: get_session() and get_async_session() for custom queries using SQLModel/SQLAlchemy expressions

Choose Backend: SQLite or Postgres

  • SQLite (in-memory) quick demo — no environment variables needed:
from typing import Optional
from sqlmodel import SQLModel, Field
from sqlmodel_pg_kit import create_all
from sqlmodel_pg_kit import db  # access engine factory

# Override engine to SQLite in-memory
db.engine = db.create_engine("sqlite:///:memory:", echo=False)

class Hero(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    name: str

create_all()  # creates tables on the current engine (SQLite)
  • SQLite (file) quick demo:
from sqlmodel_pg_kit import db, create_all

db.engine = db.create_engine("sqlite:///./demo.db", echo=False)
create_all()
  • Postgres (recommended for production) via environment variables:
export SQL_HOST=127.0.0.1
export SQL_PORT=5432
export SQL_USER=postgres
export SQL_PASSWORD=change-me-strong
export SQL_DATABASE=appdb
export SQL_SSLMODE=disable  # or require/verify-full as needed

Then your Python code can just call:

from sqlmodel_pg_kit import create_all
create_all()  # uses the default Postgres engine constructed from env
  • Postgres explicit configuration (no env needed):
from sqlmodel_pg_kit import db, create_all

cfg = db.DatabaseConfig(
    host="127.0.0.1", port=5432, user="postgres",
    password="change-me-strong", database="appdb", sslmode="disable",
)
db.engine = db.create_engine(cfg.sync_url(), echo=False)
create_all()

CSV → SQLModel → Table (Autogenerate Model)

Autogenerate a SQLModel class from a CSV header and import the rows.

  • CLI example:
uv run python examples/06_csv_to_sqlmodel.py --csv ./data/molecules.csv --sqlite ./demo.db
# Or with Postgres via env vars:
# export SQL_HOST=...; export SQL_USER=...; ...
# uv run python examples/06_csv_to_sqlmodel.py --csv ./data/molecules.csv
  • In code:
from sqlmodel_pg_kit.csv_import import build_model_from_csv, insert_rows
from sqlmodel_pg_kit import create_all
from sqlmodel_pg_kit.db import get_session

spec, rows = build_model_from_csv("./data/molecules.csv", class_name="Molecules", table_name="molecules")
create_all()
with get_session() as s:
    n = insert_rows(spec.model, rows, s)
print("inserted:", n)

Type inference rules per column: bool if values are true/false/1/0/yes/no; else int; else float; otherwise str. Empty/NA/NaN/None/null become NULL.

Sync CRUD (helpers)

uv run python examples/01_sync_crud.py

Shows create/get/list/update/delete using Repository. Minimal snippet:

from typing import Optional
from sqlmodel import SQLModel, Field
from sqlmodel_pg_kit import create_all, Repository
from sqlmodel_pg_kit.db import get_session

class Hero(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    name: str = Field(index=True)
    age: Optional[int] = None

create_all()
repo = Repository(Hero)
with get_session() as s:
    h = repo.create(s, {"name": "Alice", "age": 20})
    h2 = repo.get(s, h.id)
    page = repo.list(s, page=1, size=5)
    h3 = repo.update(s, h.id, age=21)
    ok = repo.delete(s, h.id)

Bulk insert + filters/pagination

uv run python examples/02_bulk_and_filters.py

Demonstrates Repository.bulk_insert(rows) and filtering with SQLModel expressions:

from typing import Optional
from sqlmodel import select, SQLModel, Field
from sqlmodel_pg_kit import Repository
from sqlmodel_pg_kit.db import get_session

class Hero(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    name: str
    age: Optional[int] = None

with get_session() as s:
    Repository(Hero).bulk_insert(s, [{"name":"PG Hero","age":1},{"name":"PG Hero","age":2}])
    heroes = s.exec(select(Hero).where(Hero.name == "PG Hero")).all()

Relationships (Team <- Hero)

uv run python examples/03_relationships.py

Creates a Team, assigns heroes, and reads back with selectinload eager loading.

Async CRUD

uv run python examples/04_async_crud.py

Uses get_async_session() with AsyncRepository to write/read without raw SQL. You may call create_all() once before async operations.

Tests via Makefile

  • make test-sqlite: run fast smoke tests on SQLite (no Postgres needed)
  • make test: run all tests found by pytest
  • make test-pg: run Postgres integration once on current Python (requires SQL_/PG env)
  • make test-pg-once: run Postgres integration across Python 3.103.13 (requires uv and SQL_/PG env)

Cheminformatics Example

This repo includes a practical multi-table example tailored for cheminformatics workflows: examples/05_cheminformatics.py.

What it shows:

  • Molecules with descriptors: smiles, selfies, qed, sa_score (no RDKit dependency in runtime; compute upstream and store).
  • Many-to-many datasets: Dataset and link table MoleculeDataset for train/holdout/etc.
  • Dataclass interop: lightweight @dataclass MoleculeDTO that converts to SQLModel for fast CRUD.
  • Typical queries: threshold filters, pattern matching, eager-loaded relationships, join filters.

Run it (requires SQL_/PG env):

uv run python examples/05_cheminformatics.py

Step-by-step outline you can adapt:

  1. Define models
class Molecule(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    smiles: str = Field(index=True)
    selfies: Optional[str] = None
    qed: Optional[float] = Field(default=None, index=True)
    sa_score: Optional[float] = Field(default=None, index=True)
    datasets: List[Dataset] = Relationship(back_populates="molecules", link_model=MoleculeDataset)

class Dataset(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    name: str = Field(index=True)
    molecules: List[Molecule] = Relationship(back_populates="datasets", link_model=MoleculeDataset)

class MoleculeDataset(SQLModel, table=True):
    molecule_id: int = Field(foreign_key="molecule.id", primary_key=True)
    dataset_id: int = Field(foreign_key="dataset.id", primary_key=True)
  1. Bring your RDKit pipeline data via dataclass
@dataclass
class MoleculeDTO:
    smiles: str
    selfies: Optional[str] = None
    qed: Optional[float] = None
    sa_score: Optional[float] = None

    def to_model(self) -> Molecule:
        return Molecule(**vars(self))
  1. Fast CRUD
from sqlmodel_pg_kit.db import get_session
create_all()  # once

with get_session() as s:
    s.add(Molecule(smiles="CCO", qed=0.5, sa_score=2.1))
    s.commit()

with get_session() as s:
    m = s.exec(select(Molecule).where(Molecule.smiles == "CCO")).one()
    m.qed = 0.55
    s.add(m); s.commit(); s.refresh(m)

with get_session() as s:
    s.delete(m); s.commit()
  1. Link datasets and join queries
with get_session() as s:
    ds = Dataset(name="train"); s.add(ds); s.commit(); s.refresh(ds)
    m = s.exec(select(Molecule).where(Molecule.smiles == "CCO")).one()
    s.add(MoleculeDataset(molecule_id=m.id, dataset_id=ds.id)); s.commit()

with get_session() as s:
    stmt = (
        select(Molecule)
        .join(MoleculeDataset, Molecule.id == MoleculeDataset.molecule_id)
        .join(Dataset, Dataset.id == MoleculeDataset.dataset_id)
        .where(Dataset.name == "train")
    )
    train_mols = s.exec(stmt).all()
  1. Tips for high-throughput tasks
  • Bulk insert many rows: use SQLAlchemy Core insert(Model).values(list_of_dicts) then s.commit().
  • Paginate: select(Model).offset((page-1)*size).limit(size).
  • Eager-load related rows: .options(selectinload(Model.rel)) to avoid N+1.
  • Partial updates: load row, set fields, commit; or use Core update() when you dont need ORM instances.

See examples/05_cheminformatics.py for a complete, runnable walkthrough.

Cheminformatics: RDKit + Mordred

Below are patterns to integrate RDKit and Mordred descriptor pipelines.

  • Modeling patterns:

    • Wide columns: put highvalue fields directly on Molecule (e.g., qed, sa_score) with BTree indexes for fast filters.
    • JSONB payload: store a large descriptor dict in a JSONB column when you need many descriptors but query only a subset.
    • Normalized table (EAV): MoleculeDescriptor(molecule_id, name, value) when you frequently query and index specific descriptors by name.
  • JSONB example (Postgres):

from typing import Optional, Dict
from sqlmodel import SQLModel, Field
from sqlalchemy import Column
from sqlalchemy.dialects.postgresql import JSONB

class Molecule(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    smiles: str = Field(index=True)
    descriptors: Dict[str, float] = Field(default_factory=dict, sa_column=Column(JSONB))
  • EAV example:
class MoleculeDescriptor(SQLModel, table=True):
    molecule_id: int = Field(foreign_key="molecule.id", index=True)
    name: str = Field(index=True)
    value: float = Field(index=True)
  • RDKit + Mordred pipeline sketch:
# Optional installs in Jupyter:
# %pip install rdkit-pypi mordred

from rdkit import Chem
from rdkit.Chem import QED
from mordred import Calculator, descriptors

mol = Chem.MolFromSmiles("c1ccccc1O")
qed = float(QED.qed(mol))
calc = Calculator(descriptors, ignore_3D=True)
md = calc(mol)  # returns mapping-like object
desc = {k: float(v) for k,v in md.items() if v is not None and isinstance(v, (int, float))}

from sqlmodel_pg_kit.db import get_session
from sqlmodel import select

with get_session() as s:
    m = Molecule(smiles="c1ccccc1O", descriptors=desc)
    s.add(m); s.commit(); s.refresh(m)
    # filter by a descriptor threshold
    # for JSONB: use SQLAlchemy JSON operators or extract into wide columns for hot features

Indexing tips:

  • Create BTree indexes on hot numeric columns: qed, sa_score, etc.
  • For JSONB, consider GIN indexes with jsonb_path_ops on frequently accessed keys.
  • For EAV, add (name, value) composite indexes and partial indexes for the top N descriptor names.

Jupyter Notebooks

A readytorun tutorial notebook is included under notebooks/:

  • notebooks/01_cheminformatics_quickstart.ipynb: endtoend walkthrough (install, configuration, CRUD, joins, optional RDKit/Mordred computation). You can execute it in your Jupyter environment.
  • notebooks/05_cheminformatics.ipynb: a teaching version of examples/05_cheminformatics.py, stepbystep CRUD and joins tailored for learning.
  • notebooks/01_sync_crud.ipynb: helpers-based create/get/list/update/delete (with optional SQLite override).
  • notebooks/02_bulk_and_filters.ipynb: bulk insert and SQLModel filtering examples.
  • notebooks/03_relationships.ipynb: Team ↔ Hero relationships with eager loading.
  • notebooks/04_async_crud.ipynb: async session CRUD, with optional async SQLite override.
  • notebooks/06_csv_import.ipynb: CSV → SQLModel 自动建模与入库(含 SQLite/PG 切换与筛选查询)。

Typical start in Jupyter:

%pip install -e . pytest  # kit + tests
# Optional dependencies for chem pipelines:
# %pip install rdkit-pypi mordred

Micromamba environment

If you already have a micromamba env named sqlmodel:

micromamba activate sqlmodel
jupyter lab  # or jupyter notebook

Then open one of the notebooks under notebooks/ to follow along.

Build & Publish

  • PyPI build: uv build
  • PyPI publish: uv publish (configure token via uv keyring set ...)
  • Conda build: conda build conda/

Notes

  • For production, prefer sslmode=verify-full, least-privileged DB users, and consider PgBouncer.
  • Bring your own Alembic migrations; set target_metadata = SQLModel.metadata in env.py.

apple container start command

# 停止并删除现有容器  
container stop pg-app  
container delete pg-app  
  
container run \
  --name pg-app \
  --detach \
  --mount type=tmpfs,target=/var/lib/postgresql/data \
  --volume ./docker-init:/docker-entrypoint-initdb.d \
  --env POSTGRES_USER=postgres \
  --env POSTGRES_PASSWORD=change-me-strong \
  --env POSTGRES_INITDB_ARGS="--encoding=UTF8 --locale=en_US.UTF-8" \
  postgres:16

container ls --all
container logs pg-app

set environment variables

export SQL_HOST=192.168.64.8
export SQL_PORT=5432
export SQL_USER=postgres
export SQL_PASSWORD=change-me-strong
export SQL_DATABASE=appdb
export SQL_SSLMODE=disable

make test with uv

测试连接 现在您可以使用psql测试PostgreSQL连接

psql -h $SQL_HOST -p $SQL_PORT -U $SQL_USER -d $SQL_DATABASE -c '\l'
make test-pg-once

Verify data with psql

  • List tables: \dt
  • Inspect example tables (depending on the example you ran), e.g.:
    • SELECT * FROM hero ORDER BY id DESC LIMIT 10;
    • SELECT * FROM dataset ORDER BY id DESC LIMIT 10;