Issues with serializability and database storage in Python

Issues with serializability and database storage in Python
Photo by Nguyen Khiem / Unsplash

This article might not really apply to Java/Kotlin developers, seeing as Spring+Hibernate handles much of the heavy lifting for this issue.

I realized that python developers don't have a good idiomatic way of baking in database persistence into their applications. While writing my LLM-Sonder application (a library to enable LLM Agents to talk to each other), I wrote it code-first, with database and serializability taking a back seat.

I wrote code to enable users to create LLM Agent conversations by instantiating classes – ConversationWithAgents(participants = ...) However, when exposing the framework via a web server, I find myself rewriting some classes when trying to write serializable forms.

It arises again when trying to make the relationships savable in databases.

Let’s define the three forms of data we will be talking about:

  • Relationship as classes and types in code — Object
  • Sendable over the wire — Serializable
  • Savable in a database — Persistence

The two forms (serializable and persistence) are sometimes correlated:

  1. A serialized object might contain nested objects. We might want to avoid sending the entire nested object over the wire, just an id for the object. This would be the foreign key of the relational structure.
  2. As serialized data comes in, we would need to deserialize and persist the data. It would be easiest if the serialized form exactly maps to the persistence form, so that no conversion will have to be made.

However, the three forms usually diverge too. For example, Many to Many relationships is expressed differently in object form (code) and persistence form (SQL if relational).

class ConversationWithAgents:
	participants: list[ConversationParticipants]
	start_time: datetime
	end_time: datetime
-- Conversation table
CREATE TABLE ConversationWithAgents(
    conversation_id INT PRIMARY KEY,
    start_time TIMESTAMP,
    end_time TIMESTAMP
);

-- Participant types table
CREATE TABLE ParticipantType (
    type_id INT PRIMARY KEY,
    type_name VARCHAR(50) NOT NULL
);

-- Participant table 
CREATE TABLE ConversationParticipants(
    participant_id INT PRIMARY KEY,
    type_id INT NOT NULL,
    name VARCHAR(100),
    FOREIGN KEY (type_id) REFERENCES ParticipantType(type_id)
);

-- Junction table for Conversation-Participant relationship
CREATE TABLE ConversationParticipant (
    conversation_id INT,
    participant_id INT,
    join_time TIMESTAMP,
    leave_time TIMESTAMP,
    PRIMARY KEY (conversation_id, participant_id),
    FOREIGN KEY (conversation_id) REFERENCES Conversation(conversation_id),
    FOREIGN KEY (participant_id) REFERENCES Participant(participant_id)
);

The ideal state would be to define a single class and have all three forms available for free, e.g. ConversationWithAgents(participants=...).to_db() and only defining custom representations when less/more data is required (e.g. when sending data from a webserver to the frontend, you might want to remove some sensitive/irrelevant fields. This would be the Data Transfer Objects (DTOs) pattern.)

ORMs

The common suggestion/implementation for persistence in python is using an ORM framework like SQLAlchemy. This is mostly great, but the issue is that classes are no longer declared in native python types.

from sqlalchemy import Column, Integer, String, DateTime, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class ConversationParticipant(Base):
    __tablename__ = 'conversation_participants'

    id = Column(Integer, primary_key=True)
    name = Column(String)  # or any other relevant fields
    conversation_id = Column(Integer, ForeignKey('conversations_with_agents.id'))

class ConversationWithAgents(Base):
    __tablename__ = 'conversations_with_agents'

    id = Column(Integer, primary_key=True)
    start_time = Column(DateTime)
    end_time = Column(DateTime)
    participants = relationship("ConversationParticipant", back_populates="conversation")

ConversationParticipant.conversation = relationship("ConversationWithAgents", back_populates="participants")

This was an issue because I wanted to publish a python package that users can import and use with their other python code. This makes the package less interoperable with other packages.

ORMs provide the object and persistence forms out of the box, 2 out of the 3 forms that we will require. To get a serializable form, you’ll typically have to bring your own.

for row in resultproxy:
    row_as_dict = row._mapping  # SQLAlchemy 1.4 and greater
    # row_as_dict = dict(row)  # SQLAlchemy 1.3 and earlier

Dataclasses

Python dataclasses are handy for their built in serializability. You get the object and serializable forms, but no persistence.

from dataclasses import dataclass, asdict
import json
import sqlite3
from typing import List, Optional

   @dataclass
   class Message:
       content: str
       sender: str
       timestamp: float
       id: Optional[int] = None

   @dataclass
   class Conversation:
       title: str
       messages: List[Message]
       id: Optional[int] = None

       def to_json(self) -> str:
           return json.dumps(asdict(self))

       @classmethod
       def from_json(cls, json_str: str) -> 'Conversation':
           data = json.loads(json_str)
           data['messages'] = [Message(**msg) for msg in data['messages']]
           return cls(**data)

       def save_to_db(self, conn: sqlite3.Connection):
           cursor = conn.cursor()
           cursor.execute('''
               INSERT INTO conversations (title) VALUES (?)
           ''', (self.title,))
           self.id = cursor.lastrowid

           for msg in self.messages:
               cursor.execute('''
                   INSERT INTO messages (conversation_id, content, sender, timestamp)
                   VALUES (?, ?, ?, ?)
               ''', (self.id, msg.content, msg.sender, msg.timestamp))

       @classmethod
       def load_from_db(cls, conn: sqlite3.Connection, id: int) -> 'Conversation':
           cursor = conn.cursor()
           cursor.execute('SELECT title FROM conversations WHERE id = ?', (id,))
           title = cursor.fetchone()[0]

           cursor.execute('SELECT content, sender, timestamp FROM messages WHERE conversation_id = ?', (id,))
           messages = [Message(content=row[0], sender=row[1], timestamp=row[2]) for row in cursor.fetchall()]

           return cls(title=title, messages=messages, id=id)

How Hibernate handles this

Now, it seems like Java developers don't have to worry about any of this. Hibernate allows the creation of a single object that handles all three forms.

import javax.persistence.*;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.List;

import lombok.Getter;
import lombok.Setter;

@Entity
@Table(name = "conversations_with_agents")
@Getter @Setter
public class ConversationWithAgents {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(name = "start_time")
    private LocalDateTime startTime;

    @Column(name = "end_time")
    private LocalDateTime endTime;

    @OneToMany(mappedBy = "conversation", cascade = CascadeType.ALL, orphanRemoval = true)
    private List<ConversationParticipant> participants = new ArrayList<>();

    // Helper method to add a participant
    public void addParticipant(ConversationParticipant participant) {
        participants.add(participant);
        participant.setConversation(this);
    }
}

@Entity
@Table(name = "conversation_participants")
@Getter @Setter
public class ConversationParticipant {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String name;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "conversation_id")
    private ConversationWithAgents conversation;
}

Usage example:

import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;

public class Main {
    public static void main(String[] args) {
        EntityManagerFactory emf = Persistence.createEntityManagerFactory("your-persistence-unit");
        EntityManager em = emf.createEntityManager();

        em.getTransaction().begin();

        ConversationWithAgents conversation = new ConversationWithAgents();
        conversation.setStartTime(LocalDateTime.now());

        ConversationParticipant participant1 = new ConversationParticipant();
        participant1.setName("Agent1");
        ConversationParticipant participant2 = new ConversationParticipant();
        participant2.setName("Agent2");

        conversation.addParticipant(participant1);
        conversation.addParticipant(participant2);

        em.persist(conversation);
        em.getTransaction().commit();

        // Query
        ConversationWithAgents retrievedConversation = em.find(ConversationWithAgents.class, conversation.getId());
        System.out.println("Conversation started at: " + retrievedConversation.getStartTime());
        for (ConversationParticipant participant : retrievedConversation.getParticipants()) {
            System.out.println("Participant: " + participant.getName());
        }

        em.close();
        emf.close();
    }
}