Simplifying Serialization with Marshmallow: An Introduction with SQLAlchemy

Introduction

Serialization is the conversion of an object into an easily transported form. When building out a full-stack application, it acts as a translator and allows data to flow smoothly between the frontend, backend, and any other component. Serialization is an essential tool to allow components to communicate and exchange information between each other.

JSON Serialization typically converts objects into JSON and is used for data interaction between the server and client over a network. SQLAlchemy Serialization is more database-focused and will convert SQLAlchemy models or database query results into a format suitable for storage or transmission (usually JSON). SQLAlchemy also provides built-in serialization methods for working with an ORM system, allowing for easier conversion from query results to JSON.

While both of these serialization methods can be incredibly useful, as data structures become more complex, libraries have become increasingly popular among developers, as they can streamline serialization. Marshmallow is a great example of a Python library that converts various data types into Python and vice-versa. This blog will delve deeper into the comparison between Marshmallow and SQLAlchemy Serialization.

The Challenges of Serialization

Data serialization is a crucial component when connecting different components and services across networks. Data needs to be converted quickly and efficiently into an easily transmittable and reconstructable format. Given the vast amounts of data that are constantly exchanged, there are some challenges that developers face when serializing data.

Divergence of Data Formats: When the backend and the frontend interact, there are many diverse sets of data formats like JSON, XML, etc. Serialization needs to be able to translate between these different formats.
Nested Structures and Relationships: In the case of relational databases, like those you would work with using SQLAlchemy, nested data structures add new complexities. When serializing and de-serializing relationships between objects or tables, it is easy to run into recursion errors from nested data.
Performance: As databases become more complex and there is an increase in the flow of data, inefficient serialization can impact system performance. This can decrease the speed of a system and overall responsiveness.
Time: Simplifying serialization processes can increase the capacity and efficiency of the developer, thus reducing development overheads. Streamlined serialization processes promote faster exchange of data.
Compatibility Across Technology: In a full-stack application, systems will utilize diverse technologies like RESTful APIs, databases, or other components. This can lead to complications in serialization as data needs to be serialized in a consistent and replicable process.

Overview of Serialization Tools

In Python, there are a few commonly used libraries to streamline serialization like Json, Marshmallow, SQLAlchemy-serializer, etc. In the context of using a web framework like Flask RESTful API with SQLAlchemy serialization, a developer would either utilize a library or create serialization methods to convert Python objects to easily transmitted data like JSON. For clarification, Flask RESTful API only facilitates the creation of RESTful APIs but does not provide built-in tools for data serialization.

Manual Serialization Process

Define Models: First, you define data models to determine the structure of the database. In this example, a User has many Communities, so a One-to-Many relationship.

   class User(db.Model):
       id = db.Column(db.Integer, primary_key=True)
       username = db.Column(db.String, nullable=False, unique=True)
       email = db.Column(db.String, nullable=False, unique=True)

       #relationships
       communities = db.relationship("Community", back_populates="owner", cascade="all, delete-orphan")

   class Community(db.Model):
       id = db.Column(db.Integer, primary_key=True)
       name = db.Column(db.String, nullable=False)
       description = db.Column(db.String, nullable=False)
       user_id = db.Column(db.Integer, db.ForeignKey("users.id", ondelete="CASCADE"))

       #relationships
       user = db.relationship("User", back_populates="communities")

 from flask_sqlalchemy import SQLAlchemy

 app = Flask(__name__)
 app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:///app.db"
 db = SQLAlchemy(app)

Create Schemas/Build Methods: Without using a library, developers will often create serialization schemas to convert data to and from JSON. You could also build serialization methods within the class models. Below is an example of a schema that creates a to_dict() and from_dict() method to serialize data as it is passed from server to client.

 class UserSchema:
     @staticmethod
     def to_dict(user):
         return {
             'id': user.id,
             'username': user.username,
             'email': user.email,
             'communities': [CommunitySchema.to_dict(community) for community in user.communities]
         }

     @staticmethod
     def from_dict(data):
         return User(
             username=data['username'],
             email=data['email'],
             communities=[CommunitySchema.from_dict(community_data) for community_data in data.get('communities', [])]
         )

 class CommunitySchema:
     @staticmethod
     def to_dict(community):
         return {
             'id': community.id,
             'name': community.name,
             'description': community.description,
             'user_id': community.user_id
         }

     @staticmethod
     def from_dict(data):
         return Community(
             name=data['name'],
             description=data['description'],
             user_id=data['user_id']
         )

The to_dict() method is used to serialize the data, converting the JSON object into a dictionary, and the from_dict() method deserializes, converting the dictionary to a JSON object. You can exclude the items you want to return by changing the keys in the return dictionary.

Using the Schema: In the Flask route, using RESTful APIs, when a request is made from the client side, schemas are used to convert the data, and then send it back to the client in a more readable form. In the example below, the to_dict() method is used to convert the requested user data to a dictionary.


 from models import db, User, Community
 from flask_restful import Api, Resource
 from flask_migrate import Migrate
 from flask import Flask, jsonify
 from user_schema import UserSchema

 migrate = Migrate(app, db)
 db.init_app(app)
 api = Api(app)

 class Users(Resource):
     def get(self):
         users = [UserSchema.to_dict(u) for u in Users.query]
         return users, 200
 api.add_resource(Users, "/users")

 ...

Using SQLAlchemy-serializer: SQLAlchemy-serializer is a mixin that allows for SQLAlchemy model serialization, without the need to build out functions like to_dict() and from_dict(). Instead, these methods are provided and can be combined with methods and key phrases to exclude or add extra fields. Below is a simple example utilizing our User and Community model.

 from sqlalchemy_serializer import SerializerMixin;

 class User(db.Model):
     id = db.Column(db.Integer, primary_key=True)
     username = db.Column(db.String, nullable=False, unique=True)
     email = db.Column(db.String, nullable=False, unique=True)

     #relationships
     communities = db.relationship("Community", back_populates="owner", cascade="all, delete-orphan")
     #can be used within the model to prevent recursion errors
     serialize_rules = ('-communities.user')

 class Community(db.Model):
     id = db.Column(db.Integer, primary_key=True)
     name = db.Column(db.String, nullable=False)
     description = db.Column(db.String, nullable=False)
     user_id = db.Column(db.Integer, db.ForeignKey("users.id", ondelete="CASCADE"))

     #relationships
     user = db.relationship("User", back_populates="communities")
     serialize_rules = ('-user.communities')

 class Users(Resource):
     def get(self):
         users = [u.to_dict(only=("username", "email")) for u in User.query]

In addition to serialize_rules in the model, this library also has an argument that can restrict which fields you want to access. Although this is a great library to use, Marshmallow provides a more versatile and flexible serialization library that can be used with more than just SQLAlchemy.

Enter Marshmallow

Marshmallow is a Python library that is used for serialization and deserialization. Unlike more traditional serialization methods, Marshmallow simplifies the serialization process by declaring data schemas. Schema classes can be created to specify how objects should be serialized and deserialized. This can make your code more concise, as you no longer need to build your serialization methods. Furthermore, schemas will usually be built in a separate file from the model, leading to increased organization and maintainability of your code.

Although this blog is focusing more on SQLAlchemy, Marshmallow is also much more versatile than other libraries. It can handle a variety of data types from simple Python data types to complex nested structures, or can integrate with frameworks like Flask. This makes this an incredibly useful library that can be suitable for a wide range of applications.

Marshmallow has many powerful capabilities, so many that this blog will not be able to cover all of them. Instead, in the next section we will take a closer look at some of the most notable features within its schema definition.

Marshmallow - Powerful Features in Schema

Nested Fields and Recursive Serialization

Example:

from marshmallow import Schema, fields
from schemas.community_schema import CommunitySchema

class UserSchema(ma.SQLAlchemySchema):
    class Meta():
        model = User
        load_instance = True
        fields = ["id", "username", "email", "communities"]
    communities = fields.List(fields.Nested(CommunitySchema(only="name"))

Marshmallow allows you to gather nested information, as long as the other model has a foreign key relating to the parent model. This is an example where you could easily grab the communities belonging to each User, without having to query all of the communities
Furthermore, you can explicitly specify which attributes you want to serialize or deserialize, using only and exclude. In this case, we want to have a list of just the name of the community the User owns.

Validation and Deserialization

Example:

from marshmallow import fields, validate, validates, ValidationError
from models.user import User
from app_setup import ma

class UserSchema(ma.SQLAlchemySchema):
    class Meta():
        model = User
        load_instance = True
        fields = ['id', 'username', 'email']

    username = fields.String(required=True, validate=validate.Length(min=3, max=20))
    email = fields.String(required=True, validate=validate.Length(min=2, max=256))

Marshmallow also supports validation during deserialization, which adds another layer of protection to your database. In this example, the length of the input is being validated.

Field Level Preprocessing

Example:

...
#change the data before it is serialized
username = fields.String(required=True, preprocess=convert_to_uppercase)

#or use this decorator
@ma.pre_dump
def preprocess_data(self, data, **kwargs):
        data['username'] = data['username'].lower() if 'username' in data else None
        data['email'] = data['email'].lower() if 'email' in data else None
        return data

Here you can see two different ways to preprocess data before it is serialized. In the first method, before the username is added to the database, it is first converted to all uppercase. This is applying a specific preprocessing function directly to the value of that field during serialization. In addition to this example, there are other conversions like lowercase, capitalize, title, convert to list, parse a string to a datetime object, etc. You can check out the full documentation for Marshmallow to learn more about the different abilities.
The @ma.pre_dump decorator is used before the serialization process begins and is used to define a method in the schema. This lets the user manipulate the entire data structure before serialization. It has a broader scope than the preprocess parameter.

Polymorphic Serialization

Example:

...

class CommunitySchema(ma.SQLAlchemySchema):
    id = fields.Integer()
    name = fields.String()

class BakingCommSchema(CommunitySchema):
    baked_good = fields.String()

class SportCommSchema(CommunitySchema):
    sport = fields.String()

class UserSchema(ma.SQLAlchemySchema):
    id = fields.Integer()
    username = fields.String()
    email = fields.Email()

    communities = fields.List(fields.Nested(CommunitySchema))

The CommunitySchema acts as the base schema that will define common fields shared by all of the other communities. In this case, that would be the name and id. Then you have subschemas, BakingCommSchema and SportCommSchema that will inherit from the CommunitySchema and add additional type-specific fields.
Then, in UserSchema you can have a single field to represent communities of different types that belong to a specific user. During serialization, Marshmallow will automatically select the appropriate subschema based on the actual types present in the communities list. For example, if a community has a baked_good field, then BakingCommSchema will be used to serialize that community
Using polymorphic serialization eliminates the need for complex conditional checks in serialization code. As you extend the community class to include new subtypes, you can dynamically scale to update the schemas. Furthermore, the code will be more consistent and have more readability for the user when displaying this information on the frontend

Custom Field Types

Example:

...

class UserSchema(ma.SQLAlchemySchema):
    class Meta():
        model = User
        load_instance = True
        fields = ['id', 'username', 'email', 'interests']

    username = fields.String(required=True, validate=validate.Length(min=3, max=20))
    email = fields.String(required=True, validate=validate.Length(min=2, max=256))
    #validate through validate
    interests = fields.List(fields.String(), validate=validate.Length(min=1))
    #use the custom field
    interests = ListOfStringsField()
    #create a custom field type
 class ListOfStringsField(fields.Field)
      def _validate(self, value):
              if not isinstance(value, list):
                  raise ValidationError("Field must be a list.")
              for item in value:
                  if not isinstance(item, str):
                       raise ValidationError("All items in the list must be strings.")

      def _serialize(self, value, attr, obj, **kwargs):
          return value

Marshmallow allows you to create custom field types by subclassing fields.Fields and utilizing custom serialization and deserialization logic. This is incredibly useful when you want to add specification to data validation, nesting schemas, preprocessing data, and supporting polymorphism. This is a simple example that compares validate to this more flexible approach.
The validate method performs validations on the field's value during deserialization. This makes sure the field is a list and that the items in the list are strings. The _serialize method returns a serialized value. In this simple example, it just returns the original value.
This can be more useful than a validate parameter as it increases the scope of validation. As you can see, the validate parameter does not check that the field is a list, just that the item in the list is a string. However, the customized field type allows you to handle more complex and custom logic.

Conclusion

Serialization is critical for data exchange between components in a full-stack application. It promotes enhanced communication between the frontend and backend. However, as data formats begin to diverge, data structures become increasingly more complex, and system performance needs to be enhanced, there is a need for more robust libraries. In particular, Marshmallow is unique in its versatility and efficiency when serializing data. This library not only simplifies serialization, but allows for consistent and replicable serialization processes, regardless of data type. This blog only scratches the surface of all the uses of Marshmallow, but if you want to read more, check out the full documentation here!