Intro to Mongo Db – Part 3 – Polymorphic Schemas and Transaction

POLYMORPHIC SCHEMA

OO languages allow functions to manipulate child classes as though they were their parent classes, calling methods that are defined in the parent but may have been overridden with different implementations in the child. This feature of OO languages is referred to as polymorphism.

Relational databases, with their focus on tables with a fixed schema and does not allow us to define a related set of schemas for a table so that we could store any object in our hierarchy in the same table (and retrieve it using the same mechanism).

The flexibility that MongoDB offers by not enforcing a particular schema for all documents in a collection provides several benefits to the application programmer over an RDBMS solution:

  • Better mapping of object-oriented inheritance and polymorphism
  • Simpler migrations between schemas with less application downtime
  • Better support for semi-structured domain data 
Effectively using MongoDB requires recognizing when a polymorphic schema may benefit your application and not over-normalizing your schema by replicating the same data layout you might use for a relational database system.

TRANSACTION

Relational database schemas often rely on the existence of atomic multistatement transactions to ensure data consistency: either all of the statements in a group succeed, or all of the statements fail, moving the database from one self-consistent state to another. MongoDB’s document model and its atomic update operations enable an approach that maintains consistency where a relational database would use a transaction. We can use an approach known as compensation to mimic the transactional behavior of relational databases.

The patterns we use in MongoDB to mitigate the lack of atomic multidocument update operations include

1)document embedding and complex updates for basic operations

2) optimistic update with compensation available for when we really need a two-phase commit protocol.

When designing your application to use MongoDB, more than in relational databases, you must keep in mind which updates you need to be atomic and design your schema appropriately.

Advertisement

Intro to Mongo Db ( part 2 )

Mongo DB does not support joins or multidocument transactions at all, MongoDB has been able to implement an automatic sharding solution with much better scaling and performance characteristics than you’d normally be stuck with if you had to take relational joins and transactions into account.

Embedding data vs Referencing

Embedded data –

We can say embedded data is denormalized form, embedding is the approach that will provide the best performance and data consistency guarantees. Example of a blog schema-

{

"_id": "First Post",

"author": "Rick",

"text": "This is my first post",

"comments": [

{ "author": "Stuart", "text": "Nice post!" },

...

]

}

Referencing Data

If your application may query data in many different ways, or you are not able to anticipate the patterns in which data may be queried, a more “normalized” approach may be better. Normalizing your data model into multiple collections is the increased flexibility this gives you in performing queries.

  // db.posts schema

{

"_id": "First Post",

"author": "Rick",

"text": "This is my first post"

}

    // db.comments schema

{

"_id": ObjectId(...),

"post_id": "First Post",

"author": "Stuart",

"text": "Nice post!"

}

MongoDB tends to be more of an art than a science, and one of the earlier decisions you need to make is whether to embed a one-to-many relationship as an array of subdocuments or whether to follow a more relational approach and reference documents by their _id value.

Intro to Mongo Db ( part 1)

Understanding RDBMS 

Data is arranged in tables with rows and columns. Important concept of Normalization of database schema is followed. 1 NF requires each cell to have single value (or multivalues are stored as blobs of text which is not efficient when doing LIKE query). To avoid redundancy of data often more tables are created based on primary key. Though this is efficient way of storing, but often this creates a problem where Join of tables is required to yield query result. Now join is an expensive process of reading, though modern DB have reduced it by using cache mechanisms. even with such optimizations, joining tables is one of the most expensive operations that relational databases do. Additionally, if you end up needing to scale your database to multiple servers, you introduce the problem of generating a distributed join, a complex and generally slow operation.

Denormalizing for Performance
The dirty little secret (which isn’t really so secret) about relational databases is that once we have gone through the data modeling process to generate our nice nth normal form data model, it’s often necessary to denormalize the model to reduce the number of JOIN operations required for the queries we execute frequently.

MongoDB: Who Needs Normalization, Anyway?

In MongoDB, data is stored in documents. This means that where the first normal form in relational databases required that each row-column intersection contain exactly one value, MongoDB allows you to store an array of values if you so desire. MongoDB can natively encode such multivalued properties, we can get many of the performance benefits of a denormalized form without the attendant difficulties in updating redundant data.

MongoDB Document Format

Documents in MongoDB are modeled after the JSON (JavaScript Object Notation) format, but are actually stored in BSON (Binary JSON). Briefly, what this means is that a MongoDB document is a dictionary of key- value pairs, where the value may be one of a number of types:

  • Primitive JSON types (e.g., number, string, Boolean)
  • Primitive BSON types (e.g., datetime, ObjectId, UUID, regex)
  • Arrays of values
  • Objects composed of key-value pairs
  • Null