Why proponents of marriage equality should love graph databases, Part 1: A Reply to 'The Database Engineering Perspective'

Written on 10:28:00 AM by S. Potter

This is a response to the excellent Gay marriage: the database engineering perspective blog post from November 2008 by Sam Hughs. The only discussion that didn't happen in his blog post was offering a NOSQL alternative to the gay marriage database problem, but I will not hold it against Sam Hughs because it was a comprehensive look at relational database schema designs for modelling marriage!

This response is a little dabble (i.e. not as thorough as the original post) into how graph databases could model monogamous same sex marriages without much problem and also allow for polyamorous marriages quite naturally. My goal here is to demonstrate how graph databases provide a great amount of flexibility without very much work at all, especially when the data application is just as concerned with connections between entities (aka records in RDBMS or nodes in graph databases) than just the entities themselves.

First off I want to look at the inherent problems with modelling real world data in relational databases and how the approach of graph databases can overcome many, if not all, of these problems. Then I will launch into a specific snippet of code to demonstrate modelling marriages that can be between any two consenting adults rather than just one man and one woman. Then I hope to outline how this could be extended to model poly amorous (including polygamous) relationships without much more work. Near the end I will also include a section at the end that discusses shortcomings of the graph database approach as well for completeness. My discussion will be focused on using Neo4J as it is the only comprehensive graph database I am aware of that is open source.

Update (2010-03-13): Boris (in the comments) mentioned that there is another open source graph database called HyperGraphDB

Update (2010-04-03): Johannes (via email) told me about InfoGrid, which looks like a very interesting alternative to Neo4J with a slightly different way of doing things.

What is wrong with relational databases?

For the last 14 years I have been using, designing, maintaining and administering relational databases in some capacity as a software programmer, developer, engineer and now as an applications architect. During that time I have found the following problems with using relational databases:
  • A lot of real world data isn't highly structured (only partially): this is probably the biggest problem with using relational databases (for everything) the way they are supposed to be. Sure you can add a blob, clob or text field to a table and add an arbitrary structure of data that depends on the record, which can be parsed by the application, but that would be a violation of all things relational and you would be paying a heavy price for doing this in a relational database depending on how big these "blobs" generally are. Not to mention you miss out on query-ability, which is something relational databases do well on highly structured datasets with the relevant indexes defined and a decent attempt at normalization. Note: I think relational databases are a fine thing when utilized to model data that truly is highly structured in the wild where the entities, not the relationships matter the most.
  • Object to relational mapping (ORM) constraints and disjoints: For almost a decade I have been using ORM libraries such as TOPLink (Java), Persistence (C++), Hibernate (Java), ActiveRecord (Ruby), DataMapper (Ruby), SQLObject (Python) and others. They each had their own specific problems at times, but they also possessed a set of common problems that resulted in creating a far from seamless integration between the object oriented layer (that a vast number of business systems and web applications are currently written with today) and the distinct properties of a relational database. These common object to relational constraints and disjointedness is not an ORM tool issue, rather it is a problem of trying to shove a round pin into a square hole (paraphrasing a comment made in page 3 of 'The Neo Database: A Technology Introduction').
  • Weak and inefficient "traversal" support: When I first started writing rich domain models in the nineties (that is last century for all you whipper-snappers) the focus of design was on the actual entities not relationships between the entities. Sure occasionally you had to model a relationship as an "entity", but for the most part you could do your best to reduce your domain relationships as much as possible to a combination of contain a(n) (aka belongs_to in ActiveRecord), contained by (aka has_one in ActiveRecord), one-to-many (aka has_many in ActiveRecord) and/or many-to-many (aka has_and_belongs_to_many in ActiveRecord) as much as possible because you can't attach attributes to relationships in a relational database unless you attempt to make it an entity. If you are interested in deep and/or rich object graphs, however, you have to pay the penalty in any RDBMS with multiple joins, which are an expensive operation. Even with all the right indices defined and queries optimized you will find more than three levels of indirection (aka JOINs) will take a while on even a medium sized dataset. So as application developers, we are forced to add lazy loading and/or customized eager loading logic into our rich object domain layer for a number cases where performance matters, which in my view pollutes business logic and makes the code less maintainable going forward. This is far from ideal. It is also not ideal that data set traversal patterns change with new the introduction of new features, so often you will need to fork lazy loading logic in the business logic layer to satisfy you new requirements. Adding a lot more complexity to manage in the application tier.
  • Maintaining and evolving relational schemas: modifying relational database schemas has always been a little tricky at best. Today I take advantage of using ActiveRecord's "migrations" that loosely orders (by creation timestamp) a set of relational schema modifications to run (and we wrote a simple extension to wrap them in a transaction to save our sanity - why wasn't that the default to begin with? anyway...). Even though there is a method to the madness now, it is still a little crazy, especially when it comes time to run these migrations on the production server. I always miss a heart beat or two when I run the rake db:migrate RAILS_ENV=production command (yes, even after taking a snapshot).

It has been fun thinking out loud, but I have stuff to do today. Hey, I sometimes have a life, honest!:) So I have made an outline of what is to come in the next parts of this topic. I have already written some (bad) Java code to model same-sex and opposite-sex marriages consistently using a graph database (in this case Neo4J), but I plan on offering a Python snippet too, since I really can't stand the look of Java any more and the API doesn't make the case of Graph Databases for those coming from more terser feeling languages like Python, Ruby, Haskell, Erlang, Javascript (well if you use sane APIs like jQuery at least).

What is coming in Part 2?

  • How can graph databases overcome relational database shortcomings?
  • Gay marriage: the graph database solution

What is coming in Part 3?

  • Poly amorous marriage using a graph database
  • Graph databases: problems to watch out for