SQL and simple polymorphism

SQL is about tables and sets. Object-oriented programing is about types, subtypes, and polymorphism. There are nasty, nasty frameworks available to try to move between the two worlds (say, by serializing and deserializing Java objects) — this posting, however, is not about those. Instead, it’s about how to make SQL support the basic benefits of polymorphism in what we might call a SQL-y way, independent of any programming language.

Similar entities

Imagine that you want to deal with a set of geographical entities, similar to what I handle in OurAirports:

cities
airports
navigation transmitters
points of interest (e.g. buildings, etc.)

The normal object-oriented approach would be to create a base class such as “Location” with all the properties these have in common (unique id, latitude, longitude, elevation, name, description, etc.) and then create subclasses adding additional information for each one (such as frequencies for the navigation transmitters, or identifiers for the airport). The ~~normal~~ naive SQL approach, on the other hand, is to repeat the information in four separate tables.

The base table

SQL can, however, handle polymorphism in a fairly elegant way — I think database people call these “disjoint subtypes”, but feel free to ignore that term. Instead of a base class, we create a base table (I’m using the MySQL dialect, but most SQLs should have similar capabilities):

create table Locations (
  id bigint primary key auto_increment,
  type enum ('airport', 'city', 'poi', 'navaid') not null,
  latitude_deg double not null,
  longitude_deg double not null,
  elevation_m int not null,
  title varchar(128) not null,
  description text not null,
  # add indices as needed
);

(A purer implementation would use a separate LocationTypes table for flexibility, rather than the enum value — I’ll probably do that in OurAirports, but it would make these examples a bit more verbose.)

The subtype tables

Next, create a table for each of your subtypes, where the primary key is also a foreign key pointing to the Locations table:

create table Airports (
  id bigint primary key,
  iati_code char(3),
  icao_code char(4),
  local_code varchar(4),
  # etc.
  foreign key id references Locations(id)
);

Using the same id for the base and derived table ensures that each entity has an id that is unique across all subtables, and simplifies SQL statements for accessing and modifying the tables.

Polymorphism

Now, when you want to deal with locations in general, regardless of subtype, simply query the Locations table:

select L.* from Locations L where latitude_deg > 60;

You can update base information without even knowing the subtype:

update Locations set latitude_deg=45.5 where id=111;

When you want to deal with a specific kind of location, do a simple join:

select L.*, A.* from Locations L 
join Airports A on L.id=A.id
where A.latitude_deg > 60;

Views

To make this even easier, define a view containing the join:

drop view if exists AirportsView;
create view AirportsView as
select * from Locations L 
join Airports A on L.id=A.id;

Now you can simply query AirportsView as if it were a single table:

select * from AirportsView where latitude_deg>60;

Inserting and deleting

Inserting is slightly more complex, because it involves two steps: first, create the base entry, then link to it from the derived entry:

insert into Locations 
 (latitude_deg, longitude_deg, elevation_m, name) 
values
 (45.5, -75.5, 100, 'Sample Airport');

insert into Airports (id, iata_code) values
(last_insert_id(), 'AAA');

Stored procedures and/or triggers can automate this fairly nicely, if you don’t want to have to embed too much database logic in application code. Deletion similarly requires two steps:

delete from Airports where id=111;

delete from Locations where id=111;

A simple post-deletion trigger for Airports, though, could automate this and avoid the risk of mistakes in source code.

Eating the dogfood

Unfortunately, I didn’t design OurAirports this way from the start, so I have a bit of refactoring to do. When I’m finished, however, I expect mapping to be much simpler, since I will no longer have to execute 5 or 6 separate queries to find all the icons to display on a map. If anyone has any better ideas, please let me know; otherwise, stay tuned.

Tagged: databases, programming, tips