From time to time we detect query patterns that are not well fitted to the BTree+ structures provided by InnoDB. One such situation is when you need to traverse a hierarchy (tree) or graph structure with many nodes. Specialist databases exist for this such as Neo4J. However there exists a simple solution in the form of OQGraph which is distributed with MariaDB and is documented here.
The OQGRAPH engine is based on an original idea by Open Query founder Arjen Lentz, and was developed in-house with Antony Curtis at Open Query.A good simple example of how OQGraph can be applied, is the friend of a friend traversal where you want to determine the path between two people in a social network. I've used MariaDB, installing OQGraph as a plugin.
INSTALL SONAME 'ha_oqgraph';
create database test; use test;
First I'll create a person table to hold a bunch of people from my facebook account, a friendship table to represent the various connections that exist between those people, and lastly the OQGraph table which is what we will query in order to find the relationships.
CREATE TABLE `person` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` char(12) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=17 DEFAULT CHARSET=latin1
CREATE TABLE `friendship` ( `person` int(10) unsigned NOT NULL, `friend` int(10) unsigned NOT NULL, PRIMARY KEY (`person`,`friend`), KEY `idx_friend_person` (`friend`,`person`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `friendship_graph` ( `latch` varchar(32) DEFAULT NULL, `origid` bigint(20) unsigned DEFAULT NULL, `destid` bigint(20) unsigned DEFAULT NULL, `weight` double DEFAULT NULL, `seq` bigint(20) unsigned DEFAULT NULL, `linkid` bigint(20) unsigned DEFAULT NULL, KEY `latch` (`latch`,`origid`,`destid`) USING HASH, KEY `latch_2` (`latch`,`destid`,`origid`) USING HASH ) ENGINE=OQGRAPH DEFAULT CHARSET=latin1 `data_table`=friendship `origid`=person `destid`=friend
In the CREATE statement for the OQGraph table, we pass the origin table (data_table) and fields so it understands where to source the data.
The person table simply contains names and IDs.
+----+----------+ | id | name | +----+----------+ | 1 | ewen | | 2 | piotr | | 3 | mixa | | 4 | maciek | | 5 | kenny | | 6 | lefred | | 7 | peter | | 8 | celeste | | 9 | stephen | | 10 | carolina | | 11 | mark | | 12 | harriet | | 13 | dani | | 14 | juanjo | | 15 | ander | | 16 | ana | +----+----------+
And the friendship table was populated with the combined ID values to represent each connection. For example here we record Ander knows Ana, and Ana knows Ander.
insert into friendship values (16,15); insert into friendship values (15,16);
Now we can find the shortest path between two nodes (people).
MariaDB [test]> select p.name from friendship_graph fg join person p on (fg.linkid=p.id)where fg.latch = '1' and origid = 1 and destid = 2; +-------+ | name | +-------+ | ewen | | piotr | +-------+
Since Piotr and I know each other directly, there is just a straight link between the two of us. What about something less direct?
MariaDB [test]> select p.name from friendship_graph fg join person p on (fg.linkid=p.id)where fg.latch = '1' and origid = 8 and destid = 2; +---------+ | name | +---------+ | celeste | | ewen | | piotr | +---------+
In this case Celeste knows me, and since I know Piotr we have linked the two nodes (people) via a third node. Can this follow a longer chain, can we traverse more than one level of abstraction?
MariaDB [test]> select p.name from friendship_graph fg join person p on (fg.linkid=p.id)where fg.latch = '1' and origid = 8 and destid = 16; +---------+ | name | +---------+ | celeste | | ewen | | juanjo | | ander | | ana | +---------+
Again, Celeste knows Ewen, who knows Juanjo, he in turn knows Ander and Ander is friends with Ana.
What is the "latch" field in the CREATE TABLE statement?
The latch field is the given mechanism for communicating with the OQGraph engine and it allows us to set the algorithm to use.
Summary of Implemented latch commands
Latch | Alternative | additional where clause fields | Graph operation |
---|---|---|---|
NULL | (unspecified) | (none) | List original data |
(empty string) | 0 | (none extra) | List all vertices in linkid column |
(empty string) | 0 | origid | List all first hop vertices from origid in linkid column |
dijkstras | 1 | origid, destid | Find shortest path using Dijkstras algorithm between origid and destid, with traversed vertex ids in linkid column |
dijkstras | 1 | origid | Find all vertices reachable from origid, listed in linkid column, and report sum of weights of vertices on path to given vertex in weight |
dijkstras | 1 | destid | Find all vertices reachable from destid, listed in linkid column, and report sum of weights of vertices on path to given vertex in weight |
breadth_first | 2 | origid | List vertices reachable from origid in linkid column |
breadth_first | 2 | destid | List vertices from which a path can be found to destid in linkid column |
breadth_first | 2 | origid, destid | Visit all vertices possible between origid and destid, report in linkid column |
The MariaDB documentation on OQGraph can be found on the MariaDB knowledge base.
To solve Graph type queries natively in MySQL, it is necessary to both consider the schema model to use such as nested sets or adjacency lists and in some cases write stored procedures to provide the recursion necessary. OQGraph presents a simple solution to a complex problem without introducing additional complications to the stack and without the need to develop sophisticated SQL. Facebook have created their own solution called TAO which interacts with MySQL, but to my knowledge is not currently open source.