SQL - Parent child relationship trees
I'm still tyring to finish this writeup on my understanding of oracle trees. Please feel free to post comments/suggestions.
I've been working on a project where the client can manage a list of categories with an unlimited level of subcategories. The result is a tree structure that looks something like this:
category1
--subcategory1
--subcategory2
----subsubcategory1
category2
--subcategory3
Note that the first category1 has 3 tiers (category, subcategory and subsubcategory) while category2 has only 2 tiers.
I need to be able to create a unlimited level tree structure from this data for web site navigation. To get this to work I created a parent child relationship in a DB table as shown below.
--- using oracle style sql
CREATE TABLE categories (
category_id INT NOT NULL PRIMARY KEY,
category_name VARCHAR2(128) NOT NULL,
parent_id INT NULL
);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(1,'category1',NULL);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(2,'subcategory1',1);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(3,'subcategory2',1);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(4,'subsubcategory1',3);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(5,'category2',NULL);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(6,'subcategory3',5);
COMMIT;
You end up with rows of data in the table that look like the following matrix.
category_id | category_name | parent_id |
---|---|---|
1 | category1 | NULL |
2 | subcategory1 | 1 |
3 | subcategory2 | 1 |
4 | subsubcategory1 | 3 |
5 | category2 | NULL |
6 | subcategory3 | 5 |
The category_id column is the unique primary key for records in this table. The parent_id is a foreign key to the primary key column in the same table. The parent_id links the current row to a parent record in the same table. In short, the parent_id tells us if the current record has a parent (NULL means no parent) and who that parent is. We will use this relationship to "Walk the tree" where given any category_id, we can find all of its parents and all of its children within the tree.
When building a navigation menu, we tend to start at the top and work our way down. To do that, we need to start with all the top level parent categories. We know we are at the "top" of the list when we have a record with no parent (NULL for the parent_id).
--- Select all top level categories
SELECT * FROM categories WHERE parent_id IS NULL;
category_id category_name parent_id
----------- ------------- ---------
1 category1 (null)
6 category2 (null)
2 record(s) selected
If you look at the original tree drawing, this query does return the top level parent records. Now its time to put this simple parent/child relationship to the test and if it can be usefull as it is or if we need to add more data to make it work.
Lets try to create the navigation.
SELECT * FROM categories;
CATEGORY_ID CATEGORY_NAME PARENT_ID
-------------- ---------------- ------------
1 category1 (null)
2 subcategory1 1
3 subcategory2 1
4 subsubcategory1 3
5 category2 (null)
6 subcategory3 5
6 record(s) selected
Everything looks ok, but there is a hidden problem. How do we know these records are in the correct order? In this case, pure dum luck. If we change our example data we can see the problem more clearly:
DROP TABLE categories;
CREATE TABLE categories (
CATEGORY_ID NUMBER(22,0) NOT NULL,
CATEGORY_NAME VARCHAR2(128) NOT NULL,
PARENT_ID NUMBER(22,0) NULL,
SORT_ORDER NUMBER(22,0) NOT NULL
);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(1, 'category3', NULL);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(2, 'subcategory5', 1);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(3, 'subcategory4', 1);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(4, 'subsubcategory4', 3);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(5, 'category2', NULL);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(6, 'subcategory3', 5);
COMMIT;
SELECT * FROM categories;
CATEGORY_ID CATEGORY_NAME PARENT_ID
-------------- ---------------- ------------
1 category3 (null)
2 subcategory5 1
3 subcategory4 1
4 subsubcategory4 3
5 category2 (null)
6 subcategory3 5
6 record(s) selected
Woah! Whats going on here? We havent told Oracle how to sort the results, so Oracle is picking the method it thinks is best. A good rule of when working with data sets is to allways specify an ORDER BY clause to guarntee you get the results back in the way you expect.
So, how can we fix the order of this recordset? We cant. We need more data to allow us to tell Oracle how to sort these records. We need a sort order column. So lets revisit our table creation statment with our new colum "sort_order" with values for sorting:
DROP TABLE categories;
CREATE TABLE categories (
category_id INT NOT NULL PRIMARY KEY,
category_name VARCHAR2(128) NOT NULL,
parent_id INT NULL,
sort_order INT NOT NULL
);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(1,'category1',NULL,1);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(2,'subcategory1',1,1);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(3,'subcategory2',1,2);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(4,'subsubcategory1',3,1);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(5,'category2',NULL,2);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(6,'subcategory3',5,1);
COMMIT;
I've added a sort order column and I have chosen to order each record within the higharchy level that I know it should be returned within.
This means that sort_orders will only be unique within a tier for a given parent. If I lost you with that last statement, just keep reading, we will address that issue soon.
Here is how our data looks now:
SELECT * FROM categories;
CATEGORY_ID CATEGORY_NAME PARENT_ID SORT_ORDER
-------------- ---------------- ------------ -------------
1 category3 (null) 2
2 subcategory5 1 2
3 subcategory4 1 1
4 subsubcategory4 3 1
5 category2 (null) 1
6 subcategory3 5 1
6 record(s) selected
Here we see the basic sorting problem. We now have all the information that we should need to create and sort our higharchy.
So where do we go from here? Lets see what happens when we sort on the sort_order column:
SELECT * FROM categories ORDER BY sort_order;
CATEGORY_ID CATEGORY_NAME PARENT_ID SORT_ORDER
-------------- ---------------- ------------ -------------
3 subcategory4 1 1
4 subsubcategory4 3 1
5 category2 (null) 1
6 subcategory3 5 1
1 category3 (null) 2
2 subcategory5 1 2
6 record(s) selected
We are not taking into account the higharchy level, so the sorting does not work. (as a side note, we could do absolute sorting where the sort order column is unique, but lets continue to explore this solution first)
What we really need to do is somehow get the results returned already ordered by the higharchy level, then sub-sort by the sort order. SQL does not support returning trees in this way (that I know of) so after searching google I found a suggestion from oracle on how to do this in oracle 9i (the Database I'm currently using)...if you are using 10g, it should be even easier. Check out the article for more info:
http://www.oracle.com/technology/oramag/oracle/05-may/o35asktom.html
Anywho, their version of the solution is a bit hard to understand, so here is my simpler version that solves our problem:
SELECT category_name,
category_id,
level,
sort_order
FROM categories
START WITH parent_id IS NULL
CONNECT BY PRIOR category_id = parent_id
ORDER SIBLINGS BY sort_order;
This is an oracle specific solution. The SYS_CONNECT_BY_PATH function, START WITH and CONNECT BY PRIOR statements are all non-standard oracle features. I have no idea how you would do this in pure SQL...I assume you would need to write some procedual code to do the work for you.
Lets break down how this works: