Understanding Hierarchical Data
When it comes to storing and querying hierarchical data, there are many ways to approach the problem. Among these is the use of parent-child tree structures, which allow for the representation of relationships between nodes in a tree-like structure.
In this article, we will explore the basics of parent-child tree structures and the typical queries used when dealing with them.
Examples of Parent-Child Structures
Parent-child tree structures can be found in various data domains, from businesses to music genres. Many organizations structure their companies hierarchically, with each department reporting to one or more higher-level departments or executives.
- Family trees are also common examples, where each node represents a family member, with parents and children forming the parent-child relationships.
- The file system on a computer is another example of a parent-child tree structure.
- Each folder is a node, with its parent folder being the node above it in the hierarchy.
- Menus, art and music genres, and even projects can also be represented as parent-child tree structures.
Storage of Parent-Child Structures in Relational Databases
The most common way of storing parent-child tree structures in a relational database is to use a single table to hold all nodes, with each node having a unique ID, a reference to its parent node ID, and any other relevant data. This type of table is commonly referred to as a parent-child table.
To associate nodes with their parent nodes in a parent-child table, you can add a column to the table that holds the ID of the parent node. This column is called a foreign key column, and it references the primary key column of the parent node.
Typical Queries Run on Parent-Child Tree Structures
Once a parent-child tree structure is stored in a relational database, there are a variety of queries that can be used to retrieve and manipulate the data.
1. List All Children of One Parent
To list all the immediate children of a parent node, you can use a simple query with a WHERE clause that matches the parent node’s ID. For example, suppose we have a parent-child table with columns for ID, first name, last name, and parent ID.
The following query lists all of Karen’s children:
SELECT * FROM people WHERE parent_id = 3;
2. List a Parent Node for a Child Node
If you have the ID of a child node and want to find its parent node, you can use a self-join query.
A self-join is a query that joins a table to itself, using different aliases to distinguish between the two instances of the table. For example, this query finds the parent node of Karen:
SELECT p.* FROM people p JOIN people c ON p.id = c.parent_id WHERE c.id = 7;
3. Get a Generation Number (or Tree Level) for Each Node
If you want to find the level of a node in the tree structure, you can use a recursive common table expression (CTE). A recursive CTE is a query that includes a reference to itself, allowing for the iteration over a set of data to produce a complete result set.
For example, this query adds a generation number to each node:
WITH RECURSIVE tree AS (
SELECT id, first_name, last_name, parent_id, 1 AS generation
FROM people
WHERE parent_id IS NULL
UNION ALL
SELECT p.id, p.first_name, p.last_name, p.parent_id, t.generation + 1
FROM people p
JOIN tree t ON p.parent_id = t.id
)
SELECT * FROM tree ORDER BY generation, last_name;
4. List All Descendants
To list all the descendants of a given parent node, you can extend the previous query with a left join.
A left join includes all the rows from the left table and any matching rows from the right table, resulting in a null value for the right table columns if there is no match. For example, this query lists all of Karen’s descendants:
WITH RECURSIVE descendants AS (
SELECT * FROM people WHERE id = 7
UNION ALL
SELECT people.* FROM people JOIN descendants ON descendants.id = people.parent_id
)
SELECT * FROM descendants LEFT JOIN (SELECT id AS root_id FROM people WHERE id = 7) AS root ON 1 = 1 WHERE root_id IS NULL;
5. Generate a Tree View of Hierarchical Data
A tree view is a graphical representation of hierarchical data, which can be generated using a combination of the CAST function, the RIGHT function, and the ORDER BY clause.
For example, this query generates a tree view of all people in the database:
WITH RECURSIVE tree AS (
SELECT id, first_name, last_name, parent_id, 1 AS generation
FROM people
WHERE parent_id IS NULL
UNION ALL
SELECT p.id, p.first_name, p.last_name, p.parent_id, t.generation + 1
FROM people p
JOIN tree t ON p.parent_id = t.id
)
SELECT CAST(REPEAT(' ', generation - 1) AS VARCHAR(255)) || first_name || ' ' || last_name AS name FROM tree ORDER BY lpad(id::text, 6, '0');
Conclusion
In this article, we have explored the basics of parent-child tree structures and the typical queries used when dealing with them. By understanding the fundamentals of hierarchical data, storage in relational databases, and the various types of queries available, you can effectively store and manipulate hierarchical data in your applications.
Whether you’re working with organizational structures or music genres, parent-child tree structures are a powerful tool for representing and querying hierarchical data. In this article, we’ve explored the basics of parent-child tree structures and the typical queries used when dealing with them.
We’ve seen how hierarchical data can be represented in a relational database using a single table with primary and foreign key columns. We’ve also looked at various types of queries that can be used for retrieving and manipulating hierarchical data, including listing all children of one parent, listing a parent node for a child node, and generating a tree view of hierarchical data.
By understanding the fundamentals of parent-child tree structures, we can efficiently store and manipulate hierarchical data in our applications, making them more powerful and flexible.