Performance comparison of hierarchical models, Django and PostgreSQL

Good day, dear readers.

Today's article will be devoted to the comparison of models of working with hierarchical data in PostgreSQL via Django app. In the article I specifically do not use a clean implementation in the database, because I'm interested in is performance in an environment close to combat.

All tests were performed on my data and results are relevant for this data on your data the result may differ from that obtained in the article.

Model implementation of the hierarchical structures in the database

For working with such structures in PostgreSQL you can use the following modely:

Adjacency model (AM) — model, when the column is stored parent;
Nested Sets (NS) model when paired column stores the range of all sub-elements;
Materialised Path model (MP) model, when the column is stored the full path to the element.

Also details about the implementation of hierarchical structures in a relational database, you can read here.

To implement them in Django selected the following tools:

AM — regular recursion is based on the Django ForeignKey;
NS — Modul django-mptt;
MP module ltree PostgreSQL wrapper LtreeField;

testing Methodology

Testing is conducted on the dataset of 150 thousand companies. Time will be measured
the following query:

Reading the tree

Reading arbitrary node the

Insert the node
Move a subtree between levels

query Time will be deemed to be the average time of 1000 queries for each item, to a randomly selected tree item.

Hardware test stand

the

CPU Core i5 2,5 GHz
RAM 1600 MHz DDR3
Samsung SSD 850 EVO 500GB

Software test stand

the

Python 2.7
Postgres 9.6
Django 1.8
psycopg2

Description test

For testing created by separate Django application. It created 3 models, one for each storage scheme described above. This is done to ensure that at any moment it was possible to conduct similar testing on different datasets.

this application includes two commands:

the

load_tree — loads the data for the test
analize — executes the collection and analysis of data based on a specified number of requests

To run the application and collect statistics you need to perform a command sequence:

the

python manage.py migrate
python manage.py load_tree <path to data file>
python manage.py load_tree analize <number of requests for analysis>

the results of the command analize stored in report.

Results

After running the specified commands to their data, I received the following rezultaty. In order to model names didn't fool you, below are the models from the test and storage models

the

Raw — adjacency model

Mptt nested sets

Ltree — materialised path

Reading the tree

read_tree_chart

As you can see from the chart model Mptt and Ltree showed approximately the same result, but the native model Raw was faster.

Reading arbitrary node

read_node_chart

In this test, unlike the previous paragraph, the model Mptt equal in speed to the standard implementation, but Ltree on the contrary worsen your score. It should be noted that in the model Raw for more descendants uses a recursive query (WITH RECURSION), and, in spite of this, he worked faster than anyone.

insert_node_chart

When you insert a node behind again model Ltree, but in this case it is most likely due to the fact that the path field in which is stored the tree consisted of the id of records, so I had to do 2 queries (insert and then update a field path), which consequently affect performance.

Move a subtree between levels

move_node_chart

the moving of the node with the subtree of Mptt showed the worst result, it is most likely related to the fact that when you move it needs to recalculate all the fields from the migrated nodes, which is not a quick operation. Move Raw is the fastest, because, in fact, it is just updating one field. the Ltree is not much behind the leader, as it needs to update the paths from all nodes in that subtree.

Results

email in Slovenia implementation hierarchical structur I was expecting a better result will show implementation of the model MP(Ltree), but, to my surprise, she showed the second result, losing the native implementation of the model AM(Raw). Well, the implementation of the model NS(Mptt) got 3rd place as 2 of the 4 tests he lost by a wide margin from the competition.

Summary table with results

the the the the

model	insert_node	move_node	read_node	read_tree
Ltree	0.001955	0.010375	0.008745	0.025522
Mptt	0.001006	0.855293	0.00104	0.115597
Raw	0.001002	0.001	0.001012	0.021957

it is also necessary to note that results may vary for other datasets, and therefore before making a decision it is recommended to conduct a similar test for their data.

Used articles:

Representing Trees in PostgreSQL
Trees in SQL: Nested Sets and Materialized Path
Storing trees in a database.
Benchmark tree structure for Django
Hierarchical data structures and Doctrine
Ltree
Django-mptt
LtreeFiled

PS This is a cross post original article from my blog

/ > PS Updated reading schedule tree after adding collations

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express