Performance comparison of hierarchical models, Django and PostgreSQL
Good day, dear readers.
Today's article will be devoted to the comparison of models of working with hierarchical data in PostgreSQL via Django app. In the article I specifically do not use a clean implementation in the database, because I'm interested in is performance in an environment close to combat.
All tests were performed on my data and results are relevant for this data on your data the result may differ from that obtained in the article.
Model implementation of the hierarchical structures in the database
For working with such structures in PostgreSQL you can use the following modely:
-
the
- Adjacency model (AM) — model, when the column is stored parent; the
- Nested Sets (NS) model when paired column stores the range of all sub-elements; the
- Materialised Path model (MP) model, when the column is stored the full path to the element.
Also details about the implementation of hierarchical structures in a relational database, you can read here.
To implement them in Django selected the following tools:
-
the
- AM — regular recursion is based on the Django ForeignKey; the
- NS — Modul django-mptt; the
- MP module ltree PostgreSQL wrapper LtreeField;
testing Methodology
Testing is conducted on the dataset of 150 thousand companies. Time will be measured
the following query:
-
the
- Reading the tree
- Insert the node the
- Move a subtree between levels
Reading arbitrary node the
query Time will be deemed to be the average time of 1000 queries for each item, to a randomly selected tree item.
Hardware test stand
the
-
the
- CPU Core i5 2,5 GHz the
- RAM 1600 MHz DDR3 the
- Samsung SSD 850 EVO 500GB
Software test stand
the
-
the
- Python 2.7 the
- Postgres 9.6 the
- Django 1.8 the
- psycopg2
Description test
For testing created by separate Django application. It created 3 models, one for each storage scheme described above. This is done to ensure that at any moment it was possible to conduct similar testing on different datasets.
this application includes two commands:
the
-
the
load_tree
— loads the data for the test
the analize
— executes the collection and analysis of data based on a specified number of requests
To run the application and collect statistics you need to perform a command sequence:
the
python manage.py migrate
python manage.py load_tree <path to data file>
python manage.py load_tree analize <number of requests for analysis>
the results of the command analize
stored in report.
Results
After running the specified commands to their data, I received the following rezultaty. In order to model names didn't fool you, below are the models from the test and storage models
the
Raw — adjacency model
Mptt nested sets
Ltree — materialised path
Reading the tree
As you can see from the chart model Mptt and Ltree showed approximately the same result, but the native model Raw was faster.
Reading arbitrary node
In this test, unlike the previous paragraph, the model Mptt equal in speed to the standard implementation, but Ltree on the contrary worsen your score. It should be noted that in the model Raw for more descendants uses a recursive query (WITH RECURSION), and, in spite of this, he worked faster than anyone.
When you insert a node behind again model Ltree, but in this case it is most likely due to the fact that the path field in which is stored the tree consisted of the id of records, so I had to do 2 queries (insert and then update a field path), which consequently affect performance.
Move a subtree between levels
the moving of the node with the subtree of Mptt showed the worst result, it is most likely related to the fact that when you move it needs to recalculate all the fields from the migrated nodes, which is not a quick operation. Move Raw is the fastest, because, in fact, it is just updating one field. the Ltree is not much behind the leader, as it needs to update the paths from all nodes in that subtree.
Results
email in Slovenia implementation hierarchical structur I was expecting a better result will show implementation of the model MP(Ltree), but, to my surprise, she showed the second result, losing the native implementation of the model AM(Raw). Well, the implementation of the model NS(Mptt) got 3rd place as 2 of the 4 tests he lost by a wide margin from the competition.
Summary table with results
model | insert_node | move_node | read_node | read_tree |
---|---|---|---|---|
Ltree | 0.001955 | 0.010375 | 0.008745 | 0.025522 |
Mptt | 0.001006 | 0.855293 | 0.00104 | 0.115597 |
Raw | 0.001002 | 0.001 | 0.001012 | 0.021957 |
it is also necessary to note that results may vary for other datasets, and therefore before making a decision it is recommended to conduct a similar test for their data.
Used articles:
-
the
- Representing Trees in PostgreSQL the
- Trees in SQL: Nested Sets and Materialized Path the
- Storing trees in a database. the
- Benchmark tree structure for Django the
- Hierarchical data structures and Doctrine the
- Ltree the
- Django-mptt the
- LtreeFiled
PS This is a cross post original article from my blog
/ > PS Updated reading schedule tree after adding collations
Комментарии
Отправить комментарий