Performance comparison of hierarchical models, Django and PostgreSQL

Good day, dear readers.


Today's article will be devoted to the comparison of models of working with hierarchical data in PostgreSQL via Django app. In the article I specifically do not use a clean implementation in the database, because I'm interested in is performance in an environment close to combat.


All tests were performed on my data and results are relevant for this data on your data the result may differ from that obtained in the article.


Model implementation of the hierarchical structures in the database


For working with such structures in PostgreSQL you can use the following modely:


    the
  1. Adjacency model (AM) — model, when the column is stored parent;
  2. the
  3. Nested Sets (NS) model when paired column stores the range of all sub-elements;
  4. the
  5. Materialised Path model (MP) model, when the column is stored the full path to the element.

Also details about the implementation of hierarchical structures in a relational database, you can read here.


To implement them in Django selected the following tools:


    the
  1. AM — regular recursion is based on the Django ForeignKey;
  2. the
  3. NS — Modul django-mptt;
  4. the
  5. MP module ltree PostgreSQL wrapper LtreeField;

testing Methodology


Testing is conducted on the dataset of 150 thousand companies. Time will be measured
the following query:


    the
  1. Reading the tree
  2. Reading arbitrary node the

  3. Insert the node
  4. the
  5. Move a subtree between levels

query Time will be deemed to be the average time of 1000 queries for each item, to a randomly selected tree item.


Hardware test stand


the
    the
  • CPU Core i5 2,5 GHz
  • the
  • RAM 1600 MHz DDR3
  • the
  • Samsung SSD 850 EVO 500GB

Software test stand


the
    the
  • Python 2.7
  • the
  • Postgres 9.6
  • the
  • Django 1.8
  • the
  • psycopg2

Description test


For testing created by separate Django application. It created 3 models, one for each storage scheme described above. This is done to ensure that at any moment it was possible to conduct similar testing on different datasets.


this application includes two commands:


the
    the
  • load_tree — loads the data for the test
  • the
  • analize — executes the collection and analysis of data based on a specified number of requests

To run the application and collect statistics you need to perform a command sequence:


the
python manage.py migrate
python manage.py load_tree <path to data file>
python manage.py load_tree analize <number of requests for analysis>

the results of the command analize stored in report.


Results


After running the specified commands to their data, I received the following rezultaty. In order to model names didn't fool you, below are the models from the test and storage models


the

    Raw — adjacency model

    Mptt nested sets

    Ltree — materialised path


Reading the tree


read_tree_chart


As you can see from the chart model Mptt and Ltree showed approximately the same result, but the native model Raw was faster.


Reading arbitrary node


read_node_chart


In this test, unlike the previous paragraph, the model Mptt equal in speed to the standard implementation, but Ltree on the contrary worsen your score. It should be noted that in the model Raw for more descendants uses a recursive query (WITH RECURSION), and, in spite of this, he worked faster than anyone.


insert_node_chart


When you insert a node behind again model Ltree, but in this case it is most likely due to the fact that the path field in which is stored the tree consisted of the id of records, so I had to do 2 queries (insert and then update a field path), which consequently affect performance.


Move a subtree between levels


move_node_chart


the moving of the node with the subtree of Mptt showed the worst result, it is most likely related to the fact that when you move it needs to recalculate all the fields from the migrated nodes, which is not a quick operation. Move Raw is the fastest, because, in fact, it is just updating one field. the Ltree is not much behind the leader, as it needs to update the paths from all nodes in that subtree.


Results


email in Slovenia implementation hierarchical structur I was expecting a better result will show implementation of the model MP(Ltree), but, to my surprise, she showed the second result, losing the native implementation of the model AM(Raw). Well, the implementation of the model NS(Mptt) got 3rd place as 2 of the 4 tests he lost by a wide margin from the competition.


Summary table with results


the the the the
model insert_node move_node read_node read_tree
Ltree 0.001955 0.010375 0.008745 0.025522
Mptt 0.001006 0.855293 0.00104 0.115597
Raw 0.001002 0.001 0.001012 0.021957

it is also necessary to note that results may vary for other datasets, and therefore before making a decision it is recommended to conduct a similar test for their data.


Used articles:


    the
  1. Representing Trees in PostgreSQL
  2. the
  3. Trees in SQL: Nested Sets and Materialized Path
  4. the
  5. Storing trees in a database.
  6. the
  7. Benchmark tree structure for Django
  8. the
  9. Hierarchical data structures and Doctrine
  10. the
  11. Ltree
  12. the
  13. Django-mptt
  14. the
  15. LtreeFiled

PS This is a cross post original article from my blog

/ > PS Updated reading schedule tree after adding collations
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

When the basin is small, or it's time to choose VPS server

From Tomsk to Silicon Valley and Back