Connection pooling to the database — and why

When Your project begins popular and every millisecond of processing the request from the user is critical, it is necessary to search for bottlenecks in the system. Often the time it takes to execute SQL query from the application to the database. Try to understand that it is possible to optimize the program when working with the database.

the

Theory


For better understanding let us consider what steps are performed when an application makes a request to the database, such as request sample:
    the
  1. Opening connection to the database and send the request to the server.
  2. the Server parses the SQL query. the

  3. the Server optimizes the query based on rules and statistics tables. The result is a query execution plan.
  4. the
  5. the server executes the query in accordance with a previously built plan and sends the results to the user.

On what can you do to save time?
First step open the connection with the server is quite long and we can exclude in advance by preparing a pool of already open connections and providing him a connection to the application as needed.

You can also avoid re-execution of steps two and three if we use bound variables when writing queries, and to cache the results of step three, which we get from the server.

Currently, the majority of drivers to work with the database supports connection pooling. However, there is always the temptation to write my own implementation which will be faster. Let's check how much we win using connection pooling and caching, as in the boxed decision and samopisnom.

the

measurement Method


For tests using a freely distributed DBMS PostgreSQL, and the client write in JAVA. In the database, let's create a small test table.test_table (about 10 lines), consisting of a primary id key and a string value. Let we have clients in parallel, perform a query to DB, this will create threads that will do simple searches on the primary key in this table. When creating threads, we will specify a different implementation of pools of connections that will allow us to compare the performance, because the flow will take the total time spent to execute 100 queries.

the
 class TestThread extends Thread {
private DBPool pool; 
private long workTime = 0;
private long foundStr = 0;

@Override
public void run() {
workTime = System.currentTimeMillis(); // Note the time
Connection con = null;
PreparedStatement st = null;
ResultSet rs = null;
Random rnd = new Random();// will use as the primary key in the query
for (int i = 0; i < 100; i++) {
try {
con = pool.getConnection();// get connection to DB
// send the request to the parsing and building execution plan
st = con.prepareStatement("SELECT a.* FROM test.a test_table WHERE id =?");
st.setObject(1, rnd.nextInt(10));
rs = st.executeQuery();// execute the query
if (rs.next()) {
String tmp = (rs.getString(2)); // process the result
if (tmp != null) {
foundStr++;
}
}
} catch (SQLException ex) {
//execution error, print in the console
System.out.println("Pool" + pool + "exeption" + ex);
} finally {
// and in the end, carefully close all used objects
try {
if (rs != null)
rs.close();
} catch (SQLException e) {
//ignore
}
try {
if (st != null)
st.close();
} catch (SQLException e) {
//ignore
}
try {
if (con != null)
pool.putConnection(con); // put the connection back into the pool
} catch (SQLException e) {
//ignore
}
}
}
workTime = System.currentTimeMillis() - workTime; // get the time spent
}
}


Now make more pools, and compare the performance.
The first will be a classic that every request opens a connection to the server and after executing the query and closing it.
private String url, user, password; DBPool(String url, String user, String password) throws ClassNotFoundException { this.url = url; this.user = user; this.password = password; Class.forName("org.postgresql.Driver"); } public Connection getConnection() throws SQLException { return DriverManager.getConnection(url, user, password); } public void putConnection(Connection connection) throws SQLException { connection.close(); } }
Second is using special caching the DataSource class of the JDBC driver to PostgreSQL — PGPoolingDataSource. Which allows you to set the connection pool size and initial number of connections. In addition to the settings from the PreparedStatement is setting setPrepareThreshold — responsible for the number of executions of the query after which the query is cached and does not require parsing and building execution plan.
the
 class DBPoolCache extends DBPool {
private PGPoolingDataSource source;

DBPoolCache(String host, String database, String user, String password) {
source = new PGPoolingDataSource();
source.setDataSourceName("A Data Source");
source.setServerName(host);
source.setDatabaseName(database);
source.setUser(user);
source.setPassword(password);
source.setMaxConnections(20);//Maximum value
source.setInitialConnections(20);//How many connections will be immediately open
}

public Connection getConnection() throws SQLException {
return source.getConnection();
}

public void putConnection(Connection connection) throws SQLException {
connection.close();
}
}

So in the end our implementation of pools, when we ourselves't cache the connection to the database and the results of parsing an SQL statement (PreparedStatement).
the
 class DBPoolCacheMy extends DBPool {
private String url, user, password;
private PGSimpleDataSource source;
private BlockingQueue<Connection> connections = new ArrayBlockingQueue<Connection>(20);

DBPoolCacheMy(String host, String database, String user, String password) throws SQLException {
source = new PGSimpleDataSource();
source.setServerName(host);
source.setDatabaseName(database);
source.setUser(user);
source.setPassword(password);
for (int i = 0; i < 20; i++) {//Prepare connection
connections.add(new MyConnection(source.getConnection()));
}
}

public Connection getConnection() throws SQLException {
try { // try to get a free connection
return connections.poll(2, TimeUnit.SECONDS);
} catch (InterruptedException e) {
return null;
}
}

public void putConnection(Connection connection) throws SQLException {
connections.add(connection);
}
}

Just have your class implement connections to a database, which will implement caching of PreparedStatement.
the
 class MyConnection implements Connection {
private Connection connection;

protected MyConnection(Connection connection) {
this.connection = connection;
}

private ConcurrentHashMap<String, PreparedStatement > statements = new ConcurrentHashMap<String, PreparedStatement > ();

public PreparedStatement prepareStatement(String sql) throws SQLException {
PreparedStatement statement = statements.get(sql);
if (statement == null) {
statement = new MyStatement(connection.prepareStatement(sql));
statements.put(sql, statement);
}
return statement;
}
.....
}

Plus your class implements the PreparedStatement interface and not reacting to the closure of the
the
 class MyStatement implements PreparedStatement {
private PreparedStatement statement;

MyStatement(PreparedStatement statement) throws SQLException {
this.statement = statement;
((PGStatement) statement).setPrepareThreshold(1);
}

public void close() throws SQLException {
//ignore
}
.....
}


the

Conclusion



Finally, we compare the performance of three different pools of connections, run the tests with the number of concurrent threads from 1 to 10, for different implementations. The result is the following dependence of the total execution time of the task from the number of threads.



The graph shows that to cache database connections is clearly necessary, it gives a significant boost in system performance. But to write samopisny the caching of the connection and PreparedStatement does not give tangible benefits.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

When the basin is small, or it's time to choose VPS server

Performance comparison of hierarchical models, Django and PostgreSQL

From Tomsk to Silicon Valley and Back