Design in PostgreSQL, document-oriented APIs: full-text search and save many documents(Part 2)

first installment, I have created a good save function, and another function that allows you to create editable document-oriented table on the fly. They are working properly and doing exactly what we need, but we can do so much more. Especially: I want full-text search, indexed on the fly and save many documents within a transaction.

Let's do this.

full text search


Our document-oriented table has a search box, type tsvector, which is indexed using GIN index to speed. I want to update this field each time save the document and do not want a lot of noise from the API when I do it.

In this regard, I will resort to some conventions.

Usually, when you create a full-text index fields are stored with a rather peculiar name. Such as:
the
    the
  • the Name, or surname, e-mail address
  • the
  • the Name, or a description of something
  • the
  • Address information

I would like to check my document at the moment save for the presence of any keys I would like to index and then save it in the search. It is possible to do using functions, which I call update_search:
the
create function update_search(tbl varchar, id int)
returns boolean
as $$
//get the record
var found = plv8.execute("select body from" + tbl + "where id=$1",id)[0];
if(found){
var doc = JSON.parse(found.body);
var searchFields = ["name","email","first","first_name",
"last","last_name","description","title", 
"street", "city", "state", "zip"];
var searchVals = [];
for(var key in doc){
if(searchFields.indexOf(key.toLowerCase()) > -1){
searchVals.push(doc[key]);
}
};

if(searchVals.length > 0){
var updateSql = "update" + tbl + "set search = to_tsvector($1) where id =$2";
plv8.execute(updateSql, searchVals.join(" "), id);
}
return true;
}else{
return false;
}

$$ language plv8;

I again use javascript (PLV8) for this purpose, and pull the document based on the ID. Then I go through all keys to check if any of them ones I might want to keep, and if there is, put them in an array.

If this is a hit, I concatinate these objects and store them in the box labeled search document, using the to_tsvector, which is built-in in Postgres, which takes plain text and transforms it into an indexed value.

Here it is! Executing this script, we get the following:

image

Perfect — now I can just plug this into the end of my save_document options and it will be called transaction every time I have something to save:

the
create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
var doc = JSON.parse(doc_string);
var result = null;
var id = doc.id;
var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];

if(!exists){
plv8.execute("select create_document_table('" + tbl + "');");
}

if(id){
result = plv8.execute("update" + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id);
}else{
result = plv8.execute("insert into" + tbl + "(body) values($1) returning *;", doc_string);
id = result[0].id;
doc.id = id;
result = plv8.execute("update" + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id);
}

//run the search indexer
plv8.execute("perform update_search($1, $2)", tbl,id);
return result[0] ? result[0].body : null;

$$ language plv8;

more documents


At the moment, I can transfer a single document from the save_document, but I would like to be able to pass it the array. I can do this by checking the argument type, then run the loop:

the
create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
var doc = JSON.parse(doc_string);

var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];
if(!exists){
plv8.execute("select create_document_table('" + tbl + "');");
}

//function that executes our SQL statement
var executeSql = function(theDoc){
var result = null;
var id = theDoc.id;
var toSave = JSON.stringify(theDoc);

if(id){
result=plv8.execute("update" + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",toSave, id);
}else{
result=plv8.execute("insert into" + tbl + "(body) values($1) returning *;", toSave);

id = result[0].id;
//put the id back on the document
theDoc.id = id;
//resave it
result = plv8.execute("update" + tbl + " set body=$1 where id=$2 returning *;",JSON.stringify(theDoc),id);
}
plv8.execute("select update_search($1,$2)", tbl, id);
return result ? result[0].body : null;

var out = null;

//was an array passed in?
if(doc instanceof Array){
for(var i = 0; i < doc.length;i++){
executeSql(doc[i]);
}
//just report back how many documents were saved
out = JSON.stringify({count : i, success : true});
}else{
out = executeSql(doc);
}
return out;
$$ language plv8;

The good side of working here with the JavaScript om is that the logic required for such a routine, quite simple (as opposed to PLPGSQL). I highlighted the whole process of saving it in a separate function — which is a JavaScript om after all so I can avoid duplication.

Then I want to check that the input argument array. If so, then I go in his members and cause executeSql, returning all that has accumulated while running.

If it's not an array, I just do everything the same as it was, returning the whole document. The result:

image

Great! The best thing about this is that it all happens inside a transaction. I love it!

Oddities Node


If only it could work perfectly from the Node! I have tried to .NET and Node, with .NET everything just works (strangely) using the Npgsql library. From Node, not really.

In short: node_pg driver makes a very strange transformation when he sees the object array as an input parameter. Pay attention to the following:

the
var pg = require("pg");
var run = function (sql, params, next) {
pg.connect(args.connectionString, function (err, db, done) {
//throw if there's a connection error
assert.ok(err === null, err);

db.query(sql, params, function (err, result) {
//we have the results, release the connection
done();
pg.end();
if(err){
next(err,null);
}else{
next(null, result.rows);
}
});
});
};

run("select * from save_document($1, $2)", ['customer_docs', {name : "Larry"}], function(err,res){
//works just fine
}

It is a regular Node/PG code. In the end, the run function is configured to trigger my save_document function and pass some data. When PG input sees an object, he turns it into a string and saving goes well.

In case, if you send an array...

the
run("select * from save_document($1, $2)", 
['customer_docs', [{name : "Larry"}, {name : "Susie"}], 
function(err,res){
//crashes hard
}

I'm getting back an error that tells me that this is incorrect JSON. Error message (Postgres) will report that this is due to poorly formatted JSON:

the
{"{name : "Larry"}, ...}

What... it is. I was trying to articulate what is happening, but it simply looks like node_pg driver parses the external array — perhaps calling the method flatten libraries Underscores. I don't know. To get around this, you need to change your call to the following:

the
run("select * from save_document($1, $2)", 
['customer_docs', JSON.stringify([{name : "Larry"}, {name : "Susie"}]), 
function(err,res){
//Works fine
}

Go!


Saving smooth and happy it makes me happy. In the next article I will configure the search engines, and will also create a full-text search feature.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Performance comparison of hierarchical models, Django and PostgreSQL

Transport Tycoon Deluxe / Emscripten part 2

google life search