Design in PostgreSQL, document-oriented APIs: full-text search and save many documents(Part 2)
first installment, I have created a good save function, and another function that allows you to create editable document-oriented table on the fly. They are working properly and doing exactly what we need, but we can do so much more. Especially: I want full-text search, indexed on the fly and save many documents within a transaction.
Let's do this.
Our document-oriented table has a search box, type tsvector, which is indexed using GIN index to speed. I want to update this field each time save the document and do not want a lot of noise from the API when I do it.
In this regard, I will resort to some conventions.
Usually, when you create a full-text index fields are stored with a rather peculiar name. Such as:
the
I would like to check my document at the moment save for the presence of any keys I would like to index and then save it in the search. It is possible to do using functions, which I call update_search:
the
I again use javascript (PLV8) for this purpose, and pull the document based on the ID. Then I go through all keys to check if any of them ones I might want to keep, and if there is, put them in an array.
If this is a hit, I concatinate these objects and store them in the box labeled search document, using the to_tsvector, which is built-in in Postgres, which takes plain text and transforms it into an indexed value.
Here it is! Executing this script, we get the following:

Perfect — now I can just plug this into the end of my save_document options and it will be called transaction every time I have something to save:
the
At the moment, I can transfer a single document from the save_document, but I would like to be able to pass it the array. I can do this by checking the argument type, then run the loop:
the
The good side of working here with the JavaScript om is that the logic required for such a routine, quite simple (as opposed to PLPGSQL). I highlighted the whole process of saving it in a separate function — which is a JavaScript om after all so I can avoid duplication.
Then I want to check that the input argument array. If so, then I go in his members and cause executeSql, returning all that has accumulated while running.
If it's not an array, I just do everything the same as it was, returning the whole document. The result:

Great! The best thing about this is that it all happens inside a transaction. I love it!
If only it could work perfectly from the Node! I have tried to .NET and Node, with .NET everything just works (strangely) using the Npgsql library. From Node, not really.
In short: node_pg driver makes a very strange transformation when he sees the object array as an input parameter. Pay attention to the following:
the
It is a regular Node/PG code. In the end, the run function is configured to trigger my save_document function and pass some data. When PG input sees an object, he turns it into a string and saving goes well.
In case, if you send an array...
the
I'm getting back an error that tells me that this is incorrect JSON. Error message (Postgres) will report that this is due to poorly formatted JSON:
the
What... it is. I was trying to articulate what is happening, but it simply looks like node_pg driver parses the external array — perhaps calling the method flatten libraries Underscores. I don't know. To get around this, you need to change your call to the following:
the
Saving smooth and happy it makes me happy. In the next article I will configure the search engines, and will also create a full-text search feature.
Article based on information from habrahabr.ru
Let's do this.
full text search
Our document-oriented table has a search box, type tsvector, which is indexed using GIN index to speed. I want to update this field each time save the document and do not want a lot of noise from the API when I do it.
In this regard, I will resort to some conventions.
Usually, when you create a full-text index fields are stored with a rather peculiar name. Such as:
the
-
the
- the Name, or surname, e-mail address the
- the Name, or a description of something the
- Address information
I would like to check my document at the moment save for the presence of any keys I would like to index and then save it in the search. It is possible to do using functions, which I call update_search:
the
create function update_search(tbl varchar, id int)
returns boolean
as $$
//get the record
var found = plv8.execute("select body from" + tbl + "where id=$1",id)[0];
if(found){
var doc = JSON.parse(found.body);
var searchFields = ["name","email","first","first_name",
"last","last_name","description","title",
"street", "city", "state", "zip"];
var searchVals = [];
for(var key in doc){
if(searchFields.indexOf(key.toLowerCase()) > -1){
searchVals.push(doc[key]);
}
};
if(searchVals.length > 0){
var updateSql = "update" + tbl + "set search = to_tsvector($1) where id =$2";
plv8.execute(updateSql, searchVals.join(" "), id);
}
return true;
}else{
return false;
}
$$ language plv8;
I again use javascript (PLV8) for this purpose, and pull the document based on the ID. Then I go through all keys to check if any of them ones I might want to keep, and if there is, put them in an array.
If this is a hit, I concatinate these objects and store them in the box labeled search document, using the to_tsvector, which is built-in in Postgres, which takes plain text and transforms it into an indexed value.
Here it is! Executing this script, we get the following:

Perfect — now I can just plug this into the end of my save_document options and it will be called transaction every time I have something to save:
the
create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
var doc = JSON.parse(doc_string);
var result = null;
var id = doc.id;
var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];
if(!exists){
plv8.execute("select create_document_table('" + tbl + "');");
}
if(id){
result = plv8.execute("update" + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id);
}else{
result = plv8.execute("insert into" + tbl + "(body) values($1) returning *;", doc_string);
id = result[0].id;
doc.id = id;
result = plv8.execute("update" + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id);
}
//run the search indexer
plv8.execute("perform update_search($1, $2)", tbl,id);
return result[0] ? result[0].body : null;
$$ language plv8;
more documents
At the moment, I can transfer a single document from the save_document, but I would like to be able to pass it the array. I can do this by checking the argument type, then run the loop:
the
create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
var doc = JSON.parse(doc_string);
var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];
if(!exists){
plv8.execute("select create_document_table('" + tbl + "');");
}
//function that executes our SQL statement
var executeSql = function(theDoc){
var result = null;
var id = theDoc.id;
var toSave = JSON.stringify(theDoc);
if(id){
result=plv8.execute("update" + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",toSave, id);
}else{
result=plv8.execute("insert into" + tbl + "(body) values($1) returning *;", toSave);
id = result[0].id;
//put the id back on the document
theDoc.id = id;
//resave it
result = plv8.execute("update" + tbl + " set body=$1 where id=$2 returning *;",JSON.stringify(theDoc),id);
}
plv8.execute("select update_search($1,$2)", tbl, id);
return result ? result[0].body : null;
var out = null;
//was an array passed in?
if(doc instanceof Array){
for(var i = 0; i < doc.length;i++){
executeSql(doc[i]);
}
//just report back how many documents were saved
out = JSON.stringify({count : i, success : true});
}else{
out = executeSql(doc);
}
return out;
$$ language plv8;
The good side of working here with the JavaScript om is that the logic required for such a routine, quite simple (as opposed to PLPGSQL). I highlighted the whole process of saving it in a separate function — which is a JavaScript om after all so I can avoid duplication.
Then I want to check that the input argument array. If so, then I go in his members and cause executeSql, returning all that has accumulated while running.
If it's not an array, I just do everything the same as it was, returning the whole document. The result:

Great! The best thing about this is that it all happens inside a transaction. I love it!
Oddities Node
If only it could work perfectly from the Node! I have tried to .NET and Node, with .NET everything just works (strangely) using the Npgsql library. From Node, not really.
In short: node_pg driver makes a very strange transformation when he sees the object array as an input parameter. Pay attention to the following:
the
var pg = require("pg");
var run = function (sql, params, next) {
pg.connect(args.connectionString, function (err, db, done) {
//throw if there's a connection error
assert.ok(err === null, err);
db.query(sql, params, function (err, result) {
//we have the results, release the connection
done();
pg.end();
if(err){
next(err,null);
}else{
next(null, result.rows);
}
});
});
};
run("select * from save_document($1, $2)", ['customer_docs', {name : "Larry"}], function(err,res){
//works just fine
}
It is a regular Node/PG code. In the end, the run function is configured to trigger my save_document function and pass some data. When PG input sees an object, he turns it into a string and saving goes well.
In case, if you send an array...
the
run("select * from save_document($1, $2)",
['customer_docs', [{name : "Larry"}, {name : "Susie"}],
function(err,res){
//crashes hard
}
I'm getting back an error that tells me that this is incorrect JSON. Error message (Postgres) will report that this is due to poorly formatted JSON:
the
{"{name : "Larry"}, ...}
What... it is. I was trying to articulate what is happening, but it simply looks like node_pg driver parses the external array — perhaps calling the method flatten libraries Underscores. I don't know. To get around this, you need to change your call to the following:
the
run("select * from save_document($1, $2)",
['customer_docs', JSON.stringify([{name : "Larry"}, {name : "Susie"}]),
function(err,res){
//Works fine
}
Go!
Saving smooth and happy it makes me happy. In the next article I will configure the search engines, and will also create a full-text search feature.
Комментарии
Отправить комментарий