ElasticSearch is one of the most popular solutions for Full-Text Search. ElasticSearch is based on Apache Lucene, powerful and flexible framework for text processing and provides additional components to it such as indices and easy-to-use REST API. On the other hand, PostgreSQL also provides FTS features and widely used to store huge volume of data. The aim of this post is to describe a prototype of a proxy server which allows to use PostgreSQL via ElasticSearch protocol. The source code of the prototype is published on GitHub project page.
Motivation
There is a number of different reasons to use such a server. Let’s look at some of them.
One of the most simple reasons is an intention to use as less of applications as possible since each application in infrastructure requires an effort to configuration and maintenance. In that case, we can remove one complex component of the infrastructure and replace it with a simple one. Also, it removes the problem of synchronization of two separate data-store components.
Another example is a migrating system to PostgreSQL as main data storage. In that case, the pg-elastic can help to move from one platform to another faster, without rewriting the client application immediately. It is clear that this could better not to use as a permanent solution, but it can dramatically reduce the number of problems during migration.
And the third reason is a performance boost in some special cases which can be used in your application. PostgreSQL has few index types that are aimed to use in FTS problems and some of them optimised for a narrow range of operation to perform very fast.
Prototype implementation
The basic idea of the solution is to fully simulate ElasticSearch REST API and proxy all requests to PostgreSQL instance. The server is written in Go in order to process requests asynchronously in an easy way and get some other useful tools as well as fast execution speed and easy to deploy an application.
All documents and configurations stored in PostgreSQL tables mainly in JSONB format that allows storing non-structured data in PostgreSQL. The workflow of the request processing is shown in the following schema.
The current version of the pg-elastic is just prototype which supports only a few basic types of request and developed as proof-of-the-concept. More information about supported API can be found on GitHub project page.
Usage
Installation
pg_elastic
written in Go programing language which is required to build the application.
All dependencies could be automatically downloaded via go get command.
IMPORTANT! pg_elastic requires Go compiler version 1.9 or above.
To download source code with all dependcies into $GOPATH
environment use:
go get github.com/asp437/pg_elastic
To build the application and place executable file in current directory use:
go build github.com/asp437/pg_elastic
To build and install the application in $GOPATH/bin
use:
go install github.com/asp437/pg_elastic
Built executable file could be found at $GOPATH/bin/pg_elastic
To run dev-version of application inside its directory execute:
go run main.go
Basic actions
The following example written in Python shows basic requests based on tests included with pg_elastic
and elasticsearch_dsl
library as easy to use ElasticSearch driver
First of all, we should create a document type to store:
class Tweet(DocType):
message = Text(analyzer='english')
user = Text()
post_date = Date()
class Meta:
index = 'twitter'
doc_type = 'tweet'
The next piece of code shows how to store documnts:
message = 'trying out Elasticsearch'
user = 'kimchy'
tweet = Tweet(meta={'id': 1}, message=message, user=user, post_date = datetime.now())
tweet.save()
The following code searches over the collection of documents:
s = Search(index="twitter") \
.query("match", message="try")
response = s.execute()
s = Search(index="twitter") \
.query("match", message="trying")
response = s.execute()
s = Search(index="twitter") \
.query("match", message="elastic")
response = s.execute()
Migration
To migrate data from existing ElasticSearch cluster into PostgreSQL instance for further usage with pg_elastic
use migration tool which is available in the pg_elastic_migrate
subpackage.
It is automatically downloaded together with pg_elastic
itself.
To build the application and place executable file in current directory use:
go build github.com/asp437/pg_elastic/pg_elastic_migrate
To build and install it into $GOPATH/bin
use:
go install github.com/asp437/pg_elastic/pg_elastic_migrate
To use the tool you should provide information about both ElasticSearch source server and PostgreSQL destanation instance. For example(all parameters are optional):
pg_elastic_migrate -elasticsearch-host=localhost:9200 -postgresql-host=localhost:5432 -postgresql-user=postgres -postgresql-password=postgres -postgresql-database=postgres
Built executable file could be found at $GOPATH/bin/pg_elastic_migrate
.
Use -h
flag to get more information about tool’s parameters.
Conclusion
The development of the prototype showed that it is a quite challenging task to fully simulate the behavior of ElasticSearch in PostgreSQL. Nevertheless, it may worth in some cases, especially for systems with PostgreSQL as the main data storage and ElasticSearch as an FTS engine due to removing of the synchronization process.
Last modified on 2018-01-15