pg-elastic - Proxy to use PostgreSQL via ElasticSearch protocol

ElasticSearch is one of the most popular solutions for Full-Text Search. ElasticSearch is based on Apache Lucene, powerful and flexible framework for text processing and provides additional components to it such as indices and easy-to-use REST API. On the other hand, PostgreSQL also provides FTS features and widely used to store huge volume of data. The aim of this post is to describe a prototype of a proxy server which allows to use PostgreSQL via ElasticSearch protocol. The source code of the prototype is published on GitHub project page.

Motivation

There is a number of different reasons to use such a server. Let’s look at some of them.

One of the most simple reasons is an intention to use as less of applications as possible since each application in infrastructure requires an effort to configuration and maintenance. In that case, we can remove one complex component of the infrastructure and replace it with a simple one. Also, it removes the problem of synchronization of two separate data-store components.

Another example is a migrating system to PostgreSQL as main data storage. In that case, the pg-elastic can help to move from one platform to another faster, without rewriting the client application immediately. It is clear that this could better not to use as a permanent solution, but it can dramatically reduce the number of problems during migration.

And the third reason is a performance boost in some special cases which can be used in your application. PostgreSQL has few index types that are aimed to use in FTS problems and some of them optimised for a narrow range of operation to perform very fast.

Prototype implementation

The basic idea of the solution is to fully simulate ElasticSearch REST API and proxy all requests to PostgreSQL instance. The server is written in Go in order to process requests asynchronously in an easy way and get some other useful tools as well as fast execution speed and easy to deploy an application.

All documents and configurations stored in PostgreSQL tables mainly in JSONB format that allows storing non-structured data in PostgreSQL. The workflow of the request processing is shown in the following schema.

Request processing

The current version of the pg-elastic is just prototype which supports only a few basic types of request and developed as proof-of-the-concept. More information about supported API can be found on GitHub project page.

Usage

Installation

pg_elastic written in Go programing language which is required to build the application. All dependencies could be automatically downloaded via go get command.

IMPORTANT! pg_elastic requires Go compiler version 1.9 or above.

To download source code with all dependcies into $GOPATH environment use:

1
go get github.com/asp437/pg_elastic

To build the application and place executable file in current directory use:

1
go build github.com/asp437/pg_elastic

To build and install the application int $GOPATH/bin use:

1
go install github.com/asp437/pg_elastic

Built executable file could be found at $GOPATH/bin/pg_elastic

To run dev-version of application inside its directory execute:

1
go run main.go

Basic actions

The following example written in Python shows basic requests based on tests included with pg_elastic and elasticsearch_dsl library as easy to use ElasticSearch driver

First of all, we should create a document type to store:

1
2
3
4
5
6
7
8
class Tweet(DocType):
    message = Text(analyzer='english')
    user = Text()
    post_date = Date()

    class Meta:
        index = 'twitter'
        doc_type = 'tweet'

The next piece of code shows how to store documnts:

1
2
3
4
message = 'trying out Elasticsearch'
user = 'kimchy'
tweet = Tweet(meta={'id': 1}, message=message, user=user, post_date = datetime.now())
tweet.save()

The following code searches over the collection of documents:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
s = Search(index="twitter") \
    .query("match", message="try")
response = s.execute()

s = Search(index="twitter") \
    .query("match", message="trying")
response = s.execute()

s = Search(index="twitter") \
    .query("match", message="elastic")
response = s.execute()

Migration

To migrate data from existing ElasticSearch cluster into PostgreSQL instance for further usage with pg_elastic use migration tool which is available in the pg_elastic_migrate subpackage. It is automatically downloaded together with pg_elastic itself.

To build the application and place executable file in current directory use:

1
go build github.com/asp437/pg_elastic/pg_elastic_migrate

To build and install it into $GOPATH/bin use:

1
go install github.com/asp437/pg_elastic/pg_elastic_migrate

To use the tool you should provide information about both ElasticSearch source server and PostgreSQL destanation instance. For example(all parameters are optional):

1
pg_elastic_migrate -elasticsearch-host=localhost:9200 -postgresql-host=localhost:5432 -postgresql-user=postgres -postgresql-password=postgres -postgresql-database=postgres

Built executable file could be found at $GOPATH/bin/pg_elastic_migrate

Use -h flag to get more information about tool’s parameters.

Conclusion

The development of the prototype showed that it is a quite challenging task to fully simulate the behavior of ElasticSearch in PostgreSQL. Nevertheless, it may worth in some cases, especially for systems with PostgreSQL as the main data storage and ElasticSearch as an FTS engine due to removing of the synchronization process.


Written by Aleksandr Parfenov in PostgreSQL on 15/01/2018. Tags: PostgreSQL, FTS, ElasticSearch,

Comments

comments powered by Disqus