ElasticSearch Search Engine

All Plugins / Index / Elasticsearch

Identity Card

Plugin LabelElasticSearch Search Engine
Short DescriptionElasticSearch implementation to index all files and search a whole repository quickly.
Plugin Identifierindex.elasticsearch
AuthorCharles du Jeu
Dependenciesaccess.smb, access.imap, access.swift, access.s3, access.inbox, access.demo, access.fs, access.dropbox, access.webdav, access.sftp_psl, access.smbicewind, access.sftp, access.ftp


To get the last version of Elastica go to the following site and in the "Download" part take the zip format. Then unzip the whole archive and copy the lib/Elastica folder into plugins/index.elasticsearch of your Pydio folder.

Why ElasticSearch?
Based on Apache Lucène and more powerful and efficient.

Why Elastica?
This is the PHP implementation that has the most features available currently.

Install ElasticSearch on Ubuntu:
To be able to install ElasticSearch you have to get and install the openjdk 7. In order to install the openjdk 7 you just have to run the following command lines:

$ sudo apt-get update
$ sudo apt-get install openjdk-7-jre-headless -y

Then download the latest package version (here it's the version 0.90.2) with the following command:
$ wget https://downloads.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.2.deb

You just have now to run the following commands to install the package:
$ sudo dpkg -i elasticsearch-0.90.2.deb
$ sudo service elasticsearch start
$ rm elasticsearch-0.90.2.deb

In order to test ElasticSearch you can do the following (in a terminal):

- Create an index called "test_index":
$ curl -XPUT 'localhost:9200/test_index'

- Then add a document with the type "test_type" with the following command:
$ curl -XPUT 'localhost:9200/test_index/test_type/1' -d '{"user":"test", "message":"this is a test"}'
This will add a document with id equals to 1 containing two fields "user" and "message" that respective values are "test" and "this is a test".
The document will be added in the index "test_index" under the type "test_type".

- You can retrieve the documents by their id:
$ curl -XGET 'localhost:9200/test_index/test_type/1'
If you followed the previous steps the result should be something like this:
    "_source" : {
        "message":"this is a test"

- You can retrieve documents by queries (there are a lot of different queries for more information click here):
$ curl -XGET 'localhost:9200/test_index/test_type/_search' -d '{
            "message":"this is a test"
The result should be something like that:
                "_source" : {
                    "message":"this is a test"

Use ElasticSearch in Pydio:

To use ElasticSearch in Pydio you have to follow the steps concerning ElasticSearch and Elastica installation first. Then you can go to Pydio GUI and set ElasticSearch the same way you would set Lucène.

Go to Pydio Settings->Workspaces and Users->Workspaces, then just choose your workspace and go to repository features. Here just remove the Lucène Search Engine (if you want to set ElasticSearch) and search the meta plugin "ElasticSearch Search Engine". Then you can use the indexation and then the search function when you go to the choosen workspace.

Concerning the use of Elastica:

- Here is a code snippet to index a document just like the test we previously made:
$client = new Elastica\Client(array("host" => "localhost", "port" => "9200"));
$index = $client->getIndex("test_index");


$type = $index->getType("test_type");

$data = array();
$data["user"] = "test";
$data["message"] = "this is a test";
$id = 1;
$doc = new Elastica\Document($id, $data);

- Here is the code snippet that allows you to get docs by id and by query (this is the equivalent as the examples for the test):
$client = new Elastica\Client(array("host" => "localhost", "port" => "9200"));
$index = $client->getIndex("test_index");


$type = $index->getType("test_type");

/* document fetched by id*/
$doc = $type->getDocument(1)

/* documents fetched by query */
$query = "this is a test";
$matchQuery = new Elastica\Query\Match();

$resultSet = $type->search($matchQuery);
$results = resultSet->getResults();

/* now you can go all over the results */

foreach($results as $hit){
    /* print the data from the hit */

Plugin parameters

Max results displayed *
Set the maximum results that will be displayed.Integer200
Parse Content Until *
Skip content parsing and indexation for files bigger than this size (must be in Bytes)String500000
HTML files *
List of extensions to consider as HTML file and parse contentStringhtml,htm
Text files *
List of extensions to consider as Text file and parse contentStringtxt
Unoconv Path
Full path on the server to the 'unoconv' binaryString
PdftoText Path
Full path on the server to the 'pdftotext' binaryString
Automatically append a * after the user query to make the search broaderBooleanfalse
Wildcard limitation
For the sake of performances, it is not recommanded to use wildcard as a very first character of a query string. Lucene recommends asking the user minimum 3 characters before wildcard. Still, you can set it to 0 if necessary for your usecases.Integer3

Instance parameters

ElasticSearch Host *
ElasticSearch Server host (without http)Stringlocalhost
ElasticSearch Port *
ElasticSearch Server port (default 9200)Integer9200
Index Content *
Parses the file when possible and index its content (see plugin global options)Booleanfalse
Index Meta Fields
Which additionnal fields to index and searchString
Repository keywords
If your workspace path is defined dynamically by specific keywords like AJXP_USER, or your own, mention them here.String