ElasticSearch Search Engine

All Plugins / Index / Elasticsearch

Identity Card

StatusCore
Plugin LabelElasticSearch Search Engine
Short DescriptionElasticSearch implementation to index all files and search a whole repository quickly.
Plugin Identifierindex.elasticsearch
AuthorCharles du Jeu
Urldocs/references/plugins/index/elasticsearch
Dependenciesaccess.smb, access.imap, access.swift, access.s3, access.inbox, access.demo, access.fs, access.dropbox, access.webdav, access.sftp_psl, access.smbicewind, access.sftp, access.ftp

Documentation

To get the last version of Elastica go to the following site and in the "Download" part take the zip format. Then unzip the whole archive and copy the lib/Elastica folder into plugins/index.elasticsearch of your Pydio folder.

Why ElasticSearch?
Based on Apache Lucène and more powerful and efficient.

Why Elastica?
This is the PHP implementation that has the most features available currently.

Install ElasticSearch on Ubuntu:
To be able to install ElasticSearch you have to get and install the openjdk 7. In order to install the openjdk 7 you just have to run the following command lines:

$ sudo apt-get update
$ sudo apt-get install openjdk-7-jre-headless -y

Then download the latest package version (here it's the version 0.90.2) with the following command:
$ wget https://downloads.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.2.deb

You just have now to run the following commands to install the package:
$ sudo dpkg -i elasticsearch-0.90.2.deb
$ sudo service elasticsearch start
$ rm elasticsearch-0.90.2.deb

In order to test ElasticSearch you can do the following (in a terminal):

- Create an index called "test_index":
$ curl -XPUT 'localhost:9200/test_index'

- Then add a document with the type "test_type" with the following command:
$ curl -XPUT 'localhost:9200/test_index/test_type/1' -d '{"user":"test", "message":"this is a test"}'
This will add a document with id equals to 1 containing two fields "user" and "message" that respective values are "test" and "this is a test".
The document will be added in the index "test_index" under the type "test_type".

- You can retrieve the documents by their id:
$ curl -XGET 'localhost:9200/test_index/test_type/1'
If you followed the previous steps the result should be something like this:
{
    "_index":"test_index",
    "_type":"test_type",
    "_id":"1",
    "_version":1,
    "exists":true,
    "_source" : {
        "user":"test",
        "message":"this is a test"
    }
}

- You can retrieve documents by queries (there are a lot of different queries for more information click here):
$ curl -XGET 'localhost:9200/test_index/test_type/_search' -d '{
    "query":{
        "match":{
            "message":"this is a test"
        }
    }
}'
The result should be something like that:
{
    "took":143,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":0.30685282,
        "hits":[
            {
                "_index":"test_index",
                "_type":"test_type",
                "_id":"1",
                "_score":0.30685282,
                "_source" : {
                    "user":"test",
                    "message":"this is a test"
                }
            }
        ]
    }
}


Use ElasticSearch in Pydio:

To use ElasticSearch in Pydio you have to follow the steps concerning ElasticSearch and Elastica installation first. Then you can go to Pydio GUI and set ElasticSearch the same way you would set Lucène.

Go to Pydio Settings->Workspaces and Users->Workspaces, then just choose your workspace and go to repository features. Here just remove the Lucène Search Engine (if you want to set ElasticSearch) and search the meta plugin "ElasticSearch Search Engine". Then you can use the indexation and then the search function when you go to the choosen workspace.


Concerning the use of Elastica:

- Here is a code snippet to index a document just like the test we previously made:
$client = new Elastica\Client(array("host" => "localhost", "port" => "9200"));
$index = $client->getIndex("test_index");

if(!$index->exists()){
    $index->create();
}

$index->open();
$type = $index->getType("test_type");

$data = array();
$data["user"] = "test";
$data["message"] = "this is a test";
$id = 1;
$doc = new Elastica\Document($id, $data);

$type->addDocument($doc);
$index->close()
- Here is the code snippet that allows you to get docs by id and by query (this is the equivalent as the examples for the test):
$client = new Elastica\Client(array("host" => "localhost", "port" => "9200"));
$index = $client->getIndex("test_index");

if(!$index->exists()){
    $index->create();
}

$index->open();
$type = $index->getType("test_type");

/* document fetched by id*/
$doc = $type->getDocument(1)

/* documents fetched by query */
$query = "this is a test";
$matchQuery = new Elastica\Query\Match();
$matchQuery->setField("message");
$matchQuery->setQueryString($query);

$resultSet = $type->search($matchQuery);
$results = resultSet->getResults();

/* now you can go all over the results */

foreach($results as $hit){
    /* print the data from the hit */
    print_r($hit->getSource());
}

Plugin parameters

LabelDescriptionTypeDefault
Max results displayed *
MAX_RESULTS
Set the maximum results that will be displayed.Integer200
Parse Content Until *
PARSE_CONTENT_MAX_SIZE
Skip content parsing and indexation for files bigger than this size (must be in Bytes)String500000
HTML files *
PARSE_CONTENT_HTML
List of extensions to consider as HTML file and parse contentStringhtml,htm
Text files *
PARSE_CONTENT_TXT
List of extensions to consider as Text file and parse contentStringtxt
Unoconv Path
UNOCONV
Full path on the server to the 'unoconv' binaryString
PdftoText Path
PDFTOTEXT
Full path on the server to the 'pdftotext' binaryString
Auto-Wildcard
AUTO_WILDCARD
Automatically append a * after the user query to make the search broaderBooleanfalse
Wildcard limitation
WILDCARD_LIMITATION
For the sake of performances, it is not recommanded to use wildcard as a very first character of a query string. Lucene recommends asking the user minimum 3 characters before wildcard. Still, you can set it to 0 if necessary for your usecases.Integer3

Instance parameters

LabelDescriptionTypeDefault
ElasticSearch Host *
ELASTICSEARCH_HOST
ElasticSearch Server host (without http)Stringlocalhost
ElasticSearch Port *
ELASTICSEARCH_PORT
ElasticSearch Server port (default 9200)Integer9200
Index Content *
index_content
Parses the file when possible and index its content (see plugin global options)Booleanfalse
Index Meta Fields
index_meta_fields
Which additionnal fields to index and searchString
Repository keywords
repository_specific_keywords
If your workspace path is defined dynamically by specific keywords like AJXP_USER, or your own, mention them here.String