Backup / Recover
Always plan for the worst.
Take steps to limit time loss and avoid headaches by doing regular backups of your Pydio Cells instance and data. You'll be all set for a speedy recovery in case of a data or configuration loss (that can happen if you're hit by a disaster such as hard-drive failure, power loss, bad upgrade process, etc.)
This procedure is adapted for simple mono-node installation, in such case, you have to backup 2 sources of data:
CELLS_WORKING_DIRcontains the configuration, the log files, some indexes that uses key-value stores (typically for search indexes) and the default data location. If you have followed our recommanded best practices, it is located under
/var/cellsin Linux-like machines. Please refer to the working directories page for further details. . The corresponding database
If you have made a multi-node installation, be aware that you must backup both storage and datasource for each one of your nodes.
Backup configuration and default data location
In the case of a vanilla installation, the location defined by
CELLS_WORKING_DIR environment variable contains almost everything:
- it contains all configuration
- it is also the parent of the
datasubfolder that contains the default datasources that are created at installation time. Please note that these default datasources contains in addition to the default business datasources, the hidden internal datasources that are used to store among others version files, thumbnails, etc.
Create a backup of this folder, for instance:
# Stop Cells before doing the backup sudo systemctl stop cells # Using rsync, first time will be longer and then we will only incrementally add and/or remove new files rsync -avr --delete /var/cells/ /home/pydio/backups/cells # Restart Cells sudo systemctl start cells
In this kind of setup, you cannot perform the backup while the service is running: the default key-value store that is used by some of the services, for e.g log and search indexes, must be stopped when you copy the files or the restored version will be corrupted.
WARNING: proceed with extra care with rsync when using the
inverting source and target folders will wipe everything in the source folders...
So be careful to really use:
rsync -avr --delete <source folder> <target folder>
- We advise you to use rsync to regularly backup this folder: after first time, the followings will be quicker because it only transfers the difference.
- If you ever happen to mess with your installation without impacting the DB you can always restore this folder to your latest backup and then do a
./cells startwith a new (or a restored old) binary. You should be good to go.
- All the information about network, database and plugins configuration are stored inside the
Backup additionnal datasources
File system datasource
All file based datasources are defined by 4 things:
- the configuration that is stored in the
pydio.jsonfile (see above)
- the files that are stored in this datasource
- a related index that is stored as tables in the configured DB
- for file system based DS: a
.minio.sysfolder that is located in the parent folder of the datasource root folder. This hidden folder contains s3/minio meta data for the files and is shared with all datasources that are located in the same parent folder.
To backup datasources, we must backup configuration, files and indexes. The index and the content of the
.minio.sys folder can be automatically restored after data restoration by running a resync from the admin interface. But including this folder in your backups will gain quite some time on restoration.
For "structured storage" datasource, indexes can be rebuilt automatically. But indexes are really important for "flat storage" datasources, as files are stored as blob inside the storage, and if loosing indexes does not loose any actual data, it will loose the files and folders names and structure! Specific tools are provided to dump DB index directly into your storage (inside a specific file) to secure your backup/recovery processes.
You can find more details about datasources here. It is a good read to understand how data is actually stored inside the storage.
This is one of the main advantage of using a S3 backend for your datasources: this system is fully decoupled from your Pydio Cells instance. You should be able to directly manage your backup policies via your S3 provider manager interface.
The configuration of this datasource is fully stored in the main
pydio.json file and is backed up and restored together with your main system.
Note that the Enterprise distribution offers two more datasource types (namely Google Cloud and Azure Blob storage) that are to be backed up the same way.
Backup the database
In a vanilla single node instance, at installation, you have configured a database connection, usually the
cells DB. In such case, you only need to backup (and restore) this single database.
To perform a backup, you can use default mysql tool that is usually installed with your DB software.
Typically, on Linux:
mysqldump -u pydio -p cells > /home/pydio/backups/cells_sqldump_$(date -u +%Y-%m-%d).sql
-u <user>: defines the user to be used
-pprompts you for the mysql password
cells: your database name
> /home/pydio/backups/cells_sqldump_$(date -u +%Y-%m-%d).sqltells the tool where to store resulting data.
You might adapt these options to your use case.
Restoring the data
If you have followed the above steps, restoring the data is quite easy:
- Restore the content of the
/var/cellson Linux or
~/Library/Application\ Support/Pydio/cellson MacOS), insure files ownership and permissions are OK.
- Restore the database from your sql backup file: make sure to create an empty
cellswith correct owner and use the following command:
mysql -u pydio -p cells < /home/pydio/backups/cells_sqldump_<relevant_date>.sql
- Optionally restore the folders of any additional external filesystem datasources
Post restoration and before launching the app
If you restore the application on a different server or if some of the configuration like database, URLs, IP addresses have changed, you must double check in the
pydio.json configuration file that can be found at the root of the
CELLS_WORKING_DIR folder and adapt the values to the new one, typically:
- if hostname as changed, change
.defaults.urlInternalproperty, you might also want to check
.PeerAdressproperty of the various DS
- if public URL has changed, you have to change all occurrences of it (currently 4 in v1.4 and newer)
- if DB configuration has changed: adapt
You can then retrieve the relevant Pydio Cells binary and simply relaunch the app.
You should connect as an
admin user after first restart and check that all datasources are correctly up and running, and optionally also rerun a sync if you where not able to correctly backup and restore the index and parent
If you want to uninstall Cells, you need to :
- Stop cells service
- Drop the database cells
- Empty the Working Directories
WARNING: be careful not to remove the parent 'pydio' folder, other applications, typically the Cells Sync Client, have their working default directory as siblings of the
cells folder and store configuration and data inside it.