Changing the default datasource storage type
Created on 2023/03/06,During the installation process, the application creates 3 default datasources. Namely:
- pydiods1: the default datasource for shared files. The
Common Files
workspace (that exists at startup) points here. - personal: contains "My Files"-like workspaces for each user.
- cellsdata: is used to create cells when no root node has been defined upon creation.
By default, these datasources are local FS based and the corresponding folders are created under <CELLS_DATA_DIR>
<CELLS_DATA_DIR>
is defined as <CELLS_WORKING_DIR>/data/
if not explicitely overridden.
Furthermore, the application also create three S3 buckets that are used to store internal and technical data:
- thumbs: stores the thumbnails.
- versions: stores the old versions of the files, when the versioning is enabled on a given DS.
- binaries: service docstore. It stores various things, like e.g. avatars, custom backgrounds, etc.
In the default setup, these 3 buckets are folders that are siblings of the pydiods1
folder and can be also found under <CELLS_DATA_DIR>
. These folders are created during the installation process.
More generally speaking, the system uses the object service that is linked to the default datasource to create these buckets, see defaults.datasource
attribute in the pydio.json
configuration file. The system uses this object service to retrieve connection information: where to find and how to connect to the technical buckets.
As from version 2.1 of Pydio Cells, if you want to change the default layout for both default DSs and technical buckets, we strongly advise to do so during the installation process: in the last page of the installation wizard, under "Advanced Configuration", you can choose another location for the default local FS based datasources or switch to S3-based default datasources.
This process is quite straight forward and self-explaining.
Validate datasources are correctly configured
In order to double check everything is working correctly, you might perform following checks:
- Change your account avatar: it insures the
binaries
implicit DS runs OK - Upload an image: this checks the
thumbs
implicit DS - Turn versioning on for a workspace and modify a file: this validates
versions
implicit DS - Create a Cell with no root folder
- Add a few files in both
My Files
andCommon Files
folder: you can then double check that the crorresponding buckets for instance in your S3 account contain the expected tree structure.
Legacy (version < v2.1) - Change default location still using local file system
The update process in Pydio Cells is quite robust and easy to run. We strongly advise that you use the latest version (see above).
Yet, if you are stuck with an older version, we still leave here this legacy procedure as a hint for you to change the default layout.
It can be quite tricky and could lead to an unstable system, so you have to understand what is hapenning and follow exactly the below steps.
My Files and Cells Data
It is quite easy to tweak these:
Enterprise Distribution
You just have to edit the default template path by going to:
Admin console >> show advanced parameters >> Left column menu >> Data management >> Template Path
If you have created new datasources, you might just then impact the default template paths to use you newly created DS, for instance:
# Cells
Path = DataSources.cellsdatanew + "/" + User.Name;
# My Files
Path = DataSources.personalnew + "/" + User.Name;
To use auto-completion: first move your cursor after the dot, then press CTRL + SPACE
If you want to mutualise the DS, you might also implement a more complex layout. For instance, you might do the following:
- create a new datasource (let say
dynamicws
) - create a temporary workspace that point to it and create 2 folders
personal
andcellsdata
in it - delete the temporary workspace
- impact the template path to have:
# Cells
Path = DataSources.dynamicws + "/cellsdata/" + User.Name;
# Personal Files
Path = DataSources.dynamicws + "/personal/" + User.Name;
Home distribution
As you cannot edit template paths, the only solution you have is to delete the existing default cellsdata
and personal
DSs and recreate them where it fits you most. You do not have to change anything else then.
Common files
This is just a default workspace with Read/Write permissions for everyone. You can easily edit the workspace to point to another datasource.
Technical buckets
The trick here is the way the buckets connection info are computed. It was our implementer choice to say:
- we will use a raw s3 bucket to store and retrieve our internal technical stuff (typically thumbnails)
- these buckets do not need to be real datasources (they do not need neither index nor sync mechanism)
- yet, as each defined datasource has an
object
service that might expose more than one bucket, let use one of these - so let's define a default datasource, retrieve its object service and assume it also exposes the additionnal buckets we need.
In a fresh vanilla install, the default datasource is pydiods1
that exposes the <CELLS_WORKING_DIR>/data/pydiods1
folder of your file system. The underlying object service exposes all then folders that are in<CELLS_WORKING_DIR>/data/
as S3 buckets. So we just create the thumbs
, versions
, and binaries
folders during installation.
If you were to change this, you can:
- create a new datasource that points towards the desired path (for instance
defaultds
) - adapt your
pydio.json
file to have something like:
"defaults": {
"database": "...",
"datasource": "defaultds",
"url": "https://pydio.example.com",
"urlInternal": "https://pydio.example.com:443"
},
- save and restart the app.
Legacy (version < v2.1) - Switch to Amazon S3
One more time you should really use a recent version and do this during installation process rather than following the below procedure at you own risks
You can also tune your configuration to rather have all your Datasources and Bucket on S3. We present here a simple setup that introduces well the steps you have to go through. Feel free to then adapt to your specific use case.
Create the required buckets
The main difference here is that you (still) have to manually create your buckets in your S3 administration console before creating the DSs in Cells.
Note that buckets names in AWS have to be globally unique, so you cannot use the names they have in the local vanilla setup. Let us use for instance:
- com-example-cells-personal
- com-example-cells-cells
- com-example-cells-common
- com-example-cells-thumbs
- com-example-cells-versions
- com-example-cells-binaries
Note that due to their implicit declaration (see above) via the default DS, the thumb, versions and binaries must be buckets of the same storage than the ds that is defined as default DS. Otherwise you will have to tweak the config even a little bit more.
Create the new datasources
Then, using the admin console of the web UI, we create new DSs pointing to these newly created buckets for personnal, cells and common.
default datasource | NewDatasource | Usage |
---|---|---|
pydiods1 | s3common | Common files |
personal | s3personal | My files for users |
cellsdata | s3cells | Root nodes for cells |
In Home distribution, as you cannot edit the default template path for cellsdata and personal files, you rather have to delete the existing default ds and recreate them with the exact same name pointing toward your newly created s3 buckets.
Adapt the template paths to point to the new datasources
Template paths are the mechanism that enable to have dynamic workspaces, typically depending on the name of the loǵged in user for personnal files. In Cells ED you can modify this by going to:
Admin console >> show advanced parameters >> Left column menu >> Data management >> Template Path
With the above naming, you should then modify the 2 default template path to reather have:
# Cells
Path = DataSources.s3cells + "/" + User.Name;
# My Files
Path = DataSources.s3personal + "/" + User.Name;
To use auto-completion: first move your cursor after the dot, then press CTRL + SPACE
Adapt config by directly impacting pydio.json
The pydio.json
file that is at the root of the CELLS_WORKING_DIR
is the main configuration file of your instance/node and is also dynamically updated by the app when an admin make some changes via the admin console. You should rather handle it with extra care and we always advise to:
- shutdown cells before editing the file
- do a proper backup of your file before modifying it.
Thus said, let's proceed to the next step by editing this file:
Modify the higlighted part, defaults.datasource
attribute value, that is pydiods1
by default, to your new datasource, in our example s3common
.
Also update the following properties in the services
section to use the bucket names you have created for the implicit DSs (thumbs, versions and binaries):
json
...
"pydio.docstore-binaries": {
"bucket": "binaries",
"datasource": "default"
},
...
"pydio.thumbs_store": {
"bucket": "thumbs",
"datasource": "default"
},
...
"pydio.versions-store": {
"bucket": "versions",
"datasource": "default"
},
must become (using our names, adapt to your specific context):
json
...
"pydio.docstore-binaries": {
"bucket": "com-example-cells-binaries",
"datasource": "default"
},
...
"pydio.thumbs_store": {
"bucket": "com-example-cells-thumbs",
"datasource": "default"
},
...
"pydio.versions-store": {
"bucket": "com-example-cells-versions",
"datasource": "default"
},
Final STEP
You should be now fully setup. Just restart the app and perform a few tests. If everything works fine, you can now delete the default datasources that you do not use anymore.
Troubleshooting
Note that the above procedure does not include migration of existing data, after restart you will then typically have lost all existing thumbnails that will be blank.
In order to double check everything is working correctly, you might perform following checks:
- Change your account avatar: it insures the
binaries
implicit DS runs OK - Upload an image: this checks the
thumbs
implicit DS - Turn versioning on for a workspace and modify a file: this validates
versions
implicit DS - Create a Cell with no root folder
- Add a few files in both
My Files
andCommon Files
folder: you can then double check that the crorresponding buckets in your S3 account contain the expected tree structure.