Pydio Cells v3.0 - Accelerated Performance

Created on 2021/11/30

Cells V3 comes with tons of improvements for an even more streamlined user experience. One of the main focuses for our development team was server speed and performance under the hood. This article will look at the new datasource format we implemented, improvements to internal communication between services, and caching optimizations.

1 - New “Flat” Datasources

Pydio users often ask for the ability to "keep the tree structure" of files visible on the storage. To achieve this, Cells’ datasource relied on unidirectional synchronization between storage and the internal index. While it allowed files to be modified directly without going through Pydio, it also brought its share of issues and performance limitations with large numbers of files – so much so that sometimes a "resync" of the datasource was required to fix index issues.

The “structured data” flow: 

DataSource Struct.png

Cells V3 introduces a new datasource format that keeps the tree structure only in Cells’ internal indexes and stores files as a flat structure on the storage. This is more in line with the "object storage" design and brings huge performance gains, as a "sync" is no longer required to maintain the indexes. Where structured datasources used to need to wait for storage events to update indexes, flat datasources now directly update indexes at upload/modification. 

The “flat data” flow:

DataSource Flat.png

This allows the files to appear more quickly in the UX and, above all, fixes the issue of moving/renaming huge folders. Instead of reacting to events and applying complex algorithms to detect data changes, move/rename is just a matter of updating the index while the data itself is left untouched. This basically improves the move/rename operation (inside a datasource) by a factor of 10 or 100, depending on the folder size!

2 - Faster Internal Communication

At its core, Cells is designed with a “microservice” architecture, splitting all domain-specific features into independent services that communicate with each other via APIs. This is a great way to provide stability (as long as the API “contract” is honored, everything works as expected, even if the underlying implementation is totally rewritten) and scalability (each microservice can be distributed or replicated on multiple servers).

This type of architecture requires a “message bus” to convey all event-based and request/response communication between all services. To not reinvent the wheel, Cells is based on NATS.io technology to deal with multi-node deployments. But in many cases, Cells is deployed on a single-node server, and the developer team found out that we could skip the NATS network layer when running Cells on a single machine.

Adding an “in-memory” communication layer simplifies communication and improves performance dramatically, and eliminates the need for NATS on single-node deployments. The internal web proxy also benefits by implementing its own internal DNS resolver, avoiding the multiple “caddy restarting…” that could be seen at startup while all services were not yet running. 

The biggest performance gains can be seen in start-up/restart time Cells, which have decreased from an average of 20s to 8s. 

3 - Caching to Improve Response Times

The team also introduced a number of new caching mechanisms that improve response time on many frequently used requests. Typically, user Access Control Lists have to be checked on each request to ensure users have the proper access permissions. These ACLs don’t change frequently, so they’re a good candidate for in-memory caching. Files’ and folders’ internal data can also be cached with a short “Time To Live” to improve response times for high-frequency requests. 

These are small changes, but they have yielded a 5X  impact on performance for these functions.

These cache layers are all based on storing data inside process RAM. A hard limit is fixed on each cache, with a default value of 8MB. If you have plenty of memory on the server, we recommend raising this limit using the CELLS_CACHES_HARD_LIMIT environment variable, expressed in MB.

Need to Balance Ease-of-Use with Security? Pydio Cells Can Help.

If your organization is serious about secure document sharing and collaboration you need to check out Pydio Cells. Cells was developed specifically to help enterprises balance the need to collaborate effectively with the need to keep data secure.

With robust admin controls, advanced automation capabilities, and a seamless, intuitive end-user experience Pydio is the right choice for organizations looking to balance performance and security without compromising on either. Try Cells live for yourself. Or click on the button below to talk to a Pydio document sharing specialist.