...
SkySync is a highly scalable file and folder transfer engine capable of managing synchronization and copy processing operations between many different file management platforms. It offers significant flexibility around all aspects of enterprise file logistics. However, with flexibility comes a level of complexity in certain product configuration and deployment options.
The whitepaper provides prescriptive guidance around a series of scenario-based deployment models. The goal is to use these models to highlight certain configuration aspects around these models. The ultimate purpose of this document is to educate migration/synchronization administrators on the concepts necessary for a successful and well-performing SkySync deployment.
Transfer Performance Factors
Before this document addresses the various performance models, it is important to understand the factors that influence transfer performance. These variables all have a significant effect, positive or negative, on the throughput that any migration will achieve.
Below is a list of the most critical factors that influence transfer performance:
1)Corpus Profile: For the purposes of this document, the collection of all documents and folders located in any given storage platform is known as the “Corpus”. The constitution of the corpus can have a significant impact on transfer throughput. Particularly when at least one of the transfer endpoints employs an API for managing content (basically any endpoint other than a standard server file system), the size and number of documents can have a dramatic effect on performance.
Given 100GB of data, if that 100GB consists of (10,240) 10MB files, transfer throughput will be considerably higher than if that data consisted of (209,715) 500KB files. This is because SkySync will have to make approximately 200,000 more API calls to transfer the 100GB of 500KB files vs 100GB of 10MB files.
So if the corpus is weighted more towards many small files versus relatively fewer large files, it should be expected that the transfer throughput will generally be lower due to the latency expense of significantly more API calls.
2) Source or Destination Rate Limiting: When at least one of the transfer endpoints employs an API for managing content, those systems will generally employ some form of API “rate limiting”. Essentially, most API based endpoint platforms implement algorithms that throttle the number of API calls that can be made in a specified length of time.
This is a necessary device to ensure that all tenant environments have an opportunity to experience the same level of performance. It also provides a mechanism to guard against Denial of Service (DoS) attacks that could potentially cause all tenant environments to become inoperable.
Typically, when a highly scalable transfer application such as SkySync “trips” the API throttle mechanism, the API platform will send a rate limit message to the calling application. When this happens, SkySync has no choice but to “back off” and wait for an increasing amount of time until throttle messages are no longer encountered. This process of throttling and backing off can result in a significant decrease in transfer throughput. SkySync utilizes “Smart-Throttling” technology to pull and push data as fast as it possibly can while respecting these rate limit messages.
3) Source Platform read performance: It is possible to saturate a source platform with requests such that adding additional job processing servers and threads no longer improves transfer performance. Also, when saturation occurs, end users are typically being affected. So, we must balance transfer performance with the ability of end users to continue performing job functions.
...
Ultimately, it is impossible to accurately predict transfer throughput performance until the end-to-end infrastructure is configured and testing is performed. Based on various transfer throughput metrics, it is possible to make small adjustments to configuration or major adjustments like adding hardware resources to improve the weakest link within the throughput performance chain. Initial phases of deployment and configuration will include a series of performance tuning configuration adjustments. Throughput metrics should not be gathered until performance tuning has been completed.
Guidance Models
SkySync can scale from very small, easy to implement single server deployments to very large multi-server clustered deployments. For simple implementations with a few million documents or less where transfer performance isn’t of primary concern, SkySync can be implemented very easily. But for larger corpus transfer solutions or when throughput is of high value, it’s important to understand the correct architectural considerations.
...
The models below are provided for the purposes of providing general architectural examples that serve as a good starting point for solution design. SkySync can be easily scaled up if necessary and prudent.
Model A – Low Volume / Modest Throughput
This model represents deployments that include the transfer or synchronization of (3) million file and folder objects or less. Achievable throughput will be somewhat modest and generally isn’t the top priority for the solution. This will be an easy configuration with minimal planning requirements.
...
While this whitepaper can help and deployments can be fully managed by typical administrative staff, SkySync recommends that the Client Solutions Silver Launch Package be utilized to ensure solution success.
Model A SkySync Architecture Guidance
The following architecture configuration concepts apply to Model A:
This is a single server model with no clustering configured
8+GB RAM, 60GB+ system drive, dual core processor or better
Windows Server 2008 SP2 or newer, fully patched, with .NET framework 4.5 or newer
SkySync installation generally consists of accepting the default answers which results in the creation and usage of a local SQL CE database
SQL CE supports a maximum database size of 4GB
If the 4GB SQL CE database size is reached, it is possible to convert the database to a full SQL Server database format
When SkySync is deployed to use SQL CE for the database, a maximum of (2) jobs may be run simultaneously
This model is intended for ease of implementation as opposed to high throughput and/or redundancy
Model B – Moderate Volume and/or Moderate Throughput
This model represents deployments that include the transfer or synchronization of (10) million file and folder objects or less at a moderate throughput. This model essentially represents the scale of what can be reasonably accomplished with a single server solution that is provisioned with better than standard resources. This solution will require some planning around database configuration and overall server specifications.
...
Again, this whitepaper will be useful in deploying this solution model, but SkySync strongly encourages all customers to take advantage of the Client Solutions Gold Launch Package be utilized to ensure solution success.
Model B SkySync Architecture Guidance
...
This is a single server model with no clustering configured
32+GB RAM, 60GB+ system drive (solid state if possible), 4 or 8 core processor or better
Windows Server 2008 SP2 or newer, fully patched, with .NET framework 4.5 or newer
Before SkySync installation, SQL Server Express or full SQL Server Standard/Enterprise should be deployed on the SkySync processing server
Database Planning and Tuning Concepts (below) should also be implemented
SQL Server Express supports a maximum database size of 10GB
SQL Server Standard/Enterprise maximum database size is not a factor for this model
When SkySync is deployed to use SQL Server Express/Standard/Enterprise for the database, a maximum of (6) jobs may be run simultaneously by default
SkySync Tuning Concepts (below) may be employed to increase throughput when processing server resources are available and rate limiting is not a factor
This model is intended for advanced, single server implementation as opposed to the simpler default install that uses SQL CE
Model C – High Volume and/or High Throughput
This model represents deployments that include the transfer or synchronization of more than (10) million file and folder objects and/or transfers or synchronizations that require a very high throughput. Achievable throughput will be significantly determined by available bandwidth, network latency, SQL Server I/O and other resources, number of processing servers/threads and the level of API rate limiting on either the source or destination.
...
This solution will require very careful planning around database configuration, SkySync cluster configuration, cluster specifications, and cluster location with respect to source and destination location. To be able to achieve the highest levels of throughput, SkySync Client Solutions will need to be engaged to assist with planning, deployment and cluster tuning processes.
Model C SkySync Architecture Guidance
...
“Standard” storage performance for a given organization disk subsystem will be sufficient when there are from approximately 1 to 3 or maybe 4 processing servers. Anything more than that or when a very large volume of file and folder objects will be under transfer management (10’s of millions) and the storage volume performance must be better than standard. For these extreme scale SkySync solutions with (5) or more clustered processing servers, it’s good to start thinking about Solid State Drive (SSD) class or Tier 1 class storage. Some organizations consider SSD class performance to be Tier 0. The point here is that very high IOPS support and very low latency becomes the single most important factor in ensuring high sustained transfer throughput and overall solution stability throughout the SkySync processing cluster. The SkySync transfer engine can be massively parallelized and given the amount of data logging that occurs during data transfer operations, this can put tremendous READ/WRITE pressure on the SQL Server database disk subsystem.
Guidance for disk performance is simple for extreme scale solutions. If throughput is the highest priority, then the SkySync SQL Server should be provisioned with the best available disk the organization has access to. If the disk is strong enough, then additional processing servers can be added to the cluster. If not, then adding additional processing servers will actually make transfer performance or cluster stability worse. So the key then becomes how to understand whether or not the disk subsystem is “keeping up” with requests.
The easiest way to monitor the SQL disk I/O subsystem to ensure that it is servicing requests fast enough is to monitor the Disk Queue Length available in Resource Monitor. The easiest way to do this is to launch the Windows Task Manager, navigate to the “Performance” tab where basic system performance can be viewed first. Then click the “Open Resource Monitor” link at the bottom of the Task Manager
Once the Resource Monitor is open, navigate to the “Disk” tab and observe the disk queue length for the logical disk associated with the volume where the SkySync database and TempDB database data files are stored. The general rule of thumb is that Disk Queue Length should be less than the total number of physical drives that comprise the volume. That can get complicated with certain storage arrays where the number of disks for a given volume is obfuscated or, in the case of an SSD volume, there may only be a single drive.
In general, the ideal Disk Queue Length is less than 1. Anything in the single digit range during heavy processing is generally OK. If Disk Queue Length is in the low double-digit range then the disk is a little under powered and is beginning to impact performance. But if Disk Queue continues to climb, or is in the hundreds or thousands range, then your disk subsystem is certainly insufficient to handle incoming requests. In this situation, the disk subsystem will need to be upgraded to improve transfer throughput. If this is not possible, then it is recommended to shut down one or more SkySync processing servers while monitoring Disk Queue Length until reasonable numbers are achieved and maximum transfer throughput is identified.
...
As data and log files grow in a SQL Server, by default, during the extent operation SQL Server will overwrite any data in the new segment with zeros. This is a security measure designed to make sure that data from any old files, that once consumed the same space on disk, have no possibility of being read by SQL administrators. It is a pretty edge case scenario but it is a very minor security risk so, by default, Instant File Initialization is not enable by default which allows the “zero overwrite” to occur.
This becomes a relevant concept as SQL databases grow by substantial amounts (another topic addressed below). An extent operation of several hundred megabytes or even gigabytes can take many seconds or even minutes to occur under the default condition when zeros must be written. During this time, the tables in the database are blocked. Blocked tables are very bad for SkySync’s highly tuned transfer scheduler engine. This condition can cause unexpected behavior as job processing threads try to figure out how to continue.
There are (2) ways to combat this condition. The best way to ensure fast extent operations is to implement Instant File Initialization. Another way, also recommended as a best practice, is to pre-create multiple data files that are pre-sized in anticipation of the amount of data that may potentially be stored in the database. This latter option will be addressed below in further detail.
The issue with the second option is that it can be difficult for inexperienced SkySync Administrators and DBAs to determine just how big those pre-sized files should be. So, the tendency is to over-allocate which can waste precious high performance disk space. Essentially, the best solution is a combination of the two. Pre-create and conservatively pre-size the SkySync and TempDB database data files to minimize extent operations. Then enable Instant File Initialization to ensure that if there must be extent operations, they happen quickly.
Steps to Enable Instant File Initialization
...
5) Restart the SQL Server service
Pre-Creating and Pre-Sizing SQL Data Files
SQL Server experts suggest that multiple database files can have a meaningful impact on database performance. Paul Randal is one of the leading authorities on all things SQL Server. His company, SQLSkills, maintains a website full of useful and reliable information regarding SQL Server. Among the many articles in his “In Recovery…” blog is this one which highlights the benefits of properly scaling out database data files. It also draws attention to potential performance pitfalls of not doing it correctly.
When a database is first created in SQL Server through the user interface, the database will be created with a single “mdf” data file by default. It is possible to create this database with the primary “mdf” data file and then multiple “ndf” data files as well. This gives the database an opportunity to store content across multiple data files.
Configuring SQL data files in this way allows for improved data access performance because it allows SQL Server to multi-thread disk I/O operations. This is particularly helpful when the administrator has the freedom to store each of the data files on a unique, high performance disk volume. This technique can be used to enhance the ability of the SkySync solution to scale out with many processing servers.
The general rule of thumb for the number of data files that should be created is ¼ to ½ the number of physical cores available to SQL Server. For an 8 core SQL Server, deploying (2) to (4) data files is ideal. For a 16 core SQL Server (4) to (8) files would be good. For fewer cores, trend towards the “1/2” number. For a machine with 16 cores or more, trending towards the “1/4” number is generally typical.
...
Given the highly transactional nature of any migration platform, database maintenance plans become very important. Without proper maintenance, indexes can quickly become fragmented resulting in reduced performance.
Info |
---|
It is beyond the scope of this document to provide specific scripts for database maintenance. However, it is important to run index defragmentation and reorganization scripts at least twice per week to ensure that SkySync tables are properly maintained. If maintenance scripts are not standardized in the organization, Ola Hallengren provides very useful scripts that can rebuild indexes across all tables in a database but only when necessary. |
...
Like any migration solution, the data in the database can generally be considered transient. In other words, even if the database is lost, organizational content is not lost. This means that while database backups are useful to minimize any lost migration or synchronization processing time, recent log backups are not necessary.
Info |
---|
For these reasons, it is recommended that the SkySync database be configured for Simple Recovery model. There is no need to waste storage space and processing resources on data logging. |
...
With the SkySync database operating in the Simple Recovery model, database backups are straightforward. For high throughput solutions, a nightly full backup is generally sufficient. However, if the Recovery Point Objective (RPO) for the organization indicates that a full day of migration/synchronization processing is too much time loss, then additional incremental backups can be scheduled at intervals throughout the day.
Ideally, database backups should be executed with compression enabled to minimize backup storage size and backup duration. However, this will come at a cost of increased CPU utilization on the SQL Server which is important to consider.
...
2)Click on the performance tab in the SkySync configuration window:
3)Increase or decrease parallel writes as necessary by (1) or (2) at a time.
...