Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This model represents deployments that include the transfer or synchronization of (10) million file and folder objects or less at a moderate throughput.  This model essentially represents the scale of what can be reasonably accomplished with a single server solution that is provisioned with better than standard resources.  This solution will require some planning around database configuration and overall server specifications. Again, this whitepaper will be useful in deploying this solution model, but SkySync strongly encourages all customers to take advantage of the Client Solutions Gold Launch Package be utilized to ensure solution success.

 

Model B SkySync Architecture Guidance

...

This model represents deployments that include the transfer or synchronization of more than (10) million file and folder objects and/or transfers or synchronizations that require a very high throughput.  Achievable throughput will be significantly determined by available bandwidth, network latency, SQL Server I/O and other resources, number of processing servers/threads and the level of API rate limiting on either the source or destination. This solution will require very careful planning around database configuration, SkySync cluster configuration, cluster specifications, and cluster location with respect to source and destination location.  To be able to achieve the highest levels of throughput, SkySync Client Solutions will need to be engaged to assist with planning, deployment and cluster tuning processes.

...

Model C SkySync Architecture Guidance

...

“Standard” storage performance for a given organization disk subsystem will be sufficient when there are from approximately 1 to 3 or maybe 4 processing servers.  Anything more than that or when a very large volume of file and folder objects will be under transfer management (10’s of millions) and the storage volume performance must be better than standard. For these extreme scale SkySync solutions with (5) or more clustered processing servers, it’s good to start thinking about Solid State Drive (SSD) class or Tier 1 class storage.  Some organizations consider SSD class performance to be Tier 0.  The point here is that very high IOPS support and very low latency becomes the single most important factor in ensuring high sustained transfer throughput and overall solution stability throughout the SkySync processing cluster.  The SkySync transfer engine can be massively parallelized and given the amount of data logging that occurs during data transfer operations, this can put tremendous READ/WRITE pressure on the SQL Server database disk subsystem.

 

Guidance for disk performance is simple for extreme scale solutions.  If throughput is the highest priority, then the SkySync SQL Server should be provisioned with the best available disk the organization has access to.  If the disk is strong enough, then additional processing servers can be added to the cluster.  If not, then adding additional processing servers will actually make transfer performance or cluster stability worse.  So the key then becomes how to understand whether or not the disk subsystem is “keeping up” with requests.

 

The easiest way to monitor the SQL disk I/O subsystem to ensure that it is servicing requests fast enough is to monitor the Disk Queue Length available in Resource Monitor.  The easiest way to do this is to launch the Windows Task Manager, navigate to the “Performance” tab where basic system performance can be viewed first.  Then click the “Open Resource Monitor” link at the bottom of the Task Manager

Once the Resource Monitor is open, navigate to the “Disk” tab and observe the disk queue length for the logical disk associated with the volume where the SkySync database and TempDB database data files are stored. The general rule of thumb is that Disk Queue Length should be less than the total number of physical drives that comprise the volume.  That can get complicated with certain storage arrays where the number of disks for a given volume is obfuscated or, in the case of an SSD volume, there may only be a single drive.  

In general, the ideal Disk Queue Length is less than 1.  Anything in the single digit range during heavy processing is generally OK.  If Disk Queue Length is in the low double-digit range then the disk is a little under powered and is beginning to impact performance.  But if Disk Queue continues to climb, or is in the hundreds or thousands range, then your disk subsystem is certainly insufficient to handle incoming requests.  In this situation, the disk subsystem will need to be upgraded to improve transfer throughput.  If this is not possible, then it is recommended to shut down one or more SkySync processing servers while monitoring Disk Queue Length until reasonable numbers are achieved and maximum transfer throughput is identified.

...

As data and log files grow in a SQL Server, by default, during the extent operation SQL Server will overwrite any data in the new segment with zeros.  This is a security measure designed to make sure that data from any old files, that once consumed the same space on disk, have no possibility of being read by SQL administrators.  It is a pretty edge case scenario but it is a very minor security risk so, by default, Instant File Initialization is not enable by default which allows the “zero overwrite” to occur.

...

This becomes a relevant concept as SQL databases grow by substantial amounts (another topic addressed below).  An extent operation of several hundred megabytes or even gigabytes can take many seconds or even minutes to occur under the default condition when zeros must be written.  During this time, the tables in the database are blocked.  Blocked tables are very bad for SkySync’s highly tuned transfer scheduler engine.  This condition can cause unexpected behavior as job processing threads try to figure out how to continue.

 

There are (2) ways to combat this condition.  The best way to ensure fast extent operations is to implement Instant File Initialization.  Another way, also recommended as a best practice, is to pre-create multiple data files that are pre-sized in anticipation of the amount of data that may potentially be stored in the database.  This latter option will be addressed below in further detail.  

The issue with the second option is that it can be difficult for inexperienced SkySync Administrators and DBAs to determine just how big those pre-sized files should be.  So, the tendency is to over-allocate which can waste precious high performance disk space.  Essentially, the best solution is a combination of the two.  Pre-create and conservatively pre-size the SkySync and TempDB database data files to minimize extent operations.  Then enable Instant File Initialization to ensure that if there must be extent operations, they happen quickly.

...

When a database is first created in SQL Server through the user interface, the database will be created with a single “mdf” data file by default.  It is possible to create this database with the primary “mdf” data file and then multiple “ndf” data files as well.  This gives the database an opportunity to store content across multiple data files. Configuring SQL data files in this way allows for improved data access performance because it allows SQL Server to multi-thread disk I/O operations.  This is particularly helpful when the administrator has the freedom to store each of the data files on a unique, high performance disk volume.  This technique can be used to enhance the ability of the SkySync solution to scale out with many processing servers. 

The general rule of thumb for the number of data files that should be created is ¼ to ½ the number of physical cores available to SQL Server.  For an 8 core SQL Server, deploying (2) to (4) data files is ideal.  For a 16 core SQL Server (4) to (8) files would be good.  For fewer cores, trend towards the “1/2” number.  For a machine with 16 cores or more, trending towards the “1/4” number is generally typical. 

...

Given the highly transactional nature of any migration platform, database maintenance plans become very important.  Without proper maintenance, indexes can quickly become fragmented resulting in reduced performance.

...

Info

It is beyond the scope of this document to provide specific scripts for database maintenance.  However, it is important to run index defragmentation and reorganization scripts at least twice per week to ensure that SkySync tables are properly maintained.  If maintenance scripts are not standardized in the organization, Ola Hallengren provides very useful scripts that can rebuild indexes across all tables in a database but only when necessary.

...

Like any migration solution, the data in the database can generally be considered transient.  In other words, even if the database is lost, organizational content is not lost.  This means that while database backups are useful to minimize any lost migration or synchronization processing time, recent log backups are not necessary.

...

Info

For these reasons, it is recommended that the SkySync database be configured for Simple Recovery model.  There is no need to waste storage space and processing resources on data logging. 

...

With the SkySync database operating in the Simple Recovery model, database backups are straightforward.  For high throughput solutions, a nightly full backup is generally sufficient.  However, if the Recovery Point Objective (RPO) for the organization indicates that a full day of migration/synchronization processing is too much time loss, then additional incremental backups can be scheduled at intervals throughout the day.

...

Ideally, database backups should be executed with compression enabled to minimize backup storage size and backup duration.  However, this will come at a cost of increased CPU utilization on the SQL Server which is important to consider.

...

There are limits to scaling (up and out) the number of processing threads as well as the number of parallel writes.  These limits are discussed below.

...

Scaling Up

In the world of server processing, conventional wisdom indicates that when 80% of any one server resource are at capacity, it is time to consider scaling server resources (either up or out). 

...

     2)Click on the performance tab in the SkySync configuration window:

        Image Modified

    3)Increase or decrease parallel writes as necessary by (1) or (2) at a time. 

...