Having come from a background of implementing enterprise document management solutions, it has been an interesting journey looking at how to provide the same level of functionality and scalability on SharePoint. One of these areas is storage. One of the push-backs we used to get from clients when recommending a SharePoint solution was related to database storage, costs and performance. Traditional document management products store metadata in the database and have intelligent hierarchical storage management (HSM) tools for managing documents external to the database, typically on NAS/SAN storage, with the ability to migrate the documents from one tier to the next as part of a managed document lifecycle management.
Some of the issues our clients had with SharePoint were that: the databases would bloat to incredible sizes making it very hard to backup and restore within available timeframes; the database files were stored on the most expensive Tier 1 storage, increasing storage costs dramatically compared to storing documents on file shares (typically a database of 200GB would be made up 90-95% of document blobs increasing Tier 1 storage requirements tenfold compared to a legacy ECM product which would only be storing metadata); performance issues with huge databases; lack of control over where documents are stored etc.
When considering SharePoint and how it would handle tens of millions of documents, we had to look at how to address this and offer our clients a similar experience with lower database impact. Using EBS/RBS functionality, typically with third party tools such as those provided by AvePoint, meant that we could externalise blobs from the SQL database and store them on the file system. These tools also provide a level of HSM and allow the Blobs to be saved to most cost effective storage based on where they are in their lifecycle. For example, newly created documents which are accessed frequently should be on faster disk than those that are at the end of their cycle and very rarely accessed.
Whilst RBS provided a great alternative to storing Blobs in the database, storage still posed a problem when version control was used. Many of our clients want to maintain a full version history and when version control is enabled on a document library, it results in huge storage requirements. Consider an organisation that produces 1TB of documents per year on file shares. Quite often the documents are edited multiple times but a new document is only created when a user decides to save the document with a new name to indicate a new version (a very simplistic example). In SharePoint, it is possible for there to be many, many more copies of the document and therefore capacity planning is critical when planning storage requirements for SharePoint. There are plenty of tools to help with this but it is often overlooked and we have seen storage being out of capacity in far quicker timescales than predicted.
A great new piece of functionality to assist with this storage challenge is SharePoint 2013’s Shredded Storage capability. Rather than save a complete copy of the document every time you edit it (as SharePoint 2010 does), SharePoint 2013 will only save the changes that have been made. This not only reduces storage requirements but also reduces the amount of data being transferred across the network. This is achieved using the MS-FSSHTTP protocol and improves communication not only between SharePoint and the end-user client application, but also between SharePoint and SQL Server. Shredded storage works on any file type (e.g. PDF) and what SQL Server does is store documents as multiple Blobs, rather than as a single Blob. The end result is a reduction in the size of content databases and more efficient use of storage.