SharePoint 2010 includes a number of enhancements that make the product far more scalable, especially in the context of handling and searching very large lists and document libraries. Following my ongoing thoughts about using SharePoint as a platform for ECM, these changes certainly help and cancel out a lot of the negativity hurled at SharePoint’s ECM capabilities.
SharePoint lists can now scale to meet the requirements of most ECM projects and a single list or document library can contain tens of millions of items. I am aware of tests that have proven that 50 million documents can be handled by a single library. Now I am not for one minute advocating that you should go out and do this but it does show that the product is far more scalable than before. The benfit of SharePoint is that it can be designed and implemented in so many different ways. Typically you would spread out your ECM platform across multiple libraries and for larger implementaions across multiple site collections.
So what this means is that hundreds of millions of items can be managed in a large archiving scenario although you would need to use the FAST search engine to be able to query a repository of this size. Microsoft recommend switching to FAST when you exceed 100 million crawled items. Another benefit of 2010 is that you can have multiple indexers crawling content at the same time.
With the ability to handle large numbers of documents, there is a need for more flexible storage management, see my other blog on StoragePoint. SharePoint 2010, in conjunction with SQL Server 2008 is now able to store documents outside of the database where traditionally they were stored as BLOBS (Binary Large Objects). The problem with BLOBS is that they have a massive impact on the size of the SharePoint content databases (especially for document centric implementations), storage costs are high, and backup and restore times are longer. RBS can store data on a cheaper disk solution (compared to the expensive disk solutions usually selected for SQL Server), to a SAN or potentially even into the cloud.
Microsoft’s aim is to move users away from hierarchical approaches to document filing to a more metadata driven approach. One feature of SharePoint 2010 that encapsulates this approach is Metadata Driven Navigation which is particularly useful for large lists. This feature allows managed metadata fields to appear in tree-view controls on the left-hand side of the navigation pane from which users can select specific values for any field to filter the results.
Microsoft have looked at other ways to improve the way in which they handle large lists by introducing Compound Index support, automatic index management and query throttling. In addition, Enterprise Content Types, Enterpise Policies, Content Organiser, Centralized eDiscovery and integrated FAST Search help improve the products ability to manage very high volumes. I’ll look at some of these in more detail on future posts.