SharePoint storage challenges and a nod to Shredded Storage

Having come from a background of implementing enterprise document management solutions, it has been an interesting journey looking at how to provide the same level of functionality and scalability on SharePoint.  One of these areas is storage.  One of the push-backs we used to get from clients when recommending a SharePoint solution was related to database storage, costs and performance.  Traditional document management products store metadata in the database and have intelligent hierarchical storage management (HSM) tools for managing documents external to the database, typically on NAS/SAN storage, with the ability to migrate the documents from one tier to the next as part of a managed document lifecycle management.

Some of the issues our clients had with SharePoint were that: the databases would bloat to incredible sizes making it very hard to backup and restore within available timeframes; the database files were stored on the most expensive Tier 1 storage, increasing storage costs dramatically compared to storing documents on file shares (typically a database of 200GB would be made up 90-95% of document blobs increasing Tier 1 storage requirements tenfold compared to a legacy ECM product which would only be storing metadata); performance issues with huge databases; lack of control over where documents are stored etc.

When considering SharePoint and how it would handle tens of millions of documents, we had to look at how to address this and offer our clients a similar experience with lower database impact.  Using EBS/RBS functionality, typically with third party tools such as those provided by AvePoint, meant that we could externalise blobs from the SQL database and store them on the file system.  These tools also provide a level of HSM and allow the Blobs to be saved to most cost effective storage based on where they are in their lifecycle.  For example, newly created documents which are accessed frequently should be on faster disk than those that are at the end of their cycle and very rarely accessed.

Whilst RBS provided a great alternative to storing Blobs in the database, storage still posed a problem when version control was used.  Many of our clients want to maintain a full version history and when version control is enabled on a document library, it results in huge storage requirements.  Consider an organisation that produces 1TB of documents per year on file shares.  Quite often the documents are edited multiple times but a new document is only created when a user decides to save the document with a new name to indicate a new version (a very simplistic example).  In SharePoint, it is possible for there to be many, many more copies of the document and therefore capacity planning is critical when planning storage requirements for SharePoint.  There are plenty of tools to help with this but it is often overlooked and we have seen storage being out of capacity in far quicker timescales than predicted.

A great new piece of functionality to assist with this storage challenge is SharePoint 2013’s Shredded Storage capability.  Rather than save a complete copy of the document every time you edit it (as SharePoint 2010 does), SharePoint 2013 will only save the changes that have been made.  This not only reduces storage requirements but also reduces the amount of data being transferred across the network.  This is achieved using the MS-FSSHTTP protocol and improves communication not only between SharePoint and the end-user client application, but also between SharePoint and SQL Server.  Shredded storage works on any file type (e.g. PDF) and what SQL Server does is store documents as multiple Blobs, rather than as a single Blob.  The end result is a reduction in the size of content databases and more efficient use of storage.

Posted in scalability, SharePoint 2010, SharePoint 2013 | Tagged , , , , , | Leave a comment

SharePoint 2013 Service Applications

There are fewer fundamental architectural changes with SharePoint 2013 than there were with 2010, but there are a few changes and a number of new service applications that are worth a mention.

Search Service

This has been totally rebuilt from the ground up and replaces the mixed bag of options that were available with the various flavours of SharePoint 2010.  The new search service combines FAST with some of the features in the 2010 search service and even some bits from Index Server.  SharePoint 2013 is a much more search centric product as will become evident once you start to use it, and as I will discuss in future posts.

Office Web Apps

This is no longer a service application that is bundled with specific license agreements but instead is a separate product with its own license.  It should be installed on a dedicated server, which should not have SharePoint, SQL Server, MS Office, Exchange or Lync installed on it. It is now a standalone app that serves as the viewing engine for SharePoint 2013, Exchange 2013 and Lync 2013; if it is only being used for viewing then no license is required but if you want to use it for editing documents then you need to purchase a license.    There are a few improvements in Office Web Apps: full screen viewing is supported, even through web parts; it now supports multi-authoring for PowerPoint, Word, Excel and OneNote documents; user unfriendly URLs have been removed; and it provides a handy document preview whenever you hover over a search result.

Web Analytics

This is no longer a separate service application and has been incorporated into the new search infrastructure that powers many of the new features in 2013.  The two have been combined to provide a much richer and more powerful search tool that not only returns rich search results but can also provide data on relevance, suggestions and user activity.  Each site now has a site settings option called Popularity Trends which outputs data to an Excel spreadsheet (usage.xlsx) which details daily hits and unique users.

Machine Translation Service Application

This provides the capability to translate contents using the Bing cloud based translation service.  Objects that can be translated include files, pages, sites and term sets.  It can be run asynchronously or synchronously for on the fly translations.  The translation activity is managed by a new timer job.

Work Management Service Application (WMSA)

This is a useful new MySites application that provides a single task list, pulling together all of a users tasks from Exchange 2013, Project Server 2013 and SharePoint 2013.  It works both ways, enabling a user to complete a task in the source application or through their MySite.  This requires the installation of Exchange Web Services (EWS) and the Exchange Web Services Managed API.

App Management Service

This is the new service for handling the new 2013 app store.  More about this in a future post.

Posted in SharePoint 2013 | Tagged , | Leave a comment

SharePoint 2013 Upgrade

You can only upgrade to SharePoint 2013 from SharePoint 2010 so those of you looking to upgrade directly from 2007 will have to take the 2010 route first.

There is no in-place upgrade option for 2013, not that we ever used it, so every upgrade needs to be onto new kit and either done via the DB attach method or by creating a new farm and using third party tools to migrate the content.

2013 needs:

  • 64 bit Windows Server 2008 R2 SP1 or 64 bit Windows Server 2012.
  • 64 bit SQL Server 2008 R2 SP1 or 2012.
    • 2012 SP1 is needed if you are planning to use BI

When you move 2010 site collections across to 2013, they remain in 2010 mode and SharePoint 2013 maintains both a 14 hive and a 15 hive to support both 2010 and 2013 mode site collections.  You can still view and access 2010 mode site collections but they will only provide the 2010 level of functionality; site collection administrators will get visual warnings on the page when a site is in 2010 mode.

If you have 2007 site collections in your 2010 farm, then you should upgrade them to 2010 mode before moving them to 2013.  There is no visual upgrade tool in 2013 like there was in 2010 so to upgrade 2010 site collections to 2013 mode you will have to run a PowerShell command, or by using the Upgrade this Site Collection item in Site Settings (site collection administrators only).

Microsoft recommend that site collection administrators are left to upgrade their own site collections post upgrade, rather than as part of the upgrade.  The exceptions of course are those site collections which may need particular attention; such as high volume, highly customised, or critical sites.

Once a site collection has been marked for upgrade to 2013 mode, an item is added to a new upgrade queue which is processed by an Upgrade Site Collection timer job.  This timer job runs every minute and can run parallel upgrades; there is a throttle applied to prevent the server being over utilised by this activity.

It is possible to view what a 2010 mode site will look like when it is upgraded to 2013 mode by using the Create an Evaluation Site Collection function.  A daily timer job processes this request and copies the 2010 site collection to a new 2013 mode site collection within the same content database, gives it a URL with the same name as the source site collection and appends -eval on the end.  By default this is retained for 31 days after which point it will be deleted.  The idea is to let a site collection administrator look at the site in 2013 mode and determine what changes will be needed before upgrading the site for real.

There is also a site collection health checker that can be run against both 2010 mode and 2013 mode site collections.

Posted in SharePoint 2013 | Tagged , | Leave a comment

SharePoint Evolution Conference

I’ll be attending the SharePoint Evolution Conference in London next week and expect it to be a really interesting event.  Last year’s event was well organised and the evening entertainment certainly made the whole event a great success – thank you AvePoint!

This year there will be o

  • ver 113 Sessions delivered by Global SharePoint Experts, Microsoft Speakers and MVP’s across

3 Days of Sessions covering Business, Technical, Developer, Information Worker, Community and Case Study Tracks and Ask the Experts sessions.  To make it even better, they will send out a DVD pack including recordings of all of the sessions, so no more need to panic over which session to go to.

I’m keen to see how they divide time between SharePoint 2010 and 2013 topics and hope to pick up some interesting pieces of information that I’ll publish on here on my return.  Follow me on twitter @rearcardoor for live updates during the conference…  hopefully I’ll be able to make a few tweets but if the presentations are too engrossing then I may have to make a difficult choice… to tweet or not to tweet… what was the question again?

Posted in SharePoint 2010, SharePoint 2013 | Tagged | Leave a comment

Capabilities and features in SharePoint 2013

This is a great page for finding about the new stuff in 2013:
Some of the key bits that i like are:
App Store – the whole world is becoming an ‘appy’place… and the new AppStore just brings SharePoint inline with modern thinking and modern approaches to product distribution.  In the early stages there were a few ‘buggy’ apps that only really served as sales tools to get people interested in a new feature.  Now there is a more robust selection of apps to choose from.
Cross Site Publishing – is a powerful new feature that provides the ability to publish and share libraries, across other site collections.  I can already see plenty of uses for this in existing client projects that we are working on
http://technet.microsoft.com/en-us/library/jj219688%28v=office.15%29.aspx
Social Network – the whole product seems a lot more social networky and geared to following people and content.  My first impressions are that most development has gone into this area.
Design manager – replaces SharePoint Designer for the branding of sites and provides an easier and more intutive interface to create and brand sites.
SkyDrive – provides offline storage which i think will prove to be a hit over time as SharePoint is used for ECM.  I always found SharePoint Workspace a bit limiting and unstable so it’s great to have an alternative.
Elimination of Inplace Upgrade – In place upgrades are no longer supported, so the common approach will be the DB attach method.
Office Webapps – this is now a separate product in 2013 (no longer included with Enterprise).
FAST Search – is now included as part of 2013, improving the overall search experience.
Folders in Document Sets – The ability to now add folders to a document set is a useful improvement but as ever, my advice is to be cautious how you use folders.  In certain situations they can prove to be very useful but it should be controlled and restricted so that there cannot be uncontrolled growth.
Shredded Storage – Versioning now only saves differences and not the full document.  SharePoint 2013 automatically parses the document contents as it goes into the DB and checks for duplicate elements. This will massively reduce storage costs.
Friendlier Error Messages – at last… nuff said.
Live Document Preview – all the documents in a document library will have dots next to their names. If you click on those dots, you will get a fully navigable preview of the document in a nice-looking preview window. The preview window also allows you to zoom by double-clicking.
Site Notebook – Microsoft have included a shared notebook in each new team site. Everyone who has access to your site will be able to use the notebook. Click on the “notebook” link in the Quick Launch (the navigation menu on the left) to open it in the OneNote Web App.
Posted in SharePoint 2013 | Tagged , | Leave a comment

Finding Duplicate Documents in SharePoint 2010

SharePoint warns you if you are about to save a duplicate of a document and it does this by matching the filename.  This only applies when you are saving your new document to the same location in SharePoint where the original exists.

  • If you save Document1 to Folder A, SharePoint will warn you if you then try to save Document1 again to Folder A.
  • If you save Document1 to Folder A, SharePoint will not warn you if you then try to save Document1 to Folder B.

The example used above could equally apply to Document Libraries and Document Sets as well as to Folders.  For most organisations this is not a problem and is considered to be a minor risk compared to the effort of reporting on and controlling duplicate detection.  This is not common to SharePoint either as most ECM products allow the same document to be saved in different locations.

Duplicates can be detected using a simple PowerShell script that looks for documents across a site that have the same name. There are scripts that check for duplicate content by calculating an MD5 hash of the file contents but I have found this does not work for Office documents, which might have different metadata applied to each copy of the document.  The following link contains a great script for comparing using the MD5 approach:

http://blog.pointbeyond.com/2011/08/24/finding-duplicate-documents-in-sharepoint-using-powershell/

The script at the bottom of this article perfroms a comparison by document name.

To use this script, run the PowerShell console “powershell_ise.exe”.  Copy and paste the below code into the console window and save it as a file with a .ps1 extension e.g. DuplicateByNameCheck.ps1.  You will need to edit the last  line of the file to point to the site that you want to check – see image below.

On running the script you will get an output window showing the duplicates, clicking on the Filename column header will group files with the same name

Duplicate By Name Code

#Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue [system.reflection.assembly]::LoadWithPartialName(“Microsoft.SharePoint”)

function Get-DuplicateFiles ($RootSiteUrl)

{

#$spSite = Get-SPSite -Identity $RootSiteUrl $spsite = new-object Microsoft.SharePoint.SPSite($RootSiteUrl)

$Items = @() $Duplicates = @() $duplicateItems = @() $duplicateshelper = @()

foreach ($SPweb in $spSite.allwebs)

{

Write-Host “Checking ” $spWeb.Title ” for duplicate documents”

foreach ($list in $spWeb.Lists)

{

if($list.BaseType -eq “DocumentLibrary” -and $list.RootFolder.Url -notlike “_*” -and $list.RootFolder.Url -notlike “SitePages*”)

{

foreach($item in $list.Items)

{

$record = New-Object -TypeName System.Object

if($item.File.length -gt 0)

{

$record | Add-Member NoteProperty FileName ($item.file.Name)

$record | Add-Member NoteProperty FullPath ($spWeb.Url + “/” + $item.Url)

$Items += $record

}

}

}

}

$spWeb.Dispose()

$duplicateItems = $Items | Group-Object Filename| Where-Object {$_.Count -gt 1}

foreach($dup in $duplicateItems) { foreach($item in $Items | Where-Object {$_.Filename -eq $dup.Name}) { if ($duplicateshelper -notcontains $item.Fullpath) { $duplicateshelper += $item.Fullpath $found  = New-Object -TypeName System.Object $found  | Add-Member NoteProperty Filename ($item.FileName) $found  | Add-Member NoteProperty Fullpath ($item.Fullpath) $duplicates += $found } } }

}

return $duplicates  | Out-GridView

}

Get-DuplicateFiles(“http://portal.denallix.com/sites/ifdemo“)

Posted in ECM, SharePoint 2010 | Tagged , , , , | Leave a comment

ECM Bits – Just save it to SharePoint

One of the areas in which organisations often get confused is around whether or not to force people to save all documents to SharePoint.  This tends to be the case in more regulated industries where stricter controls are placed over document locations, security and auditing.  This is also one of the reasons that organisations are reluctant to move to SharePoint as an ECM platform as they are concerned that business documents will not be correctly filed and tagged.

Actually, it is possible to limit the locations that users can save documents to when using the Microsoft Office Save dialog box.   Whilst this may not be quite as rigid and bulletproof as other ECM system lockdowns, it may be enough for most organisations who want to drive users to save documents into SharePoint.

The way to implement this is through the Group Policy Management Console and configuring the Activate Restricted Browsing and Approve Locations policy settings.

  • Activate Restricted Browsing – enabling this policy setting allows you to go on and restrict the locations available to users in the Save As dialog box.
  • Approve Locations – this is where you define the acceptable locations that a user can save documents to.

For example, you can restrict users being able to save files to their desktops and instead only provide them with options to save documents into SharePoint document libraries.

As mentioned earlier, this is not bulletproof and savvy users are often able to work their way around this restriction, but for the majority of users this is sufficient to encourage them to work with SharePoint as their primary repository.

Note that the list of locations can be restricted to one or more Microsoft Office applications. A typical example is to force save locations for Word, Excel and PowerPoint but not for applications such as Access.  This ensures that ‘documents’ are saved into the ECM repository but users are free to determine where they want to save databases.  This of course is a very simplistic example and each organisation will have different requirements.

Posted in ECM, SharePoint 2010 | Tagged , , , | Leave a comment

Making Documents Unique – Part 3

Proving that it all works

The first thing to do is to add a new document to a document library and ensure that the Document ID is being assigned.  To do this, navigate to a document library and upload or create a new document in the normal way.  Once you have uploaded the document, select the document drop down menu from the document library and select View Properties.

 view properties                       

You should now see that the document properties contains a new Document ID property as shown below.

properties

Modifying the View to show the Document ID

It would be useful to be able to see the Document ID in the Document Library view so let’s add it now.  If you notice, there is a small downward facing arrow to the right of the Library name in the breadcrumb trail.  If you click on this you get a quick context menu from where you can modify the view.  Select Modify this View.

modify this view

A form will appear showing which columns are selected for the current view; ensure that the Document ID property is checked and then click on OK.

Document ID check box

The Document ID will now appear as a column in your current view.

Adding the Find by Document ID web part

There is a new “Find by Document ID” search web part that you can use and which appears by default in Document Centers and Records Centers.  This enables you to immediately locate a document via its Document ID, no matter where it resides in the Site Collection.

To add the web part to the current page, click on the Edit Page button  and then click into the page where you want to insert the web part.  You will notice that the Ribbon is enabled so click on the Insert heading on the Ribbon and then click on Web Part as shown below.

insert web part

Click on the Search category in the left hand list and then select Find by Document ID in the Web Parts list.

web part select

Then click on the Add button and hey presto the web part will appear on your page.  Click on the Page heading on the toolbar and then click on the Save & Close button to exit from editing mode.

web part

Have a go and type in a Document ID and then click on the icon to the right of the search box search icon, the document will then be automatically loaded in the appropriate application.

Manual Searching

When you activate the Document ID Service, it adds a new ASPX page in the layouts directory called DocIdRedir.aspx.  This page accepts one querystring parameter which represents the unique Document ID as shown below.

http://<sitecollectionurl>/_layouts/DocIdRedir.aspx?ID=LEGAL-1-1

This URL defines a permanent link to the document, even if it is moved within the Site Collection.  You can try this yourself by cutting and pasting the above into the browser and substituting the Document ID with one of your own.

In the next and final article we will look at how to use the Document ID within documents.

Posted in ECM, SharePoint 2010, Uncategorized | Tagged , | Leave a comment

Making Documents Unique – Part 2

How to Implement Document IDs

As mentioned previously, the Document ID Service is set at the Site Collection level so once you are in a site, you need to go to Site Actions and then click on Site Settings.

Click on Site Collection Features (If you are in a sub site you will have to go to the parent Site Collection level by clicking Go to top level site settings).

By default the feature is not activated, unless you are using a Document Centre or Records Centre, and must be activated here.  To activate it simply click on Activate and the status will change to Active as shown below.  At this point make sure you are aware of which Site Collection you are in as you will need to know this later on when setting up the timer job.

Document ID Feature

Once the feature is activated then you will have a new option under Site Collection Administration called Document ID Settings.

Site Collection Admin

Click on the link and you will be presented with a Document ID Settings form.

Document ID Settings

This form allows you to do the following:

  1. Click on the first tick box to turn on Document IDs for the Site Collection.
  2. You can enter a 4 – 12 character prefix which will appear before the numeric part of the Document IDs.  I recommend that you use a different prefix for each Site Collection.
  3. Click on the second tick box to force all existing Document IDs in the Site Collection to be reset with the new prefix.  This assumes that you have already set up Document IDs and are coming into this form in order to amend the prefix.
  4. You can specify which search scope to use when performing a Document ID search

However the work doesn’t stop here.  You will need to jump into SharePoint Central Administration to check on a couple of timer jobs.  Within Central Administration, click on the Monitoring link in the left hand navigation panel to access all of the monitoring functions.

Timer Jobs

Under Timer Jobs click on Review Job Definitions and you will be presented with a list of all the timer jobs that have been set up in SharePoint.  If you scroll down the list you will see a group of Document ID jobs.  The number of entries here will depend upon the number of Site Collections that you have that are set up to use the Document ID feature.

Timer Job List

The first thing that you need to do is locate the Document ID enable/disable job for the Site Collection that you used to activate the feature; the site collection is shown in the second column.  The purpose of this timer job is to make changes to the underlying Document and Document Set content types for those Site Collections that have the feature enabled or disabled; it adds three new columns to the content types i.e. DocID, PersistID and Static URL.  The purpose of these fields will not be covered in this article but note that if you deactivate the feature, the columns remain and the existing document IDs are preserved although no new ones will be added.  Even though the IDs remain, you will no longer be able to search on them once the feature is deactivated.

Click Document ID enable/disable job for the relevant Site Collection.

Enable DIsable Job

You just need to make sure that this job is configured to run at a suitable interval for your organisation.  Click on Run Now if you want to kick off the job immediately.

The next thing you need to do is set up a schedule for the Document ID assignment job.  Similar to the above, locate the timer job for the relevant Site Collection and set up a schedule.  This job assigns the Document IDs to the content.  Again, click Run Now if you want to run the job immediately.

Assignment Job

Each new document (and every existing document if you checked the reset check box above) will now be assigned a Document ID when added to that Site Collection.

Note that the Document IDs may take a little time to initiate so don’t worry if they are not instantly visible.  Once you add a document it will be automatically assigned a Document ID which will be visible on the View Properties form.

Document Properties

In the next article we will look at how to use the Document ID.

Posted in ECM, Records Management, SharePoint 2010, Uncategorized | Tagged , , | 4 Comments

Making Documents Unique

A Fundamental ECM Requirement

A fundamental tenet of any ECM solution is that it must be possible to uniquely identify a document and retrieve it based on a unique identifier.  This identifier is typically an incremental numeric value that increments by 1 for each document that is added to the system.

The lack of such an identifier is is one of the fundamental problems that many people have had with SharePoint and has now been addressed by the Document ID service in SharePoint 2010.  Behind the scenes, documents have always had a unique identifier in SharePoint but this was (and still is) a globally unique identifier (GUID) which isn’t particularly user friendly and looks something like this –

{31EC2020-3AEF-1069-A2DD-08012B30309D}

Before SharePoint 2010, the other way to identify a document was the URL that pointed to the document based on its location.  For most users this wasn’t an issue, as documents were saved into a specific location and never needed to be moved.  For large scale ECM projects, this did become an issue, especially when documents were transactional in nature and tied to business processes.  In these scenarios, a document could potentially be moved from one library to another as part of the business process and therefore be assigned a new URL identifier; this made it difficult when trying to provide uniform access to the document regardless of its location.  Many organisations integrate their line of business systems with their ECM system and like to store the unique ID of the document against a business record.  An example would be linking the scanned image of an invoice to an invoice record in an accounts payable system.  Not having a unique ID makes this linking very difficult and can result in broken links if the target document moves within SharePoint.

The Document ID Service

Thankfully SharePoint 2010 introduced the Document ID Service which delivers this missing piece of functionality.  The new Document ID service is actvated as a feature at the Site Collection level and only applies to Document Libraries; it therefore only works for documents and not list items.  What it does is assign Unique IDs to documents when they are initially created.  The Document ID is a numeric value and SharePoint allows you to prefix it with a text value which is typically unique between Site Collections e.g. “LEGAL”, “HRDOCS”, “FINANCE”.  The unique ID looks something like this:

LEGAL-1-101

Note that if you move a document then its Document ID does not change but if you copy a document, the new document will be assigned a new Document ID.

How People are Using it

Document IDs are proving to be useful for organisations that have document centric processes that rely on being able to uniquely identify a document and/or provide rapid access to documents via search.

An example use case could play out as follows:

  • A legal firm creates a contract using Microsoft Word.
  • The document template automatically applies the Document ID to the document footer or as a reference.
  • The contract is emailed to a third party in PDF format for review.
  • The third party calls back to discuss the contract.
  • By asking the third party to state the document reference, the legal firm can instantly search for and view the contract.

This is a very simple example but just shows the benefit of being able to access a document instantly via a Doucment ID search, rather than having to browse through sites, libraries and folders to locate the document.

Potential Problem

One of the problems that I have with Document IDs is that they are only unique to the Site Collection and therefore depend upon the prefix in order to make them unique across the SharePoint Farm.  If you decide to use the same prefix in two Site Collections then there is a risk for duplication of IDs.  This issue won’t affect too many organisations but must be borne in mind when designing a large scale archiving or ECM solution.

One of the risks of having duplicate Document IDs is that if you perform a Document ID search then it will only return the first document that matches that ID.  There is therefore a risk of accessing the wrong document.  Again, this would probably only happen if you copied a document to another Site Collection which had the same Document ID prefix.

My preference would be to be able to implement the ID at Farm level so that the Document ID is unique across all Site Collections.  Having the option to implement at the Site Collection or Farm level would be a useful addition to the feature and will hopefully be on Microsoft’s roadmap.

If you have software development skills in your organisation then it is possible to override the default Documen ID behaviour with your own unique ID, which could be set Farm wide, but this is outside the scope of this article.

All in all it is a useful feature for those organisations looking to use SharePoint as a platform for ECM or archiving.

Posted in ECM, SharePoint 2010 | Tagged , , | Leave a comment