Sunday 19 September 2010

ASP.Net Routing and virtual article-based extensionless website Urls

This post has moved to http://wishfulcode.com/2010/09/19/asp-net-routing-and-virtual-article-based-extensionless-website-urls/

There is an architectural difficulty in relating the new ASP.Net penchant for functionaly semantic Url patterns using ASP.Net Routing with sites whose Url patterns are editorially semantic. That is, the patterns match various sections of a site, which may all be functionally very similar and so should be routed to the same handling code for an appropriate view to be returned.
My team were faced with this dilemna earlier this year whilst developing the http://www.glamourmagazine.co.uk site. This is a site whose content is stored in Umbraco, so has an editorial tree of >50,000 articles, each with a differing template choice. For this site, we wrote our own data reading and rendering API on top of ASP.Net, bypassing the Umbraco API.
The difficulty with using older methods to route extensionless requests to ASP.Net (and then to custom IHttpHandlers) is that once the pipeline has handed off to an ASP.Net handler, it’s difficult to get back to another IIS handler. An example of this is requests which should result in IIS finding the default document for a directory. Once we are directing all extensionless requests to ASP.Net, if our ASP.Net handler can’t continue (because maybe an article or aspx page at the virtual path does not exist), then ASP.Net will simply throw an error and display the error page configured in the Custom Errors section in the web.config. This is why still today there are two places to configure error pages, one for the ASP.Net handler, and one for IIS.

UrlRoutingModule,Catch-All Parameters, IRouteHandler and IRouteConstraint

If we lean on ASP.Net Routing, in the same way ASP.Net MVC does for its (often extensionless) virtual controller Urls, we can get a non-invasive solution to this problem with minimal hassle.
Traditionally, under ASP.Net routing if you want to set up a route, you need to know roughly how many Url segments to expect, and what the pattern should be. The problem on a site like Glamour is that there will be a varying level of Url segments, as defined by a team of editors at runtime to match the editorial layout of the site. Furthermore, the url pattern of articles is mostly irrelevant to the code.
ASP.Net Routing comes configured when you start an MVC site in Visual Studio, but if you already have an ASP.Net site you can start using ASP.Net routing without affecting anything else by including the following module in the web.config:
<add name="UrlRoutingModule" type="System.Web.Routing.UrlRoutingModule, System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" />


and adding the normal route registration in the global.asax (or your site’s application class):


protected override void Application_OnStart(object sender, EventArgs e)
{
base.Application_OnStart(sender, e);
// Set up custom routing (e.g. MVC Controllers)
RegisterRoutes(RouteTable.Routes);
}
public static void RegisterRoutes(RouteCollection routes)
{
routes.IgnoreRoute("{resource}.axd/{*pathInfo}");
routes.IgnoreRoute("{resource}.aspx/{*pathInfo}");
}


Adding a new route, with a catch-all parameter in its Url pattern (ie: all Urls will route to the rule), in conjunction with a RouteConstraint and placing this rule after all other rules (in case we actually want to use MVC or other routes) will accomplish routing all urls to our handler for articles. This class (EditorialRouteHandler below) inherits from IRouteHandler, and should implement the GetHttpHandler which must return an IHttpHandler. In our case we are simply returning a System.Web.UI.Page, setting its master page file to be that set in the Umbraco data-store, and it’s data to be the Node retreived from the CMS:


//create a route for pages in editorial tree 
var editorialPageRouter = new Route("{*page}", //catch-all page parameter 
new EditorialRouteHandler() //handler                                                            ); 

//add it to RouteCollection 
routes.Add("EditorialPageRouter", editorialPageRouter);


The final step is to add a Route Constraint to the rule. This allows the handler to tell the system that it has not found a match for the Url, and that processing can continue to another rule, or back to IIS. This is accomplished by implementing the interface IRouteConstraint and implementing the Match method which only needs to return a boolean specifying whether the specied url matches the purpose of the rule.


//add a constraint that the page must exist in the cms, or we don't route 
editorialPageRouter.Constraints = new RouteValueDictionary(); 
editorialPageRouter.Constraints.Add("EditorialPageFoundConstraint", new EditorialRouteHandler());


The following is a sample class which implements IRouteHandler and IRouteConstraint and returns a System.Web.UI.Page if a match is found. If you want to follow this pattern, you’ll need to implement the mechanism for retreiving the data and the master-page appropriate for the Url specified:


public class EditorialRouteHandler : IRouteHandler, IRouteConstraint 
{ 

public System.Web.IHttpHandler GetHttpHandler(RequestContext requestContext) 
{ 

//we should not be here if we have not found a result, and set it 
if (EditorialPageContext.Current.CurrentNode == null) 
throw new Exception("Umbraco Node Not Set. Ensure the correct EditorialRouteHandler Route Constraint has been set.");                    
//create a new page object, and change the masterpage 
Page pageResult = new Page 
{ 
AppRelativeVirtualPath = "~/fake.ashx", //this must be set to something irrelevant to fix a bug in ASP.Net 
MasterPageFile = "my-master-page.master" //set this to a value retreived from your CMS 
};        
//return page with new masterpage 
return pageResult; 
}

protected virtual string GetTemplatePath(INode node) 
{ 
//todo: search data-store for master page path for current data 
}

protected virtual string GetNodeByUrl(string virtualPath) 
{ 
//todo: search data-store for data with given url 
}

public bool Match(System.Web.HttpContextBase httpContext, Route route, string parameterName, RouteValueDictionary values, RouteDirection routeDirection) 
{ 
if (!values.ContainsKey("page")) 
return false;

// get virtual path (null means there was no value for page, meaning it is the domain without a path) 
string virtualPath = (values["page"] ?? "/").ToString();

// case insensitive 
virtualPath = virtualPath.ToLowerInvariant();

var currentNode = GetNodeByUrl(virtualPath);

//if we get a result, build a page up and set the current node 
if (currentNode != null) 
{ 
//set current node on the context, could be HttpContext 
EditorialPageContext.Current.CurrentNode = currentNode; 

//inform routing that we have found a result 
return true; 
}

//inform routing we have not found a result 
return false; 
}

}

Final Tip


If you turn off RunAllManagedModulesForAllRequests in IIS’ web.config, you will make sure that the route only gets called for extensionless urls, and not Urls which should be handled by something else in IIS (even if the file does not exist). However, in most cases, you must install hotfix 980368 from Microsoft to allow handlers to be called for Urls without extensions.

Tuesday 22 June 2010

Managing code and configuration synchronisation in an IIS web farm


This post has moved to http://wishfulcode.com/2010/06/22/managing-code-and-configuration-synchronisation-in-an-iis-web-farm/

When we think about version control, the most common purpose we associate with it is source code.

When we think about maintaining a farm of web servers, the initial problems to focus on solving are:

  1. Synchronisation – different nodes in the farm should not (unless explicitly told to) have different configuration or serve different content.
  2. Horizontal Scalability – I want to be able to add as many servers as required, without needing to spend time setting them up or without new builds taking more time to deploy.
  3. Reliability – no single point of failure (although this has varying levels – inability to run a new deployment is less severe than inability to serve content to users).
The scenario begs for a solution involving some kind of repository both of content (ie: runtime code) and configuration (ie: IIS setup), and most sensibly one that asks the member servers in the farm to pull content and configuration from an authoritative source, rather than having to maintain a list of servers to sequentially push to.

Synchronising Content

This sounds very similar to the feature set of many Distributed Version Control Systems. Thanks to James Freiwirth's investigation and code (and persistence!), we started with a set of commands in a script that would instruct a folder to fetch the latest revision of a set of files under Git version control and update to that revision. So now we could have multiple servers pulling from a central Git repository on another server and maintaining the same version between themselves. What's more, by using Git the following features are gained:

  • It's index-based – Git will fetch a revision and store it in its index before applying that revision to the working dir. That means, even on a slow connection, applying changes is very quick – no more half-changed working directory whilst waiting for large files to transfer. FTP, I'm talking to you!
  • It's optimised – Git will only fetch change deltas, and it's also very good at detecting repeated content in multiple files.
  • It's distributed – All the history of your runtime code folder will be maintained on each server. If you lose the remote source of the repositories, not only will you not lose the data because the entire history is maintained on each node, you will still be able to push, pull, commit and roll-back between the remaining repositories.
So you could commit from anywhere to your deployment Git repository, and have all servers in your web farm pick up these changes. And then you instantly gain revision control for all deployments. If you are manually copying files to your deployment environment and, now, to the origin of your repository (some environments can't help this - shock!) you never have to worry about overwriting files you'll later regret, and you're able to see exactly what's changed and when in your server environment. Or, if you have a build server producing your website's runtime code like us, then you can script the build output to be committed to your git (using NANT for example). James is a member of my development team, and he really changed the way we think about DVCS by introducing us to Git quite early on.



Synchronising IIS

Maintaining web server configuration across all servers is just as important as being able to synchronise content. In the past, options were limited. With IIS7 we gain the ability to store a very semantic and realtime representation of IIS configuration on any file path with the Shared Configuration feature. If we can somehow still store this information locally, but have it synchronised across all the servers then we are satisfying all 3 requirements for synchronisation, scalability and reliability.

To accomplish this, we can use the exact same method for synchronising IIS configuration that we use for content. We can set up a git repository, put in the IIS configuration, pull this down to each server and instruct each server's IIS to point to the working copy of the revision control repository. Then, we now have history, comments and rollback ability for IIS configuration. Being able to see each IIS configuration change difference is alone an incredibly invaluable feature for our multi-site environment.



Practical setup (on Amazon EC2)

The final task to accomplish is to identify what process runs on all the servers to keep them always pulling the latest version of both the content and the configuration. The best we've used so far is simple Windows 2008 Task Scheduler powershell scripts, which James gives examples of. However, these scripts themselves can change over time since they need to know which repositories to synchronise. This calls for yet another revision controlled repository. The scheduled tasks on the servers themselves are only running stub files which define a key, to identify which farm, and therefore which sites a server needs, and then runs another powershell script retrieved from a central git repository which ensures the correct content repositories for that farm are created and up to date.

The end result is a completely autonomous (for their runtime) set of web servers, which call to central repositories in order to seek updated content and configuration.

If we then create a virtualised image with a Scheduled Task running the stub powershell script, we have the ability at any time to increase the capacity of a server farm simply by starting new servers and pointing the traffic at them. These new servers will each pull in the latest configuration and content.

Why not use the MS Web Deploy Tool?

Microsoft's Road Map for IIS and ASP.Net includes interesting projects concerning deployment and server farm management. The Web Deploy tools is impressive in that in can synchronise IIS configuration and content (and some other server setup) even between IIS versions. However, it is very package based. We'd still need a system to either pull the latest version package down to each local server and perform a deployment of the package, or remotely push to every server we know about. This essentially starts us back at the same step I defined at the beginning of this post – needing a way to maintain a farm of servers and building something to manage the execution of those packages. There are scenarios where I do use this tool, and I'm sure that this and other tools will evolve to the point where we can get as much control and flexibility as we can achieve with 'git-deploy' quite soon.

Friday 30 April 2010

Project SUMO / Silverlight for Umbraco Media Objects

This post has moved to http://wishfulcode.com/2010/04/30/project-sumo-silverlight-for-umbraco-media-objects/

Not long ago, I was invited to join a small group tasked, with Microsoft's help, at creating an open source project to provide some tools for the Umbraco community to use which would enhance both users' and developers' experiences with Umbraco.

It was decided that a tool would be built which offers features not implemented in the default asp.net umbraco implementation of its Media section. The idea to enhance the tool with features such as media object editing, bulk saving and uploading, and cross-platform (including mobile) support led us to choosing Silverlight as the application platform to develop with.

Collaboratively, it was an interesting exercise in the use of tools, role-assumption, personal initiative... and maybe some all-nighters.

The project ended up taking on a good project-architecture:
1. Proof of concept. This took the form of an initial Silverlight application with a tree and a list of media items. We wanted to see if what we wanted to achieve on the client-side was quick to develop using Silverlight.
2. SketchFlow. Warren Buckley rapidly prototyped the different controls that would be available in the client app, which allowed the team to discuss issues before they became a reality.
3. Main Architecture & Development.  This is the side I enjoyed the most. We can up with a client-server architecture which exposed media objects as their byte[] data through a WCF service, and an MVVM pattern for interacting with this data inside the silverlight client and binding it to a UI.
4. Advanced Functionality and Skinning. With a solid foundation we are now in a great position to extend the project in order to rapidly gain new features, to build very similar clients for different silverlight platforms (like Windows Phone), and to keep improving the user experience.

The WCF side of things are architected in a way that the client is aware of the data-contract (IUSTFMediaService), and the project contains a default implementation which looks at Umbraco's default media objects (UmbracoDefaultImageMediaService). Therefore, more advanced Umbraco developers can freely make their own implementation of IUSTFMediaService to deal with any custom media document types or architectures without changing anything in the Silverlight application.

 The project will be available as an easy-to-install package for Umbraco, with releases and source code available at the SUMO codeplex site.

I've had the pleasure of working with some great people from the community since the beginning of this project including Darren Ferguson, Warren Buckley, Alex Norcliffe, Adam Shallcross and Will Coleman from Microsoft.

More to come on the user scenarios the app will enable, and some great code practices we found along the way.

Tuesday 16 March 2010

Windows Phone 7 Series and OData / WCF Data Services feeds

This post has moved to http://wishfulcode.com/2010/03/16/windows-phone-7-series-and-odata-wcf-data-services-feeds/

Whilst the Windows Phone Series 7 development tools were only released yesterday, you can tell that various other Microsoft teams have been working hard to make sure their technologies are useable in the new environment.

The first thing I attempted to do on the phone was build some apps which interacted with their data through WCF Data Services. I found it wasn't yet as simple as the normal .net experience, but you can tell that's where things are headed.

Things to keep in mind when attempting to consume oData services on a Windows Phone 7 Series silverlight app:
  • At dev time, you won't for now be able to simply "Add Service Reference" to your data service. You'll have to grab the preview client tools (which you should reference from your consuming silverlight library and the application itself). You'll then need to manually generate the classes for the client.
  • The rest of the experience is pretty similar to desktop Silverlight apps - you have to code in an asynchronous manner.
  • Remember that your app will be running inside a virtual machine (if using the emulator), and so won't be able to access Urls on your dev machine as local. Instead, you can reference your development machine via it's name.
Once up and running, it'll be a very quick, simple and powerful way to reach your data, and the included best-practice templates for the phone projects look like we're going in a great direction.

Sunday 14 March 2010

Windows Mobile has adjusted your clock


I've had a lot of Windows Mobile phones over the years, and have always loved the platform. When there was a bit of a pause in new features, I decided to go in exile and test out a different platform for a while, so I got an iPhone. Apart from the app store, it's not been a great experience! Especially this morning, after being used to Windows Mobile letting you know that's the clock has been automatically adjusted for daylight savings time - I realised that when I woke up and saw it said 7:00, there was no way to know if that was the old or correct time zone...

Saturday 6 March 2010

Azure: Deploying ASP.net websites instead of web application projects as web roles

This post has moved to http://wishfulcode.com/2010/03/06/azure-deploying-asp-net-websites-instead-of-web-application-projects-as-web-roles/

The Azure tools for Visual Studio have great support for creating web roles out of Web Application projects, but no built-in support for simple websites (that is, asp.net sites that are compiled at runtime). There are many scenerios where not needing to have a Web Application project are going to be the right decision for development and production, and the good news is that any IIS-servable directory can be packaged/deployed using simple commands from the Azure SDK.

The process of taking projects / code / files from from local to cloud with Azure is to:
  1. Prepare a local directory with a ready-to-run output of the application's build.
  2. Prepare a service definiton file (describes the roles to package).
  3. Package into an Azure package file using CSPack.
  4. Upload to production or staging environments on Azure, alongside a service configuration file (determines how many instances and what kinds of storage are available to the roles).
The Visual Studio add-in can handle this for web roles using website application projects, but we have to do this manually for website projects.

Preparing the project

The best folder structure for working locally is to create a parent directory with each role as a subdirectory. The service definition/configuration and package files can then be stored in the parent. So here I've created a the default Web Site from Visual studio at TestWebsite1\WebRole1 and just to prove that we don't need to add/modify anything from this default set-up I haven't changed anything, except to add a hello-world default.aspx:
<h1>Hello World</h1>
    <p><b><%= Environment.MachineName %></b> running <i><%=Environment.OSVersion.VersionString %></i> on <i><%=DateTime.Now.ToString("s") %></i></p>

Packaging the project for Azure

First off, we need to create a service definiton file, which just describes which kind of roles are in our application, and what their requirements are. In this case, we only have a single WebRole (and we'll give it some storage for fun), and I've called the file ServiceDefinition.csdef:
<?xml version="1.0" encoding="utf-8"?>
<ServiceDefinition name="Azure1_umbraco" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition">
  <WebRole name="WebRole1">
    <InputEndpoints>
      <InputEndpoint name="HttpIn" protocol="http" port="80" />
    </InputEndpoints>
    <ConfigurationSettings>
      <Setting name="DiagnosticsConnectionString" />
    </ConfigurationSettings>
    <LocalResources>
      <LocalStorage name="LocalStorage1" cleanOnRoleRecycle="false" sizeInMB="100" />
    </LocalResources>
  </WebRole>
</ServiceDefinition>

Next we use the CSPack tool to package up the site in to a single file (testwebsite1.cspkg) for upload to Azure. The tool will automatically look for the supporting files for each role under a directory with the role's name:
cspack ServiceDefinition.csdef /out:testwebsite1.cspkg

We'll also need to create a simple service configuration file (ServiceConfiguration.cscfg) which describes how many instances to give each role:
<?xml version="1.0"?>
<ServiceConfiguration serviceName="Azure1_umbraco" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration">
  <Role name="WebRole1">
    <Instances count="1" />
    <ConfigurationSettings>
      <Setting name="DiagnosticsConnectionString" value="UseDevelopmentStorage=true" />
    </ConfigurationSettings>
  </Role>
</ServiceConfiguration>

Deploying the website to Azure

Now that we've finished everything we need to do locally, we can move to the Azure online management tool, where we'll need to create a new Hosted Service. Once that's made, we select either the production or staging environment for the service and select Deploy.... All we need to do here is select the package and configuration files, and give the deployment a name:


Azure will spend some time uploading the package, after which your service will be in the Stopped state.

After clicking Run the site should be instantly available.

In a later post, you'll see why I wanted to make sure we could use a web deployment without requiring a Web Application project.

Friday 5 March 2010

Azure: Investigating a developer's dream

This post has moved to http://wishfulcode.com/2010/03/05/azure-investigating-a-developers-dream/

In 2009, we decided we had to leave our current hosting contract and find a new provider. I started looking at alternatives to the traditional multi-year hosting contracts that had held us back at Condé Nast Digital for years, and in May I decided to test out if the myths about Amazon EC2 were true. Yes, it really did give you servers (windows in our case) at the click of a button... with a GUI and everything. The flexibility won me over, and the stability meant I couldn't turn back. By October we had moved our entire 15+ site portfolio to Amazon, serving millions of pages each day. And the first server I switched on is still running...

I quickly learnt that in order to benefit from the cloud (and by that, we mean Amazon's extensive data-centers and redundancy services) we had to move as many components of our applications as possible to the services provided by Amazon. So we moved all of our digital assets to the storage service, and we introduced our high-traffic sites to elastic load-balancing (instead of running our own HAProxy instance, which we're still doing for smaller sites).

At the end of the day though, these are all, by definition, simple services which still require you to not only manage the code and configuration of your application layer, but also manage the running and reliability of the server environment. As fun as that is (and no I'm not being sarcastic, I'm a geek) it takes up a lot of time when all I really want to be focusing on is developing new apps.

In steps Azure, which offers a fully managed scalable environment for an application - be it a website, web service, or task worker. It seems that it's reached a level of maturity where a middle ground has been achieved offering enough compatibility to ensure that if an app is following today's best practices, it'll most likely work on the Azure platform. The methodology is a little different - you set up applications as roles, and then configure instances to run those roles on. The configuration let's you select server capabilities and whether you want local, NTFS-compatible instance storage.

For someone who believes in the .net way of doing things, Azure sounds very comprehensive and exciting - being able to write both windows-service-style apps as well as websites, and have a very quick time from coding to large-scale deployment. As I investigate using Azure within a production development team for high-traffic sites, I'll be posting a series of best practices I find along the way, hopefully involving:
  • Which apps will and won't work in Azure, and creative ways we can get around any which won't.
  • What the best migration plan is for existing code and data.
  • How to effectively deploy: integration with development, build and staging processes. (We've only just completed putting in a great deployment process for our sites on windows on EC2, but hopefully the lessons learnt here will be transferrable.)
As it turns out, there's a rumour going around that RDP access will be enabled for Azure instances. I'm guessing (and hoping) though, that given how managed the Azure guest operating systems are at the moment, that even if this is enabled there will still be a big element of self-management within the platform.

The killer feature will be whether not only is there enough of a management API to match something like Amazon's, where you can for example turn hosting services on or off, but whether you can also use an API to fully manage your application's lifecycle including deployment and scalability. It does look like the potential is there...