Skip to main content

Grids, data centres and reliability

In my work with the Grid Computing Now! Knowledge Transfer Network, I talk about "virtualisation" and "service-oriented architecture" just as much as "grid" itself. People sometimes ask what is the difference between these concepts. My first answer is perhaps rather glib - I say that I don't care as long as the technology gets the job done. Although this is not a straight answer, those of us on the GCN! team believe it is important to put the business answers before any notion of technological purity.

But if we turn to the question as stated, I think that as long as a solution includes the key concepts of virtualised resources and dynamic allocation of applications across those resource, then that to me is enough to call the system a grid. But, of course, we can go further.
A recent conversation reminded me of the important point that distributed systems typically have to manage failure. As systems scale to many machines and many sites, then some of those are going to fail some of the time. The systems have to be resilient enough to adapt and recover. Systems also have to cope with additions and deletions from the set of available resources.

This is most obvious in cycle-stealing grids, which use spare power of desktop PCs, and of scientific grids, which link many research sites across the world. The interesting question is whether this also applies to the data centre. That seems to depend to some extent on how the system is designed. For example, Google is built specifically around this approach; they have always used lots of generic systems and just replaced resources when they fail. I believe Ebay's massive server farms use the same dynamic approach.

This question arose in a conversation I had with Liam Newcombe, an independent consultant. We were supposed to be talking about Green IT (of which more another time), but our discussion wandered to include all sorts of ideas. Liam is working on an open source model of data centre reliability and performance. He believes that reliability is best achieved by adopting this approach of explicitly allowing for it within the software - rather than, for example, attempting to make the hardware itself ultra-reliable.

A key question must be how high up the stack does this awareness have to extend? Can we write applications without worrying about this or does every application have to have some potential adaptability built in? It's a fascinating topic and I look forward to reading the book the Liam is co-authoring, in due course.

Comments

Popular posts from this blog

2016 has been a good year

So much has happened over the last year with our Enterprise Architecture practice that it's hard to write a succinct summary.  For my day-to-day experience as enterprise architect, the biggest change is that I now have a team to work with.  This time last year, I was in the middle of a 12-month secondment to create the EA practice, working mainly on my own.  Now my post has been made permanent and I have recruited two members of staff to help meet the University's architectural needs.

I have spent a lot of the year meeting people, listening to their concerns and explaining how architecture can help them.  This communication remains vital, the absolute core of what we do and we will continue to meet people in this way.  We also talk to people in other Universities in order to learn from what they are doing and to share our own experience back.  A highlight in this regard was my trip to the USA last January.

Our biggest deliverable for the past year was the design of the data wa…

A new EA Repository

One of my goals since starting this job two years ago has always been to create a repository for architecture documents.  The idea is to have a central store where people can find information about the University's applications, data sources, business processes, and other architectural information.  This store will make it easier for us to explain our plans, to show the current state of the University's information systems, and to explain what Enterprise Architecture is all about.

It's taken a long time to reach this goal, mainly because we're often had more pressing and immediate work to be done.  The creation of a repository is one of those tasks that is very important but never quite urgent.  So I'm now very happy to say that we are in the process of deploying a repository and modelling tool.


This is the culmination of a careful process to select the most appropriate tool for our needs.  We began by organising several workshops to gather requirements from a rang…

A brief summary of our major initiatives

I notice that in 2016 I wrote 34 posts on this blog.  This is only my fifth post in 2017 and we're already three-quarters of the way through the year.  Either I've suddenly got lazier, or else I've had less time to spend writing here.  As I'm not inclined to think of myself as especially lazy, I'm plumping for the latter explanation.

There really is a lot going on.  The University has several major initiatives under way, many of which need input from the Enterprise Architecture section.

The Service Excellence programme is overhauling (the buzzword is "transforming") our administrative processes for HR, Finance, and Student Administration.  Linked to this is a programme to procure an integrated ERP system to replace the adminstrative IT systems. 

Enabling Digital Transformation is a programme to put the middleware and architecture in place so that we can make our processes "digital first".  We're implementing an API framework, a notification…