Skip to main content

Grids, data centres and reliability

In my work with the Grid Computing Now! Knowledge Transfer Network, I talk about "virtualisation" and "service-oriented architecture" just as much as "grid" itself. People sometimes ask what is the difference between these concepts. My first answer is perhaps rather glib - I say that I don't care as long as the technology gets the job done. Although this is not a straight answer, those of us on the GCN! team believe it is important to put the business answers before any notion of technological purity.

But if we turn to the question as stated, I think that as long as a solution includes the key concepts of virtualised resources and dynamic allocation of applications across those resource, then that to me is enough to call the system a grid. But, of course, we can go further.
A recent conversation reminded me of the important point that distributed systems typically have to manage failure. As systems scale to many machines and many sites, then some of those are going to fail some of the time. The systems have to be resilient enough to adapt and recover. Systems also have to cope with additions and deletions from the set of available resources.

This is most obvious in cycle-stealing grids, which use spare power of desktop PCs, and of scientific grids, which link many research sites across the world. The interesting question is whether this also applies to the data centre. That seems to depend to some extent on how the system is designed. For example, Google is built specifically around this approach; they have always used lots of generic systems and just replaced resources when they fail. I believe Ebay's massive server farms use the same dynamic approach.

This question arose in a conversation I had with Liam Newcombe, an independent consultant. We were supposed to be talking about Green IT (of which more another time), but our discussion wandered to include all sorts of ideas. Liam is working on an open source model of data centre reliability and performance. He believes that reliability is best achieved by adopting this approach of explicitly allowing for it within the software - rather than, for example, attempting to make the hardware itself ultra-reliable.

A key question must be how high up the stack does this awareness have to extend? Can we write applications without worrying about this or does every application have to have some potential adaptability built in? It's a fascinating topic and I look forward to reading the book the Liam is co-authoring, in due course.

Comments

Popular posts from this blog

Changing Principles

In EA, architecture principles set a framework for making architectural decisions.  They help to establish a common understanding across different groups of stakeholders, and provide guidance for portfolios and projects.  Michael Durso of the LSE gave a good introduction to the idea in a webinar last week for the UCISA EA community.

Many organisations take the TOGAF architecture principles as a starting point.  These are based on the four architectural domains of TOGAF: business, information/data, applications, technology/infrastructure.  These principles tend to describe what should be done, e.g. re-use applications, buy in software rather than build it, keep data secure.  See for example the principles adopted at Plymouth University and the University of Birmingham.

Recently though, I encountered a different way of looking at principles.  The user experience design community tend to focus more on how we should do things.  E.g. we should start with user needs, use iterative developm…

Why the UCISA Capability Model is useful

What do Universities do?

This may seem a strange question to ask and the answer may seem obvious.  Universities educate students and undertake research.  And perhaps they work with industrial partners and create spin-off companies of their worn.  And they may work with local communities, and affiliation bodies for certain degress, and they definitely report on their activities to government bodies such as HEFCE.  They provide student services and support.  The longeryou think about it, the more things you can think of that a University does.

In business, the things that an organisation does are called "capabilities", which is a slightly strange term.  I think it is linked to the HR idea of a combination of the CAPacity and ABILITY to do a task.  Whatever the name, it is a useful concept.  A capability is more basic than a process: a University may change the way it educates students but as long as it remains a University it will educate them one way or another.

A capability …

"No more us & them"

WonkHE recently posted an interesting opinion piece with the title Academics and Administrators: No more ‘us and them’. In that post, Paul Greatrix rebutted criticisms of professional services (administrative) staff in Universites from some academics. To illustrate his point, he quoted recent articles in which administrators were portrayed as a useless overhead on the key tasks at hand (teaching and research).

This flows both ways, as Greatrix himself points out. As Enterprise Architect, I work with Professional Services colleagues and I have heard some of them express opinions that clearly fail to understand the nature of academic work. Academics cannot be treated as if they were factory workers, churning out lectures on a treadmill.

I think these comments reveal a fundamental clash of ideas about how a University should work. Some people who come into management positions for other sectors tend to frame the University as a business, with students and research funders as customers t…