I've been asked about the University's Research Data Initiative and what effect it will have on the work of Applications Division. I'm not actually involved with the initiative but I do know some of the background, so in this post I will attempt to explain briefly what research data is about in general and why it is important, then give a few links about the University's initiative.
The root of all this is the computers and networking are introducing new ways for to academics to undertake and share their research. Sensors, microscopes, telescopes, particle accelerators, satellites, social media, audio and video can all produce vast amounts of data that can be of interest to researchers. We can store, search and analyse petabytes of data, and make data available for others to share.
For example, astronomers are collecting "sky surveys" - images of the entire sky, in various wavelengths (radio, Xray, visual, infra-red, ...), matching the objects between these surveys, and scanning the data for interesting anomalies. Particle physicists are collecting vast amounts of data from the Large Hadron Collider and associated simulations of particle interactions. Digital microscopes, MRI scanners, and similar devices created detailed (and therefore large) image files. Satellites and ocean sensors collect climate data. Biologists are mapping genes of many species, comparing the genetic codes of individuals to understand genetic variation, and modelling how proteins created from genes map to the phenotype. Actual maps are used in geosciences and in many social sciences. Social sciences also share census information and less formal data, in some cases from twitter feeds or other social media. The expressive arts are creating digital copies of important artworks. Archaeologists are creating virtual models of interesting sites. The performing arts can store videos of performances. And so on.
This has been described as a "fourth paradigm" for science. First there was experimental science, then theoretical science. These were followed by computational science - the use of complex models running on parallel multiprocessors, as done by EPCC at Edinburgh for example. The fourth paradigm, if you believe the story, is data-intensive science. This was a major aspect of the UK e-Science initiative.
http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_jim_gray_transcript.pdf
http://en.wikipedia.org/wiki/E-Science
The UK research councils, who fund much of the University research in the UK, are now requiring academics to keep and make available their research data. This makes it easier for other academics to verify research and to build on previous work. Arguably, the data will become as important as the published papers.
This requires infrastructure, from the actual disks through to catalogues and indexes for searching and cataloguing the data. Our University is well-placed in this regard; we are one of the co-founders of the Digital Curation Centre, who advise academics on how to manage data, and we have the experience from the Digital Library team.
The University's Research Data Initiative will give every researcher in the University access to half a terabyte of file storage and will advise them on how to structure, describe and index that data. The IS Infrastructure division are taking the lead on the storage aspects and the Library are leading on the metadata side. You can read more information at the following links.
http://www.itc.isg.ed.ac.uk/docs/open/Paper_A_business_case_RDS_RDM_Feb2012_penultimate.pdf
http://datablog.is.ed.ac.uk/2013/05/21/background-to-the-research-data-blog/
As this work is being led by other divisions within IS, I expect that it won't have an immediate effect on Applications Division. At a guess, one area where we might become involved is if links to research data become part of the University's REF submissions or of an individual researcher's profile, in which case Pure would need to process and display this information, but this is conjecture. We might also be involved if particular research projects require advice or support for databases. Although much research data is stored in files, the metadata is typically held in databases and we could help researchers set these up.
The root of all this is the computers and networking are introducing new ways for to academics to undertake and share their research. Sensors, microscopes, telescopes, particle accelerators, satellites, social media, audio and video can all produce vast amounts of data that can be of interest to researchers. We can store, search and analyse petabytes of data, and make data available for others to share.
For example, astronomers are collecting "sky surveys" - images of the entire sky, in various wavelengths (radio, Xray, visual, infra-red, ...), matching the objects between these surveys, and scanning the data for interesting anomalies. Particle physicists are collecting vast amounts of data from the Large Hadron Collider and associated simulations of particle interactions. Digital microscopes, MRI scanners, and similar devices created detailed (and therefore large) image files. Satellites and ocean sensors collect climate data. Biologists are mapping genes of many species, comparing the genetic codes of individuals to understand genetic variation, and modelling how proteins created from genes map to the phenotype. Actual maps are used in geosciences and in many social sciences. Social sciences also share census information and less formal data, in some cases from twitter feeds or other social media. The expressive arts are creating digital copies of important artworks. Archaeologists are creating virtual models of interesting sites. The performing arts can store videos of performances. And so on.
This has been described as a "fourth paradigm" for science. First there was experimental science, then theoretical science. These were followed by computational science - the use of complex models running on parallel multiprocessors, as done by EPCC at Edinburgh for example. The fourth paradigm, if you believe the story, is data-intensive science. This was a major aspect of the UK e-Science initiative.
http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_jim_gray_transcript.pdf
http://en.wikipedia.org/wiki/E-Science
The UK research councils, who fund much of the University research in the UK, are now requiring academics to keep and make available their research data. This makes it easier for other academics to verify research and to build on previous work. Arguably, the data will become as important as the published papers.
This requires infrastructure, from the actual disks through to catalogues and indexes for searching and cataloguing the data. Our University is well-placed in this regard; we are one of the co-founders of the Digital Curation Centre, who advise academics on how to manage data, and we have the experience from the Digital Library team.
The University's Research Data Initiative will give every researcher in the University access to half a terabyte of file storage and will advise them on how to structure, describe and index that data. The IS Infrastructure division are taking the lead on the storage aspects and the Library are leading on the metadata side. You can read more information at the following links.
http://www.itc.isg.ed.ac.uk/docs/open/Paper_A_business_case_RDS_RDM_Feb2012_penultimate.pdf
http://datablog.is.ed.ac.uk/2013/05/21/background-to-the-research-data-blog/
As this work is being led by other divisions within IS, I expect that it won't have an immediate effect on Applications Division. At a guess, one area where we might become involved is if links to research data become part of the University's REF submissions or of an individual researcher's profile, in which case Pure would need to process and display this information, but this is conjecture. We might also be involved if particular research projects require advice or support for databases. Although much research data is stored in files, the metadata is typically held in databases and we could help researchers set these up.
Comments