|
A project for scientific data preservation in France
|
|
PREDON as a MADICS action
The group is member of the
GdR "MADICS" (Masses de Données, Informations et Connaissances en Sciences, Big Data - Data Science).
PREDONx Workshops
Extended workshops on Data Preservatoin (PREDONx) are organised ona yearly basis. Stay tuned for more infomation and contact Cristinel Diaconu diaconu(
at)cppm.in2p3.fr if interested. Former editions here:
PREDON at AMU Days October 2015
Journéee Big Data PREDON/AMU: Accès, Préservation, Reproductibilité
IndexMed Workshop, October 14, 2014, Aix-Marseille University, Campus Saint-Charles, "Methods and tools for the mining of multi-sources and heterogeneous data in ecology".
Big Data PR2I AMU Day, October 15, 2015:
Predon Poster
PREDON at the International Workshop on High Performance Data Intensive Computing (HPDIC'2015)
More details about
HPDIC2015 here.
"This year, we open new directions related to the preservation of data in cooperation with the PREDON group. The preservation of scientific data remains nevertheless a challenge due to the complexity of the data structure, the fragility of the custom-made software environments as well as the lack of rigorous approaches in workflows and algorithms."
Workshop PREDON APC Paris 4/5 Novembre 2014
PREDONx 2014 workshop took place in Paris at APC. The web site and the agenda can be found here:
https://indico.cern.ch/event/338461/
Registrations are open, please send you proposal for contributions (there is no participation fee, limited support for travel in France is available).
Scientific Data Preservation 2014
January 21st, 2014: PREDON document "Scientific Data Preservation", a facts finding white paper produced following 2012/2013 workshops is available
here.
"Data observatories, based on open access policies and coupled with multi-disciplinary techniques for indexing and mining may lead to truly new paradigms in science. It is therefore of outmost importance to pursue a coherent and vigorous approach to preserve the scientific data at long term. The preservation remains nevertheless a challenge due to the complexity of the data structure, the fragility of the custom-made software environments as well as the lack of rigorous approaches in workflows and algorithms. [...]"
"The present document includes contributions form the participants to the PREDON Study Group, as well as invited papers, related to the scientific case, methodology and technology. This document should be read as a “facts finding” resource pointing to a concrete and significant scientific interest for long term research data preservation, as well as to cutting edge methods and technologies to achieve this goal. A sustained and coherent and long term action in the area of scientific data preservation would be highly beneficial."
Challenges
Scientific data collected with modern sensors or dedicated detectors exceed very often the perimeter of the initial scientific design. These data are obtained more and more frequently with large material and human efforts. A large class of scientific experiments are in fact unique because of their large scale, with very small chances to be repeated or superseded by new experiments in the same domain: for instance high energy physics and astrophysics experiments involve multi-annual and even multi-decades developments, unlikely repeatable. Other scientific experiments are in fact unique by nature: earth science, medical sciences etc. since the collected data is “time-stamped” and thereby non-reproducible by new experiments or observations. This new knowledge obtained using these data (“data observatories”) should be preserved long term such that the access and the re-use are made possible and lead to an enhancement of the initial investment. It is therefore of outmost importance to pursue a coherent and vigorous approach to preserve the scientific data at long term. The preservation remains nevertheless a challlenge due to the complexity of the data structure, the fragility of the custom-made software environments as well as the lack of rigorous approaches in workflows and algorithms.
Mission
One of the main missions of this project is to enforce the efforts to preserve the scientific data in France. The proposed research program as well as the associated events and workshops should be seen as ingredients towards the creation, at national level and in strong connection with the international organisation, of scientific data infrastructures for long term preservation and access.
To address the challenges listed before, the PREDON consortium proposes a research program aimed at solving the most urgent problems which presently lead to a large number of orphaned (and therefore lost) scientific data sets:
• Technological and methodological specific issues for long term data preservation
• Data mining algorithms and worksflows for large/big scientific data sets
• Experimental interfaces, formats for scientific data collection, analysis and exploitation at long term.
The approach proposed by the present project is based on a number of principles which we believe essential for a widely accepted, robust and sustainable scientific data preservation at long term:
• Multi-disciplinarity and unification
• Open access
• International connection
Structure and contact
The members of this project are scientists from IN2P3, INSU, INS2I, IRD, CINES (more details soon).
contact: Cristinel Diaconu diaconu(
at)cppm.in2p3.fr
The PREDON project is supported by the Inter-disciplinary Mission of CNRS.
PREDON is structured in 4 working groups:
Working Package |
Objectives |
Participants (*coordinator) |
WP1 Technologies and Methodologies |
Explore methodologies and technologies suitable for a coherent and robust scientific data preservation in a multi-disciplinary context and on a multi-platform computing centre |
CINES* APC |
WP2 Algorithms and Workflows |
Investigate generic and mathematically robust workflows and algorithms for data mining suited for data and workflow preservation; data- and process-based workflows and mining techniques to be used in a multi-disciplinary environment towards long term data preservation |
LAM LIRMM LIPADE* LIPN |
WP3 Data formats and interfaces |
A parallel approach for data collection, storage, processing, analysis and preservation with the aim to achieve common standards for scientific data treatment |
APC CPPM LAM* LPSC |
WP4 General coordination |
Program coordination, dissemination, communication and cooperation |
CPPM* |
Workshops and future events
- 2012: The first PREDON workshop was held in December 2012.
- 2013: An extended Workshop PREDONx will take place in 14/15 November 2013 in Marseille. Web page , agenda and registration here.
- 2014: "DPHEP Full Costs of Curation" workshop: https://indico.cern.ch/conferenceDisplay.py?confId=276820
- 2014 : LOPS@ICDE Workshop on LOng term Preservation for big Scientific data. LOPS will be held in conjunction with the 30th IEEE International Conference on Data Engineering. Chicago, IL, USA. March 31-April 4, 2014.
- iPRES 2014: The call for contributions for iPRES 2014, to be held in Melbourne in October, is now open: http://ipres2014.org/call-contributions The iPRES 2014 Coordinating Committee invites contributions of papers, posters, demonstrations, tutorials and workshops related to the increasingly broad world of digital preservation.
- EGI Community Forum: Call for participation: http://cf2014.egi.eu/programme/cfp.html]] Submission of abstracts at: http://go.egi.eu/CF2014-CfP
- FYI: there is a track on data and knowledge preservation:
Data and knowledge preservation and curation (Track Leaders: J. Shiers, A. Fresa) |
This track focuses on applications in data and knowledge preservation and curation and discusses best practices, lessons learnt, shared solutions and common challenges, covering all fields of research. The track will also address the technical and non-technical aspects of using e-infrastructures for data preservation and curation. The convenors are looking for submissions concerning, for example, workflow management, skills improvement, global services, solutions with multidisciplinary applications, business cases, amongst others. Contributors are encouraged to present their experiences, also in terms of concrete stories to be shared with other participants. Demonstrations are particularly welcome. |
-
The H2020 calls, more details on the topic that explicitly mentions "preservation" can be found at: http://ec.europa.eu/research/participants/portal/desktop/en/opportunities/h2020/topics/2137-einfra-1-2014.html The deadline for submission is September 2014.
- RDA Plenary March 2014: https://rd-alliance.org/rda-third-plenary-meeting.html
-
IDCC14: "Commodity, catalyst or change-agent? Data-driven transformations in research, education, business & society” 24-27 February 2014, Omni San Francisco Hotel, California Street, San Francisco, USA http://www.dcc.ac.uk/events/idcc14/]]
Recent talks
- Talk at MASTODONS colloque , January 23/24, 2014, Institut de Physique du Globe, Paris (see reference below).
- CHEP2013 October 13-18, 2013, International Conference on "Computing in High Energy Physics" :
- C. Diaconu "PREDON: a project for scientific data preservation in France" (presentated in parallel session: agenda)
- FreDocs : October 7-10, 2013, Le réseau Renatis, réseau national des professionnels de l’information scientifique du CNRS, tiendra ses prochaines rencontres FRéDoc 2013 du 7 au 10 octobre à Aussois sur le thème de "Gestion et valorisation des données de la recherche". Ces journées réuniront des professionnels de l’IST, administrateurs de systèmes d’information, responsables qualité, chercheurs et autres acteurs du monde scientifique du CNRS et d’autres établissements de recherche et d’enseignement supérieur et des intervenants (...)
Reference documents
- MADICS Logo: