Swiss Data Science Center: Empowering data-driven science
16.05.2019 | Anna Ettlin
Data science methods have applications in many different fields in academia and industry. To foster the adoption of data-driven science, the Swiss Data Science Center was founded two and a half years ago. Jointly backed by ETH Zurich and EPFL and supported by the computer science departments of both universities, the centre has already made significant progress towards its goal to make data science more open, transparent and accessible in both academia and industry.
In 2012, Harvard Business Review called the role of the data scientist external page ‘the sexiest job of the 21st century’, catapulting the term ‘data science’ into the mainstream. But there’s more to data science than the media hype: its methods – which combine mathematics, statistics and computer science – can be applied to solve various problems in scientific research and industry. The ETH Board has named data science as one of its four strategic focus areas for the years 2017 to 2020, so that Switzerland may profit from the new possibilities created by digitalisation.
As an essential step towards that goal, ETH Zurich and EPFL jointly established the Swiss Data Science Center (SDSC) in January 2017. The centre, located in both Zurich and Lausanne and supported equally by both universities, serves to facilitate research by fostering collaboration between data scientists and experts from different disciplines. “The mission of the centre is to help with the adoption of data science and machine learning methods in academia and industry”, says Olivier Verscheure, Executive Director of SDSC. “We do this by connecting parties that wouldn’t otherwise come into contact with each other.”
Two years into the SDSC’s operations, this approach is already showing first successes: the first academic projects have reached the halfway mark, industry collaborations are ramping up, and the software platform for data science the centre has created is drawing international interest. But how does the SDSC fit in with existing research groups and data science consulting companies?
Filling the gaps
The world of data science is not easy to navigate; the data scientist, the owner of the data, and the researcher who stands to learn the most from the data, are often three different stakeholders, each with their own interests and concerns. “When a medical or environmental research group wants to apply data science methods and machine learning to their research, they cannot always collaborate with data science research groups because the data scientists have to focus on their own research and publications and, although interested, might not be able to spare the resources to help out”, Verscheure explains.
"The SDSC enables a unique synergy between academia and industry in both data science and across carefully selected domains."Olivier Verscheure
This is where the Swiss Data Science Center comes in, complementing the excellent computer science departments of ETH Zurich and EPFL.
The SDSC has a dedicated team of around 15 data scientists distributed among the offices in Zurich and Lausanne. They work on academic and, more recently, industry projects in collaboration with various research groups and companies. “It is important to understand that the centre is neither an academic lab nor a consulting firm”, Verscheure says. “The SDSC enables a unique synergy between academia and industry in both data science and across carefully selected domains. This will allow the centre to foster scientific breakthroughs with a significant impact on society.”
Data science where it’s needed
Collaboration between external scientists and SDSC data scientists is key to academic projects at the centre. Currently, the centre also works as a funding agency: two-thirds of the funding it receives from the ETH Board is funnelled back into scientific projects. Once a year, the centre calls for project proposals that would apply data science methods to other areas of scientific research. The most promising projects are fully funded by the centre, usually for two years, and are assigned an SDSC data scientist. The data scientist and the project initiator adapt and apply the right data science methods to the research questions at hand and jointly publish the results.
Although the idea of collaborative projects was initially unfamiliar to the research groups, Verscheure observes that most project teams have come to appreciate them, and the centre is in demand with scientists. Eighteen projects were accepted in the first call for proposals at the end of 2017 and are now past the halfway mark. Ten more projects are just starting out, having been accepted a few months ago.
"Machine learning can contribute to new insights, since many scientific problems today involve large and highly complex data sets."Andreas Krause
The academic projects showcase the true breadth of applications of data science. “The biggest focus areas are health and life sciences, as well as environmental sciences”, says Andreas Krause, Academic Co-Director of the SDSC and professor at the Department of Computer Science at ETH Zurich. For data science, the sky’s the limit: other projects focus on cosmology, political and social sciences and even architecture. Krause, himself a machine learning expert who helped found the SDSC, sees great potential in the application of data science methods to other scientific domains. “Machine learning may not solve everything, but it can contribute to new insights, especially since many scientific problems today involve large and highly complex data sets”, he elaborates.
Future-proofing the industry
Having established its academic operations, the SDSC has also started branching out into collaborations with industry, where the demand for data science is equally great. “Tech companies like Google and Facebook obviously do not need us – they have their own teams of data scientists”, says Olivier Verscheure. Instead, the centre focuses on traditional industries, such as manufacturing, banking and biopharmaceutical companies. “There are many traditional companies in Switzerland that are leading worldwide. Some of them are over a hundred years old”, elaborates Verscheure. “If they miss the turn towards digitalisation, the consequences would be dire.”
Within industry as within academia, the SDSC differentiates itself from consulting firms that offer ready-made solutions. Instead, the centre aims to bridge the gap between data scientists looking for a job in traditional industries and the industries themselves. Olivier Verscheure explains the issues young data scientists coming straight from university might face: “The companies and the data scientists have completely different expectations. The company is looking for someone who will radically disrupt their business model. The data scientists expect to do deep mathematics and machine learning, but they often face difficulties even getting access to the right data. They are isolated from their peers and surrounded by people who might be reluctant to accept data science or are unable to understand the need for it owing to communication barriers. Because of this, traditional companies often have trouble retaining data scientists for more than a year.”
"The companies and the data scientists have completely different expectations."Olivier Verscheure
The SDSC offers both parties the opportunity to learn from, and adapt to, each other by hiring data scientists for the companies. “Together with the companies, we look for data scientists best suited to their needs”, Verscheure explains. “The company finances the data scientist, but we are the ones who actually hire them.” Thus, the data scientist remains embedded in a team of peers, where they can still learn about data science and stay current in the fast-moving field. Their time is spent working on company projects, so they also learn the inner workings of the industry and the communication skills necessary to convey data science concepts to stakeholders at the company.
“Our goal is that a year or two from now, the company will be ready to hire the data scientist directly – and the data scientist will know what to expect”, says Verscheure. “Very soon, the majority of data science jobs will be within traditional industries, so it’s important that we bridge this final gap between the world of academia and industry.” The demand from industry is high, and the relatively young industry cell of the centre already collaborates with companies such as Bühler Group.
A platform for open science
Data science faces another challenge: for the results to be reproducible, the data, as well as the algorithms and computational resources used to analyse it, should be made available to other researchers. However, the data is often sensitive, the computational resources too valuable and the code might not run the same way a few months or years later owing to software changes. This limits data-driven research.
The Swiss Data Science Center aims to address these issues as well, especially since it, too, faces them in its daily work. For this purpose, a dedicated team of software engineers at the centre is developing a software platform called RENKU. Named after a Japanese form of collaborative poetry, RENKU is an open platform that facilitates data science collaborations by storing and tracking data, the methods applied to it and the results gathered, as well as managing computational resources and access to data. “With RENKU, we aim to make scientific research more open, transparent and reproducible, and to provide researchers with access to data and computational resources such as SWITCH and the Swiss National Supercomputing Centre in Lugano”, elaborates Andreas Krause.
In the two years since its founding, the team at the SDSC has already produced a first working version of RENKU and is now working to implement additional features. “We hope that RENKU will help foster the adoption of data-driven methods, lowering the bar for researchers of different backgrounds to work together”, states Olivier Verscheure. To this end, the centre uses RENKU in its academic and industry projects. The unique solution has already attracted international interest, with renowned universities considering operating their own RENKU instances to make their data and data-driven research more transparent. “Our next goal is to increase the adoption of the platform”, says the Executive Director. “Imagine a network of RENKU instances between ETH, EPFL and other world-class universities, where scientists can seamlessly exchange data without losing ownership, build on each other’s research and be credited for what they bring to the community.”
Unique strengths
With nearly thirty scientific projects, several industry collaborations and a working software platform, the SDSC has made significant progress towards its goals in the two and a half years since its founding. Both Olivier Verscheure and Andreas Krause were positively surprised by the speed and efficiency with which the project took off. “The people in Switzerland are doers”, says the Executive Director. “They are very pragmatic, and once they reach a consensus, they put it into action quickly and efficiently.” Andreas Krause adds: “We have succeeded in attracting top talents with a wide range of expertise on machine learning, signal processing, systems, privacy and security and so on – and the market is highly competitive.”
"We have succeeded in attracting top talents with a wide range of expertise on machine learning, signal processing, systems, privacy and security and so on."Andreas Krause
Spread as it is between the offices in Zurich and Lausanne, the diverse SDSC team has also fostered the spirit of collaboration between the two federal universities. This partnership is unique in the world of data science. “While data science centres are popping up left, right, and centre, most international universities have their own centre, compartmentalising data science”, Verscheure explains. “The SDSC benefits from a unique positioning, with its own team of scientists and two top universities joining forces.” The centre gets the best of both worlds: a dedicated team and a steering committee comprised of representatives of the entire ETH Domain, not only from the computer science departments, but also from mathematics and engineering, thus ensuring an integral approach to data science. The SDSC also contributes to education at ETH Zurich and EPFL, both to the Master’s programme in Data Science and to the DAS and CAS continuing education programmes.
Looking forward
With these core strengths established, the SDSC is ready to face the challenges of tomorrow. “We are still growing and expanding our collaborations with academia and industry. The Center will strive to expand beyond the ETH Domain and become a truly national institute for data science and AI services. We have also initiated discussions at the international level”, says Olivier Verscheure. “In the next years, we are hoping to increase adoption of the RENKU platform, grow our industry cell and produce the first scientific breakthroughs in the academic projects.” In the long term, the Executive Director aims to stop relying primarily on funding from the ETH Board for scientific projects. “We started out by funding projects to demonstrate to the scientific community how they can profit from collaborations with us”, he explains. “In the future, we hope that the research groups will come to us and we will jointly apply for funding to the Swiss National Science Foundation, Horizon 2020 or similar agencies.”
Has the ETH Board’s goal to prepare Switzerland for digitalisation been fulfilled? “We have set the process in motion”, says Andreas Krause. “But data science is causing such a fundamental change in the academic world, in industry and in society as a whole, that we certainly can’t rest on our laurels just yet. There is a huge potential for the centre to be a catalyst, bringing together domain experts, data scientists in the centre and basic researchers in data science to have impact in ways not previously possible.”
Examples of academic projects at the SDSC
Deep Learning for Observational Cosmology – DLOC
Scientists from the Department of Computer Science and the Department of Physics at ETH Zurich are working with the Swiss Data Science Center to explore ways to use machine learning methods to enhance the analysis of cosmological data. In particular, one of the goals is to create a generative model that accelerates some of the computationally expensive cosmological simulations.
external page Read more
A Research Platform for Data-Driven Democracy Studies in Switzerland – DemocraSci
Initiated by researchers from the Department of Management, Technology and Economics at ETH Zurich and the Department of Informatics at the University of Zurich, this project aims to make data science methods applicable to political science by developing a document processing and analysis chain for documents of parliamentary proceedings from the last 125 years. This project has attracted the interest of the Federal Government in Bern.
external page Read more
Delivering Added-value To Antarctica – ACE-DATA
Researchers from the EPFL/Swiss Polar Institute, the British Antarctic Survey, the PSI and others are working with the Swiss Data Science Center to combine data gathered by different research groups during expeditions to the Southern Ocean into open-access data sets, allowing cross-disciplinary, data-driven science.
external page Read more