Maximizing 21st century technologies for knowledge

'Reaching for the stars' has become a down-to-earth endeavor: data-driven, information-rich and fast-paced results in information technology are at the very heart of astronomy. Data-based science is becoming recognized as a stand-alone research area - ranking with theory and experimentation. Astroinformatics is a new discipline emerging at the intersection of astronomy/astrophysics, applied computer science and advanced statistics.

21.01.2015 by Minh Tran, Amanda Caracas

Read
Number of comments

"The history of astronomy is the history of receding horizons". Claimed nearly a century ago by astronomer and eponym of the Hubble Space Telescope, Edwin Hubble, this phrase still holds truth for modern-day research. Powerful telescopes scan the skies in a quest to find answers to the fundamental questions about the origin of life, space - the universe. The data collected by these high-performance instruments has reached overwhelming proportions and the plethora of new data both enables and challenges cosmological research. Scientists increasingly rely on the data-intensive discipline of computer science to organize, explore, visualize and mine data for new astronomical discoveries. The petabyte age of big data has ingested astronomy, creating a new data-intensive science paradigm of 21st-century research, named 'astroinformatics'. Where until recently, questions such as 'Is this object a star or a galaxy?' or 'What do billions of extrasolar planets have in common?' could only be answered by arduously studying photographic plates, it is now possible to gain deep, quick and accurate insight into the works of the universe. The synthesis of computer science and astronomy challenges the boundaries of present knowledge, enables scientific breakthrough and reveals new horizons for leading-edge research.

Systems which are used to capture and process astronomical data increasingly need to possess the ability to function well and accommodate to changes in size or volume to meet the researcher's needs. 'Scalable' and 'parallel computing' play major roles in astroinformatics in order to obtain results from exponential sets of data. Tasks can be carried out in parallel, with multiple calculations happening simultaneously, reducing execution time. Although computing platforms are able to scale to thousands of cores, they are also more complex and create steep learning curves for scientists. As these platforms advance, the workflow becomes less efficient. Data processing and programming practices cannot keep up with new processing techniques and hardware advances. The cycle of a narrower data set scope, higher platform complexity and increased available data is constantly accelerating.

Prof. Gustavo Alonso and his PhD student Stefan Müller state that practical tools which provide a flexible interface between the application domain and parallel computing can enable researchers to accomplish their analyses in a simple and convenient way without having to modify the parallel computing code. By just inserting two decorators (annotations) in sequential Python code, application developers can use Pydron to automatically parallelize their code. The programming language Python is popular among astronomers due to its powerful libraries, such as SciPy. Using Pydron has three major advantages: it parallelizes new code, improves existing code and helps with data reduction. Astronomy processing codes change continuously as scientists work on data and refine their methods. With Pydron, scientists can write sequential code which is easier to maintain and adapt, and increase productivity. Pydron semi-automatically parallelizes the code, scales the execution and requires minimal changes to the code. In an experiment, Pydron detected and measured sources in 576 images from the Hubble Space Telescope, each approximately 250 Mbytes. Pydron's execution time decreased from more than four hours to less than three minutes – a 100-plus reduction factor.

Additional Information

Chart explaining how Pydron works — Pydron translates Python into an intermediate data-flow graph representation. The graph is then evaluated by a scheduler sub-system which uses a distribution sub-system to execute individual tasks to the worker-nodes.