OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data ProcessingWei, Shoulin; Wang, Feng; Deng, Hui; Liu, Cuiyin; Dai, Wei; Liang, Bo; Mei, Ying; Shi, Congming; Liu, Yingbo; Wu, Jingping
doi: 10.1088/1538-3873/129/972/024001pmid: N/A
The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.
Visibility Estimation for the CHARA/JouFLU Exozodi SurveyNuñez, Paul D.; Brummelaar, Theo ten; Mennesson, Bertrand; Scott, Nicholas J.
doi: 10.1088/1538-3873/129/972/024002pmid: N/A
We discuss the estimation of the interferometric visibility (fringe contrast) for the Exozodi survey conducted at the CHARA array with the JouFLU beam combiner. We investigate the use of the statistical median to estimate the uncalibrated visibility from an ensemble of fringe exposures. Under a broad range of operating conditions, numerical simulations indicate that this estimator has a smaller bias compared with other estimators. We also propose an improved method for calibrating visibilities, which not only takes into account the time interval between observations of calibrators and science targets, but also the uncertainties of the calibrators’ raw visibilities. We test our methods with data corresponding to stars that do not display the exozodi phenomenon. The results of our tests show that the proposed method yields smaller biases and errors. The relative reduction in bias and error is generally modest, but can be as high as for the brightest stars of the CHARA data and statistically significant at the 95% confidence level (CL).
Morphology-based Query for Galaxy Image DatabasesShamir, Lior
doi: 10.1088/1538-3873/129/972/024003pmid: N/A
Galaxies of rare morphology are of paramount scientific interest, as they carry important information about the past, present, and future Universe. Once a rare galaxy is identified, studying it more effectively requires a set of galaxies of similar morphology, allowing generalization and statistical analysis that cannot be done when . Databases generated by digital sky surveys can contain a very large number of galaxy images, and therefore once a rare galaxy of interest is identified it is possible that more instances of the same morphology are also present in the database. However, when a researcher identifies a certain galaxy of rare morphology in the database, it is virtually impossible to mine the database manually in the search for galaxies of similar morphology. Here we propose a computer method that can automatically search databases of galaxy images and identify galaxies that are morphologically similar to a certain user-defined query galaxy. That is, the researcher provides an image of a galaxy of interest, and the pattern recognition system automatically returns a list of galaxies that are visually similar to the target galaxy. The algorithm uses a comprehensive set of descriptors, allowing it to support different types of galaxies, and it is not limited to a finite set of known morphologies. While the list of returned galaxies is neither clean nor complete, it contains a far higher frequency of galaxies of the morphology of interest, providing a substantial reduction of the data. Such algorithms can be integrated into data management systems of autonomous digital sky surveys such as the Large Synoptic Survey Telescope (LSST), where the number of galaxies in the database is extremely large. The source code of the method is available at http://vfacstaff.ltu.edu/lshamir/downloads/udat.
Cosmic Ray Removal in Fiber Spectroscopic ImageBai, Zhongrui; Zhang, Haotong; Yuan, Hailong; Carlin, Jeffrey L.; Li, Guangwei; Lei, Yajuan; Dong, Yiqiao; Yang, Huiqin; Zhao, Yongheng; Cao, Zihuang
doi: 10.1088/1538-3873/129/972/024004pmid: N/A
Single-exposure spectra in large spectral surveys are valuable for time domain studies such as stellar variability, but there is no available method to eliminate cosmic rays for single-exposure, multi-fiber spectral images. In this paper, we describe a new method to detect and remove cosmic rays in multi-fiber spectroscopic single exposures. Through the use of two-dimensional profile fitting and a noise model that considers the position-dependent errors, we successfully detect as many as 80% of the cosmic rays and correct the cosmic ray polluted pixels to an average accuracy of 97.8%. Multiple tests and comparisons with both simulated data and real LAMOST data show that the method works properly in detection rate, false detection rate, and validity of cosmic ray correction.
C 3, A Command-line Catalog Cross-match Tool for Large Astrophysical CatalogsRiccio, Giuseppe; Brescia, Massimo; Cavuoti, Stefano; Mercurio, Amata; di Giorgio, Anna Maria; Molinari, Sergio
doi: 10.1088/1538-3873/129/972/024005pmid: N/A
Modern Astrophysics is based on multi-wavelength data organized into large and heterogeneous catalogs. Hence, the need for efficient, reliable and scalable catalog cross-matching methods plays a crucial role in the era of the petabyte scale. Furthermore, multi-band data have often very different angular resolution, requiring the highest generality of cross-matching features, mainly in terms of region shape and resolution. In this work we present C3 (Command-line Catalog Cross-match), a multi-platform application designed to efficiently cross-match massive catalogs. It is based on a multi-core parallel processing paradigm and conceived to be executed as a stand-alone command-line process or integrated within any generic data reduction/analysis pipeline, providing the maximum flexibility to the end-user, in terms of portability, parameter configuration, catalog formats, angular resolution, region shapes, coordinate units and cross-matching types. Using real data, extracted from public surveys, we discuss the cross-matching capabilities and computing time efficiency also through a direct comparison with some publicly available tools, chosen among the most used within the community, and representative of different interface paradigms. We verified that the C3 tool has excellent capabilities to perform an efficient and reliable cross-matching between large data sets. Although the elliptical cross-match and the parametric handling of angular orientation and offset are known concepts in the astrophysical context, their availability in the presented command-line tool makes C3 competitive in the context of public astronomical tools.
An Archive of Spectra from the Mayall Fourier Transform Spectrometer at Kitt PeakPilachowski, C. A.; Hinkle, K. H.; Young, M. D.; Dennis, H. B.; Gopu, A.; Henschel, R.; Hayashi, S.
doi: 10.1088/1538-3873/129/972/024006pmid: N/A
We describe the SpArc science gateway for spectral data obtained using the Fourier Transform Spectrometer (FTS) in operation at the Mayall 4-m telescope at the Kitt Peak National Observatory during the period from 1975 through 1995. SpArc is hosted by Indiana University Bloomington and is available for public access. The archive includes nearly 10,000 individual spectra of more than 800 different astronomical sources including stars, nebulae, galaxies, and solar system objects. We briefly describe the FTS instrument itself and summarize the conversion of the original interferograms into spectral data and the process for recovering the data into FITS files. The architecture of the archive is discussed and the process for retrieving data from the archive is introduced. Sample use cases showing typical FTS spectra are presented.
A New Approach to the Internal Calibration of Reverberation-Mapping SpectraFausnaugh, M. M.
doi: 10.1088/1538-3873/129/972/024007pmid: N/A
We present a new procedure for the internal (night-to-night) calibration of timeseries spectra, with specific applications to optical AGN reverberation mapping data. The traditional calibration technique assumes that the narrow [O iii] λ5007 emission-line profile is constant in time; given a reference [O iii] λ5007 line profile, nightly spectra are aligned by fitting for a wavelength shift, a flux rescaling factor, and a change in the spectroscopic resolution. We propose the following modifications to this procedure: (1) we stipulate a constant spectral resolution for the final calibrated spectra, (2) we employ a more flexible model for changes in the spectral resolution, and (3) we use a Bayesian modeling framework to assess uncertainties in the calibration. In a test case using data for MCG+08-11-011, these modifications result in a calibration precision of ∼1 millimagnitude, which is approximately a factor of five improvement over the traditional technique. At this level, other systematic issues (e.g., the nightly sensitivity functions and Feii contamination) limit the final precision of the observed light curves. We implement this procedure as a python package (mapspec), which we make available to the community.
Citations and Team SizesAbt, Helmut A.
doi: 10.1088/1538-3873/129/972/024008pmid: N/A
I explore whether small or large teams produce the most important astronomical results, on average, using citation counts as our metric. I present evidence that citation counts indicate the importance of papers. For the 1343 papers published in A&A, ApJ, and MNRAS in 2012 January-February, I considered 4.5 years worth of citations. In each journal, there are larger citation counts for papers from large teams than from small teams by a factor of about 2. To check whether the results from 2012 were unusual, I collected data from 2013 for A&A and found it to be the same as that for 2012. Could the preponderance of papers by large teams be due to self-citations (i.e., citing and cited papers sharing one or more authors)? To answer this, I looked at 136 papers with one to 266 authors and discovered a linear relation that ranges from a 12.7% self-citation rate for single-author papers to a 45.9% self-citation rate for papers with 100 authors. Correcting for these factors is not enough to explain the predominance of the papers with large teams. Then I computed citations per author. While large teams average more citations than small ones by a factor of 2, individuals on small teams average more citations than individuals on large teams by a factor of 6. The papers by large teams often have far more data, but those by small teams tend to discuss basic physical processes.
Indicators of Stellar Mass in the Photometric H-bandLester, John B.; Khatu, V. C.; Neilson, Hilding R.
doi: 10.1088/1538-3873/129/972/024201pmid: N/A
Extensive infrared spectral surveys, such as the APOGEE survey in the H-band, are now being conducted, many targeting the Galactic Bulge and recording observations of primarily red giant stars. However, because stars of different masses converge to the red giant region, the masses of single red giant stars are poorly constrained. These surveys are now using spectral resolving powers that are high enough to measure the equivalent widths of individual spectral lines, which are mostly from molecular species. Because other observations can constrain or determine the star’s luminosity and radius, we have computed spherical stellar atmospheres for a fixed luminosity and radius but for a range of masses. We then computed the H-band flux spectrum for each model and searched for spectral lines that are sensitive to mass. Our synthetic spectra reveal many lines of CO that become weaker with increasing stellar mass. To explore this, we created a ratio of equivalent widths using a representative, unblended CO line and an unblended OH line that did not vary with mass. We found that this ratio varied about 30% over the mass range from to . We repeated the spectral analysis using spherical model stellar atmospheres computed with a composition solar and found that the ratio displayed a very similar dependence on mass. The presence in the H-band of spectral features sensitive to the masses of red giant stars opens up the potential of constraining more tightly the physical properties of the stars making up the galactic bulge and globular clusters.
Analysis of Scattering from Archival Pulsar Data using a CLEAN-based MethodTsai, Jr-Wei; Simonetti, John H.; Kavic, Michael
doi: 10.1088/1538-3873/129/972/024301pmid: N/A
In this work, we adopted a CLEAN-based method to determine the scatter time, τ, from archived pulsar profiles under both the thin screen and uniform medium scattering models and to calculate the scatter time frequency scale index α, where . The value of α is −4.4, if a Kolmogorov spectrum of the interstellar medium turbulence is assumed. We deconvolved 1342 profiles from 347 pulsars over a broad range of frequencies and dispersion measures. In our survey, in the majority of cases the scattering effect was not significant compared to pulse profile widths. For a subset of 21 pulsars scattering at the lowest frequencies was large enough to be measured. Because reliable scatter time measurements were determined only for the lowest frequency, we were limited to using upper limits on scatter times at higher frequencies for the purpose of our scatter time frequency slope estimation. We scaled the deconvolved scatter time to 1 GHz assuming and considered our results in the context of other observations which yielded a broad relation between scatter time and dispersion measure.