07.02.2021 Views

SkyShot - Volume 1, Issue 1: Autumn 2020

The inaugural issue of SkyShot, an online publication for promoting understanding and appreciation for outer space. As an international community, we share the work of undergraduate and high school students through a multidisciplinary, multimedia approach. Features research papers, astrophotography, informative articles, guides, and poetry in astronomy, astrophysics, and aerospace.

The inaugural issue of SkyShot, an online publication for promoting understanding and appreciation for outer space. As an international community, we share the work of undergraduate and high school students through a multidisciplinary, multimedia approach. Features research papers, astrophotography, informative articles, guides, and poetry in astronomy, astrophysics, and aerospace.

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

SkyShot Autumn 2020

a unique purpose, such as convolution

layers for generating feature maps from

the image, pooling layers for extracting

key features such as edges, dense layers

for combining features, and dropout layers

that prevent overfitting to the training

set. [10]

This method was applied to galaxy

classification by researchers at the National

Astronomical Observatory of Japan

(NAOJ). The Subaru Telescope, an

8.2-meter optical-infrared telescope at

Maunakea, Hawaii, serves as a robust

source of data and images of galaxies

due to its wide coverage, high resolution,

and high sensitivity. [11] In fact, earlier

this year, astronomers used Subaru Telescope

data to train an algorithm to learn

theoretical galaxy colors and search for

specific spectroscopic signatures, or

light frequency combinations. The algorithm

was used to identify galaxies in the

early stage of formation from data containing

over 40 million objects. Through

this study, a relatively young galaxy HSC

J1631+4426, breaking the previous record

for lowest oxygen abundance, was discovered.

[12]

In addition, NAOJ researchers have

been able to detect nearly 560,000 galaxies

in the images and have had access

to big data from the Subaru/Hyper

Suprime-Cam (HSC) Survey, which

contains deeper band images and has

a higher spatial resolution than images

from the Sloan Digital Sky Survey. Using

a convolutional neural network (CNN)

with 14 layers, they could classify galaxies

as either non-spirals, Z-spirals, or

S-spirals. [10]

This application presents several important

takeaways for computational

astrophysics. The first is the augmentation

of data in the training set. Since

the number of non-spiral galaxies was

significantly greater than the number of

spiral galaxies, the researchers needed

more training set images for Z-spiral and

S-spiral galaxies. In order to achieve this

result without actively acquiring new

images from scratch, they flipped, rotated,

and rescaled the existing images with

Z-spiral and S-spiral galaxies, generating

a training set with roughly similar numbers

for all types of galaxies.

Second, it is also important to note

that the accuracy levels of AI models may

reduce when working with celestial bodies

or phenomena that are rare, due to a

reduction in the size of the training set.

The galaxy classification CNN originally

achieved an accuracy of 97.5%, identifying

spirals in over 76,000 galaxies in

a testing dataset. However, this value

decreased to only 90% when the model

was trained on a set with fewer than 100

images per galaxy type, demonstrating

the potential for concerns if more rare

galaxy types were to be used.

A final important takeaway is regarding

the impact of misclassification and

differences between the training dataset

and the testing dataset. When applying

the model to the testing set of galaxy images

to classify, the model found roughly

equal numbers of S-spirals and Z-spirals.

This contrasted with the training set, in

which S-spiral galaxies were more common.

Although this may appear concerning,

as one would expect the distribution

of galaxy types to remain consistent, the

training set may have not been representative,

likely due to human selection

and visual inspection bias. In addition,

the authors point out that the criterion

of what constitutes a clear spiral is ambiguous,

and that the training set images

were classified by human eye. As a result,

while the training set only included images

that had unambiguous spirals; the

validation set may have included more

ambiguous cases, causing the model to

incorrectly classify them.

Several strategies can be used to combat

such issues in scientific machine

learning research. In terms of datasets,

possible options include creating a new,

larger training sample or employing numerical

simulations to create mock images.

On the other hand, a completely

different machine learning approach -

unsupervised learning - could be used.

Unsupervised learning would not require

humans to visually classify the

training dataset, as the learning model

would identify patterns and create classes

on its own. [10]

In fact, researchers at the Computational

Astrophysics Research Group at

the University of Santa Cruz have taken

a very similar approach to the task of

galaxy classification, focusing on galaxy

morphologies, such as amorphous elliptical

or spheroidal. Their deep learning

framework, named Morpheus, takes in

image data by astronomers and uniquely

does pixel level classification for various

features of the image, allowing it to

discern unique objects within the same

image rather than merely classifying the

image as a whole (like the models used

by the NAOJ researchers). A notable benefit

of this approach is that Morpheus

can discover galaxies by itself and would

not require as much visual inspection or

human involvement, which can be fairly

high for traditional deep learning approaches

- the NAOJ researchers worked

with a dataset that required nearly

100,000 volunteers. [13] This is crucial,

given that Morpehus could be used to

analyze very large surveys, such as the

Legacy Survey of Space and Time, which

would capture over 800 panoramic images

per night. [13]

Examples of a Hubble Space Telescope

Image and its classification results

using Morpheus [13].

41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!