Student: Issac Ward
Abstract
The Square Kilometre Array radio telescope will produce multidimensional images of unprecedented sizes, and finding faint galaxies in this data will be difficult. This project aims to develop a novel approach to finding such galaxies in the wavelet domian using convolutional neural networks. By operating on data formatted in the JPEG2000 standard, decompression parameters can be specified to test the effectiveness of convolutional neural networks at different resolutions and quality levels. Intel Movidius' Neural Compute Sticks will be investigated as means to parallise large convolutional neural network inferencing tasks.
Introduction
The 'gfinder' program (see references for link) has been developed as a proof of concept for using convolutional neural networks (CNNs) on JPEG2000 formatted data for sourcefinding. The artificial input data consists of 24, 1000-component '.jpx' files where each channel is a 2-dimensional image (RA by DEC) with an associated observation frequency. Consecutive components correspond to consecutive frequencies, and in this way the input data is 3-dimensional. The '.jpx' files have the 3- dimensional locations of each galaxy encoded into their metadata, and this information will be taken as ground truth. The goal of the project is to use the 'gfinder' program to train a CNN capable of iterating over portions of the input data set to locate galaxies in any given component (which can then be checked against ground truth). This sourcefinding process is likely to be slow, and will be sped up by reducing the dimensions and quality of the input as well as utilising Intel Movidius' 'Neural Compute Sticks' (NCS) to parallise the task.
Main tasks
Handling input
To decompress and decode the input data gfinder uses the Kakadu SDK; a closed-source library that implements the ISO/IEC 15444-1 standard fully in part 1. The tools provided by this library can be used to decompress each channel (where each component is 3600x3600 pixel grayscale image). The decompression parameters that will be investigated are the:
1. Discard levels: each discard level scales the length of both of the image's axes by 1/2, effectively decreasing the area by a factor of 4. The input data has been encoded with 7 levels, allowing images to be decompressed with no discard (3600x3600), one level of discard (1800x1800), two levels of discard (900x900) and so forth.
2. Quality layers: each decrease in quality level further sacrifices the image's finer detail whilst keeping its low frequency detail. The input data has been encoded with 15 quality layers which are built up sequentially; quality layer 4 also has all the detail of quality layers 1, 2 and 3. Note that the decompressed image's dimensions are not affected by changing the quality layers.
It is understood that the input data suffers from instrumental noise around sources with a high signal to noise and from stochastic background noise. The instrumental noise is considered high frequency detail that may be reducable by decompressing the ‘.jpx’ file with less quality layers, which provides the effect of smoothing over high frequency detail. This effect can be observed in Figure 1, showing a source with high instrumental noise decompressed with only 1 quality layer and with 15 quality layers (maximum permitted by the source file).
Training, validating and evaluating
Formally, the required CNNs produced by the ‘gfinder’ program are binary classifiers, that is, they are only be able to determine if an input belongs to one of two classes: galaxy or noise. In order to train this network, the locations of each galaxy in the input files are read from the file's metadata and 16x16 pixel images are decompressed with galaxies at the centre. The data is then augmented to aid in generalisation and reduce overfitting by translating the input images (rotation is not applied as the sources always appear with instrumental noise in a vertical pattern) which preserves each unit’s class label (Krizhevsky et al., 2017). A matching set of noise is then decompressed such that the set is balanced. 80% of this set is used for training, 10% for validation and 10% for evaluation.
During the training process, the data is fed to the CNN until loss begins to increase, at which point the CNN is reloaded to the point at which this occurs and is considered optimally trained. In order to test the performant qualities of CNNs produced with different architectures, a validation process is ran in which the CNN is exposed to images from the validation set and is tasked with classifying them. From this process the classification accuracy of the CNN can be determined.
Figure 2: Probability heatmap showing 7 likely source locations
Larger regions in the input data are also used to test the sourcefinding performance of trained CNNs. This process involves striding over the input component and feeding images of aforementioned size (in this case 16x16 pixels) into the CNN - producing a 2-dimensional probability heatmap which can be interpreted to find the locations of galaxies in each frequency channel. These locations are then validated against the input file's metadata to measure the CNN's accuracy.
The evaluation process works identically (since there is no unlabelled data in the input set) and produces images showing the probability as a heatmap and resultant localization of sources (shown in Figure 2 and Figure 3 respectively).
Figure 3: Localization and highlighting outputted by gfinder of likely sources in Figure 2
Figure 4: Inferencing duration on varying numbers of consecutive components using varying numbers of NCS devices
Increasing time performance
As shown in Figure 4, this process can take upwards of one minute on the test computer (Dell XPS9360 Intel Kaby Lake i5 processor) for a single component to be entirely inferenced. The Movidius NCS' are a 'plug 'n' play' styled option for computer vision inferencing tasks that leverage 'VPUs' in order to provide cheap inferencing in a low power supply environments (Movidius, 2018). By attaching three of them to a powered USB hub, the test computer can control their simultaneous operation by directly loading pieces of the input component in question onto the devices as they are decompressed from the input file. The devices then return a softmaxed two element array where each element represents the probability of that input region holding noise or a galaxy respectively. This result (which is identical to the output of the CPU's inferencing, except for being a half precision floating point value rather than a full precision floating point value) can then be aggregated to create a probability heatmap as mentioned.
Regarding time performance, Figure 4 demonstrates how adding more NCS’ doesn’t improve time performance once the data flow bottleneck is at the decompressing process; there is a time performance improvement when going from one to two NCS’ (as the inferencing process is the bottleneck) but going from two to three yields no time improvement. Note that this test was conducted when decompressing with the most demanding parameters computationally, hence the decompression bottleneck.
As of 11/02/2018 the ‘nscdk’ (see reference for link) which provides the backend for interfacing with the NCS' does not support 3-dimensional convolution layers which is pivotal for exploiting the relationships in the input data's frequency axis. An attempt to take advatange of these relationships via post processing has been attempted by increasing the probability for pixels in the output heatmap who have adjacent high probability pixels, as true sources will exist across multiple components.
Results
The following results include only inferencing time, not post processing time.
Figure 5: Duration of inferencing process on 5 components with varying discard levels applied.
Figure 6: True positive accuracy on 5 components with varying discard levels applied.
Figure 7: Duration of inferencing process on 5 components with varying quality layers used.
Figure 8: True positives and false positives on 5 components with varying discard levels applied.
4 graphs were trained with different discard levels and their time performance and accuracy were tested as shown in Figure 5 and Figure 6 respectively. Note that discard level 3 would result in an input image size of 4x4 pixels whereas the CNNs architecture involves a 5x5 convolutional layer applied to the input, so 3x3 convolutional layers were used for this graph instead (The point being to show the detriment in performance when the input size becomes comparable to the size of the features in the input data).
The effect of discard levels on inferencing on both the CPU and three NCS' was investigated on small inputs (a single component) and larger inputs (five consecutive components).
The effect of quality layers on inferencing on both the CPU and three NCS' was investigated on small inputs (a single component) and larger inputs (five consecutive components) also.
Discussion
The effect observed of discard levels on inferencing performance demonstrates the noise reduction quality obtained in applying consecutive discard levels to the input file during decompression of a given component. As further discard levels are applied, fine detail noise is smoothed as the image's dimensions are reduced. It can be seen that this effect is desirable, up until a point where it begins to smooth the fainter galaxies into the background noise making them undetectable to the CNN. Although this parameter can be leveraged to increase completeness and decrease time complexity, it is undesirable as it requires the retraining of the network for different discard levels as the input dimensions change. Note that to use such small resolution inputs, the filters of the CNN had to be scaled to be comparable to that of the source sizes; 8x8 and 4x4 pixels in the case of discard levels 2 and 3 respectively. The NCS' cannot load tensors this small into memory for inferencing, and as such, it was impossible to run tests on the NCS' for discard levels beyond 1.
Reducing the quality layers has a similar effect, except the image's dimensions remain the same. Again the high level detail is smoothed into the background noise as quality layers are reduced, up until a point where the fainter sources are smoothed into the background. This can be observed in Figure 8; applying smoothing up until quality layer 5 decreases false positives and increases the number of true positives located. As expected, the time taken is proportional to the quality layers used, as the complexity of the decompression process is directly proportional to the number of quality layers decompressed.
Unsurprisingly, the effect of NCS' on time performance is dependant on the program’s data flow bottleneck – if it is bottlenecked by the inferencing process and not the decompressing process, more NCS’ will allow for better time performance. The ‘gfinder’ program warns the user of the main bottleneck during the evaluation process (the bottleneck will be dependant on the decompression parameters specified). As previously discussed, when the inferencing process is the bottleneck -provided that the NCS’ throughput can be taken advantage of through the USB3.0 hub being used -more NCS’ will reduce the time taken to inference an input image. Interestingly, when the complexity fo the decompression is increased, the bottleneck can change from the inferencing process to the decompression process, at which point the program duration will be independant of the number of NCS' (the queue which handles the data to be inferenced will be empty when the NCS’ attempt to dequeue a new input).
These results have unique applications for source finding applications using deep learning. Seemingly, the program’s result is unaffected and in some cases can be improved when working at lower quality and resolution levels. This conclusion agrees with current knowledge regarding CNNs and image compression: “networks are resilient to JPEG and JPEG2000 compression distortions. It is only at very low quality levels … that the performance begins to decrease. This means that we can be reasonably confident that deep networks will perform well on compressed data” (Dodge & Karam, 2016).
The 'gfinder' program, installation guide and tutorial are all available at https://github.com/ICRAR/gfinder
Acknowledgments
This project would not have been possible without the guidance and insight provided by Dr. Slava Kitaeff. Thanks also goes to Jurek Tadek Malarecki for aiding in handling the input data. This project heavily relies on the ncsdk and the Kakadu SDK developed by the Intel Movidius team and Prof. David Taubman whom I thank. I am especially indebted to the International Centre for Radio Astronomy Research (ICRAR) for providing me with the summer studentship placement that allowed me to work on this project.
References
Dodge, S., Karam, L. (2016). Understanding how image quality affects deep neural networks. arXiv preprint arXiv:1604.04004.
Krizhevsky, A., Sutskever, I. and Hinton, G. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), pp.84-90.
Movidius. (2017). Getting started with the NCS. [ONLINE] Available at: https://developer.movidius.com/start. [Accessed 13 February 2018].
Movidius. (2017). ncsdk GitHub repository. [ONLINE] Available at: https://github.com/Movidius/ncsdk. [Accessed 12 February 2018].
Ward, I. (2017). gfinder GitHub repository. [ONLINE] Available at: https://github.com/ICRAR/gfinder. [Accessed 15 February 2018].








2 Comments
Slava Kitaeff
Also PDF
Tao An
Hi, Slava,
My students are also using CNN to separate galaxy/star, to search for pulsars etc.
Maybe they can be also interested in this project.
Tao