We just released NGAS version 11.0. The most prominent change on this version is initial python 3.5+ support, with 2.7 still being supported.
On top of that this version contains quite a few improvements over 10.0. A complete changelog can be found in the official documentation.
We just released NGAS version 10.0. This version contains quite a few improvements over 9.1 and 9.0:
- The
ARCHIVE,QARCHIVE,REARCHIVEandBBCPARCcommands are now using the same underlying code. All the small differences between the commands has been kept, so they should behave exactly as before. This was a required step we needed to take before implementing other improvements/bugfixes. - The archiving commands listed above are now more efficient in how they calculate the checksum of the incoming data. If the processing-plug-in promises not to change the data, then the checksum is calculated on the incoming stream instead of calculating it on the file, reducing disk access and response times.
- Partial content retrieval for the
RETRIEVEcommand has been implemented. This feature was present in the ALMA branch of the NGAS code, and now has been incorporated into ours. - We merged the latest ALMA mirroring code into our code base. This and the point above should ensure that NGAS 10.0 is ALMA-compatible.
- Unified all the CRC checksuming code, and how different variants are chosen.
- We have improved response times for scenarios when many parallel RETRIEVE commands are issued. Worst-case scenario times in 100 parallel request scenarios were brought down from tens of seconds to about 2 seconds (i.e., an order of magnitude).
- Moved the data-check background thread checksum to a separate pool of processes (to avoid hanging up the main process). The checksuming also pauses/resumes depending on whether the server is serving any requests or not to avoid exhausting access to the disk.
- Added the ability to write plug-ins that will react to each file archiving (e.g., to trigger some processing, etc).
- Added support for (our) latest
bbcprelease, which includes CRC32c checksum support. - Fixed a few small problems with different installation scenarios
We just released NGAS version 9.1. This version consists a few improvements over 9.0:
- NGAS is now hosted in a public GitHub repository (see previous post)
- CI has been set up to ensure that tests runs correctly against SQLite3, MySQL and PostgreSQL
- Janitor Thread changes:
- Plug-ins: Instead of having a fixed, single module with all the business logic of the Janitor Thread, its individual components have been broken down into separate modules which are loaded and run using a standard interface. This makes the whole Janitor Thread logic more simple. It also allows us to implement users-written plug-ins that can be run as part of the janitor thread.
- The execution of the Janitor Thread doesn't actually happen in a thread anymore, but in a separate process. This takes some burden out from the main NGAS process. In most places we keep calling it a thread though; this will continue changing continuously as we find these occurrences.
- The NGAS server script, the daemon script and the SystemV init script have been made more flexible, removing the need of having more than one version for each of them.
- Some cleanup has been done on the NGAS client-side HTTP code to remove duplicates and offer a better interface both internally and externally.
- Self-archiving of logfiles is now optional
- A few incorrect database results handling code has been fixed, making the code behave better across different databases
- Misc bug fixes and code cleanups
After some time preparing for this, we have finally published our code in a public GitHub repository:
Together with this new repository, the following other sites are available:
- Documentation: https://ngas.readthedocs.io
- Travis CI builds: https://travis-ci.org/ICRAR/ngas
Both the documentation and the Travis builds are automatically triggered after every push to our master branch, so they can be considered up to date.
Additionally, the Travis build executes our automatic tests against three different database engines, ensuring they remain stable across every change (and alerting us if that is not the case).
We just released NGAS version 9.0. This version consists mainly on big code cleanups, which makes NGAS easier to maintain:
- Switched from our own home-brewed logging package to the standard logging module
- Unified time conversion routines, eliminating heaps of old code
- Removed the entire
pccset of modules. - General bug fixes and improvements
We just released NGAS version 8.0. The most important changes included in this version are:
- Re-structured NGAS python packages: importing NGAS python packages is now simpler and doesn't alter the python path in any way. The different packages can be installed either as zipped eggs, exploded eggs, or in development mode. This makes NGAS python packages easier to install in any platform/environment where
setuptoolsorpipis available. - Initial support for logical containers. Logical containers are groups of files, similar to how directories group files in a filesystem.
- Streamlined CRC32c support throughout QARCHIVE and subscription flows.
- Stabilization of unit test suite: Now the unit test suite shipped with NGAS runs reliably on most computers. This made it possible to have a continuous integration environment (based on a private Jenkins installation) to monitor the health of the software after each change on the code.
- Improved SQL interaction, making sure we use prepared statements all over the place, and standard PEP-249 python modules for database connectivity.
- Improved server- and client-side connection handling.
- General bug fixes and improvements
We have just released a new major version of NGAS (MWA-7.0.2). This version features a number of bug fixes and improvements as well as a major new feature called logical containers. Logical containers allow the logical grouping of files archived on NGAS. It also allows recursive logical containers. Think about it as a virtual directory structure. In fact it is possible to archive a complete directory with all its sub-directories on NGAS and retrieve it again. Please note that this version has not yet been used in any operational environment and, although we assume it being stable, it might not be as stable as the older versions. We also believe that this version is fully backwards compatible with older versions, i.e. you should be able to run a mixed environment, with old and new versions together. Obviously it would not be possible to use the container related commands with the older versions.
Currently NGAS is using either the zlib.crc32 or the binascii.crc32 implementations. More recently we have investigated the speed of the CRC32 calculations in great detail and found that indeed these software implementations are pretty slow and can't even keep up with the fast file systems we are using. Consequently we've found the Intel SSE4.2 CRC32C implementation and tested the speed as well. This indeed is far superior (as you would expect). However, since it is using the Castagnoli polynomial (hence the 'c' at the end) as opposed to the IEEE 802.3 polynomial, the resulting value is different from the zlib and binascii implementations. The CRC32C seems to have significant advantages when it comes to the probability to miss corruption in bigger data buffers and thus it would seem natural to switch to that one inside NGAS. Unfortunately this is not really straight forward for existing archives for three reasons:
- The existing values in the DB had been calculated using the IEEE 802.3 polynomial and thus the DataCheck thread would report corruptions for every single file.
- The CRC plugin value can only be specified for the DataCheck thread and is not used by the various archiving plugins. In fact the usage of the binascii.crc32 is hardcoded in some of these plugins, which in turn would lead to inconsistent checksums.
- If we would use the SSE4.2 hardware acceleration only, NGAS would suddenly be seriously platform dependent. In fact it would only work on more recent Intel processors.
A possible solution has to address all three issues and then come up with an update path for the existing CRC values in the DB. The third one seems to be the easiest one to tackle, since there are implementations available, which fall back to a software CRC32C, if the hardware does not support it. Obviously this comes with a similar performance hit as the original CRC32 implementations. The implementations I've found are still limited to little-endian platforms, but that should be less problematic and also fairly easy to fix, if deemed necessary.
Let's assume that we have this code in place and callable by a CRC plugin, then we still need to fix issue 2. In fact this issue should be fixed in any case, since it has a fairly high potential to lead to inconsistencies in the CRCs. A fix would require a more generic setting of the CRC plugin in the config, which is also obeyed by the archiving plugins.
The migration path will potentially be quite time consuming, since it will require to calculate and check the old CRC32 and then, if correct, calculate the new one and replace the value in the DB for every single file in the archive. This could be optimised by calculating both checksums on every single block in one go. The only correct place to perform this update operation is inside the DataCheck thread itself, but that requires a (most likely temporary) modification of that code. Note that the operation could be performed one server at a time. During the update it would probably be good to use a bookkeeping table, which holds the old and the new CRC and the execution time for both as well as a status code.
Together with Dongwei Fan from the NAOC/ChinaVO we have just installed an instance of NGAS on the ChinaVO Cloud infrastructure. This will initially be used for training purposes, but also to transfer some data between ICRAR and NAOC.
We have just released a new version of NGAS, which includes many enhancements and bug-fixes. The Fabric based installation procedure has been completely overhauled and now fully supports installations on Mac OSX as well. Like always this release is also available from the NGAS server.
Just worked on the NGAS notification and e-mail configuration. Since it is a pretty bad idea to put personal e-mail addresses into the NGAS config files, we've came up with a slightly better solution and created Google e-mail groups specific for the various installations. Creating groups is an interactive process, since it requires to put a Captcha into a field, both for the group generation and also for inviting group members. We've also configured a default Subject prefix, which will be added to the existing subject field of an e-mail before relaying it to the subscribers. Since the notifications will be send by the NGAS servers on the various machines, we also need to register the sender as a group member. We've used the generic e-mail address ngas@ngas.ddns.net for that. Since this is a non-existing e-mail address, this user had been added using the 'Direct add members' option in the Google groups management interface. This has been tested on the AWS NGAS instance and is working fine.
Since we are using AWS EC2 for release testing, we have now assigned a dynamic association with a static IP and host-name using a dynamic DNS service. The DDNS name is ngas.ddns.net. This host will always also run the latest released version of NGAS as well as the NGAS portal. That NGAS server also contains the tar-file of the release itself.
Just a few remarks about the nice plots we are now using in operations for monitoring the NGAS file ingestion across the whole globe. Here is an example:
These are fully interactive plots, allowing zooming and panning and are showing the actual values when hovering with the mouse pointer. They are created straight out of the NGAS DB using the excellent dygraphs JavaScript library, which in turn is based on d3. dygraphs is very flexible, but still fairly easy to use. I'm planning to write a short tutorial on how to embed a graph like the one above into a web page.
With the upgrade of the MIT network and some minor modifications of the NGAS multi-stream data transfer we are now able to achieve quite stunning data rates half-way around the globe from Perth to the MIT. This proves that our initial assumption about the bottleneck being the internal network of the MIT cluster was indeed correct. However, we were still surprised about the actual rate we are now able to achieve, since we expected just around 5 Gbps maximum, instead we now seem to be limited by the actual production data rate.
The Murchison Wide Field Array will start observing again next week, after the Australian summer break and some downtime to fix a few damages due to a lightning strike end of last year. The MWA currently is the instrument producing the biggest data stream captured by NGAS systems by a fair margin: As soon as observations are started, the MWA data (384 MB/s) is captured by two fairly standard servers equipped with two disk arrays each. These two servers are running two instances of NGAS servers each and the data is then send down to Perth using our enhanced MWA version of the subscription mechanism. If everything is running smoothly the disks arrays are staying fairly empty, since the data is transferred almost in real time. The whole setup is depicted here and in a paper (ArcXiv version). All together the MWA archive is currently controlling about 2 PB of data (multiple copies), including the data (~ 600 TB) we've transferred to the NGAS cluster at the MIT in Boston. All this happened between July and December 2013. In Perth NGAS has been integrated with the Hierarchical Storage Manager (HSM) of the Pawsey Center. This means that most of the data is on tapes. The MIT archive is disk only.

