A new project enables data to be read directly from compressed IoT data
The Network Computing, Communications and Storage research group at Aarhus University has developed a completely new way to compress data. The new technique provides possibility to analyze data directly on compressed files, and it may have a major impact on the so-called “data tsunami” from massive amounts of IoT devices.
The method will now be further developed, and it will form the framework for an end-to-end solution to help scale-down the exponentially increasing volumes of data from IoT devices.
“Today, if you need just 1 Byte of data from a 100 MB compressed file, you usually have to decompress a significant part of the whole file to access to the data. Our technology enables random access to the compressed data. It means that you can access 1 Byte data at the cost of only decompressing less than 100 Bytes, which is several orders of magnitude lower compared to the state-of-the-art technologies. This could have a huge impact on data accessibility, data processing speed and the cloud storage infrastructure,” says Associate Professor Qi Zhang from Aarhus University.
Compressed IoT data
The compression technique makes it feasible to compress IoT data (typically data in time series) in real time before the data is sent to the cloud. After this, the typical data analytics could be carried out directly on the compressed data. There is no need to decompress all the data or large amounts of it in order to carry out an analysis.
This could potentially alleviate the ever-increasing pressure on the communication and data storage infrastructure. The research group believes that the project’s results will serve as a foundation for the development of sustainable IoT solutions, and that it could have a profound impact on digitalization:
“Today, IoT data is constantly being streamed to the cloud, and as consequence of the massive amounts of IoT devices deployed globally an exponential data growth is expected. Conventionally, to allow fast frequent data retrieval and analysis, it is preferable to store the data in an uncompressed form.
“The drawback here is the use of more storage space. If you keep the data in compressed form; however, it takes time to decompress the data first before you can access and analyze it. Our project outcome has the potential not only to reduce data storage space but also to accelerate data analysis,” says Qi Zhang.