Hiding messages in VoIP packets
A group of researchers from the Institute of Telecommunications of the Warsaw University of Technology have devised a relatively simple way of hiding information within VoIP packets exchanged during a phone conversation.
They called the method TranSteg, and they have proved its effectiveness by creating a proof-of-concept implementation that allowed them to send 2.2MB (in each direction) during a 9-minute call.
IP telephony allows users to make phone calls through data networks that use an IP protocol. The actual conversation consists of two audio streams, and the Real-Time Transport Protocol (RTP) is used to transport the voice data required for the communication to succeed.
But, RTP can transport different kinds of data, and the TranSteg method takes advantage of this fact.
“Typically, in steganographic communication it is advised for covert data to be compressed in order to limit its size. In TranSteg it is the overt data that is compressed to make space for the steganogram,” explain the researchers. “The main innovation of TranSteg is to, for a chosen voice stream, find a codec that will result in a similar voice quality but smaller voice payload size than the originally selected.”
In fact, this same approach can – in theory – be successfully used with video streaming and other services where is possible to compress the overt data without making its quality suffer much.
To effect the undetected sending of the data through VoIP communication, both the machine that sends it and the one that receives it must be previously configured to know that data packets marked as carrying payload encoded with one codec are actually carrying data encoded with another one that compresses the voice data more efficiently and leaves space for the steganographic message (click on the screenshot to enlarge it):
The method is efficient in sending and receiving the data, but in order to be considered good enough to use, it must be undetectable by outside observers.
According to the paper, the first thing can be accomplished whether VoIP phones or intermediate network nodes are used by one or both participant in the conversation, but the second one only if two VoIP phones are the sending and receiving nodes, since there is no change of format of voice payloads during the traversing of the network.