Message Type Identification of Binary Network Protocols using Continuous Segment Similarity
Protocol reverse engineering based on traffic traces infers the behavior of unknown network protocols by analyzing observable network messages. To perform correct deduction of message semantics or behavior analysis, accurate message type identification is an essential first step. However, identifying message types is particularly difficult for binary protocols, whose structural features are hidden in their densely packed data representation. We leverage the intrinsic structural features of binary protocols and propose an accurate method for discriminating message types. Our approach uses a similarity measure with continuous value range by comparing feature vectors where vector elements correspond to the fields in a message, rather than discrete byte values. This enables a better recognition of structural patterns, which remain hidden when only exact value matches are considered. We combine Hirschberg alignment with DBSCAN as cluster algorithm to yield a novel inference mechanism. By applying novel autoconfiguration schemes, we do not require manually configured parameters for the analysis of an unknown protocol, as required by earlier approaches. Results of our evaluations show that our approach has considerable advantages in message type identification result quality and also execution performance over previous approaches.
READ FULL TEXT