July 27, 2018
Gibran Gómez
The use of TLS is rapidly spreading among malware families, since it makes possible for them to evade most used content-based detection techniques. Nevertheless, because of its nature, such traffic presents some characteristics that are not hid by the protocol, namely the number of packets transmitted, their size, their inter-arrival times, their direction, as well as any feature derived from these attributes or a combination of them, like the biggest packet sent or received, the number of packets sent before a response in the other direction is received, the total size of the conversation, among others. If such attributes are similar enough, we must be able to find patterns that help us characterize malware behavior when performing the same actions towards an endpoint.
Therefore, the present work aims to analyze the aspects of TLS traffic produced by different malware families using machine learning (ML) clustering and classification techniques, in order to demonstrate that it is possible to generate models that work as behavioral signatures, able to typify TLS traffic generated by malware clients, which allows to determine if malicious connections are taking place inside a network.