In this thesis, we first address the need for measuring large-scale Internet traffic to gain useful insights into the security and traffic trends in large Internet Service Providers (ISPs) and Internet eXchange Points (IXPs) by designing a system called Flowyager for querying network-wide flow data in a near real-time manner. Next, we propose FlowDNS to augment flow data with domain names to infer the actual service/domain to which the traffic belongs. This system lays the foundation for monito-ring the services that are being used and gives network operators the chance to predict their bandwidth demands. To gain a more comprehensive picture, we need to combine the results from the above-mentioned systems with active measurement techniques. This gives us the chance to dis-cover the existence and origin of hidden characteristics of the Internet traffic. For in-stance, in a large European ISP, we detect a large amount of Internet traffic using port number 0 when querying Flowyager. Complementing passive measurement results with active measurement techniques, we find that this traffic is mostly caused by fragmentati-on, scanning, and misconfigured devices. Finally, given the widespread usage of Virtual Private Networks (VPNs) during the COVID-19 pandemic for remote work, we strive to characterize VPN traffic in the Internet. We use active measurement techniques to detect VPN servers and analyze their security aspects. Then, with the help of FlowDNS, we detect VPN traffic on the Internet to provide insights about the VPN traffic patterns in the Internet.
This dissertation helps researchers and network operators to gain insights about some hidden characteristics of Internet traffic, and also provides the means to look for specific traffic patterns in the network flow data and investigate its characteristics.