This post is going to be a little bit different than most of my others. It is going to cover the very basic process of analyzing strange network traffic and figuring out what is causing it. I’ll give some background on what I noticed and what our setup was. At the end of this post, I’ll also explain the exact cause of the problem and how to resolve it, in the hopes that I can save someone else some investigation time. This post is not intended to say that this is the only, or even the best, way of finding this information. It is simply the approach I took given our resources and system setup. Having said that, here is the background:
This happened at a small Mac based IT consulting company. They provide IT support for small to medium sized businesses, and do a large amount of work remotely from their offices. This work is done through either point-to-point VPN tunnels using ARD, or over the internet using specialized software which I will not name at this time. They recently replaced an aging Cisco router/firewall with a brand new Meraki, with the idea of simplifying the setup of those point-to-point VPNs. I was in the process of adding various firewall rules to the Meraki when I noticed some very odd traffic patterns (shown below) to a few of the internal systems.
These patterns seemed to indicate that massive amounts of data were being sent to these systems almost every night between midnight and 1AM. The amount of data was not consistent, but on average it seemed that each machine was getting between 3 to 6 GB of data every night. A quick examination of the systems did not turn up any out of the ordinary, and a malware scan came up empty. This was concerning (and odd) for a number of reasons:
- If the network had been breached, why was so much data coming in, rather than going out?
- If this was an attempt at a DOS attack, why only do it between midnight and 1AM when it would not matter if the systems went down?
- If this was the result of Malware activity, where did it come from and why was it not detected?
- What was this mystery data, and why could we not find it on the affected machines?
- Why was it always the same amount of data for each individual machine, yet each machine receiving different amounts of data from the others?
My first instinct was to get as much information from the firewall logs as I could. Unfortunately, as you can see in the screenshot above, Meraki does not exactly give much information. I was able to see the times and dates of the traffic spikes, I was able to tell that the majority of it was TCP traffic, and I was able to tell that the majority of it was related to a specific application. That was it. At this point, it seemed likely that there was either some sort of run-away process on the machines, or they had been infected with some new sort of malware that was targeting Macs.
I did a fair amount of research but could not find any other reports of malware behaving in this manner. I examined the suspect systems much more closely, looking for any sign of runaway processes, large amounts of data in unusual places, unexpected network connections being made, etc. I came up totally blank on everything. As far as I could tell, the systems were behaving perfectly, didn’t have any large amounts of data where data shouldn’t be, and they didn’t seem to be connecting to any C&C server at any point during the work day.
At this point, we had a decision to make – do we assume the machines have been compromised, disconnect them from the network, wipe them clean, and live with not knowing exactly what was going on; or do we leave them on the network risking further breach if they were infected, sniff the traffic, and find out exactly what was going on? My preference was to figure out exactly what was going on, but I had to clear that with the CEO – which he eventually did. In the end, we figured that since this had been going on for at least a month without anyone noticing, one more night was unlikely to make much of a difference as far as getting the machines off the network.
The Culprit is Found!
So, the decision was made to leave the machines on overnight and do a full packet capture from each one. This was discovered on a Friday, so I would have all weekend to analyze the results (luckily it did not take that long) and report back. I setup the capture and let it run overnight, collecting the results the next morning in the form of a standard pcap file. I chose the smallest file (3.5GB) and opened it in Wireshark. I filtered the results down to TCP traffic only (since the Meraki was able to at least tell us it was TCP traffic), and left it sorted by timestamp. I then scrolled down to the approximate time that the behavior started and sure enough there it was.
The first thing I noticed, right off the bat, was that all of the traffic was originating on client networks and being passed to the company’s internal network through the point-to-point VPNs. That was odd, but also somewhat comforting. If the traffic was coming from them, chances are it was something that our systems were requesting for some reason, rather than an outside actor. This also led me to my first real hypothesis of what could be going on. It took only quick glances at the next dozen packets for everything to be confirmed. Every packet, every single one in those 3.5 GB of data, was coming in or going out over port 3283. If you are reading this blog, you probably know exactly what that means – Apple Remote Desktop.
So there I had it, ARD was requesting massive amounts of data from clients between midnight and 1AM. But why? What does ARD request from clients that goes to each workstation at exactly the same time, and can take up that much space? I came to that answer pretty quickly after glancing through the program preferences. System Reports – from every single client machine – to every single workstation that had interacted with them via ARD.
Mystery Solved. Now to Solve the Problem…
If you install ARD using the default installation options (basically just click next, next, next until it’s done), you will tell it to set itself as a reporting server for each client it accesses. This means that if you have 10 different default installations of ARD connecting to a client, that client will have 10 different reporting servers that it will send reports to. Every night. Now, if you have 500 clients being administered by those 10 ARD installations, that becomes 5,000 reports being sent out every single night.
A quick look through the ARD SysAdmin Manual shows the following: “avoid having all the clients upload their report information at the same time. As the number of clients grows, the network usage from the clients as they upload their report data could come in bursts over a short period of time overwhelming the network buffer on the Task Server. In such a case, you will probably give yourself your own denial-of-service attack.”
Well, that sounds familiar!
So, for those setting up their own networks from scratch, or who are running in to the same problem described here, the ideal way to setup ARD reporting is as follows. Have a single “Task Server” and make sure that all of your ARD installs point to that task server. Turn off report gathering for all ARD installs except for that Task Server. On the Task Server you will create lists of clients if you have a large number of computers checking in (perhaps by department). You can set the clients in each list to send their reports at different times the day, thereby eliminating the possibility of DOSing yourself.
If you are already in the situation we were in and just want to stop the reports for now, here is how to do it:
- First go to all of your ARD installs and disable report gather from Preferences > Reporting
- Now from one ARD window select all of the clients that you want to stop the reporting from and “Get Info”
- Lastly, go to the Reporting tab and disable all reports