Phishing. If it started with an “F” you’d love it


Phishing. If it started with an ‘F’, you’d love it. Phishing, as they say, happens. Teaching people how to recognize phishing and avoid it is an entire industry.  Anyway, the same goes for understanding how to pull information from the phishing you receive so you can build intelligence. Instead of telling you about it, though, I wanted to walk through a recent example of some of those techniques in action.

In this case, a phishing email came into the targeted enterprise early on Tuesday morning. It is a normal case for phishing to hit early, after all it is a common habit to grind email first thing in the morning before starting the day. Monday mornings are definitely more common than Tuesdays, however, Monday was the day following Easter so it was not unexpected that many people had taken the day off. A goal in phishing is identical to the goal of candy marketing – you want to shorten the amount of time between the email’s arrival and when it is read. The longer it sits the lower its impact and the growing probability that it will be ignored or overlooked. In fact, a good technique in baselining traffic is to find that line where the highest and lowest points exist in email, especially outgoing email. Then, you can map phishing responses to that information and gain a sense of activity and when you might want heightened alerting or detection.

Anyway, the alarm was rung for this particular phish, due to a couple of characteristics. It was sent as if it came natively from the same enterprise and it was very well-aligned in topic and theme to the audience phished. That group of recipients also contained some sensitive targets, which more than anything bubbled this particular phish up to the attention of the threat and SOC teams. The phish contained link and while the SOC chased down the target page, threat intel ran a series of email intelligence procedures against the phish. I’ll save the effort to track down the target webpage and how it collected credentials for another post. Let us focus on the email intelligence steps, at the moment.  As a set of objectives, we need to determine if the phish:


What order is the most “correct” to take when examining a phish probably worth debate but I’ve always been a fan of just beginning. Barring one step influencing another step to the point where it becomes inhibited if not performed first, order is relatively flexible. We worked in what seemed a sensible order.

First, was a check to see if we had seen the phish before. Finding a negative, next was an examination of the complete email. 1. Its sender was representing as coming from the enterprise and 2. emails were individually sent with no email lists or entries in the cc field.
3. The subject line was clear and in properly formed English. 4. It opened with a generic, but common greeting and 4. the text following contained an equally properly formed call to action in text with 5. a link below it to take-action. 6. Following the call to action, was a closing sentence and signature block. The signature block was formatted to the enterprise guidelines. The person supposedly sending the email was fictitious. Overall, the email contained a theme and each component reinforced that theme, from the greeting to the closing.

I’m a fan of measuring vertical space from the subject line to the end of an email. It is a simple, but handy technique to help identify templates. Templates tend to have size constraints on inputs into user-defined fields. Measuring vertices can provide hints towards how much variance is allowed and when customization creeps into the template. That can lead to interesting intelligence work around identifying phishing as a service (PaaS) or use of a phishing kit that provides standardization.

After that high-level examination, the email headers were dumped and the email deconstructed. The trail of received headers were interesting to read but ultimately provided little that we could take-action on. The message-id was a bit more handy, as was the sender email and a few other odds and ends. For every data point we took the following process:

Ultimately, the information drawn from the email headers were a dead end. We had not seen it previous nor were we seeing individual or compounds of multiple elements. That was good and bad. Undaunted, we moved into investigating the components.

The link provided in the email was to a website that went down a few hours after the first wave of phishing was sent. The domain used was newly created the night before via a South African registrar and hosted by a different South Asia provider. WHOIS information was blocked. The domain naming was loosely linked to the theme though the TLD was non-standard. The IP space was low confidence linked to a bulletproof hosting service. The SOC had a snapshot of the webpage drawn from network traffic. It was copy-cat of a page we hosted that was prompting users to provide credentials. The credentials were harvested, packaged and pushed to another server (which was inaccessible). That turned into a dead end, but did speak to objective #3. Within the page, while we were not lucky enough for them to reuse the Google analytics code embedded in the page they did utilize the other scripts. Some of those were used in only a few specific places, providing a relatively small pool of calls to search within to define a time-frame. The hope was to get insight into possible testing on their part and more DNS to sleuth. Sadly, we were thwarted there, but did discover a few other people who had not yet reported the phish, but had clicked the link.

The greeting was too generic to be useful, but the call to action and email theme was more specific. We had around 60 samples of the email at this point from people who had received the phish. In the emails, the call to action and text was 86% identical between emails (python automation is wonderful). That helped define likely boilerplate text and phrasing. In a few cases, the name and title of the person in focus for the phish was not a good fit — something a person typing the email manually would likely have changed. This was also the same in a few other areas of the call to action and closing sentence. This helped define the edges of where user-defined input fields likely existed in the template boilerplate. It also started firming up the idea that it was a service or kit generated phishing (objective #4).

Like on all elements, raw and compound searches were performed for the theme and call to action. That lead to some of the first good information that could speak to targeting (objective #1). We polled (python again) a-number-of open source and private sources of phishing data, but had gotten negative results. A raw search fared a little better, but a compound search on fragments led to a completely different result. Another entity in our economic niche had circulated a notice via their website to people they service alerting them to not respond to phishing that was a near identical match to our phish! That provided another stream of information and continuing pivots uncovered another dozen identical and near identical phishes and a good-sized list of likely ones starting a little more than a month prior. Where entities had posted snapshots or excerpts of the phish, examination shored up the thought that a service or kit was in the action. The few with biggest deviation appeared to harken back to where a change was made in the underlying text. That helped handle the biggest question management had on the phish and moved the onus of thought that was singularly focused on us to something wider against our market niche.

We also re-examined the DNS for the domain in our phishing email and those in the uncovered ones. It showed a definitive pattern and hinted toward more, that we added to the suspicious/likely phished list. Finding archive snapshots (python!) of those subdomains helped pin that verification down.

Another point of interest was the signature. While the name and email was fictitious the number was not. It matched an office in our enterprise and only appeared in three places on the website. One of those was a link off the copied webpage used in the phish. That page contained several points of contact and a form that allowed individuals to create custom lists of publications we provided. A few items stood out immediately. First was the fictitious name was a montage of the names in the points of contact. Second was that by pulling down the various publications we found emails matching the targets of the phishing. The list was a bit larger than just that pool of victims, but it was not discernible why some, but not all, were targeted. It did help address objective #2 as the sensitive emails that rang the alarm of the phish were attributable to this location. The copied webpage also contained a note about verifying accounts, which aligned to the theme of the phish. Both these facts lowered the confidence in the idea of an insider, or inside information, having leaked.

In the wrap up, we had gleaned enough intelligence on the email to speak to 3 of the 4 objectives clearly and have good hints to the fourth. It was a good exercise of email intelligence procedures and helped us get a win.

Image by Gino Crescoli from Pixabay

About the author

Monty St John

Monty is a security professional with more than two decades of experience in threat intelligence, digital forensics, malware analytics, quality services, software engineering, development, IT/informatics, project management and training. He is an ISO 17025 laboratory auditor and assessor, reviewing and auditing 40+ laboratories. Monty is also a game designer and publisher who has authored more than 24 products and 35 editorial works.