DETECTING AND THWARTING SPEAR PHISHING ATTACKS IN ELECTRONIC MESSAGES
A computer-implemented method may comprise receiving an electronic message from a purported known sender; accessing a database of known senders and determining whether the sender matches one of the known senders. The degree of similarity of the sender to at least one of the known senders may then be quantified. The received message may then be determined to be legitimate when the purported known sender is determined to match one of the known senders. The received electronic message may be flagged as being suspect when the purported known sender does not match one of the plurality of known senders and the quantified degree of similarity of the purported known sender to one of the known senders is greater than a threshold value. A perceptible cue may then be generated when the received message has been flagged as being suspect, to alert the recipient that the flagged message is likely illegitimate.
The present application is related in subject matter to commonly-owned and co-pending U.S. patent application Ser. No. 14/542,939 filed on Nov. 17, 2014 entitled “Methods and Systems for Phishing Detection”, which is incorporated herein by reference in its entirety.
BACKGROUNDA spear phishing email is an email that appears to be from a known person or entity. But it is not. The spear phisher often knows the recipient victim's name, address, job title and professional network. The spear phisher knows a lot about his intended victim, thanks to the quantity and rich variety of information available publicly through online sources, the media and social networks.
Spear phishing is a growing threat. Spear phishing is, however, very different from a phishing attack. The differences between a phishing attack and a spear phishing attack may include the following:
-
- The target of a spear phishing attack is usually a member of the corporate market, and especially people who have access to sensitive resources of the company. Typical targets are accountants, lawyers and top management executives. In contrast, phishing attacks tend to target all end users more indiscriminately.
- Most often, a spear phishing attack is initiated only after a thorough analysis of the target victim. This analysis is aided by the great amount of personal and professional information available on social networks (including, for example, Facebook, Twitter, LinkedIn and the like), company website and other media. Consequently, a spear phishing attack is often crafted to be unique to the targeted individual. Phishing attacks, on the other hand, tend to be somewhat more indiscriminate, typically targeting thousands of people.
- In the first phase of a spear phishing attack, the email purports to originate from a well-known (to the targeted victim) and trusted individual, such as a coworker. In contrast, phishing emails typically appear to originate from a trusted company (PayPal, Dropbox, Apple, Google, etc.).
- The second phase of spear phishing attack has a different modus operandi: a malicious attachment or a malicious Uniform Resource Locator (URL) that leads the victim to install malware that will perform malicious operations (e.g., theft of data). Alternatively, the spear phishing email may contain text in the body of the email that induces or dupes the victim to perform a predetermined action (e.g., send a wire transfer, disclose sensitive information or the like). Instead, phishing attacks typically rely on the inclusion of a malicious URL only.
According to one embodiment, to protect a user from a spear phishing attack, a protection layer may be applied for each phase of the spear phishing attack. That is, during the first phase of the spear phishing attach, one embodiment detects whether an impersonation of a known sender is likely. During the second phase of the spear phishing attack, a detection procedure may be carried out, to determine whether the suspicious email may contain a malicious attachment, a malicious URL or contains suspect text in the body of the email.
According to one embodiment, to detect whether an email constitutes a potential spear phishing attack, the “From” email address (the sender's email address) may be scrutinized to detect whether the sender is a legitimate, known and trusted entity or is potentially an impersonation of the same. According to one embodiment, if a user receives an email from an unknown recipient, a check may be carried out to determine if the sender's email address is a known contact of the email recipient. If the sender's email address looks like but is in any way different from a known contact of the recipient, the email recipient may be warned (through the generation of a visual and/or audio cue, for example) that the email is at least potentially illegitimate, as impersonating a known contact—the essence of a spear phishing attack.
One embodiment is configured to protect the user (e.g., an email recipient) by carrying out activities including:
-
- 1. Managing, for the protected user, a list of his or known email contacts called KNOWN_CONTACTS;
- 2. Managing, for the protected user, a list of blacklisted email contacts called BLACKLIST;
- 3. Checking each incoming email to determine whether the sender email address looks like the email address of a known email contact; and
- 4. Warning the end user if an incoming email is determined to be potentially illegitimate.
Managing List of Known Email Contacts
According to one embodiment, a list of his known email contacts called KNOWN CONTACTS may be created and maintained. All email addresses in this list may be stored in lowercase. According to one embodiment, the KNOWN_CONTACTS list may be initially seeded by the protected user's address book. According to one embodiment, the protected user's address book, for performance and accuracy reasons, may not be used if it exceeds a predetermined (say 1,000, for example) maximum number of entries. This predetermined maximum number of entries may be represented by an ADDRESS_BOOK_MAX_SIZE variable (whose default value may be set a 1,000). Very large address books may, for example, be associated with very large companies that share the whole company address book with all employees.
Another source of legitimate email address to populate the KNOWN_CONTACTS list are email addresses of emails received by the end user, with the exception of automated emails such as email alerts, newsletters, advertisements or any email that has been sent by an automated process. The email addresses of people to whom the end user has sent an email is also another source of legitimate email addresses. According to one embodiment, KNOWN_CONTACTS may be updated in one or more of the following cases:
-
- When the address book is updated;
- When the protected user receives an email from a non-suspect new contact, with the exception of automated emails such as email alerts, newsletters, advertisements or any email that has been sent by an automated process; and/or
- When the end user sends an email to a new contact.
Managing List of Blacklisted Contacts
According to one embodiment, a list of blacklisted email contacts called BLACKLIST may also be established and managed. All email addresses in this list are stored as lowercase. According to one embodiment, if an email is sent by a sender whose email address belongs to BLACKLIST, then that email will be dropped and will not be delivered to the protected user.
Detecting a Potentially Suspect or Illegitimate Email Address
When a protected user receives an email, a check may be carried out to determine whether the sender's email address is known. The KNOWN_CONTACTS list may be consulted for this purpose. If the email address is not known (e.g., is not present in the KNOWN_CONTACTS list), a determination may be carried out, according to one embodiment, to determine whether the email address looks like or is otherwise similar to a known address. An email address is made up of a local part, the @ symbol and a domain part:
-
- The local part is the left side of the email address, before the @ symbol. For example, john.smith is the local part of john.smith@gmail.com.
- The domain is the right side of the email address, after the @ symbol. For example, gmail.com is the domain of john.smith@gmail.com.
According to one embodiment, an email may be considered to be suspect or potentially illegitimate if both of the following conditions are met:
-
- The email address is not in KNOWN_CONTACTS, and
- The local part of the email address is equal or close to the local part of an email address of KNOWN_CONTACTS.
According to one embodiment, a detection process may be carried out to determine whether the local part of the received email address has been spoofed, to appear to resemble the local part of an email address in the KNOWN_CONTACTS list. According to one embodiment, such a detection process may utilize a string metric to compare the local part of an email address in the KNOWN_CONTACTS with the local part of the received email address. A string metric (also known as a string similarity metric or string distance function) is a metric that measures distance (“inverse similarity”) between two text strings for approximate string matching or comparison and in fuzzy string searching. A string metric may provide a number that is an indication of the distance or similarity between two (e.g., alpha or alphanumeric) strings.
One embodiment utilizes the Levenshtein Distance (also known as Edit Distance). The Levenshtein Distance operates between two input strings, and returns a number equivalent to the number of substitutions and deletions needed in order to transform one input string (e.g., the local part of the received email address) into another (e.g., the local part of an email address in the KNOWN_CONTACTS list). One embodiment, therefore, computes a string metric such as the Levenshtein distance to detect if there has been a likely spoofing of the local part of the received email address. The Levenshtein distance between two sequences of characters is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one sequence of characters into the other. Other string metrics that may be used in this context include, for example, the Damerau-Levenshtein distance. Others may be used to good benefit as well.
According to one embodiment, an email address is considered as suspect if the string metric (the Levenshtein Distance in one implementation) d between the local part of the email address and the local part of an email address of KNOWN_CONTACTS is such that
d≦STRING_METRIC_DISTANCE_THRESHOLD
One implementation may include the following functionality:
Above, the minimum length for the local part of the email address has been set at 6 characters and the STRING_METRIC_DISTANCE_THRESHOLD has been set a 2. Of course, other values may be substituted for these values. Indeed, the parameters STRING_METRIC DISTANCE_THRESHOLD and localpart_min_length may be readily configured according to operational conditions and according to the security policies of the deploying organization.
For example, if the STRING_METRIC_DISTANCE_THRESHOLD is increased, a greater number of spoofing attempts may be detected, but a greater number of false positives (email addresses that are legitimate but are flagged as potentially illegitimate) may be generated. A greater number of false positives may erode the user experience and degrade the confidence of the protected user in the system and may lead the user to disregard flagged emails.
Flagging an Email as Potentially Illegitimate/Generating Warning Cue
If the email address is suspect, a visual (for example) cue (such as a message) may be generated to warn the protected user. According to one embodiment, the protected user may then be called upon to make a decision to:
-
- confirm that the email address is suspect—the email address is then added to BLACKLIST and the email is dropped; or
- deny that the email address is suspect—the email address is then added to KNOWN_CONTACTS and the email is delivered to the protected user.
One implementation may include the following functionality:
As shown at B38, if the purported known sender does not match one of the plurality of known senders in the database of known senders and the quantified degree of similarity of the purported known sender of the electronic message to one of the plurality of known senders of electronic messages is indeed greater than the threshold value, the received electronic message may be flagged as being suspect. Thereafter, a visual and/or other perceptible cue, warning message, dialog box and the like may be generated when the received electronic message has been flagged as being suspect, to alert the recipient thereof that the flagged electronic message is likely illegitimate.
According to one embodiment, the electronic message may be or may comprises an email. In Block B33, the quantifying may comprise calculating a string metric of the difference between the purported sender and one of the plurality of known senders in the database of known senders. In one embodiment, the string metric may comprise a Levenshtein distance between the purported sender and one of the plurality of known senders in the database of known senders.
After block B39, a prompt may be generated, to solicit a decision confirming the flagged electronic message as being suspect or a decision denying that the flagged electronic message is suspect. Thereafter, the electronic message flagged as suspect may be dropped when the prompted decision is to confirm that the flagged electronic message is suspect and the flagged electronic message may be delivered to its intended recipient when the prompted decision is to deny that the flagged electronic message is suspect.
Any reference to an engine in the present specification refers, generally, to a program (or group of programs) that perform a particular function or series of functions that may be related to functions executed by other programs (e.g., the engine may perform a particular function in response to another program or may cause another program to execute its own function). Engines may be implemented in software or hardware as in the context of an appropriate hardware device such as an algorithm embedded in a processor or application-specific integrated circuit.
Embodiments of the present invention are related to the use of computing device 412, 408, 410 to detect and compute a probability that received email may be or may include a spear phishing attack. According to one embodiment, the methods and systems described herein may be provided by one or more computing devices 412, 408, 410 in response to processor(s) 502 executing sequences of instructions contained in memory 504. Such instructions may be read into memory 504 from another computer-readable medium, such as data storage device 507. Execution of the sequences of instructions contained in memory 504 causes processor(s) 502 to perform the steps and have the functionality described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the described embodiments. Thus, embodiments are not limited to any specific combination of hardware circuitry and software. Indeed, it should be understood by those skilled in the art that any suitable computer system may implement the functionality described herein. The computing devices may include one or a plurality of microprocessors working to perform the desired functions. In one embodiment, the instructions executed by the microprocessor or microprocessors are operable to cause the microprocessor(s) to perform the steps described herein. The instructions may be stored in any computer-readable medium. In one embodiment, they may be stored on a non-volatile semiconductor memory external to the microprocessor, or integrated with the microprocessor. In another embodiment, the instructions may be stored on a disk and read into a volatile semiconductor memory before execution by the microprocessor.
While certain example embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the embodiments disclosed herein. Thus, nothing in the foregoing description is intended to imply that any particular feature, characteristic, step, module, or block is necessary or indispensable. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the embodiments disclosed herein.
Claims
1. A computer-implemented method, comprising:
- receiving an electronic message from a purported known sender over a computer network;
- accessing a database configured to store a plurality of known senders of electronic messages and determining whether the purported known sender of the electronic message matches one of the plurality of known senders of electronic messages in the database of known senders;
- quantifying a degree of similarity of the purported known sender of the electronic message to at least one of the plurality of known senders of electronic messages stored in the database;
- determining the received electronic message to be legitimate when the purported known sender is determined to match one of the plurality of known senders in the database of known senders;
- flagging the received electronic message as being suspect when: the purported known sender does not match one of the plurality of known senders in the database of known senders; and the quantified degree of similarity of the purported known sender of the electronic message to one of the plurality of known senders of electronic messages is greater than a threshold value; and
- generating at least a visual cue when the received electronic message has been flagged as being suspect, to alert a recipient thereof that the flagged electronic message is likely illegitimate.
2. The computer-implemented method of claim 1, wherein the electronic message comprises an email.
3. The computer-implemented method of claim 1, wherein quantifying comprises calculating a string metric of a difference between the purported sender and one of the plurality of known senders in the database of known senders.
4. The computer-implemented method of claim 1, wherein quantifying comprises calculating a Levenshtein distance between the purported sender and one of the plurality of known senders in the database of known senders.
5. The computer-implemented method of claim 1, further comprising prompting for a decision confirming the flagged electronic message is suspect or a decision denying that the flagged electronic message is suspect.
6. The computer-implemented method of claim 5, further comprising dropping the flagged electronic message when the prompted decision is to confirm that the flagged electronic message is suspect and delivering the flagged electronic message when the prompted decision is to deny that the flagged electronic message is suspect.
7. The computer-implemented method of claim 1, wherein accessing also accesses a database of blacklisted senders of electronic messages and dropping the received electronic message if a sender of the received electronic matches an entry in the database of blacklisted senders of electronic messages.
8. A computing device configured to determine whether a received electronic message comprises a spear phishing attack, comprising:
- at least one processor;
- at least one data storage device coupled to the at least one processor;
- a plurality of processes spawned by said at least one processor, the processes including processing logic for:
- receiving an electronic message from a purported known sender over a computer network;
- accessing a database configured to store a plurality of known senders of electronic messages and determining whether the purported known sender of the electronic message matches one of the plurality of known senders of electronic messages in the database of known senders;
- quantifying a degree of similarity of the purported known sender of the electronic message to at least one of the plurality of known senders of electronic messages stored in the database;
- determining the received electronic message to be legitimate when the purported known sender is determined to match one of the plurality of known senders in the database of known senders;
- flagging the received electronic message as being suspect when: the purported known sender does not match one of the plurality of known senders in the database of known senders; and the quantified degree of similarity of the purported known sender of the electronic message to one of the plurality of known senders of electronic messages is greater than a threshold value; and
- generating at least a visual cue when the received electronic message has been flagged as being suspect, to alert a recipient thereof that the flagged electronic message is likely illegitimate
9. The computing device of claim 8, wherein the electronic message comprises an email.
10. The computing device of claim 8, wherein quantifying comprises calculating a string metric of a difference between the purported sender and one of the plurality of known senders in the database of known senders.
11. The computing device of claim 8, wherein quantifying comprises calculating a Levenshtein distance between the purported sender and one of the plurality of known senders in the database of known senders.
12. The computing device of claim 8, wherein the processes further comprise processing logic for prompting for a decision confirming the flagged electronic message is suspect or a decision denying that the flagged electronic message is suspect.
13. The computing device of claim 12, wherein the processes further comprise processing logic for dropping the flagged electronic message when the prompted decision is to confirm that the flagged electronic message is suspect and for delivering the flagged electronic message when the prompted decision is to deny that the flagged electronic message is suspect.
14. The computing device of claim 8, wherein the processes further comprise processing logic for accessing a database of blacklisted senders of electronic messages and dropping the received electronic message if a sender of the received electronic matches an entry in the database of blacklisted senders of electronic messages.
15. A tangible, non-transitory machine-readable data storage device having data stored thereon representing sequences of instructions which, when executed by a computing device, cause the computing device to:
- receive an electronic message from a purported known sender over a computer network;
- access a database configured to store a plurality of known senders of electronic messages and determine whether the purported known sender of the electronic message matches one of the plurality of known senders of electronic messages in the database of known senders;
- quantify a degree of similarity of the purported known sender of the electronic message to at least one of the plurality of known senders of electronic messages stored in the database;
- determine the received electronic message to be legitimate when the purported known sender is determined to match one of the plurality of known senders in the database of known senders;
- flag the received electronic message as being suspect when: the purported known sender does not match one of the plurality of known senders in the database of known senders; and the quantified degree of similarity of the purported known sender of the electronic message to one of the plurality of known senders of electronic messages is greater than a threshold value; and
- generate at least a visual cue when the received electronic message has been flagged as being suspect, to alert a recipient thereof that the flagged electronic message is likely illegitimate.
16. The tangible, non-transitory machine-readable data storage device of claim 15, wherein the electronic message comprises an email.
17. The tangible, non-transitory machine-readable data storage device of claim 15, wherein quantifying comprises calculating a string metric of a difference between the purported sender and one of the plurality of known senders in the database of known senders.
18. The tangible, non-transitory machine-readable data storage device of claim 15, wherein quantifying comprises calculating a Levenshtein distance between the purported sender and one of the plurality of known senders in the database of known senders.
19. The tangible, non-transitory machine-readable data storage device of claim 15, wherein the stored sequences of instructions further comprise prompting for a decision confirming the flagged electronic message is suspect or a decision denying that the flagged electronic message is suspect.
20. The tangible, non-transitory machine-readable data storage device of claim 15, wherein the stored sequences of instructions further comprise dropping the flagged electronic message when the prompted decision is to confirm that the flagged electronic message is suspect and delivering the flagged electronic message when the prompted decision is to deny that the flagged electronic message is suspect.
21. The tangible, non-transitory machine-readable data storage device of claim 15, wherein the stored sequences of instructions further comprise accessing a database of blacklisted senders of electronic messages and dropping the received electronic message if a sender of the received electronic matches an entry in the database of blacklisted senders of electronic messages.
Type: Application
Filed: Sep 22, 2015
Publication Date: Mar 23, 2017
Inventor: Sebastien GOUTAL (CYSOING)
Application Number: 14/861,846