Patents by Inventor Mitesh Dalal

Mitesh Dalal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11956272
    Abstract: Aspects of the disclosure relate to identifying legitimate websites and removing false positives from domain discovery analysis. Based on a list of known legitimate domains, a computing platform may generate a baseline dataset of feature vectors corresponding to the known legitimate domains. Subsequently, the computing platform may receive information identifying a first domain for analysis and may execute one or more machine learning algorithms to compare the first domain to the baseline dataset. Based on execution of the one or more machine learning algorithms, the computing platform may generate first domain classification information indicating that the first domain is a legitimate domain. In response to determining that the first domain is a legitimate domain, the computing platform may send one or more commands directing a domain identification system to remove the first domain from a list of indeterminate domains maintained by the domain identification system.
    Type: Grant
    Filed: November 22, 2022
    Date of Patent: April 9, 2024
    Assignee: Proofpoint, Inc.
    Inventors: Hung-Jen Chang, Gaurav Mitesh Dalal, Ali Mesdaq
  • Publication number: 20240095289
    Abstract: To find enriching contextual information for an abbreviated domain name, a data enrichment engine can comb through web content source code corresponding to the abbreviated domain name. From textual content in the web content source code, the data enrichment engine can identify words with initial characters that match characters of the abbreviated domain name to thereby establish a relationship there-between. This relationship can facilitate more accurate and efficient domain name classification. The data enrichment engine can query a WHOIS server to find out if candidate domains having initial characters that match the characters of the abbreviated domain name are registered to the same entity. If so, keywords can be extracted from the candidate domains and used to find more relevant domains for domain risk analysis and detection. Candidate domains determined by the data enrichment engine can be provided to a downstream computing facility such as a domain filter.
    Type: Application
    Filed: November 17, 2023
    Publication date: March 21, 2024
    Inventors: Gaurav Mitesh Dalal, Ali Mesdaq, Hung-Jen Chang
  • Patent number: 11868412
    Abstract: To find enriching contextual information for an abbreviated domain name, a data enrichment engine can comb through web content source code corresponding to the abbreviated domain name. From textual content in the web content source code, the data enrichment engine can identify words with initial characters that match characters of the abbreviated domain name to thereby establish a relationship there-between. This relationship can facilitate more accurate and efficient domain name classification. The data enrichment engine can query a WHOIS server to find out if candidate domains having initial characters that match the characters of the abbreviated domain name are registered to the same entity. If so, keywords can be extracted from the candidate domains and used to find more relevant domains for domain risk analysis and detection. Candidate domains determined by the data enrichment engine can be provided to a downstream computing facility such as a domain filter.
    Type: Grant
    Filed: November 19, 2021
    Date of Patent: January 9, 2024
    Assignee: Proofpoint, Inc.
    Inventors: Gaurav Mitesh Dalal, Ali Mesdaq, Hung-Jen Chang
  • Publication number: 20230308463
    Abstract: A threat actor identification system that obtains domain data for a set of domains, generates domain clusters, determines whether the domain clusters are associated with threat actors, and presents domain data for the clusters that are associated with threat actors to brand owners that are associated with the threat actors. The clusters may be generated based on similarities in web page content, domain registration information, and/or domain infrastructure information. For each cluster, a clustering engine determines whether the cluster is associated with a threat actor, and for clusters that are associated with threat actors, corresponding domain information is stored for presentation to brand owners to whom the threat actor poses a threat.
    Type: Application
    Filed: May 24, 2023
    Publication date: September 28, 2023
    Inventors: Gaurav Mitesh Dalal, Hung-Jen Chang, Ali Mesdaq
  • Patent number: 11700272
    Abstract: A threat actor identification system that obtains domain data for a set of domains, generates domain clusters, determines whether the domain clusters are associated with threat actors, and presents domain data for the clusters that are associated with threat actors to brand owners that are associated with the threat actors. The clusters may be generated based on similarities in web page content, domain registration information, and/or domain infrastructure information. For each cluster, a clustering engine determines whether the cluster is associated with a threat actor, and for clusters that are associated with threat actors, corresponding domain information is stored for presentation to brand owners to whom the threat actor poses a threat.
    Type: Grant
    Filed: February 3, 2021
    Date of Patent: July 11, 2023
    Assignee: PROOFPOINT, INC.
    Inventors: Gaurav Mitesh Dalal, Hung-Jen Chang, Ali Mesdaq
  • Publication number: 20230205823
    Abstract: An intelligent clustering system has a dual-mode clustering engine for mass-processing and stream-processing. A tree data model is utilized to describe heterogenous data elements in an accurate and uniform way and to calculate a tree distance between each data element and a cluster representative. The clustering engine performs element clustering, through sequential or parallel stages, to cluster the data elements based at least in part on calculated tree distances and parameter values reflecting user-provided domain knowledge on a given objective. The initial clusters thus generated are fine-tuned by undergoing an iterative self-tuning process, which continues when new data is streamed from data source(s). The clustering engine incorporates stage-specific domain knowledge through stage-specific configurations. This hybrid approach combines strengths of user domain knowledge and machine learning power.
    Type: Application
    Filed: March 7, 2023
    Publication date: June 29, 2023
    Inventors: Hung-Jen Chang, Gaurav Mitesh Dalal, Ali Mesdaq
  • Patent number: 11671456
    Abstract: A rules engine is adapted for analyzing each match produced by a domain discovery system as matching a seed domain. Utilizing a natural language processing (NLP) library, the rules engine determines segments from the match, assigns a lexical category to each segment based on the context in how a seed domain string is used, and compares the lexical category of the segment that is closest to the seed domain string with a lexical category of the seed domain string. Based on the comparing, the rules engine determines whether the match is relevant to the seed domain and, if not, the match produced by the domain discovery system is identified as a false positive and automatically removed from a set of matches produced by the domain discovery system for the seed domain.
    Type: Grant
    Filed: May 11, 2020
    Date of Patent: June 6, 2023
    Assignee: PROOFPOINT, INC.
    Inventors: Gaurav Mitesh Dalal, Hung-Jen Chang, Ali Mesdaq
  • Publication number: 20230169783
    Abstract: Disclosed is an effective domain name defense solution in which a domain name string may be provided to or obtained by a computer embodying a visual domain analyzer. The domain name string may be rendered or otherwise converted to an image. An optical character recognition function may be applied to the image to read out a text string which can then be compared with a protected domain name to determine whether the text string generated by the optical character recognition function from the image converted from the domain name string is similar to or matches the protected domain name. This visual domain analysis can be dynamically applied in an online process or proactively applied in an offline process to hundreds of millions of domain names.
    Type: Application
    Filed: January 12, 2023
    Publication date: June 1, 2023
    Inventors: Gaurav Mitesh Dalal, Ali Mesdaq, Sharon Huffner, Harold Nguyen
  • Patent number: 11636161
    Abstract: An intelligent clustering system has a dual-mode clustering engine for mass-processing and stream-processing. A tree data model is utilized to describe heterogenous data elements in an accurate and uniform way and to calculate a tree distance between each data element and a cluster representative. The clustering engine performs element clustering, through sequential or parallel stages, to cluster the data elements based at least in part on calculated tree distances and parameter values reflecting user-provided domain knowledge on a given objective. The initial clusters thus generated are fine-tuned by undergoing an iterative self-tuning process, which continues when new data is streamed from data source(s). The clustering engine incorporates stage-specific domain knowledge through stage-specific configurations. This hybrid approach combines strengths of user domain knowledge and machine learning power.
    Type: Grant
    Filed: July 16, 2019
    Date of Patent: April 25, 2023
    Assignee: PROOFPOINT, INC.
    Inventors: Hung-Jen Chang, Gaurav Mitesh Dalal, Ali Mesdaq
  • Publication number: 20230079326
    Abstract: Aspects of the disclosure relate to identifying legitimate websites and removing false positives from domain discovery analysis. Based on a list of known legitimate domains, a computing platform may generate a baseline dataset of feature vectors corresponding to the known legitimate domains. Subsequently, the computing platform may receive information identifying a first domain for analysis and may execute one or more machine learning algorithms to compare the first domain to the baseline dataset. Based on execution of the one or more machine learning algorithms, the computing platform may generate first domain classification information indicating that the first domain is a legitimate domain. In response to determining that the first domain is a legitimate domain, the computing platform may send one or more commands directing a domain identification system to remove the first domain from a list of indeterminate domains maintained by the domain identification system.
    Type: Application
    Filed: November 22, 2022
    Publication date: March 16, 2023
    Inventors: Hung-Jen Chang, Gaurav Mitesh Dalal, Ali Mesdaq
  • Patent number: 11580760
    Abstract: Disclosed is an effective domain name defense solution in which a domain name string may be provided to or obtained by a computer embodying a visual domain analyzer. The domain name string may be rendered or otherwise converted to an image. An optical character recognition function may be applied to the image to read out a text string which can then be compared with a protected domain name to determine whether the text string generated by the optical character recognition function from the image converted from the domain name string is similar to or matches the protected domain name. This visual domain analysis can be dynamically applied in an online process or proactively applied in an offline process to hundreds of millions of domain names.
    Type: Grant
    Filed: May 4, 2020
    Date of Patent: February 14, 2023
    Assignee: PROOFPOINT, INC.
    Inventors: Gaurav Mitesh Dalal, Ali Mesdaq, Sharon Huffner, Harold Nguyen
  • Patent number: 11539745
    Abstract: Aspects of the disclosure relate to identifying legitimate websites and removing false positives from domain discovery analysis. Based on a list of known legitimate domains, a computing platform may generate a baseline dataset of feature vectors corresponding to the known legitimate domains. Subsequently, the computing platform may receive information identifying a first domain for analysis and may execute one or more machine learning algorithms to compare the first domain to the baseline dataset. Based on execution of the one or more machine learning algorithms, the computing platform may generate first domain classification information indicating that the first domain is a legitimate domain. In response to determining that the first domain is a legitimate domain, the computing platform may send one or more commands directing a domain identification system to remove the first domain from a list of indeterminate domains maintained by the domain identification system.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: December 27, 2022
    Assignee: Proofpoint, Inc.
    Inventors: Hung-Jen Chang, Gaurav Mitesh Dalal, Ali Mesdaq
  • Publication number: 20220245351
    Abstract: Aspects of the disclosure relate to detecting random and/or algorithmically-generated character sequences in domain names. A computing platform may train a machine learning model based on a set of semantically-meaningful words. Subsequently, the computing platform may receive a seed string and a set of domains to be analyzed in connection with the seed string. Based on the machine learning model, the computing platform may apply a classification algorithm to the seed string and the set of domains, where applying the classification algorithm to the seed string and the set of domains produces a classification result. Thereafter, the computing platform may store the classification result.
    Type: Application
    Filed: November 18, 2021
    Publication date: August 4, 2022
    Inventors: Hung-Jen Chang, Gaurav Mitesh Dalal, Ali Mesdaq
  • Patent number: 11194871
    Abstract: To find enriching contextual information for an abbreviated domain name, a data enrichment engine can comb through web content source code corresponding to the abbreviated domain name. From textual content in the web content source code, the data enrichment engine can identify words with initial characters that match characters of the abbreviated domain name to thereby establish a relationship there-between. This relationship can facilitate more accurate and efficient domain name classification. The data enrichment engine can query a WHOIS server to find out if candidate domains having initial characters that match the characters of the abbreviated domain name are registered to the same entity. If so, keywords can be extracted from the candidate domains and used to find more relevant domains for domain risk analysis and detection. Candidate domains determined by the data enrichment engine can be provided to a downstream computing facility such as a domain filter.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: December 7, 2021
    Assignee: Proofpoint, Inc.
    Inventors: Gaurav Mitesh Dalal, Ali Mesdaq, Hung-Jen Chang
  • Publication number: 20210374526
    Abstract: A domain processing system receives or collects raw data containing sample domains each having a known class identity indicating whether a domain is conducting an email campaign. The domain processing system extracts features from each of the sample domains and selects features of interest from the features, including at least a feature particular to a seed domain and features particular to email activities over a time line that includes days before and after a domain creation date. The features of interest are used to create feature vectors which, in turn, are used to train a machine learning model, the training including optimizing a neural network structure iteratively until stopping criteria are satisfied. The trained model functions as an email campaign domain classifier operable to classify candidate domains with unknown class identities such that each of the candidate domain is classified as conducting or not conducting an email campaign.
    Type: Application
    Filed: March 30, 2021
    Publication date: December 2, 2021
    Inventors: Hung-Jen Chang, Gaurav Mitesh Dalal, Ali Mesdaq
  • Publication number: 20210160269
    Abstract: A threat actor identification system that obtains domain data for a set of domains, generates domain clusters, determines whether the domain clusters are associated with threat actors, and presents domain data for the clusters that are associated with threat actors to brand owners that are associated with the threat actors. The clusters may be generated based on similarities in web page content, domain registration information, and/or domain infrastructure information. For each cluster, a clustering engine determines whether the cluster is associated with a threat actor, and for clusters that are associated with threat actors, corresponding domain information is stored for presentation to brand owners to whom the threat actor poses a threat.
    Type: Application
    Filed: February 3, 2021
    Publication date: May 27, 2021
    Inventors: Gaurav Mitesh Dalal, Hung-Jen Chang, Ali Mesdaq
  • Publication number: 20210112030
    Abstract: Taking a zero-configuration approach, a domain name discovery system utilizes, in an iterative process, WHOIS data and infrastructure data for a seed domain to automatically discover domain names having registration and/or infrastructure details that match those of the seed domain. Registration information such as a registered email address associated with a domain name discovered through WHOIS data matching or infrastructure data matching is utilized in a reverse lookup for domain names having infrastructure or WHOIS registered information that fully matches the information associated with the domain name discovered through the iterative process. Domain names discovered through WHOIS data matching, infrastructure data matching, and reverse lookup can be presented through a user interface on a client device communicatively connected to the domain name discovery system over a network. The domain name discovery can be performed periodically or in near real time responsive to receiving a new seed domain.
    Type: Application
    Filed: December 21, 2020
    Publication date: April 15, 2021
    Inventors: Gaurav Mitesh Dalal, Ali Mesdaq
  • Patent number: 10965701
    Abstract: A threat actor identification system that obtains domain data for a set of domains, generates domain clusters, determines whether the domain clusters are associated with threat actors, and presents domain data for the clusters that are associated with threat actors to brand owners that are associated with the threat actors. The clusters may be generated based on similarities in web page content, domain registration information, and/or domain infrastructure information. For each cluster, a clustering engine determines whether the cluster is associated with a threat actor, and for clusters that are associated with threat actors, corresponding domain information is stored for presentation to brand owners to whom the threat actor poses a threat.
    Type: Grant
    Filed: January 14, 2019
    Date of Patent: March 30, 2021
    Assignee: Proofpoint, Inc.
    Inventors: Gaurav Mitesh Dalal, Hung-Jen Chang, Ali Mesdaq
  • Publication number: 20210067557
    Abstract: A rules engine is adapted for analyzing each match produced by a domain discovery system as matching a seed domain. Utilizing a natural language processing (NLP) library, the rules engine determines segments from the match, assigns a lexical category to each segment based on the context in how a seed domain string is used, and compares the lexical category of the segment that is closest to the seed domain string with a lexical category of the seed domain string. Based on the comparing, the rules engine determines whether the match is relevant to the seed domain and, if not, the match produced by the domain discovery system is identified as a false positive and automatically removed from a set of matches produced by the domain discovery system for the seed domain.
    Type: Application
    Filed: May 11, 2020
    Publication date: March 4, 2021
    Inventors: Gaurav Mitesh Dalal, Hung-Jen Chang, Ali Mesdaq
  • Publication number: 20210042371
    Abstract: To find enriching contextual information for an abbreviated domain name, a data enrichment engine can comb through web content source code corresponding to the abbreviated domain name. From textual content in the web content source code, the data enrichment engine can identify words with initial characters that match characters of the abbreviated domain name to thereby establish a relationship there-between. This relationship can facilitate more accurate and efficient domain name classification. The data enrichment engine can query a WHOIS server to find out if candidate domains having initial characters that match the characters of the abbreviated domain name are registered to the same entity. If so, keywords can be extracted from the candidate domains and used to find more relevant domains for domain risk analysis and detection. Candidate domains determined by the data enrichment engine can be provided to a downstream computing facility such as a domain filter.
    Type: Application
    Filed: March 29, 2019
    Publication date: February 11, 2021
    Inventors: Gaurav Mitesh Dalal, Ali Mesdaq, Hung-Jen Chang