Patents by Inventor Lili Diao

Lili Diao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Detecting network entities that pose a cybersecurity risk to a private computer network

Patent number: 11973791

Abstract: A risk knowledge graph is created from information on risk events involving network entities of a private computer network. Each of the risk events is represented as a node in the risk knowledge graph. The nodes are connected by edges that represent the risk events. The nodes are grouped into communities of related nodes. A response action is performed against a community to mitigate a cybersecurity risk posed by the community.

Type: Grant

Filed: October 4, 2021

Date of Patent: April 30, 2024

Assignee: Trend Micro Incorporated

Inventors: Zhijie Li, ZhengBao Zhang, Lili Diao
Automatic charset and language detection with machine learning

Patent number: 11449794

Abstract: Language-based machine learning approach for automatically detecting universal charset and the language of a received document is disclosed. The language-based machine learning approach employs a plurality of text document samples in different languages, after converting them to a selected Unicode style (if their original encoding schemes are not the selected Unicode), to generate a plurality of language-based machine learning models during the training stage. During the application stage, vector representations of the received document for different combinations of charsets and their respective applicable languages are tested against the plurality of machine learning models to ascertain the charset and language combination that is most similar to its associated machine learning model, thereby identifying the charset and language of the received document.

Type: Grant

Filed: August 21, 2019

Date of Patent: September 20, 2022

Assignee: Trend Micro Incorporated

Inventor: Lili Diao
Two stage virus detection

Patent number: 8935788

Abstract: A two stage virus detection system detects viruses in target files. In the first stage, a training application receives a master virus pattern file recording all known virus patterns and generates a features list containing fundamental virus signatures from the virus patterns, a novelty detection model, a classification model, and a set of segmented virus pattern files. In the second stage, a detection application scans a target file for viruses using the generated outputs from the first stage rather than using the master virus pattern file directly to do traditional pattern matching. The results of the scan can vary in detail depending on a fuzzy scan level. For fuzzy scan level “1,” the existence of a virus is returned. For fuzzy scan level “2,” the grant virus type found is returned. For fuzzy scan level “3,” the exact virus name is returned.

Type: Grant

Filed: October 15, 2008

Date of Patent: January 13, 2015

Assignee: Trend Micro Inc.

Inventors: Lili Diao, Vincent Chan, Patrick Mg Lu
Identification of normal scripts in computer systems

Patent number: 8838992

Abstract: A machine learning model is used to identify normal scripts in a client computer. The machine learning model may be built by training using samples of known normal scripts and samples of known potentially malicious scripts and may take into account lexical and semantic characteristics of the sample scripts. The machine learning model and a feature set may be provided to the client computer by a server computer. In the client computer, the machine learning model may be used to classify a target script. The target script does not have to be evaluated for malicious content when classified as a normal script. Otherwise, when the target script is classified as a potentially malicious script, the target script may have to be further evaluated by an anti-malware or sent to a back-end system.

Type: Grant

Filed: April 28, 2011

Date of Patent: September 16, 2014

Assignee: Trend Micro Incorporated

Inventors: Xuewen Zhu, Lili Diao, Da Li, Dibin Tang
Identifying sensitive expressions in images for languages with large alphabets

Patent number: 8699796

Abstract: One embodiment relates to a method of identifying sensitive expressions in images for a language with a large alphabet. The method is performed using a computer and includes (i) extracting an image from a message, (ii) extracting image character-blocks (i.e. normalized pixel graphs) from the image, and (iii) predicting characters to which the character-blocks correspond using a multi-class learning model, wherein the multi-class learning model is trained using a derived list of sensitive characters which is a subset of the large alphabet. In addition, (iv) the characters may be combined into string text, and (v) the string text may be searched for matches with a predefined list of sensitive expressions. Another embodiment relates to a method of training a multi-class learning model so that the model predicts characters to which image character-blocks correspond. Other embodiments, aspects and features are also disclosed herein.

Type: Grant

Filed: November 11, 2008

Date of Patent: April 15, 2014

Assignee: Trend Micro Incorporated

Inventors: Lili Diao, Jonathan J. Oliver
Method and arrangement for automatic charset detection

Patent number: 8560466

Abstract: The invention relates, in an embodiment, to a method for handling a received document. The method includes receiving a plurality of text document samples. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes generating fundamental units from the plurality of text document samples for charsets of the plurality of text document samples. Training includes extracting a subset of said fundamental units as feature lists and converting the feature lists into a set of feature vectors. Training further includes generating the set of machine learning models from the set of feature vectors. The method includes applying the set of machine learning models against a set of target document feature vectors converted from the received document. The method includes decoding the received document to obtain decoded content of the received document based on at least the first encoding scheme.

Type: Grant

Filed: February 26, 2010

Date of Patent: October 15, 2013

Assignee: Trend Micro Incorporated

Inventors: Lili Diao, Yun-chian Cheng
Zero day malware scanner

Patent number: 8375450

Abstract: A training model for malware detection is developed using common substrings extracted from known malware samples. The probability of each substring occurring within a malware family is determined and a decision tree is constructed using the substrings. An enterprise server receives indications from client machines that a particular file is suspected of being malware. The suspect file is retrieved and the decision tree is walked using the suspect file. A leaf node is reached that identifies a particular common substring, a byte offset within the suspect file at which it is likely that the common substring begins, and a probability distribution that the common substring appears in a number of malware families. A hash value of the common substring is compared (exact or approximate) against the corresponding substring in the suspect file. If positive, a result is returned to the enterprise server indicating the probability that the suspect file is a member of a particular malware family.

Type: Grant

Filed: October 5, 2009

Date of Patent: February 12, 2013

Assignee: Trend Micro, Inc.

Inventors: Jonathan James Oliver, Cheng-Lin Hou, Lili Diao, YiFun Liang, Jennifer Rihn
Methods for matching image-based texual information with regular expressions

Patent number: 8260054

Abstract: A method for matching an image-form textual string in an image to a regular expression is disclosed. The method includes constructing a representation of the regular expression and generating a candidate string of characters from the image-form textual string. The method further includes ascertaining whether there exists a match between the image-form textual string and the regular expression, the match is deemed achieved if a probability value associated with the match is above a predetermined matching threshold.

Type: Grant

Filed: September 22, 2008

Date of Patent: September 4, 2012

Assignee: Trend Micro Incorporated

Inventors: Jonathan James Oliver, Lili Diao
Lightweight SVM-based content filtering system for mobile phones

Patent number: 8023974

Abstract: In one embodiment, a content filtering system generates a support vector machine (SVM) learning model in a server computer and provides the SVM learning model to a mobile phone for use in classifying text messages. The SVM learning model may be generated in the server computer by training a support vector machine with sample text messages that include spam and legitimate text messages. A resulting intermediate SVM learning model from the support vector machine may include a threshold value, support vectors and alpha values. The SVM learning model in the mobile phone may include the threshold value, the features, and the weights of the features. An incoming text message may be parsed for the features. The weights of features found in the incoming text message may be added and compared to the threshold value to determine whether or not the incoming text message is spam.

Type: Grant

Filed: February 15, 2007

Date of Patent: September 20, 2011

Assignee: Trend Micro Incorporated

Inventors: Lili Diao, Vincent Chan, Patrick MG Lu
METHOD AND ARRANGEMENT FOR AUTOMATIC CHARSET DETECTION

Publication number: 20110213736

Abstract: The invention relates, in an embodiment, to a method for handling a received document. The method includes receiving a plurality of text document samples. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes generating fundamental units from the plurality of text document samples for charsets of the plurality of text document samples. Training includes extracting a subset of said fundamental units as feature lists and converting the feature lists into a set of feature vectors. Training further includes generating the set of machine learning models from the set of feature vectors. The method includes applying the set of machine learning models against a set of target document feature vectors converted from the received document. The method includes decoding the received document to obtain decoded content of the received document based on at least the first encoding scheme.

Type: Application

Filed: February 26, 2010

Publication date: September 1, 2011

Inventors: Lili Diao, Yun-chian Cheng
Method and arrangement for SIM algorithm automatic charset detection

Patent number: 7827133

Abstract: The invention relates, in an embodiment, to a computer-implemented method for handling a target document, the target document having been transmitted electronically and involving an encoding scheme. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes using SIM (Similarity Algorithm) to generate the set of machine learning models from feature vectors obtained from the plurality of text document samples. The method also includes applying the set of machine learning models against a set of target document feature vectors converted from the target document to detect the encoding scheme. The method including decoding the target document to obtain decoded content of the document based on at least the first encoding scheme.

Type: Grant

Filed: February 26, 2010

Date of Patent: November 2, 2010

Assignee: Trend Micro Inc.

Inventor: Lili Diao
Lightweight content filtering system for mobile phones

Patent number: 7756535

Abstract: In one embodiment, a content filtering system includes a feature list and a learning model. The feature list may be a subset of a dictionary that was used to train the content filtering system to identify classification (e.g., spam, phishing, porn, legitimate text messages, etc.) of text messages during a training stage. The learning model may include representative vectors, each of which represents a particular class of text messages. The learning model and the feature list may be generated in a server computer during the training stage and then subsequently provided to the mobile phone. An incoming text message in the mobile phone may be parsed for occurrences of feature words included in the feature list and then converted to an input vector. The input vector may be compared to the learning model to determine the classification of the incoming text message.

Type: Grant

Filed: July 7, 2006

Date of Patent: July 13, 2010

Assignee: Trend Micro Incorporated

Inventors: Lili Diao, Jackie Cao, Vincent Chan
METHOD AND ARRANGEMENT FOR SIM ALGORITHM AUTOMATIC CHARSET DETECTION

Publication number: 20100153320

Abstract: The invention relates, in an embodiment, to a computer-implemented method for handling a target document, the target document having been transmitted electronically and involving an encoding scheme. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes using SIM (Similarity Algorithm) to generate the set of machine learning models from feature vectors obtained from the plurality of text document samples. The method also includes applying the set of machine learning models against a set of target document feature vectors converted from the target document to detect the encoding scheme. The method including decoding the target document to obtain decoded content of the document based on at least the first encoding scheme.

Type: Application

Filed: February 26, 2010

Publication date: June 17, 2010

Inventor: Lili Diao
Automatic charset detection using SIM algorithm with charset grouping

Patent number: 7711673

Abstract: The invention relates, in an embodiment, to a computer-implemented method for automatic charset detection, which includes detecting an encoding scheme of a target document. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes using SIM (Similarity Algorithm) to generate the set of machine learning models from feature vectors obtained from the plurality of text document samples. The method also includes applying the set of machine learning models against a set of target document feature vectors converted from the target document to detect the encoding scheme.

Type: Grant

Filed: September 28, 2005

Date of Patent: May 4, 2010

Assignee: Trend Micro Incorporated

Inventor: Lili Diao
Automatic charset detection using support vector machines with charset grouping

Patent number: 7689531

Abstract: The invention relates, in an embodiment, to a computer-implemented method for automatic charset detection, which includes detecting an encoding scheme of a target document. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes using a SVM (Support Vector Machine) technique to generate the set of machine learning models from feature vectors obtained from the plurality of text document samples. The method also includes applying the set of machine learning models against a set of target document feature vectors converted from the target document to detect the encoding scheme.

Type: Grant

Filed: September 28, 2005

Date of Patent: March 30, 2010

Assignee: Trend Micro Incorporated

Inventors: Lili Diao, Yun-chian Cheng
METHODS FOR MATCHING IMAGE-BASED TEXUAL INFORMATION WITH REGULAR EXPRESSIONS

Publication number: 20100074534

Abstract: A method for matching an image-form textual string in an image to a regular expression is disclosed. The method includes constructing a representation of the regular expression and generating a candidate string of characters from the image-form textual string. The method further includes ascertaining whether there exists a match between the image-form textual string and the regular expression, the match is deemed achieved if a probability value associated with the match is above a predetermined matching threshold.

Type: Application

Filed: September 22, 2008

Publication date: March 25, 2010

Inventors: Jonathan James Oliver, Lili Diao