Patents by Inventor Ching-Yung Lin

Ching-Yung Lin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PREDICTING THE BUSINESS IMPACT OF TWEET CONVERSATIONS

Publication number: 20160019659

Abstract: A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages. The method also includes clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders. The method additionally includes merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists. Each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged.

Type: Application

Filed: June 24, 2015

Publication date: January 21, 2016

Inventors: YURDAER N. DOGANATA, CHING-YUNG LIN, DAVID CORBALAN LUNA, JORDI C. MESTRE, XAVIER NOGUERA PAGES, MERCAN TOPKARA, ZHEN WEN, DANNY L. YEH
PREDICTING THE BUSINESS IMPACT OF TWEET CONVERSATIONS

Publication number: 20160019565

Abstract: A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages. The method also includes clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders. The method additionally includes merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists. Each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged.

Type: Application

Filed: June 3, 2015

Publication date: January 21, 2016

Inventors: YURDAER N. DOGANATA, CHING-YUNG LIN, DAVID C. LUNA, JORDI C. MESTRE, XAVIER N. PAGES, MERCAN TOPKARA, ZHEN WEN, DANNY L. YEH
Generating data from imbalanced training data sets

Patent number: 9224104

Abstract: Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.

Type: Grant

Filed: September 24, 2013

Date of Patent: December 29, 2015

Assignee: International Business Machines Corporation

Inventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
GENERATING DATA FROM IMBALANCED TRAINING DATA SETS

Publication number: 20150356464

Abstract: Determining a number of kernels within a model is provided. A number of kernels that include data samples of a majority data class of an imbalanced training data set is determined based on a set of generated artificial data samples for a minority data class of the imbalanced training data set. The number of kernels within the model is generated based on the set of generated artificial data samples. A likelihood of the set of generated artificial data samples being included in the majority data class of the imbalanced training data set is calculated. Parameters of each kernel in the number of kernels are updated based on the likelihood of the set of generated artificial data samples being included in the majority data class of the imbalanced training data set. Each kernel in the number of kernels is adjusted based on the updated parameters.

Type: Application

Filed: August 20, 2015

Publication date: December 10, 2015

Inventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
INSIDER THREAT PREDICTION

Publication number: 20150286819

Abstract: A method for predicting insider threat includes mining electronic data of an organization corresponding to activity of an entity, determining features of the electronic data corresponding to the activity of the entity, classifying the features corresponding to the activity of the entity, determining sequences of classified features matching one or more patterns of insider threat, scoring the entity according to matches of the classified features to the one or more patterns of insider threat, and predicting an insider threat corresponding to the entity according to the score.

Type: Application

Filed: April 7, 2014

Publication date: October 8, 2015

Applicant: International Business Machines Corporation

Inventors: Anni R. Coden, Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
VISUALIZING CONFLICTS IN ONLINE MESSAGES

Publication number: 20150106360

Abstract: Visualizing social media conflict is provided. Textual messages by a set of human users connected via a network regarding a particular topic are collected. Active users in the set of human users authoring a number of textual messages regarding the particular topic more than a threshold number of textual messages are selected. Keywords are selected that occur more than a threshold number of times within the textual messages regarding the particular topic. A sentiment score is computed for each of the keywords occurring more than the threshold number of times within the textual messages using a keyword co-occurrence graph. A sentiment of each of the active users is determined based on the computed sentiment score of each of the selected keywords that are authored by a particular active user.

Type: Application

Filed: October 10, 2013

Publication date: April 16, 2015

Applicant: International Business Machines Corporation

Inventors: Nan Cao, Ching-Yung Lin, Fei Wang, Zhen Wen
Finding a top-K diversified ranking list on graphs

Patent number: 9009147

Abstract: A method, system and computer program product for finding a diversified ranking list for a given query. In one embodiment, a multitude of date items responsive to the query are identified, a marginal score is established for each data item; and a set, or ranking list, of the data items is formed based on these scores. This ranking list is formed by forming an initial set, and one or more data items are added to the ranking list based on the marginal scores of the data items. In one embodiment, each of the data items has a measured relevance and a measured diversity value, and the marginal scores for the data items are based on the measured relevance and the measured diversity values of the data items.

Type: Grant

Filed: August 19, 2011

Date of Patent: April 14, 2015

Assignee: International Business Machines Corporation

Inventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen
GENERATING DATA FROM IMBALANCED TRAINING DATA SETS

Publication number: 20150088791

Abstract: Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.

Type: Application

Filed: September 24, 2013

Publication date: March 26, 2015

Applicant: International Business Machines Corporation

Inventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
COMPOSITE PROPENSITY PROFILE DETECTOR

Publication number: 20150058273

Abstract: Detecting propensity profile for a person may comprise receiving artifacts associated with the person; detecting profile characteristics for the person based on the artifacts; receiving a plurality of predefined profiles comprising a plurality of characteristics and relationships between the characteristics over time, each of the plurality of predefined profiles specifying an indication of propensity; matching the profile characteristics for the person with one or more of the plurality of predefined profiles; and outputting one or more propensity indicators based on the matching, the propensity indicators comprising at least an expressed strength of a given propensity in the person at a given time.

Type: Application

Filed: August 20, 2013

Publication date: February 26, 2015

Applicant: International Business Machines Corporation

Inventors: Anni R. Coden, Keith C. Houck, Ching-Yung Lin, Wanyi Lin, Peter K. Malkin, Shimei Pan, Youngja Park, Justin D. Weisz
SEQUENTIAL ANOMALY DETECTION

Publication number: 20150052090

Abstract: A dataset including at least one temporal event sequence is collected. A one-class sequence classifier f(x) that obtains a decision boundary is statistically learned. At least one new temporal event sequence is evaluated, wherein the at least one new temporal event sequence is outside of the dataset. It is determined whether the at least one new temporal event sequence is one of a normal sequence or an abnormal sequence based on the evaluation. Numerous additional aspects are disclosed.

Type: Application

Filed: August 16, 2013

Publication date: February 19, 2015

Applicant: International Business Machines Corporation

Inventors: Ching-Yung Lin, Yale Song, Zhen Wen
Inferring user interests using social network correlation and attribute correlation

Patent number: 8838688

Abstract: Methods and apparatus are provided for inferring user interests from both direct and indirect social neighbors. User interests are inferred from social neighbors by exploiting the correlation among multiple attributes of a user, in addition to the social correlation of an attribute among a group of users. Attributes of a user are inferred by obtaining an inferred set of attributes comprised of one or more attributes of social neighbors of the user. Thereafter, the inferred set is modified using a user attribute correlation model describing a probability that the attributes in the inferred set co-occur on the user and one or more of the social neighbors. An inference quality of the obtained attributes can optionally be obtained based on social network properties of the social neighbors. Interactions with the user and/or the social neighbors can be employed to solicit feedback to improve the one or more inferred attributes.

Type: Grant

Filed: May 31, 2011

Date of Patent: September 16, 2014

Assignee: International Business Machines Corporation

Inventors: Ching-Yung Lin, Zhen Wen
Determining the importance of data items and their characteristics using centrality measures

Patent number: 8818918

Abstract: Computer-implemented methods, systems, and articles of manufacture for determining the importance of a data item. A method includes: (a) receiving a node graph; (b) approximating a number of neighbor nodes of a node; and (c) calculating a average shortest path length of the node to the remaining nodes using the approximation step, where this calculation demonstrates the importance of a data item represented by the node. Another method includes: (a) receiving a node graph; (b) building a decomposed line graph of the node graph; (c) calculating stationary probabilities of incident edges of a node graph node in the decomposed line graph, and (d) calculating a summation of the stationary probabilities of the incident edges associated with the node, where the summation demonstrates the importance of a data item represented by the node. Both methods have at least one step carried out using a computer device.

Type: Grant

Filed: April 28, 2011

Date of Patent: August 26, 2014

Assignee: International Business Machines Corporation

Inventors: Ching-Yung Lin, Hanghang Tong, Jimeng Sun, Spyridon Papadimitriou, U Kang
Privacy-aware on-line user role tracking

Patent number: 8775335

Abstract: Access is obtained to a first nonnegative factor matrix and a second nonnegative factor matrix obtained by factorizing a nonnegative asymmetric matrix which represents a set of data which tracks time-stamped activities of a plurality of entities. The first nonnegative factor matrix is representative of initial role membership of the entities, and the second nonnegative factor matrix is representative of initial role activity descriptions. At a given one of the time stamps, while holding a change in the first nonnegative factor matrix constant, a change in the second nonnegative factor matrix is updated to reflect time variance of the set of data at the given one of the time stamps, without accessing actual data values at previous ones of the time stamps.

Type: Grant

Filed: August 5, 2011

Date of Patent: July 8, 2014

Assignee: International Business Machines Corporation

Inventors: Ching-Yung Lin, Hanghang Tong, Fei Wang
Method and system for managing and querying large graphs

Patent number: 8645339

Abstract: A method, system and computer program product for managing and querying a graph. The method includes the steps of: receiving a graph; partitioning the graph into homogeneous blocks; compressing the homogeneous blocks; and storing the compressed homogeneous blocks in files where at least one of the steps is carried out using a computer device.

Type: Grant

Filed: November 11, 2011

Date of Patent: February 4, 2014

Assignee: International Business Machines Corporation

Inventors: U Kang, Ching-Yung Lin, Jimeng Sun, Hanghang Tong
System and method for social inference based on distributed social sensor system

Patent number: 8620916

Abstract: A method (and system) for data acquisition includes downloading a user's sent materials from a communication data repository, analyzing the sent materials and extracting data portions that are authored by the user, generating statistical values from the extracted data, transmitting the generated statistical values to one or multiple repositories, receiving the generated statistical values on one or multiple server machines, and aggregating statistical values of multiple users.

Type: Grant

Filed: March 9, 2012

Date of Patent: December 31, 2013

Assignee: International Business Machines Corporation

Inventors: Ching-Yung Lin, Dmitry A. Rekesh
System and method for social inference based on distributed social sensor system

Patent number: 8615515

Abstract: A method (and system) for data acquisition includes extracting information from user communications and allowing a user to control the information to be extracted. The method of data acquisition may include downloading a user's sent materials from a communication data repository, analyzing the downloaded materials and extracting data portions that are authored by the user, generating statistical values from the extracted data, transmitting the generated statistical values to one or multiple repositories, receiving generated statistical values one or multiple server machines, and aggregating statistical values of multiple users.

Type: Grant

Filed: May 9, 2008

Date of Patent: December 24, 2013

Assignee: International Business Machines Corporation

Inventors: Ching-Yung Lin, Dmitry A. Rekesh
Method and system for detecting anomalies in a bipartite graph

Patent number: 8612169

Abstract: A method of detecting anomalies from a bipartite graph includes analyzing the graph to determine a row-cluster membership, a column-cluster membership and a non-negative residual matrix, and in a processor, detecting the anomalies from the non-negative residual matrix.

Type: Grant

Filed: April 26, 2011

Date of Patent: December 17, 2013

Assignee: International Business Machines Corporation

Inventors: Ching-Yung Lin, Hanghang Tong
METHOD AND SYSTEM FOR MANAGING AND QUERYING LARGE GRAPHS

Publication number: 20130124488

Abstract: A method, system and computer program product for managing and querying a graph. The method includes the steps of: receiving a graph; partitioning the graph into homogeneous blocks; compressing the homogeneous blocks; and storing the compressed homogeneous blocks in files where at least one of the steps is carried out using a computer device.

Type: Application

Filed: November 11, 2011

Publication date: May 16, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: U Kang, Ching-Yung Lin, Jimeng Sun, Hanghang Tong
MEASURING THE GOODNESS OF A TOP-K DIVERSIFIED RANKING LIST

Publication number: 20130046769

Abstract: A method, system and computer program product for measuring a relevance and diversity of a ranking list to a given query. The ranking list is comprised of a set of data items responsive to the query. In one embodiment, the method comprises calculating a measured relevance of the set of data items to the query using a defined relevance measuring procedure, and determining a measured diversity value for the ranking list using a defined diversity measuring procedure. The measured relevance and the measured diversity value are combined to obtain a measure of the combined relevance and diversity of the ranking list. The measured relevance of the set of data items may be based on the individual relevance of each of the data items to the query, and the diversity value may be based on the similarities of the data items to each other.

Type: Application

Filed: August 19, 2011

Publication date: February 21, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen
FINDING A TOP-K DIVERSIFIED RANKING LIST ON GRAPHS

Publication number: 20130046768

Abstract: A method, system and computer program product for finding a diversified ranking list for a given query. In one embodiment, a multitude of date items responsive to the query are identified, a marginal score is established for each data item; and a set, or ranking list, of the data items is formed based on these scores. This ranking list is formed by forming an initial set, and one or more data items are added to the ranking list based on the marginal scores of the data items. In one embodiment, each of the data items has a measured relevance and a measured diversity value, and the marginal scores for the data items are based on the measured relevance and the measured diversity values of the data items.

Type: Application

Filed: August 19, 2011

Publication date: February 21, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen

prev 1 2 3 4 5 6 next