Patents by Inventor Ching-Yung Lin
Ching-Yung Lin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20160019659Abstract: A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages. The method also includes clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders. The method additionally includes merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists. Each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged.Type: ApplicationFiled: June 24, 2015Publication date: January 21, 2016Inventors: YURDAER N. DOGANATA, CHING-YUNG LIN, DAVID CORBALAN LUNA, JORDI C. MESTRE, XAVIER NOGUERA PAGES, MERCAN TOPKARA, ZHEN WEN, DANNY L. YEH
-
Publication number: 20160019565Abstract: A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages. The method also includes clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders. The method additionally includes merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists. Each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged.Type: ApplicationFiled: June 3, 2015Publication date: January 21, 2016Inventors: YURDAER N. DOGANATA, CHING-YUNG LIN, DAVID C. LUNA, JORDI C. MESTRE, XAVIER N. PAGES, MERCAN TOPKARA, ZHEN WEN, DANNY L. YEH
-
Patent number: 9224104Abstract: Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.Type: GrantFiled: September 24, 2013Date of Patent: December 29, 2015Assignee: International Business Machines CorporationInventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
-
Publication number: 20150356464Abstract: Determining a number of kernels within a model is provided. A number of kernels that include data samples of a majority data class of an imbalanced training data set is determined based on a set of generated artificial data samples for a minority data class of the imbalanced training data set. The number of kernels within the model is generated based on the set of generated artificial data samples. A likelihood of the set of generated artificial data samples being included in the majority data class of the imbalanced training data set is calculated. Parameters of each kernel in the number of kernels are updated based on the likelihood of the set of generated artificial data samples being included in the majority data class of the imbalanced training data set. Each kernel in the number of kernels is adjusted based on the updated parameters.Type: ApplicationFiled: August 20, 2015Publication date: December 10, 2015Inventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
-
Publication number: 20150286819Abstract: A method for predicting insider threat includes mining electronic data of an organization corresponding to activity of an entity, determining features of the electronic data corresponding to the activity of the entity, classifying the features corresponding to the activity of the entity, determining sequences of classified features matching one or more patterns of insider threat, scoring the entity according to matches of the classified features to the one or more patterns of insider threat, and predicting an insider threat corresponding to the entity according to the score.Type: ApplicationFiled: April 7, 2014Publication date: October 8, 2015Applicant: International Business Machines CorporationInventors: Anni R. Coden, Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
-
Publication number: 20150106360Abstract: Visualizing social media conflict is provided. Textual messages by a set of human users connected via a network regarding a particular topic are collected. Active users in the set of human users authoring a number of textual messages regarding the particular topic more than a threshold number of textual messages are selected. Keywords are selected that occur more than a threshold number of times within the textual messages regarding the particular topic. A sentiment score is computed for each of the keywords occurring more than the threshold number of times within the textual messages using a keyword co-occurrence graph. A sentiment of each of the active users is determined based on the computed sentiment score of each of the selected keywords that are authored by a particular active user.Type: ApplicationFiled: October 10, 2013Publication date: April 16, 2015Applicant: International Business Machines CorporationInventors: Nan Cao, Ching-Yung Lin, Fei Wang, Zhen Wen
-
Patent number: 9009147Abstract: A method, system and computer program product for finding a diversified ranking list for a given query. In one embodiment, a multitude of date items responsive to the query are identified, a marginal score is established for each data item; and a set, or ranking list, of the data items is formed based on these scores. This ranking list is formed by forming an initial set, and one or more data items are added to the ranking list based on the marginal scores of the data items. In one embodiment, each of the data items has a measured relevance and a measured diversity value, and the marginal scores for the data items are based on the measured relevance and the measured diversity values of the data items.Type: GrantFiled: August 19, 2011Date of Patent: April 14, 2015Assignee: International Business Machines CorporationInventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen
-
Publication number: 20150088791Abstract: Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.Type: ApplicationFiled: September 24, 2013Publication date: March 26, 2015Applicant: International Business Machines CorporationInventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
-
Publication number: 20150058273Abstract: Detecting propensity profile for a person may comprise receiving artifacts associated with the person; detecting profile characteristics for the person based on the artifacts; receiving a plurality of predefined profiles comprising a plurality of characteristics and relationships between the characteristics over time, each of the plurality of predefined profiles specifying an indication of propensity; matching the profile characteristics for the person with one or more of the plurality of predefined profiles; and outputting one or more propensity indicators based on the matching, the propensity indicators comprising at least an expressed strength of a given propensity in the person at a given time.Type: ApplicationFiled: August 20, 2013Publication date: February 26, 2015Applicant: International Business Machines CorporationInventors: Anni R. Coden, Keith C. Houck, Ching-Yung Lin, Wanyi Lin, Peter K. Malkin, Shimei Pan, Youngja Park, Justin D. Weisz
-
Publication number: 20150052090Abstract: A dataset including at least one temporal event sequence is collected. A one-class sequence classifier f(x) that obtains a decision boundary is statistically learned. At least one new temporal event sequence is evaluated, wherein the at least one new temporal event sequence is outside of the dataset. It is determined whether the at least one new temporal event sequence is one of a normal sequence or an abnormal sequence based on the evaluation. Numerous additional aspects are disclosed.Type: ApplicationFiled: August 16, 2013Publication date: February 19, 2015Applicant: International Business Machines CorporationInventors: Ching-Yung Lin, Yale Song, Zhen Wen
-
Patent number: 8838688Abstract: Methods and apparatus are provided for inferring user interests from both direct and indirect social neighbors. User interests are inferred from social neighbors by exploiting the correlation among multiple attributes of a user, in addition to the social correlation of an attribute among a group of users. Attributes of a user are inferred by obtaining an inferred set of attributes comprised of one or more attributes of social neighbors of the user. Thereafter, the inferred set is modified using a user attribute correlation model describing a probability that the attributes in the inferred set co-occur on the user and one or more of the social neighbors. An inference quality of the obtained attributes can optionally be obtained based on social network properties of the social neighbors. Interactions with the user and/or the social neighbors can be employed to solicit feedback to improve the one or more inferred attributes.Type: GrantFiled: May 31, 2011Date of Patent: September 16, 2014Assignee: International Business Machines CorporationInventors: Ching-Yung Lin, Zhen Wen
-
Patent number: 8818918Abstract: Computer-implemented methods, systems, and articles of manufacture for determining the importance of a data item. A method includes: (a) receiving a node graph; (b) approximating a number of neighbor nodes of a node; and (c) calculating a average shortest path length of the node to the remaining nodes using the approximation step, where this calculation demonstrates the importance of a data item represented by the node. Another method includes: (a) receiving a node graph; (b) building a decomposed line graph of the node graph; (c) calculating stationary probabilities of incident edges of a node graph node in the decomposed line graph, and (d) calculating a summation of the stationary probabilities of the incident edges associated with the node, where the summation demonstrates the importance of a data item represented by the node. Both methods have at least one step carried out using a computer device.Type: GrantFiled: April 28, 2011Date of Patent: August 26, 2014Assignee: International Business Machines CorporationInventors: Ching-Yung Lin, Hanghang Tong, Jimeng Sun, Spyridon Papadimitriou, U Kang
-
Patent number: 8775335Abstract: Access is obtained to a first nonnegative factor matrix and a second nonnegative factor matrix obtained by factorizing a nonnegative asymmetric matrix which represents a set of data which tracks time-stamped activities of a plurality of entities. The first nonnegative factor matrix is representative of initial role membership of the entities, and the second nonnegative factor matrix is representative of initial role activity descriptions. At a given one of the time stamps, while holding a change in the first nonnegative factor matrix constant, a change in the second nonnegative factor matrix is updated to reflect time variance of the set of data at the given one of the time stamps, without accessing actual data values at previous ones of the time stamps.Type: GrantFiled: August 5, 2011Date of Patent: July 8, 2014Assignee: International Business Machines CorporationInventors: Ching-Yung Lin, Hanghang Tong, Fei Wang
-
Patent number: 8645339Abstract: A method, system and computer program product for managing and querying a graph. The method includes the steps of: receiving a graph; partitioning the graph into homogeneous blocks; compressing the homogeneous blocks; and storing the compressed homogeneous blocks in files where at least one of the steps is carried out using a computer device.Type: GrantFiled: November 11, 2011Date of Patent: February 4, 2014Assignee: International Business Machines CorporationInventors: U Kang, Ching-Yung Lin, Jimeng Sun, Hanghang Tong
-
Patent number: 8620916Abstract: A method (and system) for data acquisition includes downloading a user's sent materials from a communication data repository, analyzing the sent materials and extracting data portions that are authored by the user, generating statistical values from the extracted data, transmitting the generated statistical values to one or multiple repositories, receiving the generated statistical values on one or multiple server machines, and aggregating statistical values of multiple users.Type: GrantFiled: March 9, 2012Date of Patent: December 31, 2013Assignee: International Business Machines CorporationInventors: Ching-Yung Lin, Dmitry A. Rekesh
-
Patent number: 8615515Abstract: A method (and system) for data acquisition includes extracting information from user communications and allowing a user to control the information to be extracted. The method of data acquisition may include downloading a user's sent materials from a communication data repository, analyzing the downloaded materials and extracting data portions that are authored by the user, generating statistical values from the extracted data, transmitting the generated statistical values to one or multiple repositories, receiving generated statistical values one or multiple server machines, and aggregating statistical values of multiple users.Type: GrantFiled: May 9, 2008Date of Patent: December 24, 2013Assignee: International Business Machines CorporationInventors: Ching-Yung Lin, Dmitry A. Rekesh
-
Patent number: 8612169Abstract: A method of detecting anomalies from a bipartite graph includes analyzing the graph to determine a row-cluster membership, a column-cluster membership and a non-negative residual matrix, and in a processor, detecting the anomalies from the non-negative residual matrix.Type: GrantFiled: April 26, 2011Date of Patent: December 17, 2013Assignee: International Business Machines CorporationInventors: Ching-Yung Lin, Hanghang Tong
-
Publication number: 20130124488Abstract: A method, system and computer program product for managing and querying a graph. The method includes the steps of: receiving a graph; partitioning the graph into homogeneous blocks; compressing the homogeneous blocks; and storing the compressed homogeneous blocks in files where at least one of the steps is carried out using a computer device.Type: ApplicationFiled: November 11, 2011Publication date: May 16, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: U Kang, Ching-Yung Lin, Jimeng Sun, Hanghang Tong
-
Publication number: 20130046769Abstract: A method, system and computer program product for measuring a relevance and diversity of a ranking list to a given query. The ranking list is comprised of a set of data items responsive to the query. In one embodiment, the method comprises calculating a measured relevance of the set of data items to the query using a defined relevance measuring procedure, and determining a measured diversity value for the ranking list using a defined diversity measuring procedure. The measured relevance and the measured diversity value are combined to obtain a measure of the combined relevance and diversity of the ranking list. The measured relevance of the set of data items may be based on the individual relevance of each of the data items to the query, and the diversity value may be based on the similarities of the data items to each other.Type: ApplicationFiled: August 19, 2011Publication date: February 21, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen
-
Publication number: 20130046768Abstract: A method, system and computer program product for finding a diversified ranking list for a given query. In one embodiment, a multitude of date items responsive to the query are identified, a marginal score is established for each data item; and a set, or ranking list, of the data items is formed based on these scores. This ranking list is formed by forming an initial set, and one or more data items are added to the ranking list based on the marginal scores of the data items. In one embodiment, each of the data items has a measured relevance and a measured diversity value, and the marginal scores for the data items are based on the measured relevance and the measured diversity values of the data items.Type: ApplicationFiled: August 19, 2011Publication date: February 21, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen