Patents by Inventor Ching-Yung Lin

Ching-Yung Lin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160019659
    Abstract: A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages. The method also includes clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders. The method additionally includes merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists. Each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged.
    Type: Application
    Filed: June 24, 2015
    Publication date: January 21, 2016
    Inventors: YURDAER N. DOGANATA, CHING-YUNG LIN, DAVID CORBALAN LUNA, JORDI C. MESTRE, XAVIER NOGUERA PAGES, MERCAN TOPKARA, ZHEN WEN, DANNY L. YEH
  • Publication number: 20160019565
    Abstract: A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages. The method also includes clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders. The method additionally includes merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists. Each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged.
    Type: Application
    Filed: June 3, 2015
    Publication date: January 21, 2016
    Inventors: YURDAER N. DOGANATA, CHING-YUNG LIN, DAVID C. LUNA, JORDI C. MESTRE, XAVIER N. PAGES, MERCAN TOPKARA, ZHEN WEN, DANNY L. YEH
  • Patent number: 9224104
    Abstract: Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.
    Type: Grant
    Filed: September 24, 2013
    Date of Patent: December 29, 2015
    Assignee: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
  • Publication number: 20150356464
    Abstract: Determining a number of kernels within a model is provided. A number of kernels that include data samples of a majority data class of an imbalanced training data set is determined based on a set of generated artificial data samples for a minority data class of the imbalanced training data set. The number of kernels within the model is generated based on the set of generated artificial data samples. A likelihood of the set of generated artificial data samples being included in the majority data class of the imbalanced training data set is calculated. Parameters of each kernel in the number of kernels are updated based on the likelihood of the set of generated artificial data samples being included in the majority data class of the imbalanced training data set. Each kernel in the number of kernels is adjusted based on the updated parameters.
    Type: Application
    Filed: August 20, 2015
    Publication date: December 10, 2015
    Inventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
  • Publication number: 20150286819
    Abstract: A method for predicting insider threat includes mining electronic data of an organization corresponding to activity of an entity, determining features of the electronic data corresponding to the activity of the entity, classifying the features corresponding to the activity of the entity, determining sequences of classified features matching one or more patterns of insider threat, scoring the entity according to matches of the classified features to the one or more patterns of insider threat, and predicting an insider threat corresponding to the entity according to the score.
    Type: Application
    Filed: April 7, 2014
    Publication date: October 8, 2015
    Applicant: International Business Machines Corporation
    Inventors: Anni R. Coden, Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
  • Publication number: 20150106360
    Abstract: Visualizing social media conflict is provided. Textual messages by a set of human users connected via a network regarding a particular topic are collected. Active users in the set of human users authoring a number of textual messages regarding the particular topic more than a threshold number of textual messages are selected. Keywords are selected that occur more than a threshold number of times within the textual messages regarding the particular topic. A sentiment score is computed for each of the keywords occurring more than the threshold number of times within the textual messages using a keyword co-occurrence graph. A sentiment of each of the active users is determined based on the computed sentiment score of each of the selected keywords that are authored by a particular active user.
    Type: Application
    Filed: October 10, 2013
    Publication date: April 16, 2015
    Applicant: International Business Machines Corporation
    Inventors: Nan Cao, Ching-Yung Lin, Fei Wang, Zhen Wen
  • Patent number: 9009147
    Abstract: A method, system and computer program product for finding a diversified ranking list for a given query. In one embodiment, a multitude of date items responsive to the query are identified, a marginal score is established for each data item; and a set, or ranking list, of the data items is formed based on these scores. This ranking list is formed by forming an initial set, and one or more data items are added to the ranking list based on the marginal scores of the data items. In one embodiment, each of the data items has a measured relevance and a measured diversity value, and the marginal scores for the data items are based on the measured relevance and the measured diversity values of the data items.
    Type: Grant
    Filed: August 19, 2011
    Date of Patent: April 14, 2015
    Assignee: International Business Machines Corporation
    Inventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen
  • Publication number: 20150088791
    Abstract: Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Wan-Yi Lin, Yinglong Xia
  • Publication number: 20150058273
    Abstract: Detecting propensity profile for a person may comprise receiving artifacts associated with the person; detecting profile characteristics for the person based on the artifacts; receiving a plurality of predefined profiles comprising a plurality of characteristics and relationships between the characteristics over time, each of the plurality of predefined profiles specifying an indication of propensity; matching the profile characteristics for the person with one or more of the plurality of predefined profiles; and outputting one or more propensity indicators based on the matching, the propensity indicators comprising at least an expressed strength of a given propensity in the person at a given time.
    Type: Application
    Filed: August 20, 2013
    Publication date: February 26, 2015
    Applicant: International Business Machines Corporation
    Inventors: Anni R. Coden, Keith C. Houck, Ching-Yung Lin, Wanyi Lin, Peter K. Malkin, Shimei Pan, Youngja Park, Justin D. Weisz
  • Publication number: 20150052090
    Abstract: A dataset including at least one temporal event sequence is collected. A one-class sequence classifier f(x) that obtains a decision boundary is statistically learned. At least one new temporal event sequence is evaluated, wherein the at least one new temporal event sequence is outside of the dataset. It is determined whether the at least one new temporal event sequence is one of a normal sequence or an abnormal sequence based on the evaluation. Numerous additional aspects are disclosed.
    Type: Application
    Filed: August 16, 2013
    Publication date: February 19, 2015
    Applicant: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Yale Song, Zhen Wen
  • Patent number: 8838688
    Abstract: Methods and apparatus are provided for inferring user interests from both direct and indirect social neighbors. User interests are inferred from social neighbors by exploiting the correlation among multiple attributes of a user, in addition to the social correlation of an attribute among a group of users. Attributes of a user are inferred by obtaining an inferred set of attributes comprised of one or more attributes of social neighbors of the user. Thereafter, the inferred set is modified using a user attribute correlation model describing a probability that the attributes in the inferred set co-occur on the user and one or more of the social neighbors. An inference quality of the obtained attributes can optionally be obtained based on social network properties of the social neighbors. Interactions with the user and/or the social neighbors can be employed to solicit feedback to improve the one or more inferred attributes.
    Type: Grant
    Filed: May 31, 2011
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Zhen Wen
  • Patent number: 8818918
    Abstract: Computer-implemented methods, systems, and articles of manufacture for determining the importance of a data item. A method includes: (a) receiving a node graph; (b) approximating a number of neighbor nodes of a node; and (c) calculating a average shortest path length of the node to the remaining nodes using the approximation step, where this calculation demonstrates the importance of a data item represented by the node. Another method includes: (a) receiving a node graph; (b) building a decomposed line graph of the node graph; (c) calculating stationary probabilities of incident edges of a node graph node in the decomposed line graph, and (d) calculating a summation of the stationary probabilities of the incident edges associated with the node, where the summation demonstrates the importance of a data item represented by the node. Both methods have at least one step carried out using a computer device.
    Type: Grant
    Filed: April 28, 2011
    Date of Patent: August 26, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Hanghang Tong, Jimeng Sun, Spyridon Papadimitriou, U Kang
  • Patent number: 8775335
    Abstract: Access is obtained to a first nonnegative factor matrix and a second nonnegative factor matrix obtained by factorizing a nonnegative asymmetric matrix which represents a set of data which tracks time-stamped activities of a plurality of entities. The first nonnegative factor matrix is representative of initial role membership of the entities, and the second nonnegative factor matrix is representative of initial role activity descriptions. At a given one of the time stamps, while holding a change in the first nonnegative factor matrix constant, a change in the second nonnegative factor matrix is updated to reflect time variance of the set of data at the given one of the time stamps, without accessing actual data values at previous ones of the time stamps.
    Type: Grant
    Filed: August 5, 2011
    Date of Patent: July 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Hanghang Tong, Fei Wang
  • Patent number: 8645339
    Abstract: A method, system and computer program product for managing and querying a graph. The method includes the steps of: receiving a graph; partitioning the graph into homogeneous blocks; compressing the homogeneous blocks; and storing the compressed homogeneous blocks in files where at least one of the steps is carried out using a computer device.
    Type: Grant
    Filed: November 11, 2011
    Date of Patent: February 4, 2014
    Assignee: International Business Machines Corporation
    Inventors: U Kang, Ching-Yung Lin, Jimeng Sun, Hanghang Tong
  • Patent number: 8620916
    Abstract: A method (and system) for data acquisition includes downloading a user's sent materials from a communication data repository, analyzing the sent materials and extracting data portions that are authored by the user, generating statistical values from the extracted data, transmitting the generated statistical values to one or multiple repositories, receiving the generated statistical values on one or multiple server machines, and aggregating statistical values of multiple users.
    Type: Grant
    Filed: March 9, 2012
    Date of Patent: December 31, 2013
    Assignee: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Dmitry A. Rekesh
  • Patent number: 8615515
    Abstract: A method (and system) for data acquisition includes extracting information from user communications and allowing a user to control the information to be extracted. The method of data acquisition may include downloading a user's sent materials from a communication data repository, analyzing the downloaded materials and extracting data portions that are authored by the user, generating statistical values from the extracted data, transmitting the generated statistical values to one or multiple repositories, receiving generated statistical values one or multiple server machines, and aggregating statistical values of multiple users.
    Type: Grant
    Filed: May 9, 2008
    Date of Patent: December 24, 2013
    Assignee: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Dmitry A. Rekesh
  • Patent number: 8612169
    Abstract: A method of detecting anomalies from a bipartite graph includes analyzing the graph to determine a row-cluster membership, a column-cluster membership and a non-negative residual matrix, and in a processor, detecting the anomalies from the non-negative residual matrix.
    Type: Grant
    Filed: April 26, 2011
    Date of Patent: December 17, 2013
    Assignee: International Business Machines Corporation
    Inventors: Ching-Yung Lin, Hanghang Tong
  • Publication number: 20130124488
    Abstract: A method, system and computer program product for managing and querying a graph. The method includes the steps of: receiving a graph; partitioning the graph into homogeneous blocks; compressing the homogeneous blocks; and storing the compressed homogeneous blocks in files where at least one of the steps is carried out using a computer device.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: U Kang, Ching-Yung Lin, Jimeng Sun, Hanghang Tong
  • Publication number: 20130046769
    Abstract: A method, system and computer program product for measuring a relevance and diversity of a ranking list to a given query. The ranking list is comprised of a set of data items responsive to the query. In one embodiment, the method comprises calculating a measured relevance of the set of data items to the query using a defined relevance measuring procedure, and determining a measured diversity value for the ranking list using a defined diversity measuring procedure. The measured relevance and the measured diversity value are combined to obtain a measure of the combined relevance and diversity of the ranking list. The measured relevance of the set of data items may be based on the individual relevance of each of the data items to the query, and the diversity value may be based on the similarities of the data items to each other.
    Type: Application
    Filed: August 19, 2011
    Publication date: February 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen
  • Publication number: 20130046768
    Abstract: A method, system and computer program product for finding a diversified ranking list for a given query. In one embodiment, a multitude of date items responsive to the query are identified, a marginal score is established for each data item; and a set, or ranking list, of the data items is formed based on these scores. This ranking list is formed by forming an initial set, and one or more data items are added to the ranking list based on the marginal scores of the data items. In one embodiment, each of the data items has a measured relevance and a measured diversity value, and the marginal scores for the data items are based on the measured relevance and the measured diversity values of the data items.
    Type: Application
    Filed: August 19, 2011
    Publication date: February 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jingrui He, Ravi B. Konuru, Ching-Yung Lin, Hanghang Tong, Zhen Wen