Patents by Inventor Kubilay Atasu

Kubilay Atasu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11222054
    Abstract: Two sets X2 and X1 of histograms of words, and a vocabulary V are accessed. Each of the two sets is representable as a sparse matrix, each row of which corresponds to a histogram. Each histogram is representable as a sparse vector, whose dimension is determined by a dimension of the vocabulary. Two phases compute distances between pairs of histograms. The first phase includes computations performed for each histogram and for each word in the vocabulary to obtain a dense, floating-point vector y. The second phase includes computing, for each histogram, a sparse-matrix, dense-vector multiplication between a matrix-representation of the set X1 of histograms and the vector y. The multiplication is performed to obtain distances between all histograms of the set X1 and each histogram X2[j]. Distances between all pairs of histograms are obtained, based on which distances between documents can subsequently be assessed.
    Type: Grant
    Filed: March 12, 2018
    Date of Patent: January 11, 2022
    Assignee: International Business Machines Corporation
    Inventors: Kubilay Atasu, Cesar Berrospi Ramis, Nikolas Ioannou, Thomas Patrick Parnell, Charalampos Pozidis, Vasileios Vasileiadis
  • Patent number: 11176186
    Abstract: In an approach for construing similarities between datasets, a processor accesses a pair of sets of feature weights, wherein the sets of feature weights include a query dataset and comprises first weights associated to first features and a reference dataset and comprises second weights associated to second features. Based on similarities between the first features and the second features, a processor discovers flows from the first features to the second features, wherein the flows maximize an overall similarity between the pair of sets of feature weights. Based on the similarities and the flows, a processor computes pair contributions to the overall similarity in order to obtain contributive elements, wherein the pair contributions are contributions of pairs joining the first features to the second features. A processor ranks the contributive elements to obtain respective ranks. A processor returns a result comprising the contributive elements and indications to the respective ranks.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: November 16, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kubilay Atasu, Cesar Berrospi Ramis
  • Publication number: 20210303609
    Abstract: In an approach for construing similarities between datasets, a processor accesses a pair of sets of feature weights, wherein the sets of feature weights include a query dataset and comprises first weights associated to first features and a reference dataset and comprises second weights associated to second features. Based on similarities between the first features and the second features, a processor discovers flows from the first features to the second features, wherein the flows maximize an overall similarity between the pair of sets of feature weights. Based on the similarities and the flows, a processor computes pair contributions to the overall similarity in order to obtain contributive elements, wherein the pair contributions are contributions of pairs joining the first features to the second features. A processor ranks the contributive elements to obtain respective ranks. A processor returns a result comprising the contributive elements and indications to the respective ranks.
    Type: Application
    Filed: March 27, 2020
    Publication date: September 30, 2021
    Inventors: Kubilay Atasu, Cesar Berrospi Ramis
  • Patent number: 11042604
    Abstract: The example embodiments of the invention notably are directed to a computer-implemented method for assessing distances between pairs of histograms. Each of the histograms is a representation of a digital object; said representation comprises bins associating weights to respective vectors. Such vectors represent respective features of said digital object. This method basically revolves around computing distances between pairs of histograms. That is, for each pair {p, q} of histograms p and q of said pairs of histograms, the method computes a distance between p and q of said each pair {p, q}. In more detail, said distance is computed according to a cost of moving p into q, so as to obtain a flow matrix F, whose matrix elements Fi,j indicate, for each pair {i,j} of bins of p and q, how much weight of a bin i of p has to flow to a bin j of q to move p into q. This is achieved by minimizing a quantity ?i,jFi,j·Ci,j, where Ci,j is a matrix element of a cost matrix C representing said cost.
    Type: Grant
    Filed: December 4, 2018
    Date of Patent: June 22, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kubilay Atasu, Thomas Mittelholzer
  • Patent number: 10839255
    Abstract: A method for parallelizing a training of a model using a matrix-factorization-based collaborative filtering algorithm may be provided. The model can be used in a recommender system for a plurality of users and a plurality of items. The method includes providing a sparse training data matrix, selecting a number of user-item co-clusters, and building a user model data matrix by matrix factorization such that a computational load for executing the determining updated elements of the factorized sparse training data matrix is evenly distributed across the heterogeneous computing resources.
    Type: Grant
    Filed: May 15, 2017
    Date of Patent: November 17, 2020
    Assignee: Internationl Business Machines Corporation
    Inventors: Kubilay Atasu, Celestine Duenner, Thomas Mittelholzer, Thomas Parnell, Charalampos Pozidis, Michail Vlachos
  • Patent number: 10803346
    Abstract: A cascaded finite-state-transducer array includes a plurality of finite-state-transducers, the finite-state-transducers being distributed in space. The finite-state-transducer array is configured with dedicated data transfer channels between the finite-state-transducers to transfer specific data types. Each data stream on a dedicated data transfer channel may transmit a particular data type, which may be sorted in increasing order of start offsets or token IDs.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: October 13, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
  • Publication number: 20200175092
    Abstract: The example embodiments of the invention notably are directed to a computer-implemented method for assessing distances between pairs of histograms. Each of the histograms is a representation of a digital object; said representation comprises bins associating weights to respective vectors. Such vectors represent respective features of said digital object. This method basically revolves around computing distances between pairs of histograms. That is, for each pair {p, q} of histograms p and q of said pairs of histograms, the method computes a distance between p and q of said each pair {p, q}. In more detail, said distance is computed according to a cost of moving p into q, so as to obtain a flow matrix F, whose matrix elements Fi,j indicate, for each pair {i,j} of bins of p and q, how much weight of a bin i of p has to flow to a bin j of q to move p into q. This is achieved by minimizing a quantity ?i,jFi,j·Ci,j, where Ci,j is a matrix element of a cost matrix C representing said cost.
    Type: Application
    Filed: December 4, 2018
    Publication date: June 4, 2020
    Inventors: Kubilay Atasu, Thomas Mittelholzer
  • Patent number: 10474707
    Abstract: In one embodiment, a computer-implemented method includes receiving a regular expression (regex) and input data. One or more spans are identified representing one or more matches in which the regex matches at least a portion of the input data. Each span corresponds to a corresponding match and includes a start offset of the corresponding match in the input data and an end offset of the corresponding match in the input data. The one or more matches are identified in a sequence. An order of the sequence of the one or more spans is modified. One or more filtered spans are generated, by a computer processor, by filtering out a subset of the one or more spans that are each contained by at least one other span in the one or more spans. The identifying, the modifying, and the filtering are performed at streaming rate.
    Type: Grant
    Filed: September 21, 2015
    Date of Patent: November 12, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Kubilay Atasu
  • Patent number: 10467272
    Abstract: In one embodiment, a computer-implemented method includes receiving a regular expression (regex) and input data. One or more spans are identified representing one or more matches in which the regex matches at least a portion of the input data. Each span corresponds to a corresponding match and includes a start offset of the corresponding match in the input data and an end offset of the corresponding match in the input data. The one or more matches are identified in a sequence. An order of the sequence of the one or more spans is modified. One or more filtered spans are generated, by a computer processor, by filtering out a subset of the one or more spans that are each contained by at least one other span in the one or more spans. The identifying, the modifying, and the filtering are performed at streaming rate.
    Type: Grant
    Filed: November 30, 2015
    Date of Patent: November 5, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Kubilay Atasu
  • Publication number: 20190278850
    Abstract: Two sets X2 and X1 of histograms of words, and a vocabulary V are accessed. Each of the two sets is representable as a sparse matrix, each row of which corresponds to a histogram. Each histogram is representable as a sparse vector, whose dimension is determined by a dimension of the vocabulary. Two phases compute distances between pairs of histograms. The first phase includes computations performed for each histogram and for each word in the vocabulary to obtain a dense, floating-point vector y. The second phase includes computing, for each histogram, a sparse-matrix, dense-vector multiplication between a matrix-representation of the set X1 of histograms and the vector y. The multiplication is performed to obtain distances between all histograms of the set X1 and each histogram X2[j]. Distances between all pairs of histograms are obtained, based on which distances between documents can subsequently be assessed.
    Type: Application
    Filed: March 12, 2018
    Publication date: September 12, 2019
    Inventors: Kubilay ATASU, Cesar BERROSPI RAMIS, Nikolas IOANNOU, Thomas Patrick PARNELL, Charalampos POZIDIS, Vasileios VASILEIADIS
  • Publication number: 20190163999
    Abstract: A cascaded finite-state-transducer array includes a plurality of finite-state-transducers, the finite-state-transducers being distributed in space. The finite-state-transducer array is configured with dedicated data transfer channels between the finite-state-transducers to transfer specific data types. Each data stream on a dedicated data transfer channel may transmit a particular data type, which may be sorted in increasing order of start offsets or token IDs.
    Type: Application
    Filed: December 28, 2018
    Publication date: May 30, 2019
    Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
  • Patent number: 10198646
    Abstract: A cascaded finite-state-transducer array includes a plurality of finite-state-transducers, the finite-state-transducers being distributed in space. The finite-state-transducer array is configured with dedicated data transfer channels between the finite-state-transducers to transfer specific data types. Each data stream on a dedicated data transfer channel may transmit a particular data type, which may be sorted in increasing order of start offsets or token IDs.
    Type: Grant
    Filed: July 1, 2016
    Date of Patent: February 5, 2019
    Assignee: International Business Machines Corporation
    Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
  • Patent number: 10169487
    Abstract: One or more embodiments may provide the capability to enumerate maximal cliques of graph data by constructing and traversing a search tree through a single sequential pass on an adjacency list. The adjacency list may be generated so as to enable the at least one maximal clique to be generated in one single sequential pass.
    Type: Grant
    Filed: April 4, 2016
    Date of Patent: January 1, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kubilay Atasu, Silvio Dragone, Christoph Hagleitner, Robert R. McCune
  • Publication number: 20180330192
    Abstract: A method for parallelizing a training of a model using a matrix-factorization-based collaborative filtering algorithm may be provided. The model can be used in a recommender system for a plurality of users and a plurality of items. The method includes providing a sparse training data matrix, selecting a number of user-item co-clusters, and building a user model data matrix by matrix factorization such that a computational load for executing the determining updated elements of the factorized sparse training data matrix is evenly distributed across the heterogeneous computing resources.
    Type: Application
    Filed: May 15, 2017
    Publication date: November 15, 2018
    Inventors: Kubilay Atasu, Celestine Duenner, Thomas Mittelholzer, Thomas Parnell, Charalampos Pozidis, Michail Vlachos
  • Patent number: 10055510
    Abstract: A method is provided for searching a graph to identify cliques using a set of processing elements (PEs), a first PE of the set of PEs having access to an adjacency list of a seed vertex of the graph, the adjacency list of the seed vertex including a set of vertices. The method includes: generating a data structure for each intermediate vertex of the set of vertices, the data structure indicating the respective intermediate vertex and an additional list of intermediate vertices of the set of vertices; storing the generated data structures; for each buffered data structure, receiving the buffered data structure and configuring the available PE to receive an adjacency list of the intermediate vertex indicated in the respective data structure and to select from the adjacency list a set of further vertices that are adjacent to the seed vertex and are part of the additional list.
    Type: Grant
    Filed: November 4, 2015
    Date of Patent: August 21, 2018
    Assignee: International Business Machines Corporation
    Inventors: Kubilay Atasu, Silvio Dragone
  • Patent number: 9983876
    Abstract: A non-deterministic finite state machine module for use in a regular expression matching system. The system includes a computational unit implementing a non-deterministic finite state machine representing a regular expression, wherein the computational unit is configured to: receive an input data stream, wherein an occurrence of the regular expression is determined, and an activation signal; process the input data stream with respect to the non-deterministic finite state machine depending on the activation signal; and provide at least one branch data output for initializing an additional non-deterministic finite state machine module if the processing of an element of the input data stream according to the non-deterministic finite state machine results in a branching of a processing thread.
    Type: Grant
    Filed: February 20, 2014
    Date of Patent: May 29, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kubilay Atasu, Christoph Hagleitner, Raphael Polig, Frederick R Reiss
  • Patent number: 9875045
    Abstract: A device for matching, in input data, a regular expression with back-references, represented by a finite-state machine (FSM). The device comprises a plurality of parallel processing elements (PPEs), an interconnection network for interconnecting the PPEs with each other, and a memory for receiving and storing input data. The PPEs process the input data stored in the memory, based on backtracking to process the back-references, and implement FA next state logic to generate new active FA configurations or mark themselves as available to receive active FA configurations. The interconnection network retrieves active FA configurations from the PPEs and allocates the active FA configurations to available PPEs. The PPEs are configured to match a regular expression in the input data.
    Type: Grant
    Filed: July 27, 2015
    Date of Patent: January 23, 2018
    Assignee: International Business Machines Corporation
    Inventors: Kubilay Atasu, Silvio Dragone
  • Publication number: 20180018152
    Abstract: A system and method to hardware-accelerate finite state transducer libraries and their compilation toolchains. In an embodiment, a computer-implemented method for partitioning an UIMA-PEAR file into software-based and hardware-accelerated components may comprise creating a data-flow graph representation of the UIMA-PEAR-file, flattening hierarchies of the data-flow graph representation, and selecting the components to be hardware accelerated from the flattened hierarchies of the data-flow graph representation based on data dependencies of data types produced and consumed by each component of the flattened data-flow graph.
    Type: Application
    Filed: July 18, 2016
    Publication date: January 18, 2018
    Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
  • Publication number: 20180005060
    Abstract: A cascaded finite-state-transducer array includes a plurality of finite-state-transducers, the finite-state-transducers being distributed in space. The finite-state-transducer array is configured with dedicated data transfer channels between the finite-state-transducers to transfer specific data types. Each data stream on a dedicated data transfer channel may transmit a particular data type, which may be sorted in increasing order of start offsets or token IDs.
    Type: Application
    Filed: July 1, 2016
    Publication date: January 4, 2018
    Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
  • Patent number: 9858056
    Abstract: A system and method to hardware-accelerate finite state transducer libraries and their compilation toolchains. In an embodiment, a computer-implemented method for partitioning an UIMA-PEAR file into software-based and hardware-accelerated components may comprise creating a data-flow graph representation of the UIMA-PEAR-file, flattening hierarchies of the data-flow graph representation, and selecting the components to be hardware accelerated from the flattened hierarchies of the data-flow graph representation based on data dependencies of data types produced and consumed by each component of the flattened data-flow graph.
    Type: Grant
    Filed: July 18, 2016
    Date of Patent: January 2, 2018
    Assignee: International Business Machines Corporation
    Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu