Patents by Inventor Kubilay Atasu
Kubilay Atasu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11222054Abstract: Two sets X2 and X1 of histograms of words, and a vocabulary V are accessed. Each of the two sets is representable as a sparse matrix, each row of which corresponds to a histogram. Each histogram is representable as a sparse vector, whose dimension is determined by a dimension of the vocabulary. Two phases compute distances between pairs of histograms. The first phase includes computations performed for each histogram and for each word in the vocabulary to obtain a dense, floating-point vector y. The second phase includes computing, for each histogram, a sparse-matrix, dense-vector multiplication between a matrix-representation of the set X1 of histograms and the vector y. The multiplication is performed to obtain distances between all histograms of the set X1 and each histogram X2[j]. Distances between all pairs of histograms are obtained, based on which distances between documents can subsequently be assessed.Type: GrantFiled: March 12, 2018Date of Patent: January 11, 2022Assignee: International Business Machines CorporationInventors: Kubilay Atasu, Cesar Berrospi Ramis, Nikolas Ioannou, Thomas Patrick Parnell, Charalampos Pozidis, Vasileios Vasileiadis
-
Patent number: 11176186Abstract: In an approach for construing similarities between datasets, a processor accesses a pair of sets of feature weights, wherein the sets of feature weights include a query dataset and comprises first weights associated to first features and a reference dataset and comprises second weights associated to second features. Based on similarities between the first features and the second features, a processor discovers flows from the first features to the second features, wherein the flows maximize an overall similarity between the pair of sets of feature weights. Based on the similarities and the flows, a processor computes pair contributions to the overall similarity in order to obtain contributive elements, wherein the pair contributions are contributions of pairs joining the first features to the second features. A processor ranks the contributive elements to obtain respective ranks. A processor returns a result comprising the contributive elements and indications to the respective ranks.Type: GrantFiled: March 27, 2020Date of Patent: November 16, 2021Assignee: International Business Machines CorporationInventors: Kubilay Atasu, Cesar Berrospi Ramis
-
Publication number: 20210303609Abstract: In an approach for construing similarities between datasets, a processor accesses a pair of sets of feature weights, wherein the sets of feature weights include a query dataset and comprises first weights associated to first features and a reference dataset and comprises second weights associated to second features. Based on similarities between the first features and the second features, a processor discovers flows from the first features to the second features, wherein the flows maximize an overall similarity between the pair of sets of feature weights. Based on the similarities and the flows, a processor computes pair contributions to the overall similarity in order to obtain contributive elements, wherein the pair contributions are contributions of pairs joining the first features to the second features. A processor ranks the contributive elements to obtain respective ranks. A processor returns a result comprising the contributive elements and indications to the respective ranks.Type: ApplicationFiled: March 27, 2020Publication date: September 30, 2021Inventors: Kubilay Atasu, Cesar Berrospi Ramis
-
Patent number: 11042604Abstract: The example embodiments of the invention notably are directed to a computer-implemented method for assessing distances between pairs of histograms. Each of the histograms is a representation of a digital object; said representation comprises bins associating weights to respective vectors. Such vectors represent respective features of said digital object. This method basically revolves around computing distances between pairs of histograms. That is, for each pair {p, q} of histograms p and q of said pairs of histograms, the method computes a distance between p and q of said each pair {p, q}. In more detail, said distance is computed according to a cost of moving p into q, so as to obtain a flow matrix F, whose matrix elements Fi,j indicate, for each pair {i,j} of bins of p and q, how much weight of a bin i of p has to flow to a bin j of q to move p into q. This is achieved by minimizing a quantity ?i,jFi,j·Ci,j, where Ci,j is a matrix element of a cost matrix C representing said cost.Type: GrantFiled: December 4, 2018Date of Patent: June 22, 2021Assignee: International Business Machines CorporationInventors: Kubilay Atasu, Thomas Mittelholzer
-
Patent number: 10839255Abstract: A method for parallelizing a training of a model using a matrix-factorization-based collaborative filtering algorithm may be provided. The model can be used in a recommender system for a plurality of users and a plurality of items. The method includes providing a sparse training data matrix, selecting a number of user-item co-clusters, and building a user model data matrix by matrix factorization such that a computational load for executing the determining updated elements of the factorized sparse training data matrix is evenly distributed across the heterogeneous computing resources.Type: GrantFiled: May 15, 2017Date of Patent: November 17, 2020Assignee: Internationl Business Machines CorporationInventors: Kubilay Atasu, Celestine Duenner, Thomas Mittelholzer, Thomas Parnell, Charalampos Pozidis, Michail Vlachos
-
Patent number: 10803346Abstract: A cascaded finite-state-transducer array includes a plurality of finite-state-transducers, the finite-state-transducers being distributed in space. The finite-state-transducer array is configured with dedicated data transfer channels between the finite-state-transducers to transfer specific data types. Each data stream on a dedicated data transfer channel may transmit a particular data type, which may be sorted in increasing order of start offsets or token IDs.Type: GrantFiled: December 28, 2018Date of Patent: October 13, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
-
Publication number: 20200175092Abstract: The example embodiments of the invention notably are directed to a computer-implemented method for assessing distances between pairs of histograms. Each of the histograms is a representation of a digital object; said representation comprises bins associating weights to respective vectors. Such vectors represent respective features of said digital object. This method basically revolves around computing distances between pairs of histograms. That is, for each pair {p, q} of histograms p and q of said pairs of histograms, the method computes a distance between p and q of said each pair {p, q}. In more detail, said distance is computed according to a cost of moving p into q, so as to obtain a flow matrix F, whose matrix elements Fi,j indicate, for each pair {i,j} of bins of p and q, how much weight of a bin i of p has to flow to a bin j of q to move p into q. This is achieved by minimizing a quantity ?i,jFi,j·Ci,j, where Ci,j is a matrix element of a cost matrix C representing said cost.Type: ApplicationFiled: December 4, 2018Publication date: June 4, 2020Inventors: Kubilay Atasu, Thomas Mittelholzer
-
Patent number: 10474707Abstract: In one embodiment, a computer-implemented method includes receiving a regular expression (regex) and input data. One or more spans are identified representing one or more matches in which the regex matches at least a portion of the input data. Each span corresponds to a corresponding match and includes a start offset of the corresponding match in the input data and an end offset of the corresponding match in the input data. The one or more matches are identified in a sequence. An order of the sequence of the one or more spans is modified. One or more filtered spans are generated, by a computer processor, by filtering out a subset of the one or more spans that are each contained by at least one other span in the one or more spans. The identifying, the modifying, and the filtering are performed at streaming rate.Type: GrantFiled: September 21, 2015Date of Patent: November 12, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Kubilay Atasu
-
Patent number: 10467272Abstract: In one embodiment, a computer-implemented method includes receiving a regular expression (regex) and input data. One or more spans are identified representing one or more matches in which the regex matches at least a portion of the input data. Each span corresponds to a corresponding match and includes a start offset of the corresponding match in the input data and an end offset of the corresponding match in the input data. The one or more matches are identified in a sequence. An order of the sequence of the one or more spans is modified. One or more filtered spans are generated, by a computer processor, by filtering out a subset of the one or more spans that are each contained by at least one other span in the one or more spans. The identifying, the modifying, and the filtering are performed at streaming rate.Type: GrantFiled: November 30, 2015Date of Patent: November 5, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Kubilay Atasu
-
Publication number: 20190278850Abstract: Two sets X2 and X1 of histograms of words, and a vocabulary V are accessed. Each of the two sets is representable as a sparse matrix, each row of which corresponds to a histogram. Each histogram is representable as a sparse vector, whose dimension is determined by a dimension of the vocabulary. Two phases compute distances between pairs of histograms. The first phase includes computations performed for each histogram and for each word in the vocabulary to obtain a dense, floating-point vector y. The second phase includes computing, for each histogram, a sparse-matrix, dense-vector multiplication between a matrix-representation of the set X1 of histograms and the vector y. The multiplication is performed to obtain distances between all histograms of the set X1 and each histogram X2[j]. Distances between all pairs of histograms are obtained, based on which distances between documents can subsequently be assessed.Type: ApplicationFiled: March 12, 2018Publication date: September 12, 2019Inventors: Kubilay ATASU, Cesar BERROSPI RAMIS, Nikolas IOANNOU, Thomas Patrick PARNELL, Charalampos POZIDIS, Vasileios VASILEIADIS
-
Publication number: 20190163999Abstract: A cascaded finite-state-transducer array includes a plurality of finite-state-transducers, the finite-state-transducers being distributed in space. The finite-state-transducer array is configured with dedicated data transfer channels between the finite-state-transducers to transfer specific data types. Each data stream on a dedicated data transfer channel may transmit a particular data type, which may be sorted in increasing order of start offsets or token IDs.Type: ApplicationFiled: December 28, 2018Publication date: May 30, 2019Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
-
Patent number: 10198646Abstract: A cascaded finite-state-transducer array includes a plurality of finite-state-transducers, the finite-state-transducers being distributed in space. The finite-state-transducer array is configured with dedicated data transfer channels between the finite-state-transducers to transfer specific data types. Each data stream on a dedicated data transfer channel may transmit a particular data type, which may be sorted in increasing order of start offsets or token IDs.Type: GrantFiled: July 1, 2016Date of Patent: February 5, 2019Assignee: International Business Machines CorporationInventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
-
Patent number: 10169487Abstract: One or more embodiments may provide the capability to enumerate maximal cliques of graph data by constructing and traversing a search tree through a single sequential pass on an adjacency list. The adjacency list may be generated so as to enable the at least one maximal clique to be generated in one single sequential pass.Type: GrantFiled: April 4, 2016Date of Patent: January 1, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kubilay Atasu, Silvio Dragone, Christoph Hagleitner, Robert R. McCune
-
Publication number: 20180330192Abstract: A method for parallelizing a training of a model using a matrix-factorization-based collaborative filtering algorithm may be provided. The model can be used in a recommender system for a plurality of users and a plurality of items. The method includes providing a sparse training data matrix, selecting a number of user-item co-clusters, and building a user model data matrix by matrix factorization such that a computational load for executing the determining updated elements of the factorized sparse training data matrix is evenly distributed across the heterogeneous computing resources.Type: ApplicationFiled: May 15, 2017Publication date: November 15, 2018Inventors: Kubilay Atasu, Celestine Duenner, Thomas Mittelholzer, Thomas Parnell, Charalampos Pozidis, Michail Vlachos
-
Patent number: 10055510Abstract: A method is provided for searching a graph to identify cliques using a set of processing elements (PEs), a first PE of the set of PEs having access to an adjacency list of a seed vertex of the graph, the adjacency list of the seed vertex including a set of vertices. The method includes: generating a data structure for each intermediate vertex of the set of vertices, the data structure indicating the respective intermediate vertex and an additional list of intermediate vertices of the set of vertices; storing the generated data structures; for each buffered data structure, receiving the buffered data structure and configuring the available PE to receive an adjacency list of the intermediate vertex indicated in the respective data structure and to select from the adjacency list a set of further vertices that are adjacent to the seed vertex and are part of the additional list.Type: GrantFiled: November 4, 2015Date of Patent: August 21, 2018Assignee: International Business Machines CorporationInventors: Kubilay Atasu, Silvio Dragone
-
Patent number: 9983876Abstract: A non-deterministic finite state machine module for use in a regular expression matching system. The system includes a computational unit implementing a non-deterministic finite state machine representing a regular expression, wherein the computational unit is configured to: receive an input data stream, wherein an occurrence of the regular expression is determined, and an activation signal; process the input data stream with respect to the non-deterministic finite state machine depending on the activation signal; and provide at least one branch data output for initializing an additional non-deterministic finite state machine module if the processing of an element of the input data stream according to the non-deterministic finite state machine results in a branching of a processing thread.Type: GrantFiled: February 20, 2014Date of Patent: May 29, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kubilay Atasu, Christoph Hagleitner, Raphael Polig, Frederick R Reiss
-
Patent number: 9875045Abstract: A device for matching, in input data, a regular expression with back-references, represented by a finite-state machine (FSM). The device comprises a plurality of parallel processing elements (PPEs), an interconnection network for interconnecting the PPEs with each other, and a memory for receiving and storing input data. The PPEs process the input data stored in the memory, based on backtracking to process the back-references, and implement FA next state logic to generate new active FA configurations or mark themselves as available to receive active FA configurations. The interconnection network retrieves active FA configurations from the PPEs and allocates the active FA configurations to available PPEs. The PPEs are configured to match a regular expression in the input data.Type: GrantFiled: July 27, 2015Date of Patent: January 23, 2018Assignee: International Business Machines CorporationInventors: Kubilay Atasu, Silvio Dragone
-
Publication number: 20180018152Abstract: A system and method to hardware-accelerate finite state transducer libraries and their compilation toolchains. In an embodiment, a computer-implemented method for partitioning an UIMA-PEAR file into software-based and hardware-accelerated components may comprise creating a data-flow graph representation of the UIMA-PEAR-file, flattening hierarchies of the data-flow graph representation, and selecting the components to be hardware accelerated from the flattened hierarchies of the data-flow graph representation based on data dependencies of data types produced and consumed by each component of the flattened data-flow graph.Type: ApplicationFiled: July 18, 2016Publication date: January 18, 2018Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
-
Publication number: 20180005060Abstract: A cascaded finite-state-transducer array includes a plurality of finite-state-transducers, the finite-state-transducers being distributed in space. The finite-state-transducer array is configured with dedicated data transfer channels between the finite-state-transducers to transfer specific data types. Each data stream on a dedicated data transfer channel may transmit a particular data type, which may be sorted in increasing order of start offsets or token IDs.Type: ApplicationFiled: July 1, 2016Publication date: January 4, 2018Inventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu
-
Patent number: 9858056Abstract: A system and method to hardware-accelerate finite state transducer libraries and their compilation toolchains. In an embodiment, a computer-implemented method for partitioning an UIMA-PEAR file into software-based and hardware-accelerated components may comprise creating a data-flow graph representation of the UIMA-PEAR-file, flattening hierarchies of the data-flow graph representation, and selecting the components to be hardware accelerated from the flattened hierarchies of the data-flow graph representation based on data dependencies of data types produced and consumed by each component of the flattened data-flow graph.Type: GrantFiled: July 18, 2016Date of Patent: January 2, 2018Assignee: International Business Machines CorporationInventors: Kubilay Atasu, Akihiro Nakayama, Raphael Polig, Tong Xu