Patents by Inventor Mauricio J. Serrano
Mauricio J. Serrano has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10984073Abstract: A processor can scan a portion of a vector to identify first nonzero entries. The processor can scan another portion of the vector to identify second nonzero entries. The processor can scale a portion of a matrix using the first nonzero entries to generate first intermediate elements. The processor can scale another portion of the matrix using the second nonzero entries to generate second intermediate elements. The processor can store the first intermediate elements in a first buffer and store the second intermediate elements in a second buffer. The processor can copy a subset of the first intermediate elements from the first buffer to a memory and copy a subset of the second intermediate elements from the second buffer to the memory. The subsets of first and second intermediate elements can be aggregated to generate an output vector.Type: GrantFiled: August 8, 2019Date of Patent: April 20, 2021Assignee: International Business Machines CorporationInventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
-
Patent number: 10884942Abstract: Various embodiments execute a program with improved cache efficiency. In one embodiment, a first subset of operations of a program is performed on a plurality of objects stored in one or more data structures. The first subset of operations has a regular memory access pattern. After each operation in the first subset of operations has been performed, results of the operation are stored in one of the plurality of queues. Each queue in the plurality of queues is associated with a different cacheable region of a memory. A second subset of operations in the program is performed utilizing at least one queue in the plurality of queues. The second subset of operations utilizes results of the operations in the first subset of operations stored in the queue. The second subset of operations has an irregular memory access pattern that is regularized by localizing memory locations accessed by the second subset of operations to the cacheable region of memory associated with the at least one queue.Type: GrantFiled: May 19, 2016Date of Patent: January 5, 2021Assignee: International Business Machines CorporationInventors: William Pettit Horn, Joefon Jann, Manoj Kumar, Jose Eduardo Moreira, Pratap Chandra Pattnaik, Mauricio J. Serrano, Ilie Gabriel Tanase
-
Patent number: 10795683Abstract: Predicting indirect branch instructions may comprise predicting a target address for a fetched branch instruction. Accuracy of the target address may be tracked. The fetched branch instruction may be flagged as a problematic branch instruction based on the tracking. A pattern cache may be trained for predicting a more accurate target address for the fetched branch instruction, and the next time the fetched branch instruction is again fetched, a target address may be predicted from the pattern cache.Type: GrantFiled: June 11, 2014Date of Patent: October 6, 2020Assignee: International Business Machines CorporationInventors: Richard J. Eickemeyer, Tejas Karkhanis, Brian R. Konigsburg, David S. Levitan, Douglas R. G. Logan, Mauricio J. Serrano
-
Patent number: 10713056Abstract: A non-limiting example of a computer-implemented method for implementing wide vector execution for an out-of-order processor includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.Type: GrantFiled: November 8, 2017Date of Patent: July 14, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
-
Patent number: 10705847Abstract: A non-limiting example of a computer-implemented method for implementing wide vector execution for an out-of-order processor includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.Type: GrantFiled: August 1, 2017Date of Patent: July 7, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
-
Patent number: 10664279Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.Type: GrantFiled: January 4, 2019Date of Patent: May 26, 2020Assignee: International Business Machines CorporationInventors: Richard J. Eickemeyer, Sheldon Levenstein, David S. Levitan, Mauricio J. Serrano
-
Patent number: 10558429Abstract: Techniques for determining connected components of a graph via incremental graph analysis algorithms are provided. In one example, a computer-implemented method comprises analyzing, by a system operatively coupled to a processor, a first differential value representing an initial incremental difference of elements between selected initial elements of an initial vector and selected input elements of an input vector associated with a graph. The method further comprises recurringly analyzing, by the system, a second differential value representing a subsequent incremental difference of elements between selected updated elements of an updated initial vector and selected additional elements of another input vector associated with the graph until the second differential value is zero.Type: GrantFiled: August 24, 2016Date of Patent: February 11, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Mauricio J. Serrano, Ilie Gabriel Tanase
-
Publication number: 20190361955Abstract: A processor can scan a portion of a vector to identify first nonzero entries. The processor can scan another portion of the vector to identify second nonzero entries. The processor can scale a portion of a matrix using the first nonzero entries to generate first intermediate elements. The processor can scale another portion of the matrix using the second nonzero entries to generate second intermediate elements. The processor can store the first intermediate elements in a first buffer and store the second intermediate elements in a second buffer. The processor can copy a subset of the first intermediate elements from the first buffer to a memory and copy a subset of the second intermediate elements from the second buffer to the memory. The subsets of first and second intermediate elements can be aggregated to generate an output vector.Type: ApplicationFiled: August 8, 2019Publication date: November 28, 2019Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
-
Publication number: 20190294585Abstract: A computing device and a method of allocating vector register files in a simultaneously-multithreaded (SMT) processor core are provided. A request for a first number (M) of vector register files is received from a borrower thread of the processor core. One or more available donor threads of the processor core are identified. A second number (N) of the vector register files, of the identified one or more available donor threads, are assigned to the borrower thread, where N is ?M. The borrower thread is parameterized to create a virtualized vector register file for the borrower thread, based on a width of the N vector register files of the identified one or more donor threads.Type: ApplicationFiled: March 21, 2018Publication date: September 26, 2019Inventors: Mauricio J. Serrano, Giles B. Frazier, Silvia Melitta Mueller
-
Patent number: 10417304Abstract: Methods and systems for multiplying a matrix and a vector are described. In an example, the vector may be partitioned into a plurality of vector partitions. The matrix may be partitioned into a plurality of matrix partitions. A plurality of threads may be scheduled to multiply each matrix partition with corresponding vector partition to determine intermediate elements. Intermediate elements determined by each thread may be stored in a local buffer assigned to the corresponding thread. Intermediate elements may be copied from a particular buffer to a memory in response to the particular buffer being full. Upon completion of the plurality of threads, the intermediate elements copied to the memory may be aggregated to generate an output vector that may be a result of multiplication between the matrix and the vector.Type: GrantFiled: December 15, 2017Date of Patent: September 17, 2019Assignee: International Business Machines CorporationInventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
-
Patent number: 10379857Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an instruction cache. In response to the CLA subsequently hitting in the instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the instruction cache.Type: GrantFiled: August 23, 2018Date of Patent: August 13, 2019Assignee: International Business Machines CorporationInventors: Richard J. Eickemeyer, Sheldon B. Levenstein, David S. Levitan, Mauricio J. Serrano, Brian W. Thompto
-
Publication number: 20190188239Abstract: Methods and systems for multiplying a matrix and a vector are described. In an example, the vector may be partitioned into a plurality of vector partitions. The matrix may be partitioned into a plurality of matrix partitions. A plurality of threads may be scheduled to multiply each matrix partition with corresponding vector partition to determine intermediate elements. Intermediate elements determined by each thread may be stored in a local buffer assigned to the corresponding thread. Intermediate elements may be copied from a particular buffer to a memory in response to the particular buffer being full. Upon completion of the plurality of threads, the intermediate elements copied to the memory may be aggregated to generate an output vector that may be a result of multiplication between the matrix and the vector.Type: ApplicationFiled: December 15, 2017Publication date: June 20, 2019Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
-
Publication number: 20190138312Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.Type: ApplicationFiled: January 4, 2019Publication date: May 9, 2019Inventors: RICHARD J. EICKEMEYER, SHELDON LEVENSTEIN, DAVID S. LEVITAN, MAURICIO J. SERRANO
-
Publication number: 20190042266Abstract: Embodiments of the present invention include methods, systems, and computer program products for implementing wide vector execution in a single thread mode for an out-of-order processor. A non-limiting example of the computer-implemented method includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.Type: ApplicationFiled: November 8, 2017Publication date: February 7, 2019Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
-
Publication number: 20190042265Abstract: Embodiments of the present invention include methods, systems, and computer program products for implementing wide vector execution in a single thread mode for an out-of-order processor. A non-limiting example of the computer-implemented method includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.Type: ApplicationFiled: August 1, 2017Publication date: February 7, 2019Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
-
Patent number: 10175987Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.Type: GrantFiled: March 17, 2016Date of Patent: January 8, 2019Assignee: International Business Machines CorporationInventors: Richard J. Eickemeyer, Sheldon Levenstein, David S. Levitan, Mauricio J. Serrano
-
Publication number: 20180365012Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an instruction cache. In response to the CLA subsequently hitting in the instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the instruction cache.Type: ApplicationFiled: August 23, 2018Publication date: December 20, 2018Inventors: RICHARD J. EICKEMEYER, SHELDON B. LEVENSTEIN, DAVID S. LEVITAN, MAURICIO J. SERRANO, BRIAN W. THOMPTO
-
Patent number: 10078514Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an upper level instruction cache. In response to the CLA subsequently hitting in the upper level instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the upper level instruction cache.Type: GrantFiled: May 11, 2016Date of Patent: September 18, 2018Assignee: International Business Machines CorporationInventors: Richard J. Eickemeyer, Sheldon B. Levenstein, David S. Levitan, Mauricio J. Serrano, Brian W. Thompto
-
Patent number: 9983878Abstract: Branch prediction is provided by generating a first index from a previous instruction address and from a first branch history vector having a first length. A second index is generated from the previous instruction address and from a second branch history vector that is longer than the first vector. Using the first index, a first branch prediction is retrieved from a first branch prediction table. Using the second index, a second branch prediction is retrieved from a second branch prediction table. Based upon additional branch history data, the first branch history vector and the second branch history vector are updated. A first hash value is generated from a current instruction address and the updated first branch history vector. A second hash value is generated from the current instruction address and the updated second branch history vector. One of the branch predictions are selected based upon the hash values.Type: GrantFiled: May 15, 2014Date of Patent: May 29, 2018Assignee: International Business Machines CorporationInventors: David S. Levitan, Jose E. Moreira, Mauricio J. Serrano
-
Patent number: 9928158Abstract: A method for detecting a software-race condition in a program includes copying a state of a transaction of the program from a first core of a multi-core processor to at least one additional core of the multi-core processor, running the transaction, redundantly, on the first core and the at least one additional core given the state, outputting a result of the first core and the at least one additional core, and detecting a difference in the results between the first core and the at least one additional core, wherein the difference indicates the software-race condition.Type: GrantFiled: January 30, 2016Date of Patent: March 27, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Harold W. Cain, III, David M. Daly, Michael C. Huang, Kattamuri Ekanadham, Jose E. Moreira, Mauricio J. Serrano