Patents by Inventor Mauricio J. Serrano

Mauricio J. Serrano has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dual phase matrix-vector multiplication system

Patent number: 10984073

Abstract: A processor can scan a portion of a vector to identify first nonzero entries. The processor can scan another portion of the vector to identify second nonzero entries. The processor can scale a portion of a matrix using the first nonzero entries to generate first intermediate elements. The processor can scale another portion of the matrix using the second nonzero entries to generate second intermediate elements. The processor can store the first intermediate elements in a first buffer and store the second intermediate elements in a second buffer. The processor can copy a subset of the first intermediate elements from the first buffer to a memory and copy a subset of the second intermediate elements from the second buffer to the memory. The subsets of first and second intermediate elements can be aggregated to generate an output vector.

Type: Grant

Filed: August 8, 2019

Date of Patent: April 20, 2021

Assignee: International Business Machines Corporation

Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
Reducing memory access latency in scatter/gather operations

Patent number: 10884942

Abstract: Various embodiments execute a program with improved cache efficiency. In one embodiment, a first subset of operations of a program is performed on a plurality of objects stored in one or more data structures. The first subset of operations has a regular memory access pattern. After each operation in the first subset of operations has been performed, results of the operation are stored in one of the plurality of queues. Each queue in the plurality of queues is associated with a different cacheable region of a memory. A second subset of operations in the program is performed utilizing at least one queue in the plurality of queues. The second subset of operations utilizes results of the operations in the first subset of operations stored in the queue. The second subset of operations has an irregular memory access pattern that is regularized by localizing memory locations accessed by the second subset of operations to the cacheable region of memory associated with the at least one queue.

Type: Grant

Filed: May 19, 2016

Date of Patent: January 5, 2021

Assignee: International Business Machines Corporation

Inventors: William Pettit Horn, Joefon Jann, Manoj Kumar, Jose Eduardo Moreira, Pratap Chandra Pattnaik, Mauricio J. Serrano, Ilie Gabriel Tanase
Predicting indirect branches using problem branch filtering and pattern cache

Patent number: 10795683

Abstract: Predicting indirect branch instructions may comprise predicting a target address for a fetched branch instruction. Accuracy of the target address may be tracked. The fetched branch instruction may be flagged as a problematic branch instruction based on the tracking. A pattern cache may be trained for predicting a more accurate target address for the fetched branch instruction, and the next time the fetched branch instruction is again fetched, a target address may be predicted from the pattern cache.

Type: Grant

Filed: June 11, 2014

Date of Patent: October 6, 2020

Assignee: International Business Machines Corporation

Inventors: Richard J. Eickemeyer, Tejas Karkhanis, Brian R. Konigsburg, David S. Levitan, Douglas R. G. Logan, Mauricio J. Serrano
Wide vector execution in single thread mode for an out-of-order processor

Patent number: 10713056

Abstract: A non-limiting example of a computer-implemented method for implementing wide vector execution for an out-of-order processor includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.

Type: Grant

Filed: November 8, 2017

Date of Patent: July 14, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
Wide vector execution in single thread mode for an out-of-order processor

Patent number: 10705847

Abstract: A non-limiting example of a computer-implemented method for implementing wide vector execution for an out-of-order processor includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.

Type: Grant

Filed: August 1, 2017

Date of Patent: July 7, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
Instruction prefetching in a computer processor using a prefetch prediction vector

Patent number: 10664279

Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.

Type: Grant

Filed: January 4, 2019

Date of Patent: May 26, 2020

Assignee: International Business Machines Corporation

Inventors: Richard J. Eickemeyer, Sheldon Levenstein, David S. Levitan, Mauricio J. Serrano
Switching matrix representation for an incremental algorithm computing connected components

Patent number: 10558429

Abstract: Techniques for determining connected components of a graph via incremental graph analysis algorithms are provided. In one example, a computer-implemented method comprises analyzing, by a system operatively coupled to a processor, a first differential value representing an initial incremental difference of elements between selected initial elements of an initial vector and selected input elements of an input vector associated with a graph. The method further comprises recurringly analyzing, by the system, a second differential value representing a subsequent incremental difference of elements between selected updated elements of an updated initial vector and selected additional elements of another input vector associated with the graph until the second differential value is zero.

Type: Grant

Filed: August 24, 2016

Date of Patent: February 11, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Mauricio J. Serrano, Ilie Gabriel Tanase
DUAL PHASE MATRIX-VECTOR MULTIPLICATION SYSTEM

Publication number: 20190361955

Abstract: A processor can scan a portion of a vector to identify first nonzero entries. The processor can scan another portion of the vector to identify second nonzero entries. The processor can scale a portion of a matrix using the first nonzero entries to generate first intermediate elements. The processor can scale another portion of the matrix using the second nonzero entries to generate second intermediate elements. The processor can store the first intermediate elements in a first buffer and store the second intermediate elements in a second buffer. The processor can copy a subset of the first intermediate elements from the first buffer to a memory and copy a subset of the second intermediate elements from the second buffer to the memory. The subsets of first and second intermediate elements can be aggregated to generate an output vector.

Type: Application

Filed: August 8, 2019

Publication date: November 28, 2019

Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
Support of Wide Single Instruction Multiple Data (SIMD) Register Vectors through a Virtualization of Multithreaded Vectors in a Simultaneous Multithreaded (SMT) Architecture

Publication number: 20190294585

Abstract: A computing device and a method of allocating vector register files in a simultaneously-multithreaded (SMT) processor core are provided. A request for a first number (M) of vector register files is received from a borrower thread of the processor core. One or more available donor threads of the processor core are identified. A second number (N) of the vector register files, of the identified one or more available donor threads, are assigned to the borrower thread, where N is ?M. The borrower thread is parameterized to create a virtualized vector register file for the borrower thread, based on a width of the N vector register files of the identified one or more donor threads.

Type: Application

Filed: March 21, 2018

Publication date: September 26, 2019

Inventors: Mauricio J. Serrano, Giles B. Frazier, Silvia Melitta Mueller
Dual phase matrix-vector multiplication system

Patent number: 10417304

Abstract: Methods and systems for multiplying a matrix and a vector are described. In an example, the vector may be partitioned into a plurality of vector partitions. The matrix may be partitioned into a plurality of matrix partitions. A plurality of threads may be scheduled to multiply each matrix partition with corresponding vector partition to determine intermediate elements. Intermediate elements determined by each thread may be stored in a local buffer assigned to the corresponding thread. Intermediate elements may be copied from a particular buffer to a memory in response to the particular buffer being full. Upon completion of the plurality of threads, the intermediate elements copied to the memory may be aggregated to generate an output vector that may be a result of multiplication between the matrix and the vector.

Type: Grant

Filed: December 15, 2017

Date of Patent: September 17, 2019

Assignee: International Business Machines Corporation

Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
Dynamic sequential instruction prefetching

Patent number: 10379857

Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an instruction cache. In response to the CLA subsequently hitting in the instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the instruction cache.

Type: Grant

Filed: August 23, 2018

Date of Patent: August 13, 2019

Assignee: International Business Machines Corporation

Inventors: Richard J. Eickemeyer, Sheldon B. Levenstein, David S. Levitan, Mauricio J. Serrano, Brian W. Thompto
DUAL PHASE MATRIX-VECTOR MULTIPLICATION SYSTEM

Publication number: 20190188239

Abstract: Methods and systems for multiplying a matrix and a vector are described. In an example, the vector may be partitioned into a plurality of vector partitions. The matrix may be partitioned into a plurality of matrix partitions. A plurality of threads may be scheduled to multiply each matrix partition with corresponding vector partition to determine intermediate elements. Intermediate elements determined by each thread may be stored in a local buffer assigned to the corresponding thread. Intermediate elements may be copied from a particular buffer to a memory in response to the particular buffer being full. Upon completion of the plurality of threads, the intermediate elements copied to the memory may be aggregated to generate an output vector that may be a result of multiplication between the matrix and the vector.

Type: Application

Filed: December 15, 2017

Publication date: June 20, 2019

Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
INSTRUCTION PREFETCHING IN A COMPUTER PROCESSOR USING A PREFETCH PREDICTION VECTOR

Publication number: 20190138312

Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.

Type: Application

Filed: January 4, 2019

Publication date: May 9, 2019

Inventors: RICHARD J. EICKEMEYER, SHELDON LEVENSTEIN, DAVID S. LEVITAN, MAURICIO J. SERRANO
WIDE VECTOR EXECUTION IN SINGLE THREAD MODE FOR AN OUT-OF-ORDER PROCESSOR

Publication number: 20190042266

Abstract: Embodiments of the present invention include methods, systems, and computer program products for implementing wide vector execution in a single thread mode for an out-of-order processor. A non-limiting example of the computer-implemented method includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.

Type: Application

Filed: November 8, 2017

Publication date: February 7, 2019

Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
WIDE VECTOR EXECUTION IN SINGLE THREAD MODE FOR AN OUT-OF-ORDER PROCESSOR

Publication number: 20190042265

Abstract: Embodiments of the present invention include methods, systems, and computer program products for implementing wide vector execution in a single thread mode for an out-of-order processor. A non-limiting example of the computer-implemented method includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.

Type: Application

Filed: August 1, 2017

Publication date: February 7, 2019

Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
Instruction prefetching in a computer processor using a prefetch prediction vector

Patent number: 10175987

Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.

Type: Grant

Filed: March 17, 2016

Date of Patent: January 8, 2019

Assignee: International Business Machines Corporation

Inventors: Richard J. Eickemeyer, Sheldon Levenstein, David S. Levitan, Mauricio J. Serrano
DYNAMIC SEQUENTIAL INSTRUCTION PREFETCHING

Publication number: 20180365012

Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an instruction cache. In response to the CLA subsequently hitting in the instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the instruction cache.

Type: Application

Filed: August 23, 2018

Publication date: December 20, 2018

Inventors: RICHARD J. EICKEMEYER, SHELDON B. LEVENSTEIN, DAVID S. LEVITAN, MAURICIO J. SERRANO, BRIAN W. THOMPTO
Techniques for dynamic sequential instruction prefetching

Patent number: 10078514

Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an upper level instruction cache. In response to the CLA subsequently hitting in the upper level instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the upper level instruction cache.

Type: Grant

Filed: May 11, 2016

Date of Patent: September 18, 2018

Assignee: International Business Machines Corporation

Inventors: Richard J. Eickemeyer, Sheldon B. Levenstein, David S. Levitan, Mauricio J. Serrano, Brian W. Thompto
Branch prediction using multiple versions of history data

Patent number: 9983878

Abstract: Branch prediction is provided by generating a first index from a previous instruction address and from a first branch history vector having a first length. A second index is generated from the previous instruction address and from a second branch history vector that is longer than the first vector. Using the first index, a first branch prediction is retrieved from a first branch prediction table. Using the second index, a second branch prediction is retrieved from a second branch prediction table. Based upon additional branch history data, the first branch history vector and the second branch history vector are updated. A first hash value is generated from a current instruction address and the updated first branch history vector. A second hash value is generated from the current instruction address and the updated second branch history vector. One of the branch predictions are selected based upon the hash values.

Type: Grant

Filed: May 15, 2014

Date of Patent: May 29, 2018

Assignee: International Business Machines Corporation

Inventors: David S. Levitan, Jose E. Moreira, Mauricio J. Serrano
Redundant transactions for detection of timing sensitive errors

Patent number: 9928158

Abstract: A method for detecting a software-race condition in a program includes copying a state of a transaction of the program from a first core of a multi-core processor to at least one additional core of the multi-core processor, running the transaction, redundantly, on the first core and the at least one additional core given the state, outputting a result of the first core and the at least one additional core, and detecting a difference in the results between the first core and the at least one additional core, wherein the difference indicates the software-race condition.

Type: Grant

Filed: January 30, 2016

Date of Patent: March 27, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Harold W. Cain, III, David M. Daly, Michael C. Huang, Kattamuri Ekanadham, Jose E. Moreira, Mauricio J. Serrano

1 2 3 4 next