Patents by Inventor Mauricio J. Serrano

Mauricio J. Serrano has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10984073
    Abstract: A processor can scan a portion of a vector to identify first nonzero entries. The processor can scan another portion of the vector to identify second nonzero entries. The processor can scale a portion of a matrix using the first nonzero entries to generate first intermediate elements. The processor can scale another portion of the matrix using the second nonzero entries to generate second intermediate elements. The processor can store the first intermediate elements in a first buffer and store the second intermediate elements in a second buffer. The processor can copy a subset of the first intermediate elements from the first buffer to a memory and copy a subset of the second intermediate elements from the second buffer to the memory. The subsets of first and second intermediate elements can be aggregated to generate an output vector.
    Type: Grant
    Filed: August 8, 2019
    Date of Patent: April 20, 2021
    Assignee: International Business Machines Corporation
    Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
  • Patent number: 10884942
    Abstract: Various embodiments execute a program with improved cache efficiency. In one embodiment, a first subset of operations of a program is performed on a plurality of objects stored in one or more data structures. The first subset of operations has a regular memory access pattern. After each operation in the first subset of operations has been performed, results of the operation are stored in one of the plurality of queues. Each queue in the plurality of queues is associated with a different cacheable region of a memory. A second subset of operations in the program is performed utilizing at least one queue in the plurality of queues. The second subset of operations utilizes results of the operations in the first subset of operations stored in the queue. The second subset of operations has an irregular memory access pattern that is regularized by localizing memory locations accessed by the second subset of operations to the cacheable region of memory associated with the at least one queue.
    Type: Grant
    Filed: May 19, 2016
    Date of Patent: January 5, 2021
    Assignee: International Business Machines Corporation
    Inventors: William Pettit Horn, Joefon Jann, Manoj Kumar, Jose Eduardo Moreira, Pratap Chandra Pattnaik, Mauricio J. Serrano, Ilie Gabriel Tanase
  • Patent number: 10795683
    Abstract: Predicting indirect branch instructions may comprise predicting a target address for a fetched branch instruction. Accuracy of the target address may be tracked. The fetched branch instruction may be flagged as a problematic branch instruction based on the tracking. A pattern cache may be trained for predicting a more accurate target address for the fetched branch instruction, and the next time the fetched branch instruction is again fetched, a target address may be predicted from the pattern cache.
    Type: Grant
    Filed: June 11, 2014
    Date of Patent: October 6, 2020
    Assignee: International Business Machines Corporation
    Inventors: Richard J. Eickemeyer, Tejas Karkhanis, Brian R. Konigsburg, David S. Levitan, Douglas R. G. Logan, Mauricio J. Serrano
  • Patent number: 10713056
    Abstract: A non-limiting example of a computer-implemented method for implementing wide vector execution for an out-of-order processor includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.
    Type: Grant
    Filed: November 8, 2017
    Date of Patent: July 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
  • Patent number: 10705847
    Abstract: A non-limiting example of a computer-implemented method for implementing wide vector execution for an out-of-order processor includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.
    Type: Grant
    Filed: August 1, 2017
    Date of Patent: July 7, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
  • Patent number: 10664279
    Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.
    Type: Grant
    Filed: January 4, 2019
    Date of Patent: May 26, 2020
    Assignee: International Business Machines Corporation
    Inventors: Richard J. Eickemeyer, Sheldon Levenstein, David S. Levitan, Mauricio J. Serrano
  • Patent number: 10558429
    Abstract: Techniques for determining connected components of a graph via incremental graph analysis algorithms are provided. In one example, a computer-implemented method comprises analyzing, by a system operatively coupled to a processor, a first differential value representing an initial incremental difference of elements between selected initial elements of an initial vector and selected input elements of an input vector associated with a graph. The method further comprises recurringly analyzing, by the system, a second differential value representing a subsequent incremental difference of elements between selected updated elements of an updated initial vector and selected additional elements of another input vector associated with the graph until the second differential value is zero.
    Type: Grant
    Filed: August 24, 2016
    Date of Patent: February 11, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mauricio J. Serrano, Ilie Gabriel Tanase
  • Publication number: 20190361955
    Abstract: A processor can scan a portion of a vector to identify first nonzero entries. The processor can scan another portion of the vector to identify second nonzero entries. The processor can scale a portion of a matrix using the first nonzero entries to generate first intermediate elements. The processor can scale another portion of the matrix using the second nonzero entries to generate second intermediate elements. The processor can store the first intermediate elements in a first buffer and store the second intermediate elements in a second buffer. The processor can copy a subset of the first intermediate elements from the first buffer to a memory and copy a subset of the second intermediate elements from the second buffer to the memory. The subsets of first and second intermediate elements can be aggregated to generate an output vector.
    Type: Application
    Filed: August 8, 2019
    Publication date: November 28, 2019
    Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
  • Publication number: 20190294585
    Abstract: A computing device and a method of allocating vector register files in a simultaneously-multithreaded (SMT) processor core are provided. A request for a first number (M) of vector register files is received from a borrower thread of the processor core. One or more available donor threads of the processor core are identified. A second number (N) of the vector register files, of the identified one or more available donor threads, are assigned to the borrower thread, where N is ?M. The borrower thread is parameterized to create a virtualized vector register file for the borrower thread, based on a width of the N vector register files of the identified one or more donor threads.
    Type: Application
    Filed: March 21, 2018
    Publication date: September 26, 2019
    Inventors: Mauricio J. Serrano, Giles B. Frazier, Silvia Melitta Mueller
  • Patent number: 10417304
    Abstract: Methods and systems for multiplying a matrix and a vector are described. In an example, the vector may be partitioned into a plurality of vector partitions. The matrix may be partitioned into a plurality of matrix partitions. A plurality of threads may be scheduled to multiply each matrix partition with corresponding vector partition to determine intermediate elements. Intermediate elements determined by each thread may be stored in a local buffer assigned to the corresponding thread. Intermediate elements may be copied from a particular buffer to a memory in response to the particular buffer being full. Upon completion of the plurality of threads, the intermediate elements copied to the memory may be aggregated to generate an output vector that may be a result of multiplication between the matrix and the vector.
    Type: Grant
    Filed: December 15, 2017
    Date of Patent: September 17, 2019
    Assignee: International Business Machines Corporation
    Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
  • Patent number: 10379857
    Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an instruction cache. In response to the CLA subsequently hitting in the instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the instruction cache.
    Type: Grant
    Filed: August 23, 2018
    Date of Patent: August 13, 2019
    Assignee: International Business Machines Corporation
    Inventors: Richard J. Eickemeyer, Sheldon B. Levenstein, David S. Levitan, Mauricio J. Serrano, Brian W. Thompto
  • Publication number: 20190188239
    Abstract: Methods and systems for multiplying a matrix and a vector are described. In an example, the vector may be partitioned into a plurality of vector partitions. The matrix may be partitioned into a plurality of matrix partitions. A plurality of threads may be scheduled to multiply each matrix partition with corresponding vector partition to determine intermediate elements. Intermediate elements determined by each thread may be stored in a local buffer assigned to the corresponding thread. Intermediate elements may be copied from a particular buffer to a memory in response to the particular buffer being full. Upon completion of the plurality of threads, the intermediate elements copied to the memory may be aggregated to generate an output vector that may be a result of multiplication between the matrix and the vector.
    Type: Application
    Filed: December 15, 2017
    Publication date: June 20, 2019
    Inventors: Mauricio J. Serrano, Manoj Kumar, Pratap Pattnaik
  • Publication number: 20190138312
    Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.
    Type: Application
    Filed: January 4, 2019
    Publication date: May 9, 2019
    Inventors: RICHARD J. EICKEMEYER, SHELDON LEVENSTEIN, DAVID S. LEVITAN, MAURICIO J. SERRANO
  • Publication number: 20190042266
    Abstract: Embodiments of the present invention include methods, systems, and computer program products for implementing wide vector execution in a single thread mode for an out-of-order processor. A non-limiting example of the computer-implemented method includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.
    Type: Application
    Filed: November 8, 2017
    Publication date: February 7, 2019
    Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
  • Publication number: 20190042265
    Abstract: Embodiments of the present invention include methods, systems, and computer program products for implementing wide vector execution in a single thread mode for an out-of-order processor. A non-limiting example of the computer-implemented method includes entering, by the out-of-order processor, a single thread mode. The method further includes partitioning, by the out-of-order processor, a vector register file into a plurality of register files, each of the plurality of register files being associated with a vector execution unit, the vector execution units forming a wide vector execution unit. The method further includes receiving, by a vector scalar register of the out-of-order processor, a wide vector instruction. The method further includes processing, by the wide vector execution unit, the wide vector instruction.
    Type: Application
    Filed: August 1, 2017
    Publication date: February 7, 2019
    Inventors: Silvia M. Mueller, Mauricio J. Serrano, Balaram Sinharoy
  • Patent number: 10175987
    Abstract: Instruction prefetching in a computer processor includes, upon a miss in an instruction cache for an instruction cache line: retrieving, for the instruction cache line, a prefetch prediction vector, the prefetch prediction vector representing one or more cache lines of a set of contiguous instruction cache lines following the instruction cache line to prefetch from backing memory; and prefetching, from backing memory into the instruction cache, the instruction cache lines indicated by the prefetch prediction vector.
    Type: Grant
    Filed: March 17, 2016
    Date of Patent: January 8, 2019
    Assignee: International Business Machines Corporation
    Inventors: Richard J. Eickemeyer, Sheldon Levenstein, David S. Levitan, Mauricio J. Serrano
  • Publication number: 20180365012
    Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an instruction cache. In response to the CLA subsequently hitting in the instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the instruction cache.
    Type: Application
    Filed: August 23, 2018
    Publication date: December 20, 2018
    Inventors: RICHARD J. EICKEMEYER, SHELDON B. LEVENSTEIN, DAVID S. LEVITAN, MAURICIO J. SERRANO, BRIAN W. THOMPTO
  • Patent number: 10078514
    Abstract: A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an upper level instruction cache. In response to the CLA subsequently hitting in the upper level instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the upper level instruction cache.
    Type: Grant
    Filed: May 11, 2016
    Date of Patent: September 18, 2018
    Assignee: International Business Machines Corporation
    Inventors: Richard J. Eickemeyer, Sheldon B. Levenstein, David S. Levitan, Mauricio J. Serrano, Brian W. Thompto
  • Patent number: 9983878
    Abstract: Branch prediction is provided by generating a first index from a previous instruction address and from a first branch history vector having a first length. A second index is generated from the previous instruction address and from a second branch history vector that is longer than the first vector. Using the first index, a first branch prediction is retrieved from a first branch prediction table. Using the second index, a second branch prediction is retrieved from a second branch prediction table. Based upon additional branch history data, the first branch history vector and the second branch history vector are updated. A first hash value is generated from a current instruction address and the updated first branch history vector. A second hash value is generated from the current instruction address and the updated second branch history vector. One of the branch predictions are selected based upon the hash values.
    Type: Grant
    Filed: May 15, 2014
    Date of Patent: May 29, 2018
    Assignee: International Business Machines Corporation
    Inventors: David S. Levitan, Jose E. Moreira, Mauricio J. Serrano
  • Patent number: 9928158
    Abstract: A method for detecting a software-race condition in a program includes copying a state of a transaction of the program from a first core of a multi-core processor to at least one additional core of the multi-core processor, running the transaction, redundantly, on the first core and the at least one additional core given the state, outputting a result of the first core and the at least one additional core, and detecting a difference in the results between the first core and the at least one additional core, wherein the difference indicates the software-race condition.
    Type: Grant
    Filed: January 30, 2016
    Date of Patent: March 27, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Harold W. Cain, III, David M. Daly, Michael C. Huang, Kattamuri Ekanadham, Jose E. Moreira, Mauricio J. Serrano