Patents Examined by Jyoti Mehta
  • Patent number: 10133573
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a multivalue reduction using a parallel processing device. One of the methods includes performing a parallel M-value reduction by parallel processing units of a parallel processing device. A plurality of initial reductions are performed in serial, each initial reduction operating on data in a different respective register space of at least M register spaces. Data is moved from the M register spaces so that all results from the plurality of initial reductions are in a same first register space. One or more subsequent reductions are performed in parallel to compute M final values, each subsequent reduction operating only on data in the first register space.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: November 20, 2018
    Assignee: Google LLC
    Inventors: Erich Konrad Elsen, Sander Etienne Lea Dieleman
  • Patent number: 10127040
    Abstract: The present application discloses a processor and a method for executing an instruction on a processor. A specific implementation of the processor includes: a host interaction device, an instruction control device, an off-chip memory, an on-chip cache and an array processing device, wherein the host interaction device is configured to exchange data and instructions with a host connected with the processor, wherein the exchanged data has a granularity of a matrix; the off-chip memory is configured to store a matrix received from the host, on which a matrix operation is to be performed; and the instruction control device is configured to convert an external instruction received from the host to a series of memory access instructions and a series of computing instructions and execute the converted instructions. The implementation can improve the execution efficiency of a deep learning algorithm.
    Type: Grant
    Filed: September 28, 2016
    Date of Patent: November 13, 2018
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Wei Qi, Jian Ouyang, Yong Wang
  • Patent number: 10120693
    Abstract: Fast issuance and execution of a multi-width instruction across multiple slices in a parallel slice processor core is supported in part through the use of an early notification signal passed between issue logic associated with multiple slices handling that multi-width instruction coupled with an issuance of a different instruction by the originating issue logic for the early notification signal.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: November 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Jeffrey C. Brownscheidle, Sundeep Chadha, Dung Q. Nguyen, Tu-An T. Nguyen, Salim A. Shah, Brian W. Thompto
  • Patent number: 10102001
    Abstract: Supplemental instruction dispatch may be used in some instances in a parallel slice processor to dispatch additional instructions, referred to as supplemental instructions, to supplemental instruction ports of execution slices and using primary instruction ports of one or more execution slices to supply one or more source operands for such supplemental instructions. In addition, in some instances, in lieu of or in addition to supplemental instruction dispatch, selective slice partitioning may be used to selectively partition groups of execution slices in a parallel slice processor based upon a threading mode within which such execution slices are executing.
    Type: Grant
    Filed: March 28, 2018
    Date of Patent: October 16, 2018
    Assignee: International Business Machines Corporation
    Inventors: Kurt A. Feiste, Christopher M. Mueller, Dung Q. Nguyen, Eula A. Tolentino, Tien T. Tran, Jing Zhang
  • Patent number: 10083039
    Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.
    Type: Grant
    Filed: January 30, 2018
    Date of Patent: September 25, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 10078513
    Abstract: An array processor includes a managing element having a load streaming unit coupled to multiple processing elements. The load streaming unit provides input data portions to each of a first subset of the processing elements and also receives output data from each of a second subset of the processing elements based on a comparatively sorted combination of the input data portions provided to the first subset of processing elements. Furthermore, each of processing elements is configurable by the managing element to compare input data portions received from either the load streaming unit or two or more of the other processing elements, wherein the input data portions are stored for processing in respective queues. Each processing unit is further configurable to select an input data portion to be output data based on the comparison, and in response to selecting the input data portion, remove a queue entry corresponding to the selected input data portion.
    Type: Grant
    Filed: October 31, 2017
    Date of Patent: September 18, 2018
    Assignee: International Business Machines Corporation
    Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
  • Patent number: 10055251
    Abstract: Mechanisms for injecting code into embedded devices are provided. In some embodiments, once the code is injected into the embedded device, the injected code can analyze and modify the code of the embedded device (e.g., firmware) to create the execution environment for the injected code. For example, the injected code can identify program instruction locations in the code of the embedded device into which jump instructions can be placed. The injected code can also insert at least one jump instruction at an identified program instruction location in the code of the embedded device. In response to the execution of a jump instruction, the injected code can save a context of the code of the embedded device to memory and loading a payload context into a processor of the embedded device. The payload context can then be executed by the processor of the embedded device.
    Type: Grant
    Filed: April 22, 2010
    Date of Patent: August 21, 2018
    Assignee: The Trustees of Columbia University in the City of New York
    Inventors: Ang Cui, Salvatore J. Stolfo
  • Patent number: 10026458
    Abstract: Memories and methods for performing an atomic memory operation are disclosed, including a memory having a memory store, operation logic, and a command decoder. Operation logic can be configured to receive data and perform operations thereon in accordance with internal control signals. A command decoder can be configured to receive command packets having at least a memory command portion in which a memory command is provided and data configuration portion in which configuration information related to data associated with a command packet is provided. The command decoder is further configured to generate a command control signal based at least in part on the memory command and further configured to generate control signal based at least in part on the configuration information.
    Type: Grant
    Filed: October 21, 2010
    Date of Patent: July 17, 2018
    Assignee: Micron Technology, Inc.
    Inventor: David Resnick
  • Patent number: 10025592
    Abstract: Embodiments relate to selectively blocking branch instruction predictions. An aspect includes a computer system for performing selective branch prediction. The system includes memory and a processor, and the system is configured to perform a method. The method includes detecting a branch-prediction blocking instruction in a stream of instructions and blocking branch prediction of a predetermined number of branch instructions following the branch-prediction blocking instruction based on the detecting the branch-prediction blocking instruction.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: July 17, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James J. Bonanno, Ulrich Mayer, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel
  • Patent number: 10019265
    Abstract: Embodiments relate to selectively blocking branch instruction predictions. An aspect includes computer implemented method for performing selective branch prediction. The method includes detecting, by a processor, a branch-prediction blocking instruction in a stream of instructions and blocking, by the processor, branch prediction of a predetermined number of branch instructions following the branch-prediction blocking instruction based on the detecting the branch-prediction blocking instruction.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: July 10, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James J. Bonanno, Ulrich Mayer, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel
  • Patent number: 10007590
    Abstract: Embodiments include methods, computing systems and computer program products for identifying and tracking frequently accessed registers in a processor of a computing system. Aspects include: creating a list of top accessed registers of certain registers in processor, each register having a corresponding register usage counter, initializing each register usage counter, starting a register usage monitoring mode, examining each register usage counter, and updating list of top accessed registers, stopping register usage monitoring mode, and updating a register file partition assignment when the list of top accessed registers is identified. Once the list of top accessed registers is identified, stopping the programs and bring its threads of execution to quiescent, moving registers between register file partitions until all registers on the list of top accessed registers are in the fully-ported register file partition, and resuming executions of the program and its threads.
    Type: Grant
    Filed: April 14, 2016
    Date of Patent: June 26, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pratap C. Pattnaik, Jessica H. Tseng
  • Patent number: 9996359
    Abstract: Fast issuance and execution of a multi-width instruction across multiple slices in a parallel slice processor core is supported in part through the use of an early notification signal passed between issue logic associated with multiple slices handling that multi-width instruction coupled with an issuance of a different instruction by the originating issue logic for the early notification signal.
    Type: Grant
    Filed: April 7, 2016
    Date of Patent: June 12, 2018
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Jeffrey C. Brownscheidle, Sundeep Chadha, Dung Q. Nguyen, Tu-An T. Nguyen, Salim A. Shah, Brian W. Thompto
  • Patent number: 9977677
    Abstract: Supplemental instruction dispatch may be used in some instances in a parallel slice processor to dispatch additional instructions, referred to as supplemental instructions, to supplemental instruction ports of execution slices and using primary instruction ports of one or more execution slices to supply one or more source operands for such supplemental instructions. In addition, in some instances, in lieu of or in addition to supplemental instruction dispatch, selective slice partitioning may be used to selectively partition groups of execution slices in a parallel slice processor based upon a threading mode within which such execution slices are executing.
    Type: Grant
    Filed: April 7, 2016
    Date of Patent: May 22, 2018
    Assignee: International Business Machines Corporation
    Inventors: Kurt A. Feiste, Christopher M. Mueller, Dung Q. Nguyen, Eula F. Tolentino, Tien T. Tran, Jing Zhang
  • Patent number: 9977678
    Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.
    Type: Grant
    Filed: January 12, 2015
    Date of Patent: May 22, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 9971602
    Abstract: A method of operating a processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.
    Type: Grant
    Filed: May 28, 2015
    Date of Patent: May 15, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 9971565
    Abstract: Random numbers within a processor may be scarce, especially when multiple hardware threads are consuming them. A local random number buffer can be used by an execution core to better manage allocation and consumption of random numbers. The buffer may operate in a number of modes, and allow any hardware thread to use a random number under some conditions. In other conditions, only certain hardware threads may be allowed to consume a random number. The local random number buffer may have a dynamic pool of entries usable by any hardware thread, as well as reserved entries usable by only particular hardware threads. Further, a user-level instruction is disclosed that can be stored in a wait queue in response to a random number being unavailable, rather than having the instruction's request for a random number simply be denied. The random number buffer may also boost performance and reduce latency.
    Type: Grant
    Filed: May 7, 2015
    Date of Patent: May 15, 2018
    Assignee: Oracle International Corporation
    Inventors: John Pape, Mark Luttrell, Paul Jordan, Michael Snyder
  • Patent number: 9971635
    Abstract: A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed.
    Type: Grant
    Filed: January 26, 2016
    Date of Patent: May 15, 2018
    Assignee: International Business Machines Corporation
    Inventors: Valentina Salapura, Robert W. Wisniewski
  • Patent number: 9965274
    Abstract: A computer processor is provided with a plurality of functional units that performs operations specified by the at least one instruction over the multiple machine cycles, wherein the operations produce result operands. The processor also includes circuitry that generates result tags dynamically according to the number of operations that produce result operands in a given machine cycle. A bypass network is configured to provide data paths for transfer of operand data between the plurality of functional units according to the result tags.
    Type: Grant
    Filed: October 15, 2014
    Date of Patent: May 8, 2018
    Assignee: Mill Computing, Inc.
    Inventors: Arthur David Kahlich, Roger Rawson Godard
  • Patent number: 9934030
    Abstract: A method for sorting data in an array processor. Each of a first tier of processing elements in the array processor receives data inputs from a load streaming unit. Each of the first tier processing elements compares input data portions received from the load streaming unit, wherein the input data portions are stored for processing in respective queues. The first tier processing elements select one of the input data portions to be an output data portion based on the comparison, and in response to the selection, remove a corresponding queue entry and request next input data from the load streaming unit. Each of the first tier processing elements further provides the output data portion as an input data portion to a second tier processing element that generates output data based on a comparison of output data received from at least two first tier processing elements.
    Type: Grant
    Filed: June 3, 2015
    Date of Patent: April 3, 2018
    Assignee: International Business Machines Corporation
    Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
  • Patent number: 9928073
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices coupled via a results bus includes: retrieving, from the results bus into an entry of a register file of an execution slice, speculative result data of a load instruction generated by a load/store slice; and determining, from the load/store slice after expiration of a predetermined period of time, whether the result data is valid.
    Type: Grant
    Filed: February 22, 2016
    Date of Patent: March 27, 2018
    Assignee: International Business Machines Corporation
    Inventors: Joshua W. Bowman, Sundeep Chadha, Michael J. Genden, Dhivya Jeganathan, Dung Q. Nguyen, David R. Terry, Eula A. Tolentino