Patents Examined by Jyoti Mehta

Multivalue reductions using serial initial reductions in multiple register spaces and parallel subsequent reductions in a single register space

Patent number: 10133573

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a multivalue reduction using a parallel processing device. One of the methods includes performing a parallel M-value reduction by parallel processing units of a parallel processing device. A plurality of initial reductions are performed in serial, each initial reduction operating on data in a different respective register space of at least M register spaces. Data is moved from the M register spaces so that all results from the plurality of initial reductions are in a same first register space. One or more subsequent reductions are performed in parallel to compute M final values, each subsequent reduction operating only on data in the first register space.

Type: Grant

Filed: December 12, 2017

Date of Patent: November 20, 2018

Assignee: Google LLC

Inventors: Erich Konrad Elsen, Sander Etienne Lea Dieleman
Processor and method for executing memory access and computing instructions for host matrix operations

Patent number: 10127040

Abstract: The present application discloses a processor and a method for executing an instruction on a processor. A specific implementation of the processor includes: a host interaction device, an instruction control device, an off-chip memory, an on-chip cache and an array processing device, wherein the host interaction device is configured to exchange data and instructions with a host connected with the processor, wherein the exchanged data has a granularity of a matrix; the off-chip memory is configured to store a matrix received from the host, on which a matrix operation is to be performed; and the instruction control device is configured to convert an external instruction received from the host to a series of memory access instructions and a series of computing instructions and execute the converted instructions. The implementation can improve the execution efficiency of a deep learning algorithm.

Type: Grant

Filed: September 28, 2016

Date of Patent: November 13, 2018

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Wei Qi, Jian Ouyang, Yong Wang
Fast multi-width instruction issue in parallel slice processor

Patent number: 10120693

Abstract: Fast issuance and execution of a multi-width instruction across multiple slices in a parallel slice processor core is supported in part through the use of an early notification signal passed between issue logic associated with multiple slices handling that multi-width instruction coupled with an issuance of a different instruction by the originating issue logic for the early notification signal.

Type: Grant

Filed: March 29, 2018

Date of Patent: November 6, 2018

Assignee: International Business Machines Corporation

Inventors: Salma Ayub, Jeffrey C. Brownscheidle, Sundeep Chadha, Dung Q. Nguyen, Tu-An T. Nguyen, Salim A. Shah, Brian W. Thompto
Parallel slice processor shadowing states of hardware threads across execution slices

Patent number: 10102001

Abstract: Supplemental instruction dispatch may be used in some instances in a parallel slice processor to dispatch additional instructions, referred to as supplemental instructions, to supplemental instruction ports of execution slices and using primary instruction ports of one or more execution slices to supply one or more source operands for such supplemental instructions. In addition, in some instances, in lieu of or in addition to supplemental instruction dispatch, selective slice partitioning may be used to selectively partition groups of execution slices in a parallel slice processor based upon a threading mode within which such execution slices are executing.

Type: Grant

Filed: March 28, 2018

Date of Patent: October 16, 2018

Assignee: International Business Machines Corporation

Inventors: Kurt A. Feiste, Christopher M. Mueller, Dung Q. Nguyen, Eula A. Tolentino, Tien T. Tran, Jing Zhang
Reconfigurable processor with load-store slices supporting reorder and controlling access to cache slices

Patent number: 10083039

Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.

Type: Grant

Filed: January 30, 2018

Date of Patent: September 25, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
Comparison-based sort in a reconfigurable array processor having multiple processing elements for sorting array elements

Patent number: 10078513

Abstract: An array processor includes a managing element having a load streaming unit coupled to multiple processing elements. The load streaming unit provides input data portions to each of a first subset of the processing elements and also receives output data from each of a second subset of the processing elements based on a comparatively sorted combination of the input data portions provided to the first subset of processing elements. Furthermore, each of processing elements is configurable by the managing element to compare input data portions received from either the load streaming unit or two or more of the other processing elements, wherein the input data portions are stored for processing in respective queues. Each processing unit is further configurable to select an input data portion to be output data based on the comparison, and in response to selecting the input data portion, remove a queue entry corresponding to the selected input data portion.

Type: Grant

Filed: October 31, 2017

Date of Patent: September 18, 2018

Assignee: International Business Machines Corporation

Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
Methods, systems, and media for injecting code into embedded devices

Patent number: 10055251

Abstract: Mechanisms for injecting code into embedded devices are provided. In some embodiments, once the code is injected into the embedded device, the injected code can analyze and modify the code of the embedded device (e.g., firmware) to create the execution environment for the injected code. For example, the injected code can identify program instruction locations in the code of the embedded device into which jump instructions can be placed. The injected code can also insert at least one jump instruction at an identified program instruction location in the code of the embedded device. In response to the execution of a jump instruction, the injected code can save a context of the code of the embedded device to memory and loading a payload context into a processor of the embedded device. The payload context can then be executed by the processor of the embedded device.

Type: Grant

Filed: April 22, 2010

Date of Patent: August 21, 2018

Assignee: The Trustees of Columbia University in the City of New York

Inventors: Ang Cui, Salvatore J. Stolfo
Memories and methods for performing vector atomic memory operations with mask control and variable data length and data unit size

Patent number: 10026458

Abstract: Memories and methods for performing an atomic memory operation are disclosed, including a memory having a memory store, operation logic, and a command decoder. Operation logic can be configured to receive data and perform operations thereon in accordance with internal control signals. A command decoder can be configured to receive command packets having at least a memory command portion in which a memory command is provided and data configuration portion in which configuration information related to data associated with a command packet is provided. The command decoder is further configured to generate a command control signal based at least in part on the memory command and further configured to generate control signal based at least in part on the configuration information.

Type: Grant

Filed: October 21, 2010

Date of Patent: July 17, 2018

Assignee: Micron Technology, Inc.

Inventor: David Resnick
Selectively blocking branch prediction for a predetermined number of instructions

Patent number: 10025592

Abstract: Embodiments relate to selectively blocking branch instruction predictions. An aspect includes a computer system for performing selective branch prediction. The system includes memory and a processor, and the system is configured to perform a method. The method includes detecting a branch-prediction blocking instruction in a stream of instructions and blocking branch prediction of a predetermined number of branch instructions following the branch-prediction blocking instruction based on the detecting the branch-prediction blocking instruction.

Type: Grant

Filed: November 21, 2017

Date of Patent: July 17, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: James J. Bonanno, Ulrich Mayer, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel
Selectively blocking branch prediction for a predetermined number of instructions

Patent number: 10019265

Abstract: Embodiments relate to selectively blocking branch instruction predictions. An aspect includes computer implemented method for performing selective branch prediction. The method includes detecting, by a processor, a branch-prediction blocking instruction in a stream of instructions and blocking, by the processor, branch prediction of a predetermined number of branch instructions following the branch-prediction blocking instruction based on the detecting the branch-prediction blocking instruction.

Type: Grant

Filed: November 21, 2017

Date of Patent: July 10, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: James J. Bonanno, Ulrich Mayer, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel
Identifying and tracking frequently accessed registers in a processor

Patent number: 10007590

Abstract: Embodiments include methods, computing systems and computer program products for identifying and tracking frequently accessed registers in a processor of a computing system. Aspects include: creating a list of top accessed registers of certain registers in processor, each register having a corresponding register usage counter, initializing each register usage counter, starting a register usage monitoring mode, examining each register usage counter, and updating list of top accessed registers, stopping register usage monitoring mode, and updating a register file partition assignment when the list of top accessed registers is identified. Once the list of top accessed registers is identified, stopping the programs and bring its threads of execution to quiescent, moving registers between register file partitions until all registers on the list of top accessed registers are in the fully-ported register file partition, and resuming executions of the program and its threads.

Type: Grant

Filed: April 14, 2016

Date of Patent: June 26, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Pratap C. Pattnaik, Jessica H. Tseng
Fast multi-width instruction issue in parallel slice processor

Patent number: 9996359

Abstract: Fast issuance and execution of a multi-width instruction across multiple slices in a parallel slice processor core is supported in part through the use of an early notification signal passed between issue logic associated with multiple slices handling that multi-width instruction coupled with an issuance of a different instruction by the originating issue logic for the early notification signal.

Type: Grant

Filed: April 7, 2016

Date of Patent: June 12, 2018

Assignee: International Business Machines Corporation

Inventors: Salma Ayub, Jeffrey C. Brownscheidle, Sundeep Chadha, Dung Q. Nguyen, Tu-An T. Nguyen, Salim A. Shah, Brian W. Thompto
Execution slice with supplemental instruction port for an instruction using a source operand from another instruction port

Patent number: 9977677

Abstract: Supplemental instruction dispatch may be used in some instances in a parallel slice processor to dispatch additional instructions, referred to as supplemental instructions, to supplemental instruction ports of execution slices and using primary instruction ports of one or more execution slices to supply one or more source operands for such supplemental instructions. In addition, in some instances, in lieu of or in addition to supplemental instruction dispatch, selective slice partitioning may be used to selectively partition groups of execution slices in a parallel slice processor based upon a threading mode within which such execution slices are executing.

Type: Grant

Filed: April 7, 2016

Date of Patent: May 22, 2018

Assignee: International Business Machines Corporation

Inventors: Kurt A. Feiste, Christopher M. Mueller, Dung Q. Nguyen, Eula F. Tolentino, Tien T. Tran, Jing Zhang
Reconfigurable parallel execution and load-store slice processor

Patent number: 9977678

Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.

Type: Grant

Filed: January 12, 2015

Date of Patent: May 22, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
Reconfigurable processing method with modes controlling the partitioning of clusters and cache slices

Patent number: 9971602

Abstract: A method of operating a processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.

Type: Grant

Filed: May 28, 2015

Date of Patent: May 15, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
Storage, access, and management of random numbers generated by a central random number generator and dispensed to hardware threads of cores

Patent number: 9971565

Abstract: Random numbers within a processor may be scarce, especially when multiple hardware threads are consuming them. A local random number buffer can be used by an execution core to better manage allocation and consumption of random numbers. The buffer may operate in a number of modes, and allow any hardware thread to use a random number under some conditions. In other conditions, only certain hardware threads may be allowed to consume a random number. The local random number buffer may have a dynamic pool of entries usable by any hardware thread, as well as reserved entries usable by only particular hardware threads. Further, a user-level instruction is disclosed that can be stored in a wait queue in response to a random number being unavailable, rather than having the instruction's request for a random number simply be denied. The random number buffer may also boost performance and reduce latency.

Type: Grant

Filed: May 7, 2015

Date of Patent: May 15, 2018

Assignee: Oracle International Corporation

Inventors: John Pape, Mark Luttrell, Paul Jordan, Michael Snyder
Method and apparatus for a hierarchical synchronization barrier in a multi-node system

Patent number: 9971635

Abstract: A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed.

Type: Grant

Filed: January 26, 2016

Date of Patent: May 15, 2018

Assignee: International Business Machines Corporation

Inventors: Valentina Salapura, Robert W. Wisniewski
Computer processor employing bypass network using result tags for routing result operands

Patent number: 9965274

Abstract: A computer processor is provided with a plurality of functional units that performs operations specified by the at least one instruction over the multiple machine cycles, wherein the operations produce result operands. The processor also includes circuitry that generates result tags dynamically according to the number of operations that produce result operands in a given machine cycle. A bypass network is configured to provide data paths for transfer of operand data between the plurality of functional units according to the result tags.

Type: Grant

Filed: October 15, 2014

Date of Patent: May 8, 2018

Assignee: Mill Computing, Inc.

Inventors: Arthur David Kahlich, Roger Rawson Godard
Comparison-based sort in a reconfigurable array processor having multiple processing elements for sorting array elements

Patent number: 9934030

Abstract: A method for sorting data in an array processor. Each of a first tier of processing elements in the array processor receives data inputs from a load streaming unit. Each of the first tier processing elements compares input data portions received from the load streaming unit, wherein the input data portions are stored for processing in respective queues. The first tier processing elements select one of the input data portions to be an output data portion based on the comparison, and in response to the selection, remove a corresponding queue entry and request next input data from the load streaming unit. Each of the first tier processing elements further provides the output data portion as an input data portion to a second tier processing element that generates output data based on a comparison of output data received from at least two first tier processing elements.

Type: Grant

Filed: June 3, 2015

Date of Patent: April 3, 2018

Assignee: International Business Machines Corporation

Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
Determining of validity of speculative load data after a predetermined period of time in a multi-slice processor

Patent number: 9928073

Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices coupled via a results bus includes: retrieving, from the results bus into an entry of a register file of an execution slice, speculative result data of a load instruction generated by a load/store slice; and determining, from the load/store slice after expiration of a predetermined period of time, whether the result data is valid.

Type: Grant

Filed: February 22, 2016

Date of Patent: March 27, 2018

Assignee: International Business Machines Corporation

Inventors: Joshua W. Bowman, Sundeep Chadha, Michael J. Genden, Dhivya Jeganathan, Dung Q. Nguyen, David R. Terry, Eula A. Tolentino

prev … 4 5 6 7 8 9 10 11 12 next