Patents Examined by Daniel H. Pan
  • Patent number: 10936323
    Abstract: There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running Single Program Multiple Data code on a Single Instruction Multiple Data machine. The machine runs an instruction stream over input data streams and machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation and updates the lane-PC of each active lane according to targets of the branch operation. An instruction of the instruction stream includes a barrier indicating a convergence point for all lanes to join. In response to a lane reaching a barrier: evaluating whether all lane-PCs are set to a same thread-PC; and if the lane-PCs are not set to the same thread-PC, selecting an active lane from the plurality of lanes; otherwise, incrementing the lane-PCs of all the lanes, and then selecting an active lane from the plurality of lanes.
    Type: Grant
    Filed: June 12, 2019
    Date of Patent: March 2, 2021
    Assignee: International Business Machines Corporation
    Inventors: Gheorghe Almasi, Jose Moreira, Jessica H. Tseng, Peng Wu
  • Patent number: 10929778
    Abstract: A system includes a memory, an interface engine, and a master. The memory is configured to store data. The inference engine is configured to receive the data and to perform one or more computation tasks of a machine learning (ML) operation associated with the data. The master is configured to interleave an address associated with memory access transaction for accessing the memory. The master is further configured to provide a content associated with the accessing to the inference engine.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: February 23, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ramacharan Sundararaman
  • Patent number: 10929779
    Abstract: A system to support a machine learning (ML) operation comprises a core configured to receive and interpret commands into a set of instructions for the ML operation and a memory unit configured to maintain data for the ML operation. The system further comprises an inference engine having a plurality of processing tiles, each comprising an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform tasks of the ML operation on the data in the OCM. The system also comprises an instruction streaming engine configured to distribute the instructions to the processing tiles to control their operations and to synchronize data communication between the core and the inference engine so that data transmitted between them correctly reaches the corresponding processing tiles while ensuring coherence of data shared and distributed among the core and the OCMs.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: February 23, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Gopal Nalamalapu
  • Patent number: 10915320
    Abstract: A processor includes an instruction fetch circuit to retrieve instructions from memory, and a decode unit circuit to decode retrieved instructions. The decode unit circuit identifies a shift instruction, accumulates a shift folded immediate value to track a number of bit positions shifted for a source register, and prevents the shift instruction from allocation to an execution unit of the processor.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: February 9, 2021
    Assignee: INTEL CORPORATION
    Inventors: Vineeth Mekkat, Xi Chen, Manjunath Shevgoor
  • Patent number: 10915318
    Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.
    Type: Grant
    Filed: March 4, 2019
    Date of Patent: February 9, 2021
    Assignee: Google LLC
    Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
  • Patent number: 10908913
    Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.
    Type: Grant
    Filed: August 9, 2019
    Date of Patent: February 2, 2021
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 10908900
    Abstract: A system may comprise a processor integrated circuit (IC) and a vector mapping sub-system that is separate from the processor IC and includes one or more ICs. The system may receive input data for processing by a predictive model and generate at least one memory address from the input data. At least one memory address may be provided to the vector mapping sub-system. The vector mapping sub-system generates a resulting vector of numbers based on the at least one memory address. The resulting vector can be a fixed length vector representation of the input data. The resulting vector is provided from the vector mapping sub-system to the processor IC. The processor IC executes one or more instructions for the predictive model using the resulting vector to generate a prediction. A corresponding method also is disclosed.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: February 2, 2021
    Assignee: Groq, Inc.
    Inventor: Jonathan Alexander Ross
  • Patent number: 10891231
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: July 1, 2019
    Date of Patent: January 12, 2021
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 10853078
    Abstract: A processor includes a store buffer to store store instructions to be processed to store data in main memory, a load buffer to store load instructions to be processed to load data from main memory, and a loop invariant code motion (LICM) protection structure coupled to the store buffer and the load buffer. The LPT tracks information to compare an address of a store or snoop microoperation with entries in the LICM and re-loads a load microoperation of a matching entry.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: December 1, 2020
    Assignee: INTEL CORPORATION
    Inventors: Vineeth Mekkat, Mark Dechene, Zhongying Zhang, John Faistl, Janghaeng Lee, Hou-Jen Ko, Sebastian Winkel, Oleg Margulis
  • Patent number: 10838724
    Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.
    Type: Grant
    Filed: March 11, 2019
    Date of Patent: November 17, 2020
    Assignee: Google LLC
    Inventors: Dong Hyuk Woo, Andrew Everett Phelps
  • Patent number: 10831491
    Abstract: The present disclosure is directed to systems and methods for mitigating or eliminating the effectiveness of a side channel attack, such as a Spectre type attack, by limiting the ability of a user-level branch prediction inquiry to access system-level branch prediction data. The branch prediction data stored in the BTB may be apportioned into a plurality of BTB data portions. BTB control circuitry identifies the initiator of a received branch prediction inquiry. Based on the identity of the branch prediction inquiry initiator, the BTB control circuitry causes BTB look-up circuitry to selectively search one or more of the plurality of BTB data portions.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: November 10, 2020
    Assignee: Intel Corporation
    Inventors: Vadim Sukhomlinov, Kshitij Doshi
  • Patent number: 10824433
    Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: November 3, 2020
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Patent number: 10810131
    Abstract: A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.
    Type: Grant
    Filed: June 11, 2019
    Date of Patent: October 20, 2020
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 10789070
    Abstract: Techniques for synchronizing a set of code branches are disclosed. A synchronization process is triggered by an event and/or a schedule. The synchronization process includes traversing each code branch, such that parent branches of a particular branch are “in sync” prior to being merged into the particular branch. In an embodiment, a hierarchical order for a set of branches is determined. The branch represented by the top node of the hierarchical order does not have any parents. A branch that is a child of the branch represented by the top node is in the second level of the hierarchical order. The branch in the second level is updated by incorporating the current state of the branch represented by the top node. Thereafter, each branch is iteratively updated by incorporating the current state of the branch's parent branch. Hence, changes to any parent branch are propagated through all its descendant branches.
    Type: Grant
    Filed: July 8, 2019
    Date of Patent: September 29, 2020
    Assignee: Oracle International Corporation
    Inventors: Maurizio Cimadamore, Brian Goetz
  • Patent number: 10783000
    Abstract: Associating working sets and threads is disclosed. An indication of a stalling event is received. In response to receiving the indication of the stalling event, a state of a processor associated with the stalling event is saved. At least one of an identifier of a guest thread running in the processor and a guest physical address referenced by the processor is obtained from the saved processor state.
    Type: Grant
    Filed: May 8, 2019
    Date of Patent: September 22, 2020
    Assignee: TidalScale, Inc.
    Inventors: Isaac R. Nassi, Kleoni Ioannidou, David P. Reed, I-Chun Fang, Michael Berman, Mark Hill, Brian Moffet
  • Patent number: 10776113
    Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit includes receiving, by a load-store unit (LSU) of the processing unit, an OoO window of instructions including a plurality of instructions to be executed OoO, and issuing, by the LSU, instructions from the OoO window. The issuing includes selecting an instruction from the OoO window, the instruction using an effective address. Further, in response to the instruction being a load instruction, it is determined whether the effective address is present in an effective address directory (EAD). In response to the effective address being present in the EAD, the load instruction is issued using the effective address. Further, in response to the instruction being a store instruction, a real address mapped to the effective address is determined from an effective-real translation (ERT) table, and the store instruction is issued using the real address.
    Type: Grant
    Filed: June 21, 2019
    Date of Patent: September 15, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10768930
    Abstract: A method provides for decoding, in a microprocessor, an instruction into data identifying a first register, a second register, an immediate value, and an opcode identifier. The opcode identifier is interpreted as indicating that an arithmetic operation is to be performed on the first register and the second register, and that the microprocessor is to perform a change of control operation in response to the addition of the first register and the second register causing overflow or underflow. The change of control operation is to a location in a program determined based on the immediate value. A processor can be provided with a decoder and other supporting circuitry to implement such method. Overflow/underflow can be checked on word boundaries of a double-word operation.
    Type: Grant
    Filed: February 2, 2015
    Date of Patent: September 8, 2020
    Assignee: MIPS Tech, LLC
    Inventor: Ranganathan Sudhakar
  • Patent number: 10768933
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.
    Type: Grant
    Filed: February 12, 2019
    Date of Patent: September 8, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Zbiciak, Timothy D. Anderson
  • Patent number: 10761850
    Abstract: Disclosed embodiments relate to look up table operations implemented in a digital data processor. A look up table read instruction recalls data elements of a specified data size from table(s) and stores recalled data elements in successive slots in a destination register. Disclosed embodiments promote data elements to a larger size with selected sign or zero extension. A source operand register stores vector offsets from a table start address. A destination operand stores the results of the look up table read. The look up table instruction implies a base address register and a configuration register. The base address register stores a table base address. The configuration register sets various look up table read operation parameters.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: September 1, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Duc Bui, Dheera Balasubramanian, Naveen Bhoria, Sahithi Krishna
  • Patent number: 10754657
    Abstract: An apparatus includes a memory and a processor. The memory may be configured to store a directed acyclic graph. The processor may be configured to (i) receive a command to run the directed acyclic graph, (ii) parse the directed acyclic graph into a data flow including one or more operators, (iii) schedule the operators in one or more data paths, and (iv) generate one or more output vectors by processing one or more input vectors in the data paths. The processor generally comprises a plurality of hardware engines. The data paths may be implemented with the hardware engines. The hardware engines may operate in parallel to each other.
    Type: Grant
    Filed: April 11, 2019
    Date of Patent: August 25, 2020
    Assignee: Ambarella International LP
    Inventors: Leslie D. Kohn, Robert C. Kunz