Patents Examined by Daniel H. Pan

Optimize control-flow convergence on SIMD engine using divergence depth

Patent number: 10936323

Abstract: There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running Single Program Multiple Data code on a Single Instruction Multiple Data machine. The machine runs an instruction stream over input data streams and machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation and updates the lane-PC of each active lane according to targets of the branch operation. An instruction of the instruction stream includes a barrier indicating a convergence point for all lanes to join. In response to a lane reaching a barrier: evaluating whether all lane-PCs are set to a same thread-PC; and if the lane-PCs are not set to the same thread-PC, selecting an active lane from the plurality of lanes; otherwise, incrementing the lane-PCs of all the lanes, and then selecting an active lane from the plurality of lanes.

Type: Grant

Filed: June 12, 2019

Date of Patent: March 2, 2021

Assignee: International Business Machines Corporation

Inventors: Gheorghe Almasi, Jose Moreira, Jessica H. Tseng, Peng Wu
Address interleaving for machine learning

Patent number: 10929778

Abstract: A system includes a memory, an interface engine, and a master. The memory is configured to store data. The inference engine is configured to receive the data and to perform one or more computation tasks of a machine learning (ML) operation associated with the data. The master is configured to interleave an address associated with memory access transaction for accessing the memory. The master is further configured to provide a content associated with the accessing to the inference engine.

Type: Grant

Filed: May 22, 2019

Date of Patent: February 23, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ramacharan Sundararaman
Architecture to support synchronization between core and inference engine for machine learning

Patent number: 10929779

Abstract: A system to support a machine learning (ML) operation comprises a core configured to receive and interpret commands into a set of instructions for the ML operation and a memory unit configured to maintain data for the ML operation. The system further comprises an inference engine having a plurality of processing tiles, each comprising an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform tasks of the ML operation on the data in the OCM. The system also comprises an instruction streaming engine configured to distribute the instructions to the processing tiles to control their operations and to synchronize data communication between the core and the inference engine so that data transmitted between them correctly reaches the corresponding processing tiles while ensuring coherence of data shared and distributed among the core and the OCMs.

Type: Grant

Filed: May 22, 2019

Date of Patent: February 23, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Gopal Nalamalapu
Shift-folding for efficient load coalescing in a binary translation based processor

Patent number: 10915320

Abstract: A processor includes an instruction fetch circuit to retrieve instructions from memory, and a decode unit circuit to decode retrieved instructions. The decode unit circuit identifies a shift instruction, accumulates a shift folded immediate value to track a number of bit positions shifted for a source register, and prevents the shift instruction from allocation to an execution unit of the processor.

Type: Grant

Filed: December 21, 2018

Date of Patent: February 9, 2021

Assignee: INTEL CORPORATION

Inventors: Vineeth Mekkat, Xi Chen, Manjunath Shevgoor
Vector processing unit

Patent number: 10915318

Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.

Type: Grant

Filed: March 4, 2019

Date of Patent: February 9, 2021

Assignee: Google LLC

Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
Method for a delayed branch implementation by using a front end track table

Patent number: 10908913

Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.

Type: Grant

Filed: August 9, 2019

Date of Patent: February 2, 2021

Assignee: Intel Corporation

Inventor: Mohammad Abdallah
Efficient mapping of input data to vectors for a predictive model

Patent number: 10908900

Abstract: A system may comprise a processor integrated circuit (IC) and a vector mapping sub-system that is separate from the processor IC and includes one or more ICs. The system may receive input data for processing by a predictive model and generate at least one memory address from the input data. At least one memory address may be provided to the vector mapping sub-system. The vector mapping sub-system generates a resulting vector of numbers based on the at least one memory address. The resulting vector can be a fixed length vector representation of the input data. The resulting vector is provided from the vector mapping sub-system to the processor IC. The processor IC executes one or more instructions for the predictive model using the resulting vector to generate a prediction. A corresponding method also is disclosed.

Type: Grant

Filed: February 15, 2019

Date of Patent: February 2, 2021

Assignee: Groq, Inc.

Inventor: Jonathan Alexander Ross
Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets

Patent number: 10891231

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.

Type: Grant

Filed: July 1, 2019

Date of Patent: January 12, 2021

Assignee: Texas Instruments Incorporated

Inventor: Joseph Zbiciak
Method and apparatus for supporting speculative memory optimizations

Patent number: 10853078

Abstract: A processor includes a store buffer to store store instructions to be processed to store data in main memory, a load buffer to store load instructions to be processed to load data from main memory, and a loop invariant code motion (LICM) protection structure coupled to the store buffer and the load buffer. The LPT tracks information to compare an address of a store or snoop microoperation with entries in the LICM and re-loads a load microoperation of a matching entry.

Type: Grant

Filed: December 21, 2018

Date of Patent: December 1, 2020

Assignee: INTEL CORPORATION

Inventors: Vineeth Mekkat, Mark Dechene, Zhongying Zhang, John Faistl, Janghaeng Lee, Hou-Jen Ko, Sebastian Winkel, Oleg Margulis
Accessing data in multi-dimensional tensors

Patent number: 10838724

Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.

Type: Grant

Filed: March 11, 2019

Date of Patent: November 17, 2020

Assignee: Google LLC

Inventors: Dong Hyuk Woo, Andrew Everett Phelps
Selective access to partitioned branch transfer buffer (BTB) content

Patent number: 10831491

Abstract: The present disclosure is directed to systems and methods for mitigating or eliminating the effectiveness of a side channel attack, such as a Spectre type attack, by limiting the ability of a user-level branch prediction inquiry to access system-level branch prediction data. The branch prediction data stored in the BTB may be apportioned into a plurality of BTB data portions. BTB control circuitry identifies the initiator of a received branch prediction inquiry. Based on the identity of the branch prediction inquiry initiator, the BTB control circuitry causes BTB look-up circuitry to selectively search one or more of the plurality of BTB data portions.

Type: Grant

Filed: June 29, 2018

Date of Patent: November 10, 2020

Assignee: Intel Corporation

Inventors: Vadim Sukhomlinov, Kshitij Doshi
Array-based inference engine for machine learning

Patent number: 10824433

Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.

Type: Grant

Filed: December 19, 2018

Date of Patent: November 3, 2020

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
Streaming engine with multi dimensional circular addressing selectable at each dimension

Patent number: 10810131

Abstract: A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

Type: Grant

Filed: June 11, 2019

Date of Patent: October 20, 2020

Assignee: Texas Instruments Incorporated

Inventor: Joseph Zbiciak
Synchronizing a set of code branches

Patent number: 10789070

Abstract: Techniques for synchronizing a set of code branches are disclosed. A synchronization process is triggered by an event and/or a schedule. The synchronization process includes traversing each code branch, such that parent branches of a particular branch are “in sync” prior to being merged into the particular branch. In an embodiment, a hierarchical order for a set of branches is determined. The branch represented by the top node of the hierarchical order does not have any parents. A branch that is a child of the branch represented by the top node is in the second level of the hierarchical order. The branch in the second level is updated by incorporating the current state of the branch represented by the top node. Thereafter, each branch is iteratively updated by incorporating the current state of the branch's parent branch. Hence, changes to any parent branch are propagated through all its descendant branches.

Type: Grant

Filed: July 8, 2019

Date of Patent: September 29, 2020

Assignee: Oracle International Corporation

Inventors: Maurizio Cimadamore, Brian Goetz
Associating working sets and threads

Patent number: 10783000

Abstract: Associating working sets and threads is disclosed. An indication of a stalling event is received. In response to receiving the indication of the stalling event, a state of a processor associated with the stalling event is saved. At least one of an identifier of a guest thread running in the processor and a guest physical address referenced by the processor is obtained from the saved processor state.

Type: Grant

Filed: May 8, 2019

Date of Patent: September 22, 2020

Assignee: TidalScale, Inc.

Inventors: Isaac R. Nassi, Kleoni Ioannidou, David P. Reed, I-Chun Fang, Michael Berman, Mark Hill, Brian Moffet
Executing load-store operations without address translation hardware per load-store unit port

Patent number: 10776113

Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit includes receiving, by a load-store unit (LSU) of the processing unit, an OoO window of instructions including a plurality of instructions to be executed OoO, and issuing, by the LSU, instructions from the OoO window. The issuing includes selecting an instruction from the OoO window, the instruction using an effective address. Further, in response to the instruction being a load instruction, it is determined whether the effective address is present in an effective address directory (EAD). In response to the effective address being present in the EAD, the load instruction is issued using the effective address. Further, in response to the instruction being a store instruction, a real address mapped to the effective address is determined from an effective-real translation (ERT) table, and the store instruction is issued using the real address.

Type: Grant

Filed: June 21, 2019

Date of Patent: September 15, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
Processor supporting arithmetic instructions with branch on overflow and methods

Patent number: 10768930

Abstract: A method provides for decoding, in a microprocessor, an instruction into data identifying a first register, a second register, an immediate value, and an opcode identifier. The opcode identifier is interpreted as indicating that an arithmetic operation is to be performed on the first register and the second register, and that the microprocessor is to perform a change of control operation in response to the addition of the first register and the second register causing overflow or underflow. The change of control operation is to a location in a program determined based on the immediate value. A processor can be provided with a decoder and other supporting circuitry to implement such method. Overflow/underflow can be checked on word boundaries of a double-word operation.

Type: Grant

Filed: February 2, 2015

Date of Patent: September 8, 2020

Assignee: MIPS Tech, LLC

Inventor: Ranganathan Sudhakar
Streaming engine with stream metadata saving for context switching

Patent number: 10768933

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.

Type: Grant

Filed: February 12, 2019

Date of Patent: September 8, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Joseph Zbiciak, Timothy D. Anderson
Look up table with data element promotion

Patent number: 10761850

Abstract: Disclosed embodiments relate to look up table operations implemented in a digital data processor. A look up table read instruction recalls data elements of a specified data size from table(s) and stores recalled data elements in successive slots in a destination register. Disclosed embodiments promote data elements to a larger size with selected sign or zero extension. A source operand register stores vector offsets from a table start address. A destination operand stores the results of the look up table read. The look up table instruction implies a base address register and a configuration register. The base address register stores a table base address. The configuration register sets various look up table read operation parameters.

Type: Grant

Filed: March 29, 2018

Date of Patent: September 1, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Duc Bui, Dheera Balasubramanian, Naveen Bhoria, Sahithi Krishna
Computer vision processing in hardware data paths

Patent number: 10754657

Abstract: An apparatus includes a memory and a processor. The memory may be configured to store a directed acyclic graph. The processor may be configured to (i) receive a command to run the directed acyclic graph, (ii) parse the directed acyclic graph into a data flow including one or more operators, (iii) schedule the operators in one or more data paths, and (iv) generate one or more output vectors by processing one or more input vectors in the data paths. The processor generally comprises a plurality of hardware engines. The data paths may be implemented with the hardware engines. The hardware engines may operate in parallel to each other.

Type: Grant

Filed: April 11, 2019

Date of Patent: August 25, 2020

Assignee: Ambarella International LP

Inventors: Leslie D. Kohn, Robert C. Kunz

prev 1 2 3 4 5 6 7 8 9 … next