Systolic Array Processor Patents (Class 712/19)

Performance scaling for dataflow deep neural network hardware accelerators

Patent number: 12141683

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

Type: Grant

Filed: April 30, 2021

Date of Patent: November 12, 2024

Assignee: Intel Corporation

Inventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
Instruction for masking randomly selected values in a source vector for neural network processing

Patent number: 11900109

Abstract: The present invention relates to an execution unit for executing a computer program comprising a sequence of instructions, which include a masking instruction. The execution unit is configured to execute the masking instruction which, when executed by the execution unit, masks randomly selected values from a source operand of n values and retains other original values from the source operand to generate a result which includes original values from the source operand and symbols in place of the selected values.

Type: Grant

Filed: February 1, 2018

Date of Patent: February 13, 2024

Assignee: GRAPHCORE LIMITED

Inventors: Stephen Felix, Simon Christian Knowles, Godfrey Da Costa
Method of debugging a processor that executes vertices of an application, each vertex being assigned to a programming thread of the processor

Patent number: 11893390

Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.

Type: Grant

Filed: July 13, 2022

Date of Patent: February 6, 2024

Assignee: GRAPHCORE LIMITED

Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
Methods and systems for processing vehicle sensor data across multiple digital signal processing cores virtually arranged in segments based on a type of sensor

Patent number: 11734017

Abstract: Example embodiments relate to scheduling and processing sensor data across multiple digital signal processing (DSP) cores. A system may include DSP cores that are virtually arranged into a first segment and a second segment, each with DSP cores arranged in a linear order. The system may process sensor data using the DSP cores and a processing pipeline. DSP cores from the first segment are configured to initiate processing a first stage sequentially based on the linear order until all DSP cores are processing portions in parallel, collate outputs to produce a first stage output, and provide a first signal to the second segment based on producing the first stage output.

Type: Grant

Filed: December 7, 2020

Date of Patent: August 22, 2023

Assignee: Waymo LLC

Inventor: Peter Brinkmann
Handling exceptions in a multi-tile processing arrangement

Patent number: 11645081

Abstract: A multitile processing system has an execution unit on each tile, and an interconnect which conducts communications between the tiles according to a bulk synchronous parallel scheme. Each tile performs an on-tile compute phase followed by an intertile exchange phase, where the exchange phase is held back until all tiles in a particular group have completed the compute phase. On completion of the compute phase, each tile generates a synchronisation request and pauses an issue of instructions until it receives a synchronisation acknowledgement. If a tile attains an excepted state, it raises an exception signal and pauses instruction issue until the excepted state has been resolved. However, tiles which are not in the excepted state can continue to perform their on-tile computer phase, and will issue their own synchronisation request in their own normal time frame.

Type: Grant

Filed: May 22, 2019

Date of Patent: May 9, 2023

Assignee: Graphcore Limited

Inventors: Alan Graham Alexander, Matthew David Fyles
Medical signal processing apparatus and medical observation system

Patent number: 11607111

Abstract: A medical signal processing apparatus processes image signals input from an imaging device. The image signals corresponds to a result of examining a subject, and the imaging device sequentially outputs the image signals from multiple pixels arrayed in a matrix according to a raster to the medical signal processing apparatus. The medical image signal processing apparatus includes: a signal divider configured to divide the image signals according to the raster sequentially output from the imaging device into first divided image signals each according to a pixel group consisting of multiple pixels arrayed in connected multiple columns; and a plurality of pre-processors configured to process, in parallel, sets of pixel information of the multiple first divided image signals divided by the signal divider.

Type: Grant

Filed: September 17, 2021

Date of Patent: March 21, 2023

Assignee: SONY OLYMPUS MEDICAL SOLUTIONS INC.

Inventor: Manabu Koiso
Network interface device

Patent number: 11570045

Abstract: A network interface device comprises a plurality of components configured to process a flow of data one after another. A control component is configured to provide one or more control messages in said flow, said one or more control message being provided to said plurality of components one after another such that a configuration of one or more of said components is changed.

Type: Grant

Filed: September 28, 2018

Date of Patent: January 31, 2023

Assignee: Xilinx, Inc.

Inventors: Steven Leslie Pope, David James Riddoch
Network system including match processing unit for table-based actions

Patent number: 11489773

Abstract: Methods and devices for processing packets with reduced data stalls are provided. The method comprises: (a) receiving a packet comprising a header portion and a payload portion, wherein the header portion is used to generate a packet header vector; (b) producing a table result by performing packet match operations, wherein the table result is generated based at least in part on the packet header vector and data stored in a match table; (c) receiving, at a match processing unit, the table result and an address of a set of instructions associated with the match table; and (d) performing, by the match processing unit, one or more actions in response to the set of instructions until completion of the instructions, wherein the one or more actions comprise modifying the header portion, updating memory based data structure or initiating an event.

Type: Grant

Filed: November 5, 2018

Date of Patent: November 1, 2022

Assignee: Pensando Systems Inc.

Inventors: Michael Brian Galles, David Clear
Method of debugging a processor that executes vertices of an application, each vertex being assigned to a programming thread of the processor

Patent number: 11416258

Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.

Type: Grant

Filed: May 22, 2019

Date of Patent: August 16, 2022

Assignee: Graphcore Limited

Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
Loading operands and outputting results from a multi-dimensional array using only a single side

Patent number: 11392535

Abstract: A computational array is implemented in which all operands and results are loaded or output from a single side of the array. The computational array comprises a plurality of cells arranged in n rows and m columns, each configured to produce a processed value based upon a weight value and an activation value. The cells receive weight and activation values via colinear weight and activation transmission channels that each extend across a first side edge of the computational array to provide weight values and activation values to the cells of the array. In addition, result values produced at a top cell of each of the m columns of the array are routed through the array to be output from the same first side edge of the array at a same relative timing at which the result values were produced.

Type: Grant

Filed: November 25, 2020

Date of Patent: July 19, 2022

Assignee: GROQ, INC.

Inventors: Jonathan Alexander Ross, Tom Hawkins, Dennis Charles Abts
Prefetching weights for use in a neural network processor

Patent number: 11281966

Abstract: A circuit for performing neural network computations for a neural network, the circuit comprising: a systolic array comprising a plurality of cells; a weight fetcher unit configured to, for each of the plurality of neural network layers: send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight sequencer units configured to, for each of the plurality of neural network layers: shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.

Type: Grant

Filed: August 2, 2018

Date of Patent: March 22, 2022

Assignee: Google LLC

Inventor: Jonathan Ross
Data formatter for convolution

Patent number: 11194490

Abstract: A circuit arrangement includes a memory circuit, data upload circuitry, data formatting circuitry, and a systolic array (SA). The data upload circuitry inputs a multi-dimensional data set and stores the multi-dimensional data set in the memory circuit. The data formatting circuitry reads subsets of the multi-dimensional data set from the memory circuit. The data formatting circuitry arranges data elements of the subsets into data streams, and outputs data elements in the data streams in parallel. The SA includes rows and columns of multiply-and-accumulate (MAC) circuits. The SA inputs data elements of the data streams to columns of MAC circuits in parallel, inputs filter values to rows of MAC circuits in parallel, and computes an output feature map from the data streams and the filter values.

Type: Grant

Filed: April 18, 2018

Date of Patent: December 7, 2021

Assignee: XILINX, INC.

Inventors: Ravi Sunkavalli, Victor J. Wu, Poching Sun
System, apparatus and method for barrier synchronization in a multi-threaded processor

Patent number: 11061742

Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

Type: Grant

Filed: June 27, 2018

Date of Patent: July 13, 2021

Assignee: INTEL CORPORATION

Inventors: Robert Pawlowski, Ankit More, Shaden Smith, Sowmya Pitchaimoorthy, Samkit Jain, Vincent Cavé, Sriram Aananthakrishnan, Jason M. Howard, Joshua B. Fryman
Semiconductor device with a function of generating inherent information

Patent number: 10971236

Abstract: The present invention utilizes a new method to provide a semiconductor device having a function of generating inherent data. The NAND-type flash memory of the present invention has a memory cell array, a page buffer/sense circuit, and a differential sense amplifier that detects the potential difference of a bit line pair of a dummy array when the dummy array of the memory cell array is read out, wherein the NAND-type flash memory outputs the inherent data of the semiconductor device according to the detection result of the differential sense amplifier.

Type: Grant

Filed: May 8, 2019

Date of Patent: April 6, 2021

Assignee: Winbond Electronics Corp.

Inventor: Sho Okabe
Self-timed processors implemented with multi-rail null convention logic and unate gates

Patent number: 10951212

Abstract: There is disclosed a self-timed processor. The self-timed processor includes a plurality of functional blocks comprising null convention logic. Each of the functional blocks outputs one or more multi-rail data values. A global acknowledge tree generates a global acknowledge signal provided to all of the plurality of functional blocks. The global acknowledge signal switches to a first state when all of the multi-rail data values output from the plurality of functional blocks are in respective valid states, and the global acknowledge signal switches to a second state when all of the multi-rail data values output from the plurality of functional blocks are in a null state.

Type: Grant

Filed: February 7, 2019

Date of Patent: March 16, 2021

Assignee: Eta Compute, Inc.

Inventors: Chao Xu, Gopal Raghavan, Ben Wiley Melton, Vidura Manu Wijayasekara, Bryan Garnett Cope, David Cureton Baker, John Whitaker Havlicek
Hardware accelerator for outer-product matrix multiplication

Patent number: 10713214

Abstract: Computational apparatus includes a systolic array of processing elements, each including a multiplier and first and second accumulators. In each of a sequence of processing cycles, the processing elements perform the following steps concurrently: Each processing element, except in the first row and first column of the array, receives first and second operands from adjacent processing elements in a preceding row and column of the array, respectively, multiplies the first and second operands together to generate a product, and accumulates the product in the first accumulator. In addition, each processing element passes a stored output data value from the second accumulator to a succeeding processing element along a respective column of the array, receives a new output data value from a preceding processing element along the respective column, and stores the new output data value in the second accumulator.

Type: Grant

Filed: September 20, 2018

Date of Patent: July 14, 2020

Assignee: HABANA LABS LTD.

Inventors: Ron Shalev, Ran Halutz
ISA extensions for synchronous coalesced accesses

Patent number: 10255070

Abstract: Global synchrony changes the way computers can be programmed. A new class of ISA level instructions (the globally-synchronous load-store) of the present invention is presented. In the context of multiple load-store machines, the globally synchronous load-store architecture allows the programmer to think about a collection of independent load-store machines as a single load-store machine. These ISA instructions may be applied to a distributed matrix transpose or other data that exhibit a high degree of data non-locality and difficulty in efficiently parallelizing on modern computer system architectures. Included in the new ISA instructions are a setup instruction and a synchronous coalescing access instruction (“sca”). The setup instruction configures a head processor to set up a global map that corresponds processor data contiguously to the memory. The “sca” instruction configures processors to block processor threads until respective times on a global clock, derived from the global map, to access the memory.

Type: Grant

Filed: September 4, 2014

Date of Patent: April 9, 2019

Assignee: Massachusetts Institute of Technology

Inventors: David Joseph Whelihan, Paul Stanton Keltcher
Self-timed processors implemented with multi-rail null convention logic and unate gates

Patent number: 10205453

Abstract: There is disclosed a self-timed processor. The self-timed processor includes a plurality of functional blocks comprising null convention logic. Each of the functional blocks outputs one or more multi-rail data values. A global acknowledge tree generates a global acknowledge signal provided to all of the plurality of functional blocks. The global acknowledge signal switches to a first state when all of the multi-rail data values output from the plurality of functional blocks are in respective valid states, and the global acknowledge signal switches to a second state when all of the multi-rail data values output from the plurality of functional blocks are in a null state.

Type: Grant

Filed: April 9, 2018

Date of Patent: February 12, 2019

Assignee: Eta Compute, Inc.

Inventors: Chao Xu, Gopal Raghavan, Ben Wiley Melton, Vidura Manu Wijayasekara, Bryan Garnett Cope, David Cureton Baker, John Whitaker Havlicek
Configurable mesh data bus in an island-based network flow processor

Patent number: 10031878

Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. A configurable mesh data bus includes a command mesh, a pull-id mesh, and two data meshes. The configurable mesh data bus extends through all the islands. For each mesh, each island includes a centrally located crossbar switch and eight half links. Two half links extend to ports on the top edge of the island, a half link extends to a port on a right edge of the island, two half links extend to ports on the bottom edge of the island, and a half link extents to a port on the left edge of the island. Two additional links extend to functional circuitry of the island. The configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit.

Type: Grant

Filed: March 20, 2017

Date of Patent: July 24, 2018

Assignee: Netronome Systems, Inc.

Inventor: Gavin J. Stark
Configurable mesh control bus in an island-based network flow processor

Patent number: 9621481

Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes islands organized in rows. A configurable mesh control bus extends through the islands. The configurable mesh control bus is configurable to have a unidirectional tree structure such that configuration information passes into the integrated circuit, through a root island, through the branches of the tree structure, and to each of the other islands. The functional circuits of the islands, as well as a configurable mesh data bus of the integrated circuit, are all configured with configuration information supplied via the tree structure. In one example, the configurable control mesh bus portion of each island includes a statically configured switch and multiple half links that radiate from the switch. The static configuration is determined by hardwired tie off connections associated with the island.

Type: Grant

Filed: February 17, 2012

Date of Patent: April 11, 2017

Assignee: Netronome Systems, Inc.

Inventor: Gavin J. Stark
Configurable mesh data bus in an island-based network flow processor

Patent number: 9612981

Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. A configurable mesh data bus includes a command mesh, a pull-id mesh, and two data meshes. The configurable mesh data bus extends through all the islands. For each mesh, each island includes a centrally located crossbar switch and eight half links. Two half links extend to ports on the top edge of the island, a half link extends to a port on a right edge of the island, two half links extend to ports on the bottom edge of the island, and a half link extents to a port on the left edge of the island. Two additional links extend to functional circuitry of the island. The configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit.

Type: Grant

Filed: February 17, 2012

Date of Patent: April 4, 2017

Assignee: Netronome Systems, Inc.

Inventor: Gavin J. Stark
Programmable micro-core processors for packet parsing

Patent number: 9455598

Abstract: Disclosed is an approach for implementing a flexible parser for a networking system. A micro-core parser is implemented to process packets in a networking system. The micro-cores of the parser read the packet headers, and perform any suitably programmed tasks upon those packets and packet headers. One or more caches may be associated with the micro-cores to hold the packet headers.

Type: Grant

Filed: June 20, 2011

Date of Patent: September 27, 2016

Assignee: Broadcom Corporation

Inventors: Kaushik Kuila, David T. Hass
Cooperative subspace demultiplexing in communication networks

Patent number: 9270421

Abstract: A source node selects a plurality of transmitting nodes to cooperatively encode a set of original packets to transfer to a destination node. Encoding produces a plurality of coded packets and a corresponding code matrix of coefficients. The coded packets and the corresponding code matrix comprise a set of independent equations of independent variables in a system of linear equations, wherein the independent variables comprise the original packets. A destination node may select a set of receiving nodes to cooperatively receive the transmissions. The destination node collects the coded packets and code matrix from the receiving nodes, which provide a sufficient number of independent equations for decoding the original packets. Decoding comprises calculating a solution for the system of linear equations.

Type: Grant

Filed: January 27, 2014

Date of Patent: February 23, 2016

Assignee: Genghiscomm Holdings, LLC

Inventor: Steve J Shattil
Cooperative subspace multiplexing in communication networks

Patent number: 9225471

Abstract: A source node selects a plurality of transmitting nodes to cooperatively encode a set of original packets to transfer to a destination node. Encoding produces a plurality of coded packets and a corresponding code matrix of coefficients. The coded packets and the corresponding code matrix comprise a set of independent equations of independent variables in a system of linear equations, wherein the independent variables comprise the original packets. A destination node may select a set of receiving nodes to cooperatively receive the transmissions. The destination node collects the coded packets and code matrix from the receiving nodes, which provide a sufficient number of independent equations for decoding the original packets. Decoding comprises calculating a solution for the system of linear equations.

Type: Grant

Filed: January 26, 2014

Date of Patent: December 29, 2015

Assignee: Genghiscomm Holdings, LLC

Inventor: Steve J Shattil
MAC processor architecture

Patent number: 8897293

Abstract: In a media access control (MAC) processor, a programmable controller is configured to execute machine readable instructions for implementing MAC functions corresponding to data received by a communication device. A tightly coupled memory is associated with the programmable controller. A system memory is coupled to the programmable controller via a system bus, and a hardware processor is coupled to the system bus and the tightly coupled memory. The hardware processor is configured to implement MAC functions on data received in a communication frame, store, in the tightly coupled memory, processed data corresponding to data in the communication frame that indicates a structure of downlink data in the communication frame, and store, in the system memory, processed data corresponding to other data in the communication frame.

Type: Grant

Filed: May 7, 2012

Date of Patent: November 25, 2014

Assignee: Marvell International Ltd.

Inventors: Bhaskar Chowdhuri, Srikanth Shubhakoti, Vinod Ananth, Hongyu Xie, Shui Cheong Lee
Packet draining from a scheduling hierarchy in a traffic manager of a network processor

Patent number: 8638805

Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.

Type: Grant

Filed: September 30, 2011

Date of Patent: January 28, 2014

Assignee: LSI Corporation

Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
Methods and apparatus for matrix decompositions in programmable logic devices

Patent number: 8555031

Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.

Type: Grant

Filed: January 4, 2013

Date of Patent: October 8, 2013

Assignee: Altera Corporation

Inventor: Michael Fitton
Methods and apparatus for matrix decompositions in programmable logic devices

Patent number: 8359458

Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.

Type: Grant

Filed: July 11, 2011

Date of Patent: January 22, 2013

Assignee: Altera Corporation

Inventor: Michael Fitton
Synchronisation of execution threads on a multi-threaded processor

Patent number: 8286180

Abstract: Method and apparatus are provided for a synchronizing execution of a plurality of threads on a multi-threaded processor. Each thread is provided with a number of synchronization points corresponding to points where it is advantageous or preferable that execution should be synchronized with another thread. Execution of a thread is paused when it reaches a synchronization point until at least one other thread with which it is intended to be synchronized reaches a corresponding synchronization point. Execution is subsequently resumed. Where an executing thread branches over a section of code which included a synchronization point then execution is paused at the end of the branch until the at least one other thread reaches the synchronization point of the end of the corresponding branch.

Type: Grant

Filed: August 24, 2007

Date of Patent: October 9, 2012

Assignee: Imagination Technologies Limited

Inventor: Yoong Chert Foo
WiMAX MAC

Patent number: 8175015

Abstract: A media access control (MAC) processor includes a programmable controller and a memory coupled to the programmable controller to store machine readable instructions for implementing MAC functions corresponding to data received by a communication device. A hardware processor is coupled to the programmable controller. The hardware processor includes a processing engine configured to implement MAC functions on the data received by the communication device. The hardware processor additionally includes a context memory coupled to the processing engine to store state information of the processing engine corresponding to one or more contexts, and context switch logic coupled to the processing to determine when the processing engine should switch contexts.

Type: Grant

Filed: December 12, 2008

Date of Patent: May 8, 2012

Assignee: Marvell International Ltd.

Inventors: Bhaskar Chowdhuri, Srikanth Shubhakoti, Vinod Ananth, Hongyu Xie, Shui Cheong Lee
Sequentially propagating instructions of thread through serially coupled PEs for concurrent processing respective thread on different data and synchronizing upon branch

Patent number: 8151090

Abstract: A systolic data processing apparatus includes a processing element (PE) array and control unit. The PE array comprises a plurality of PEs, each PE executing a thread with respect to different data according to an input instruction and pipelining the instruction at each cycle for executing a program. The control unit inputs a new instruction to a first PE of the PE array at each cycle.

Type: Grant

Filed: February 17, 2009

Date of Patent: April 3, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Gi-Ho Park, Shin-Dug Kim, Jung-Wook Park, Hoon-Mo Yang, Sung-Bae Park
Linking functional blocks for sequential operation by DONE and GO components of respective blocks pointing to same memory location to store completion indicator read as start indicator

Patent number: 8103855

Abstract: The present disclosure provides a methodology for reducing congestion of a processing unit, preferably by configuring a plurality of functional blocks to run in parallel or in series without the influence or input from the processing unit. In an embodiment, the present method chains a plurality of functional blocks together by software so that one functional block starts after the completion of another functional block. The configuration of the chain can be series, parallel, and any combination thereof, arranged to meet the circuit's objective. The chaining can be configured and re-configured, preferably by software input. The chaining can also be performed at design time or at run time. The chaining can also be modified, preferably at design time, but can also be modified at run time.

Type: Grant

Filed: June 29, 2008

Date of Patent: January 24, 2012

Assignee: Navosha Corporation

Inventors: Hirak Mitra, Raj Kulkarni, Richard Wicks, Michael Moon
METHODS AND APPARATUS FOR MATRIX DECOMPOSITIONS IN PROGRAMMABLE LOGIC DEVICES

Publication number: 20120011344

Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.

Type: Application

Filed: July 11, 2011

Publication date: January 12, 2012

Applicant: ALTERA CORPORATION

Inventor: Michael Fitton
Processor architectures for enhanced computational capability

Patent number: 8078834

Abstract: A digital signal processor includes a control block configured to issue instructions based on a stored program, and a compute array including two or more compute engines configured such that each of the issued instructions executes in successive compute engines of at least a subset of the compute engines at successive times. The digital signal processor may be utilized with a control processor or as a stand-alone processor. The compute array may be configured such that each of the issued instructions flows through successive compute engines of at least a subset of the compute engines at successive times.

Type: Grant

Filed: January 9, 2008

Date of Patent: December 13, 2011

Assignee: Analog Devices, Inc.

Inventor: Douglas Garde
Quantum computer and quantum computing method

Patent number: 8054072

Abstract: A quantum computer includes a unit including thin films A, B and C each containing a physical-system group A, B and C formed of physical systems A, B and C, the films A, B and C being alternately stacked in an order of A, B, C, A, . . . , each of the systems A, B and C having three-different-energy states |0>x, |1>x , |e>x, a quantum bit being expressed by a quantum-mechanical-superposition state of |0>x and |1>x , a light source generating light beams having angular frequencies ?A(E), ye, g, ?A(E), ye, e, ?x, ye, gg, ?x, ye, ge, ?x, ye, eg and ?x, ye, ee, ?A(E), ye, g, a unit controlling frequencies and intensities of the beams, and a unit measuring intensity of light emitted from or transmitted through physical-system group A(E) contained in a lowest one of the thin films A to detect a quantum state of the group A(E).

Type: Grant

Filed: August 19, 2009

Date of Patent: November 8, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Kouichi Ichimura, Hayato Goto
Method and apparatus for matrix decompositions in programmable logic devices

Patent number: 7979673

Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.

Type: Grant

Filed: May 10, 2010

Date of Patent: July 12, 2011

Assignee: Altera Corporation

Inventor: Michael Fitton
Chained operation of functional components with DONE and GO registers storing memory address for writing and reading linking signal value

Patent number: 7870366

Abstract: The present disclosure provides an architecture that enables massive parallel processing on an IC while alleviating control congestion, memory access congestion and wiring congestion, together with high flexibility where the processing units are soft-arranged to perform different tasks. In an embodiment, the present architecture includes a functional block with a GO component to start the functional block, and a DONE component to identifying the completion status. The GO and DONE components can be linked together, preferably by a linkage component, to chain the functional blocks. The linkage is preferably soft configurable. In another embodiment, the present architecture includes an integrated circuit comprises a plurality of functional blocks chained together for serial processing, parallel processing, or any combination thereof.

Type: Grant

Filed: June 29, 2008

Date of Patent: January 11, 2011

Inventors: Hirak Mitra, Raj Kulkarni, Richard Wicks, Michael Moon
Selective Execution Dependency Matrix

Publication number: 20100257341

Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of cells arranged in a plurality of columns and a plurality of rows. Each row represents an instruction in a processor execution queue and each cell represents a dependency relationship between two instructions in the processor execution queue. A first latch couples to the first array and comprises a first bit, the first bit indicating a first status. A second latch couples to the first array and comprises a second bit, the second bit indicating a second status. A first read port couples to the first array, comprising a first read wordline and a first read bitline. The first read wordline couples to the first latch and a first column and asserts a first available signal based on the first bit. The first read bitline couples to a first row and generates a first ready signal based on the first available signal and a first cell.

Type: Application

Filed: April 3, 2009

Publication date: October 7, 2010

Applicant: International Business Machines Corporation

Inventors: Mary D. Brown, James W. Bishop, William E. Burky, John B. Griswell, JR., Dung Q. Nguyen, Todd A. Venton
SYSTOLIC ARRAY AND CALCULATION METHOD

Publication number: 20100250640

Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.

Type: Application

Filed: November 21, 2008

Publication date: September 30, 2010

Inventor: Katsutoshi Seki
METHOD AND APPARATUS FOR MATRIX DECOMPOSITIONS IN PROGRAMMABLE LOGIC DEVICES

Publication number: 20100223445

Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.

Type: Application

Filed: May 10, 2010

Publication date: September 2, 2010

Applicant: Altera Corporation

Inventor: Michael Fitton
ARRAY PROCESSOR TYPE DATA PROCESSING APPARATUS

Publication number: 20100131738

Abstract: In an array processing section, using data strings entered from input ports, a plurality of data processor elements execute predetermined operations while transferring data to each other, and output data strings of results of the operations from a plurality of output ports. A first data string converter converts data strings stored in a plurality of data storages of a data storage group into a placement suitable for the operations in the array processing section, and enters the converted data strings into the input ports of the array processing section. A second data string converter converts the data strings output from output ports of the array processing section into a placement to be stored in the plurality of data storages of the data storage group.

Type: Application

Filed: February 22, 2008

Publication date: May 27, 2010

Inventors: Tomoyoshi Kobori, Katsutoshi Seki
Method and apparatus for matrix decomposition in programmable logic devices

Patent number: 7716454

Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.

Type: Grant

Filed: October 10, 2006

Date of Patent: May 11, 2010

Assignee: Altera Corporation

Inventor: Michael Fitton
Integrated circuit incorporating an array of interconnected processors executing a cycle-based program

Publication number: 20100100704

Abstract: An integrated circuit 4 is provided including an array 10 of processors 26 with interface circuitry 12 providing communication with further processing circuitry 14. The processors 26 within the array 10 execute individual programs which together provide the functionality of a cycle-based program. During each program-cycle of the cycle based program, each of the processors executes its respective program starting from a predetermined execution start point to evaluate a next state of at least some of the state variables of the cycle-based program. A boundary between program-cycles provides a synchronisation time (point) for processing operations performed by the array.

Type: Application

Filed: October 14, 2009

Publication date: April 22, 2010

Applicant: ARM Limited

Inventors: Stephen John Hill, Michael Peter Muller
Processing in pipelined computing units with data line and circuit configuration rule signal line

Patent number: 7653805

Abstract: A semiconductor device for performing data processing by performing a plurality of computations in cycles includes a pipeline formed by connecting a plurality of computing units in series, each of the computing units including: a data line for receiving data; a control line for receiving a rule signal; a circuit information control unit configured to store, before data processing, several circuit information items, and to output a first one of the several circuit information items according to the rule signal received via the control line in a first cycle of the data processing; a processing element configured to construct an execution circuit according to the first circuit information item, to perform a computation using data from the data line, and to output a computation result; a data register for storing the computation result, and for outputting the computation result in a second cycle; and a control register for storing the rule signal and for outputting the rule signal in the second cycle.

Type: Grant

Filed: March 23, 2007

Date of Patent: January 26, 2010

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takashi Yoshikawa, Shigehiro Asano, Yutaka Yamada
Coupling data in a parallel processing environment

Patent number: 7636835

Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises one or more interface modules including circuitry to transfer data to and from a device external to the tiles; and a sub-port routing network including circuitry to route data between a port of a switch and a plurality of sub-ports coupled to one or more interface modules.

Type: Grant

Filed: April 14, 2006

Date of Patent: December 22, 2009

Assignee: Tilera Corporation

Inventors: Carl G. Ramey, David Wentzlaff, Anant Agarwal
Coupling integrated circuits in a parallel processing environment

Patent number: 7539845

Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises an interface coupled to a plurality of the tiles to transfer data between one or more switches of the tiles and one or more switches of tiles in an externally coupled integrated circuit.

Type: Grant

Filed: April 14, 2006

Date of Patent: May 26, 2009

Assignee: Tilera Corporation

Inventors: David Wentzlaff, Carl G. Ramey, Anant Agarwal
Processor having systolic array pipeline for processing data packets

Patent number: 7418536

Abstract: A processor for use in a router, the processor having a systolic array pipeline for processing data packets to determine to which output port of the router the data packet should be routed. In one embodiment, the systolic array pipeline includes a plurality of programmable functional units and register files arranged sequentially as stages, for processing packet contexts (which contain the packet's destination address) to perform operations, under programmatic control, to determine the destination port of the router for the packet. A single stage of the systolic array may contain a register file and one or more functional units such as adders, shifters, logical units, etc., for performing, in one example, very long instruction word (vliw) operations. The processor may also include a forwarding table memory, on-chip, for storing routing information, and a cross bar selectively connecting the stages of the systolic array with the forwarding table memory.

Type: Grant

Filed: January 4, 2006

Date of Patent: August 26, 2008

Assignee: Cisco Technology, Inc.

Inventors: Arthur Tung-Tak Leung, Anthony Li, William Lynch, Sharad Mehrotra
Architecture for a processor complex of an arrayed pipelined processing engine

Patent number: 7380101

Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.

Type: Grant

Filed: December 27, 2004

Date of Patent: May 27, 2008

Assignee: Cisco Technology, Inc.

Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
Tiered sequential processing media data through multiple processor chains with longest path tier assignment of processors

Patent number: 7278009

Abstract: Tiered command distribution is described. In an embodiment, a pipeline architecture includes processor chains of data processors that process control events received from an application interface control. A tier assignment algorithm determines the longest path of data processors through the processor chains to determine a tier allocation for each data processor in the set of processor chains. Each tier includes a data processor from one or more of the processor chains where a first set of data processors in a first tier each receive a control event and process the control event and/or process the data according to the control event before a second set of data processors in a second tier each receive the control event.

Type: Grant

Filed: March 31, 2005

Date of Patent: October 2, 2007

Assignee: Microsoft Corporation

Inventors: Geoffrey R Smith, Hans-Martin Krober, Michael D. Dodd
Processing method and apparatus for implementing systolic arrays

Patent number: 7260709

Abstract: The present invention relates to a processing method and apparatus for implementing a systolic-array-like structure. Input data are stored in a depth-configurable register means (DCF) in a predetermined sequence, and are supplied to a processing means (FU) for processing said input data based on control signals generated from instruction data, wherein the depth of the register means (DCF) is controlled in accordance with the instruction data. Thereby, systolic arrays can be mapped onto a programmable processor, e.g. a VLIW processor, without the need for explicitly issuing operations to implement the register moves that constitute the delay lines of the array.

Type: Grant

Filed: April 1, 2003

Date of Patent: August 21, 2007

Assignee: Koninklijke Philips Electronics N.V.

Inventor: Bernardo De Oliveira Kastrup Pereira

1 2 next