Systolic Array Processor Patents (Class 712/19)
-
Patent number: 12141683Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.Type: GrantFiled: April 30, 2021Date of Patent: November 12, 2024Assignee: Intel CorporationInventors: Arnab Raha, Debabrata Mohapatra, Gautham Chinya, Guruguhanathan Venkataramanan, Sang Kyun Kim, Deepak Mathaikutty, Raymond Sung, Cormac Brick
-
Patent number: 11900109Abstract: The present invention relates to an execution unit for executing a computer program comprising a sequence of instructions, which include a masking instruction. The execution unit is configured to execute the masking instruction which, when executed by the execution unit, masks randomly selected values from a source operand of n values and retains other original values from the source operand to generate a result which includes original values from the source operand and symbols in place of the selected values.Type: GrantFiled: February 1, 2018Date of Patent: February 13, 2024Assignee: GRAPHCORE LIMITEDInventors: Stephen Felix, Simon Christian Knowles, Godfrey Da Costa
-
Patent number: 11893390Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.Type: GrantFiled: July 13, 2022Date of Patent: February 6, 2024Assignee: GRAPHCORE LIMITEDInventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
-
Patent number: 11734017Abstract: Example embodiments relate to scheduling and processing sensor data across multiple digital signal processing (DSP) cores. A system may include DSP cores that are virtually arranged into a first segment and a second segment, each with DSP cores arranged in a linear order. The system may process sensor data using the DSP cores and a processing pipeline. DSP cores from the first segment are configured to initiate processing a first stage sequentially based on the linear order until all DSP cores are processing portions in parallel, collate outputs to produce a first stage output, and provide a first signal to the second segment based on producing the first stage output.Type: GrantFiled: December 7, 2020Date of Patent: August 22, 2023Assignee: Waymo LLCInventor: Peter Brinkmann
-
Patent number: 11645081Abstract: A multitile processing system has an execution unit on each tile, and an interconnect which conducts communications between the tiles according to a bulk synchronous parallel scheme. Each tile performs an on-tile compute phase followed by an intertile exchange phase, where the exchange phase is held back until all tiles in a particular group have completed the compute phase. On completion of the compute phase, each tile generates a synchronisation request and pauses an issue of instructions until it receives a synchronisation acknowledgement. If a tile attains an excepted state, it raises an exception signal and pauses instruction issue until the excepted state has been resolved. However, tiles which are not in the excepted state can continue to perform their on-tile computer phase, and will issue their own synchronisation request in their own normal time frame.Type: GrantFiled: May 22, 2019Date of Patent: May 9, 2023Assignee: Graphcore LimitedInventors: Alan Graham Alexander, Matthew David Fyles
-
Patent number: 11607111Abstract: A medical signal processing apparatus processes image signals input from an imaging device. The image signals corresponds to a result of examining a subject, and the imaging device sequentially outputs the image signals from multiple pixels arrayed in a matrix according to a raster to the medical signal processing apparatus. The medical image signal processing apparatus includes: a signal divider configured to divide the image signals according to the raster sequentially output from the imaging device into first divided image signals each according to a pixel group consisting of multiple pixels arrayed in connected multiple columns; and a plurality of pre-processors configured to process, in parallel, sets of pixel information of the multiple first divided image signals divided by the signal divider.Type: GrantFiled: September 17, 2021Date of Patent: March 21, 2023Assignee: SONY OLYMPUS MEDICAL SOLUTIONS INC.Inventor: Manabu Koiso
-
Patent number: 11570045Abstract: A network interface device comprises a plurality of components configured to process a flow of data one after another. A control component is configured to provide one or more control messages in said flow, said one or more control message being provided to said plurality of components one after another such that a configuration of one or more of said components is changed.Type: GrantFiled: September 28, 2018Date of Patent: January 31, 2023Assignee: Xilinx, Inc.Inventors: Steven Leslie Pope, David James Riddoch
-
Patent number: 11489773Abstract: Methods and devices for processing packets with reduced data stalls are provided. The method comprises: (a) receiving a packet comprising a header portion and a payload portion, wherein the header portion is used to generate a packet header vector; (b) producing a table result by performing packet match operations, wherein the table result is generated based at least in part on the packet header vector and data stored in a match table; (c) receiving, at a match processing unit, the table result and an address of a set of instructions associated with the match table; and (d) performing, by the match processing unit, one or more actions in response to the set of instructions until completion of the instructions, wherein the one or more actions comprise modifying the header portion, updating memory based data structure or initiating an event.Type: GrantFiled: November 5, 2018Date of Patent: November 1, 2022Assignee: Pensando Systems Inc.Inventors: Michael Brian Galles, David Clear
-
Patent number: 11416258Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.Type: GrantFiled: May 22, 2019Date of Patent: August 16, 2022Assignee: Graphcore LimitedInventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
-
Patent number: 11392535Abstract: A computational array is implemented in which all operands and results are loaded or output from a single side of the array. The computational array comprises a plurality of cells arranged in n rows and m columns, each configured to produce a processed value based upon a weight value and an activation value. The cells receive weight and activation values via colinear weight and activation transmission channels that each extend across a first side edge of the computational array to provide weight values and activation values to the cells of the array. In addition, result values produced at a top cell of each of the m columns of the array are routed through the array to be output from the same first side edge of the array at a same relative timing at which the result values were produced.Type: GrantFiled: November 25, 2020Date of Patent: July 19, 2022Assignee: GROQ, INC.Inventors: Jonathan Alexander Ross, Tom Hawkins, Dennis Charles Abts
-
Patent number: 11281966Abstract: A circuit for performing neural network computations for a neural network, the circuit comprising: a systolic array comprising a plurality of cells; a weight fetcher unit configured to, for each of the plurality of neural network layers: send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight sequencer units configured to, for each of the plurality of neural network layers: shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.Type: GrantFiled: August 2, 2018Date of Patent: March 22, 2022Assignee: Google LLCInventor: Jonathan Ross
-
Patent number: 11194490Abstract: A circuit arrangement includes a memory circuit, data upload circuitry, data formatting circuitry, and a systolic array (SA). The data upload circuitry inputs a multi-dimensional data set and stores the multi-dimensional data set in the memory circuit. The data formatting circuitry reads subsets of the multi-dimensional data set from the memory circuit. The data formatting circuitry arranges data elements of the subsets into data streams, and outputs data elements in the data streams in parallel. The SA includes rows and columns of multiply-and-accumulate (MAC) circuits. The SA inputs data elements of the data streams to columns of MAC circuits in parallel, inputs filter values to rows of MAC circuits in parallel, and computes an output feature map from the data streams and the filter values.Type: GrantFiled: April 18, 2018Date of Patent: December 7, 2021Assignee: XILINX, INC.Inventors: Ravi Sunkavalli, Victor J. Wu, Poching Sun
-
Patent number: 11061742Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.Type: GrantFiled: June 27, 2018Date of Patent: July 13, 2021Assignee: INTEL CORPORATIONInventors: Robert Pawlowski, Ankit More, Shaden Smith, Sowmya Pitchaimoorthy, Samkit Jain, Vincent Cavé, Sriram Aananthakrishnan, Jason M. Howard, Joshua B. Fryman
-
Patent number: 10971236Abstract: The present invention utilizes a new method to provide a semiconductor device having a function of generating inherent data. The NAND-type flash memory of the present invention has a memory cell array, a page buffer/sense circuit, and a differential sense amplifier that detects the potential difference of a bit line pair of a dummy array when the dummy array of the memory cell array is read out, wherein the NAND-type flash memory outputs the inherent data of the semiconductor device according to the detection result of the differential sense amplifier.Type: GrantFiled: May 8, 2019Date of Patent: April 6, 2021Assignee: Winbond Electronics Corp.Inventor: Sho Okabe
-
Patent number: 10951212Abstract: There is disclosed a self-timed processor. The self-timed processor includes a plurality of functional blocks comprising null convention logic. Each of the functional blocks outputs one or more multi-rail data values. A global acknowledge tree generates a global acknowledge signal provided to all of the plurality of functional blocks. The global acknowledge signal switches to a first state when all of the multi-rail data values output from the plurality of functional blocks are in respective valid states, and the global acknowledge signal switches to a second state when all of the multi-rail data values output from the plurality of functional blocks are in a null state.Type: GrantFiled: February 7, 2019Date of Patent: March 16, 2021Assignee: Eta Compute, Inc.Inventors: Chao Xu, Gopal Raghavan, Ben Wiley Melton, Vidura Manu Wijayasekara, Bryan Garnett Cope, David Cureton Baker, John Whitaker Havlicek
-
Patent number: 10713214Abstract: Computational apparatus includes a systolic array of processing elements, each including a multiplier and first and second accumulators. In each of a sequence of processing cycles, the processing elements perform the following steps concurrently: Each processing element, except in the first row and first column of the array, receives first and second operands from adjacent processing elements in a preceding row and column of the array, respectively, multiplies the first and second operands together to generate a product, and accumulates the product in the first accumulator. In addition, each processing element passes a stored output data value from the second accumulator to a succeeding processing element along a respective column of the array, receives a new output data value from a preceding processing element along the respective column, and stores the new output data value in the second accumulator.Type: GrantFiled: September 20, 2018Date of Patent: July 14, 2020Assignee: HABANA LABS LTD.Inventors: Ron Shalev, Ran Halutz
-
Patent number: 10255070Abstract: Global synchrony changes the way computers can be programmed. A new class of ISA level instructions (the globally-synchronous load-store) of the present invention is presented. In the context of multiple load-store machines, the globally synchronous load-store architecture allows the programmer to think about a collection of independent load-store machines as a single load-store machine. These ISA instructions may be applied to a distributed matrix transpose or other data that exhibit a high degree of data non-locality and difficulty in efficiently parallelizing on modern computer system architectures. Included in the new ISA instructions are a setup instruction and a synchronous coalescing access instruction (“sca”). The setup instruction configures a head processor to set up a global map that corresponds processor data contiguously to the memory. The “sca” instruction configures processors to block processor threads until respective times on a global clock, derived from the global map, to access the memory.Type: GrantFiled: September 4, 2014Date of Patent: April 9, 2019Assignee: Massachusetts Institute of TechnologyInventors: David Joseph Whelihan, Paul Stanton Keltcher
-
Patent number: 10205453Abstract: There is disclosed a self-timed processor. The self-timed processor includes a plurality of functional blocks comprising null convention logic. Each of the functional blocks outputs one or more multi-rail data values. A global acknowledge tree generates a global acknowledge signal provided to all of the plurality of functional blocks. The global acknowledge signal switches to a first state when all of the multi-rail data values output from the plurality of functional blocks are in respective valid states, and the global acknowledge signal switches to a second state when all of the multi-rail data values output from the plurality of functional blocks are in a null state.Type: GrantFiled: April 9, 2018Date of Patent: February 12, 2019Assignee: Eta Compute, Inc.Inventors: Chao Xu, Gopal Raghavan, Ben Wiley Melton, Vidura Manu Wijayasekara, Bryan Garnett Cope, David Cureton Baker, John Whitaker Havlicek
-
Patent number: 10031878Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. A configurable mesh data bus includes a command mesh, a pull-id mesh, and two data meshes. The configurable mesh data bus extends through all the islands. For each mesh, each island includes a centrally located crossbar switch and eight half links. Two half links extend to ports on the top edge of the island, a half link extends to a port on a right edge of the island, two half links extend to ports on the bottom edge of the island, and a half link extents to a port on the left edge of the island. Two additional links extend to functional circuitry of the island. The configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit.Type: GrantFiled: March 20, 2017Date of Patent: July 24, 2018Assignee: Netronome Systems, Inc.Inventor: Gavin J. Stark
-
Patent number: 9621481Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes islands organized in rows. A configurable mesh control bus extends through the islands. The configurable mesh control bus is configurable to have a unidirectional tree structure such that configuration information passes into the integrated circuit, through a root island, through the branches of the tree structure, and to each of the other islands. The functional circuits of the islands, as well as a configurable mesh data bus of the integrated circuit, are all configured with configuration information supplied via the tree structure. In one example, the configurable control mesh bus portion of each island includes a statically configured switch and multiple half links that radiate from the switch. The static configuration is determined by hardwired tie off connections associated with the island.Type: GrantFiled: February 17, 2012Date of Patent: April 11, 2017Assignee: Netronome Systems, Inc.Inventor: Gavin J. Stark
-
Patent number: 9612981Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. A configurable mesh data bus includes a command mesh, a pull-id mesh, and two data meshes. The configurable mesh data bus extends through all the islands. For each mesh, each island includes a centrally located crossbar switch and eight half links. Two half links extend to ports on the top edge of the island, a half link extends to a port on a right edge of the island, two half links extend to ports on the bottom edge of the island, and a half link extents to a port on the left edge of the island. Two additional links extend to functional circuitry of the island. The configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit.Type: GrantFiled: February 17, 2012Date of Patent: April 4, 2017Assignee: Netronome Systems, Inc.Inventor: Gavin J. Stark
-
Patent number: 9455598Abstract: Disclosed is an approach for implementing a flexible parser for a networking system. A micro-core parser is implemented to process packets in a networking system. The micro-cores of the parser read the packet headers, and perform any suitably programmed tasks upon those packets and packet headers. One or more caches may be associated with the micro-cores to hold the packet headers.Type: GrantFiled: June 20, 2011Date of Patent: September 27, 2016Assignee: Broadcom CorporationInventors: Kaushik Kuila, David T. Hass
-
Patent number: 9270421Abstract: A source node selects a plurality of transmitting nodes to cooperatively encode a set of original packets to transfer to a destination node. Encoding produces a plurality of coded packets and a corresponding code matrix of coefficients. The coded packets and the corresponding code matrix comprise a set of independent equations of independent variables in a system of linear equations, wherein the independent variables comprise the original packets. A destination node may select a set of receiving nodes to cooperatively receive the transmissions. The destination node collects the coded packets and code matrix from the receiving nodes, which provide a sufficient number of independent equations for decoding the original packets. Decoding comprises calculating a solution for the system of linear equations.Type: GrantFiled: January 27, 2014Date of Patent: February 23, 2016Assignee: Genghiscomm Holdings, LLCInventor: Steve J Shattil
-
Patent number: 9225471Abstract: A source node selects a plurality of transmitting nodes to cooperatively encode a set of original packets to transfer to a destination node. Encoding produces a plurality of coded packets and a corresponding code matrix of coefficients. The coded packets and the corresponding code matrix comprise a set of independent equations of independent variables in a system of linear equations, wherein the independent variables comprise the original packets. A destination node may select a set of receiving nodes to cooperatively receive the transmissions. The destination node collects the coded packets and code matrix from the receiving nodes, which provide a sufficient number of independent equations for decoding the original packets. Decoding comprises calculating a solution for the system of linear equations.Type: GrantFiled: January 26, 2014Date of Patent: December 29, 2015Assignee: Genghiscomm Holdings, LLCInventor: Steve J Shattil
-
Patent number: 8897293Abstract: In a media access control (MAC) processor, a programmable controller is configured to execute machine readable instructions for implementing MAC functions corresponding to data received by a communication device. A tightly coupled memory is associated with the programmable controller. A system memory is coupled to the programmable controller via a system bus, and a hardware processor is coupled to the system bus and the tightly coupled memory. The hardware processor is configured to implement MAC functions on data received in a communication frame, store, in the tightly coupled memory, processed data corresponding to data in the communication frame that indicates a structure of downlink data in the communication frame, and store, in the system memory, processed data corresponding to other data in the communication frame.Type: GrantFiled: May 7, 2012Date of Patent: November 25, 2014Assignee: Marvell International Ltd.Inventors: Bhaskar Chowdhuri, Srikanth Shubhakoti, Vinod Ananth, Hongyu Xie, Shui Cheong Lee
-
Patent number: 8638805Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.Type: GrantFiled: September 30, 2011Date of Patent: January 28, 2014Assignee: LSI CorporationInventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
-
Patent number: 8555031Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.Type: GrantFiled: January 4, 2013Date of Patent: October 8, 2013Assignee: Altera CorporationInventor: Michael Fitton
-
Patent number: 8359458Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.Type: GrantFiled: July 11, 2011Date of Patent: January 22, 2013Assignee: Altera CorporationInventor: Michael Fitton
-
Patent number: 8286180Abstract: Method and apparatus are provided for a synchronizing execution of a plurality of threads on a multi-threaded processor. Each thread is provided with a number of synchronization points corresponding to points where it is advantageous or preferable that execution should be synchronized with another thread. Execution of a thread is paused when it reaches a synchronization point until at least one other thread with which it is intended to be synchronized reaches a corresponding synchronization point. Execution is subsequently resumed. Where an executing thread branches over a section of code which included a synchronization point then execution is paused at the end of the branch until the at least one other thread reaches the synchronization point of the end of the corresponding branch.Type: GrantFiled: August 24, 2007Date of Patent: October 9, 2012Assignee: Imagination Technologies LimitedInventor: Yoong Chert Foo
-
Patent number: 8175015Abstract: A media access control (MAC) processor includes a programmable controller and a memory coupled to the programmable controller to store machine readable instructions for implementing MAC functions corresponding to data received by a communication device. A hardware processor is coupled to the programmable controller. The hardware processor includes a processing engine configured to implement MAC functions on the data received by the communication device. The hardware processor additionally includes a context memory coupled to the processing engine to store state information of the processing engine corresponding to one or more contexts, and context switch logic coupled to the processing to determine when the processing engine should switch contexts.Type: GrantFiled: December 12, 2008Date of Patent: May 8, 2012Assignee: Marvell International Ltd.Inventors: Bhaskar Chowdhuri, Srikanth Shubhakoti, Vinod Ananth, Hongyu Xie, Shui Cheong Lee
-
Patent number: 8151090Abstract: A systolic data processing apparatus includes a processing element (PE) array and control unit. The PE array comprises a plurality of PEs, each PE executing a thread with respect to different data according to an input instruction and pipelining the instruction at each cycle for executing a program. The control unit inputs a new instruction to a first PE of the PE array at each cycle.Type: GrantFiled: February 17, 2009Date of Patent: April 3, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Gi-Ho Park, Shin-Dug Kim, Jung-Wook Park, Hoon-Mo Yang, Sung-Bae Park
-
Patent number: 8103855Abstract: The present disclosure provides a methodology for reducing congestion of a processing unit, preferably by configuring a plurality of functional blocks to run in parallel or in series without the influence or input from the processing unit. In an embodiment, the present method chains a plurality of functional blocks together by software so that one functional block starts after the completion of another functional block. The configuration of the chain can be series, parallel, and any combination thereof, arranged to meet the circuit's objective. The chaining can be configured and re-configured, preferably by software input. The chaining can also be performed at design time or at run time. The chaining can also be modified, preferably at design time, but can also be modified at run time.Type: GrantFiled: June 29, 2008Date of Patent: January 24, 2012Assignee: Navosha CorporationInventors: Hirak Mitra, Raj Kulkarni, Richard Wicks, Michael Moon
-
Publication number: 20120011344Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.Type: ApplicationFiled: July 11, 2011Publication date: January 12, 2012Applicant: ALTERA CORPORATIONInventor: Michael Fitton
-
Patent number: 8078834Abstract: A digital signal processor includes a control block configured to issue instructions based on a stored program, and a compute array including two or more compute engines configured such that each of the issued instructions executes in successive compute engines of at least a subset of the compute engines at successive times. The digital signal processor may be utilized with a control processor or as a stand-alone processor. The compute array may be configured such that each of the issued instructions flows through successive compute engines of at least a subset of the compute engines at successive times.Type: GrantFiled: January 9, 2008Date of Patent: December 13, 2011Assignee: Analog Devices, Inc.Inventor: Douglas Garde
-
Patent number: 8054072Abstract: A quantum computer includes a unit including thin films A, B and C each containing a physical-system group A, B and C formed of physical systems A, B and C, the films A, B and C being alternately stacked in an order of A, B, C, A, . . . , each of the systems A, B and C having three-different-energy states |0>x, |1>x , |e>x, a quantum bit being expressed by a quantum-mechanical-superposition state of |0>x and |1>x , a light source generating light beams having angular frequencies ?A(E), ye, g, ?A(E), ye, e, ?x, ye, gg, ?x, ye, ge, ?x, ye, eg and ?x, ye, ee, ?A(E), ye, g, a unit controlling frequencies and intensities of the beams, and a unit measuring intensity of light emitted from or transmitted through physical-system group A(E) contained in a lowest one of the thin films A to detect a quantum state of the group A(E).Type: GrantFiled: August 19, 2009Date of Patent: November 8, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Kouichi Ichimura, Hayato Goto
-
Patent number: 7979673Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.Type: GrantFiled: May 10, 2010Date of Patent: July 12, 2011Assignee: Altera CorporationInventor: Michael Fitton
-
Patent number: 7870366Abstract: The present disclosure provides an architecture that enables massive parallel processing on an IC while alleviating control congestion, memory access congestion and wiring congestion, together with high flexibility where the processing units are soft-arranged to perform different tasks. In an embodiment, the present architecture includes a functional block with a GO component to start the functional block, and a DONE component to identifying the completion status. The GO and DONE components can be linked together, preferably by a linkage component, to chain the functional blocks. The linkage is preferably soft configurable. In another embodiment, the present architecture includes an integrated circuit comprises a plurality of functional blocks chained together for serial processing, parallel processing, or any combination thereof.Type: GrantFiled: June 29, 2008Date of Patent: January 11, 2011Inventors: Hirak Mitra, Raj Kulkarni, Richard Wicks, Michael Moon
-
Publication number: 20100257341Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of cells arranged in a plurality of columns and a plurality of rows. Each row represents an instruction in a processor execution queue and each cell represents a dependency relationship between two instructions in the processor execution queue. A first latch couples to the first array and comprises a first bit, the first bit indicating a first status. A second latch couples to the first array and comprises a second bit, the second bit indicating a second status. A first read port couples to the first array, comprising a first read wordline and a first read bitline. The first read wordline couples to the first latch and a first column and asserts a first available signal based on the first bit. The first read bitline couples to a first row and generates a first ready signal based on the first available signal and a first cell.Type: ApplicationFiled: April 3, 2009Publication date: October 7, 2010Applicant: International Business Machines CorporationInventors: Mary D. Brown, James W. Bishop, William E. Burky, John B. Griswell, JR., Dung Q. Nguyen, Todd A. Venton
-
Publication number: 20100250640Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.Type: ApplicationFiled: November 21, 2008Publication date: September 30, 2010Inventor: Katsutoshi Seki
-
Publication number: 20100223445Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.Type: ApplicationFiled: May 10, 2010Publication date: September 2, 2010Applicant: Altera CorporationInventor: Michael Fitton
-
Publication number: 20100131738Abstract: In an array processing section, using data strings entered from input ports, a plurality of data processor elements execute predetermined operations while transferring data to each other, and output data strings of results of the operations from a plurality of output ports. A first data string converter converts data strings stored in a plurality of data storages of a data storage group into a placement suitable for the operations in the array processing section, and enters the converted data strings into the input ports of the array processing section. A second data string converter converts the data strings output from output ports of the array processing section into a placement to be stored in the plurality of data storages of the data storage group.Type: ApplicationFiled: February 22, 2008Publication date: May 27, 2010Inventors: Tomoyoshi Kobori, Katsutoshi Seki
-
Patent number: 7716454Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.Type: GrantFiled: October 10, 2006Date of Patent: May 11, 2010Assignee: Altera CorporationInventor: Michael Fitton
-
Publication number: 20100100704Abstract: An integrated circuit 4 is provided including an array 10 of processors 26 with interface circuitry 12 providing communication with further processing circuitry 14. The processors 26 within the array 10 execute individual programs which together provide the functionality of a cycle-based program. During each program-cycle of the cycle based program, each of the processors executes its respective program starting from a predetermined execution start point to evaluate a next state of at least some of the state variables of the cycle-based program. A boundary between program-cycles provides a synchronisation time (point) for processing operations performed by the array.Type: ApplicationFiled: October 14, 2009Publication date: April 22, 2010Applicant: ARM LimitedInventors: Stephen John Hill, Michael Peter Muller
-
Patent number: 7653805Abstract: A semiconductor device for performing data processing by performing a plurality of computations in cycles includes a pipeline formed by connecting a plurality of computing units in series, each of the computing units including: a data line for receiving data; a control line for receiving a rule signal; a circuit information control unit configured to store, before data processing, several circuit information items, and to output a first one of the several circuit information items according to the rule signal received via the control line in a first cycle of the data processing; a processing element configured to construct an execution circuit according to the first circuit information item, to perform a computation using data from the data line, and to output a computation result; a data register for storing the computation result, and for outputting the computation result in a second cycle; and a control register for storing the rule signal and for outputting the rule signal in the second cycle.Type: GrantFiled: March 23, 2007Date of Patent: January 26, 2010Assignee: Kabushiki Kaisha ToshibaInventors: Takashi Yoshikawa, Shigehiro Asano, Yutaka Yamada
-
Patent number: 7636835Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises one or more interface modules including circuitry to transfer data to and from a device external to the tiles; and a sub-port routing network including circuitry to route data between a port of a switch and a plurality of sub-ports coupled to one or more interface modules.Type: GrantFiled: April 14, 2006Date of Patent: December 22, 2009Assignee: Tilera CorporationInventors: Carl G. Ramey, David Wentzlaff, Anant Agarwal
-
Patent number: 7539845Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises an interface coupled to a plurality of the tiles to transfer data between one or more switches of the tiles and one or more switches of tiles in an externally coupled integrated circuit.Type: GrantFiled: April 14, 2006Date of Patent: May 26, 2009Assignee: Tilera CorporationInventors: David Wentzlaff, Carl G. Ramey, Anant Agarwal
-
Patent number: 7418536Abstract: A processor for use in a router, the processor having a systolic array pipeline for processing data packets to determine to which output port of the router the data packet should be routed. In one embodiment, the systolic array pipeline includes a plurality of programmable functional units and register files arranged sequentially as stages, for processing packet contexts (which contain the packet's destination address) to perform operations, under programmatic control, to determine the destination port of the router for the packet. A single stage of the systolic array may contain a register file and one or more functional units such as adders, shifters, logical units, etc., for performing, in one example, very long instruction word (vliw) operations. The processor may also include a forwarding table memory, on-chip, for storing routing information, and a cross bar selectively connecting the stages of the systolic array with the forwarding table memory.Type: GrantFiled: January 4, 2006Date of Patent: August 26, 2008Assignee: Cisco Technology, Inc.Inventors: Arthur Tung-Tak Leung, Anthony Li, William Lynch, Sharad Mehrotra
-
Patent number: 7380101Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.Type: GrantFiled: December 27, 2004Date of Patent: May 27, 2008Assignee: Cisco Technology, Inc.Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
-
Patent number: 7278009Abstract: Tiered command distribution is described. In an embodiment, a pipeline architecture includes processor chains of data processors that process control events received from an application interface control. A tier assignment algorithm determines the longest path of data processors through the processor chains to determine a tier allocation for each data processor in the set of processor chains. Each tier includes a data processor from one or more of the processor chains where a first set of data processors in a first tier each receive a control event and process the control event and/or process the data according to the control event before a second set of data processors in a second tier each receive the control event.Type: GrantFiled: March 31, 2005Date of Patent: October 2, 2007Assignee: Microsoft CorporationInventors: Geoffrey R Smith, Hans-Martin Krober, Michael D. Dodd
-
Patent number: 7260709Abstract: The present invention relates to a processing method and apparatus for implementing a systolic-array-like structure. Input data are stored in a depth-configurable register means (DCF) in a predetermined sequence, and are supplied to a processing means (FU) for processing said input data based on control signals generated from instruction data, wherein the depth of the register means (DCF) is controlled in accordance with the instruction data. Thereby, systolic arrays can be mapped onto a programmable processor, e.g. a VLIW processor, without the need for explicitly issuing operations to implement the register moves that constitute the delay lines of the array.Type: GrantFiled: April 1, 2003Date of Patent: August 21, 2007Assignee: Koninklijke Philips Electronics N.V.Inventor: Bernardo De Oliveira Kastrup Pereira