Systolic Array Processor Patents (Class 712/19)
  • Patent number: 11900109
    Abstract: The present invention relates to an execution unit for executing a computer program comprising a sequence of instructions, which include a masking instruction. The execution unit is configured to execute the masking instruction which, when executed by the execution unit, masks randomly selected values from a source operand of n values and retains other original values from the source operand to generate a result which includes original values from the source operand and symbols in place of the selected values.
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: February 13, 2024
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Simon Christian Knowles, Godfrey Da Costa
  • Patent number: 11893390
    Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.
    Type: Grant
    Filed: July 13, 2022
    Date of Patent: February 6, 2024
    Assignee: GRAPHCORE LIMITED
    Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
  • Patent number: 11734017
    Abstract: Example embodiments relate to scheduling and processing sensor data across multiple digital signal processing (DSP) cores. A system may include DSP cores that are virtually arranged into a first segment and a second segment, each with DSP cores arranged in a linear order. The system may process sensor data using the DSP cores and a processing pipeline. DSP cores from the first segment are configured to initiate processing a first stage sequentially based on the linear order until all DSP cores are processing portions in parallel, collate outputs to produce a first stage output, and provide a first signal to the second segment based on producing the first stage output.
    Type: Grant
    Filed: December 7, 2020
    Date of Patent: August 22, 2023
    Assignee: Waymo LLC
    Inventor: Peter Brinkmann
  • Patent number: 11645081
    Abstract: A multitile processing system has an execution unit on each tile, and an interconnect which conducts communications between the tiles according to a bulk synchronous parallel scheme. Each tile performs an on-tile compute phase followed by an intertile exchange phase, where the exchange phase is held back until all tiles in a particular group have completed the compute phase. On completion of the compute phase, each tile generates a synchronisation request and pauses an issue of instructions until it receives a synchronisation acknowledgement. If a tile attains an excepted state, it raises an exception signal and pauses instruction issue until the excepted state has been resolved. However, tiles which are not in the excepted state can continue to perform their on-tile computer phase, and will issue their own synchronisation request in their own normal time frame.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: May 9, 2023
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Matthew David Fyles
  • Patent number: 11607111
    Abstract: A medical signal processing apparatus processes image signals input from an imaging device. The image signals corresponds to a result of examining a subject, and the imaging device sequentially outputs the image signals from multiple pixels arrayed in a matrix according to a raster to the medical signal processing apparatus. The medical image signal processing apparatus includes: a signal divider configured to divide the image signals according to the raster sequentially output from the imaging device into first divided image signals each according to a pixel group consisting of multiple pixels arrayed in connected multiple columns; and a plurality of pre-processors configured to process, in parallel, sets of pixel information of the multiple first divided image signals divided by the signal divider.
    Type: Grant
    Filed: September 17, 2021
    Date of Patent: March 21, 2023
    Assignee: SONY OLYMPUS MEDICAL SOLUTIONS INC.
    Inventor: Manabu Koiso
  • Patent number: 11570045
    Abstract: A network interface device comprises a plurality of components configured to process a flow of data one after another. A control component is configured to provide one or more control messages in said flow, said one or more control message being provided to said plurality of components one after another such that a configuration of one or more of said components is changed.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: January 31, 2023
    Assignee: Xilinx, Inc.
    Inventors: Steven Leslie Pope, David James Riddoch
  • Patent number: 11489773
    Abstract: Methods and devices for processing packets with reduced data stalls are provided. The method comprises: (a) receiving a packet comprising a header portion and a payload portion, wherein the header portion is used to generate a packet header vector; (b) producing a table result by performing packet match operations, wherein the table result is generated based at least in part on the packet header vector and data stored in a match table; (c) receiving, at a match processing unit, the table result and an address of a set of instructions associated with the match table; and (d) performing, by the match processing unit, one or more actions in response to the set of instructions until completion of the instructions, wherein the one or more actions comprise modifying the header portion, updating memory based data structure or initiating an event.
    Type: Grant
    Filed: November 5, 2018
    Date of Patent: November 1, 2022
    Assignee: Pensando Systems Inc.
    Inventors: Michael Brian Galles, David Clear
  • Patent number: 11416258
    Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: August 16, 2022
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
  • Patent number: 11392535
    Abstract: A computational array is implemented in which all operands and results are loaded or output from a single side of the array. The computational array comprises a plurality of cells arranged in n rows and m columns, each configured to produce a processed value based upon a weight value and an activation value. The cells receive weight and activation values via colinear weight and activation transmission channels that each extend across a first side edge of the computational array to provide weight values and activation values to the cells of the array. In addition, result values produced at a top cell of each of the m columns of the array are routed through the array to be output from the same first side edge of the array at a same relative timing at which the result values were produced.
    Type: Grant
    Filed: November 25, 2020
    Date of Patent: July 19, 2022
    Assignee: GROQ, INC.
    Inventors: Jonathan Alexander Ross, Tom Hawkins, Dennis Charles Abts
  • Patent number: 11281966
    Abstract: A circuit for performing neural network computations for a neural network, the circuit comprising: a systolic array comprising a plurality of cells; a weight fetcher unit configured to, for each of the plurality of neural network layers: send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight sequencer units configured to, for each of the plurality of neural network layers: shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.
    Type: Grant
    Filed: August 2, 2018
    Date of Patent: March 22, 2022
    Assignee: Google LLC
    Inventor: Jonathan Ross
  • Patent number: 11194490
    Abstract: A circuit arrangement includes a memory circuit, data upload circuitry, data formatting circuitry, and a systolic array (SA). The data upload circuitry inputs a multi-dimensional data set and stores the multi-dimensional data set in the memory circuit. The data formatting circuitry reads subsets of the multi-dimensional data set from the memory circuit. The data formatting circuitry arranges data elements of the subsets into data streams, and outputs data elements in the data streams in parallel. The SA includes rows and columns of multiply-and-accumulate (MAC) circuits. The SA inputs data elements of the data streams to columns of MAC circuits in parallel, inputs filter values to rows of MAC circuits in parallel, and computes an output feature map from the data streams and the filter values.
    Type: Grant
    Filed: April 18, 2018
    Date of Patent: December 7, 2021
    Assignee: XILINX, INC.
    Inventors: Ravi Sunkavalli, Victor J. Wu, Poching Sun
  • Patent number: 11061742
    Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 27, 2018
    Date of Patent: July 13, 2021
    Assignee: INTEL CORPORATION
    Inventors: Robert Pawlowski, Ankit More, Shaden Smith, Sowmya Pitchaimoorthy, Samkit Jain, Vincent Cavé, Sriram Aananthakrishnan, Jason M. Howard, Joshua B. Fryman
  • Patent number: 10971236
    Abstract: The present invention utilizes a new method to provide a semiconductor device having a function of generating inherent data. The NAND-type flash memory of the present invention has a memory cell array, a page buffer/sense circuit, and a differential sense amplifier that detects the potential difference of a bit line pair of a dummy array when the dummy array of the memory cell array is read out, wherein the NAND-type flash memory outputs the inherent data of the semiconductor device according to the detection result of the differential sense amplifier.
    Type: Grant
    Filed: May 8, 2019
    Date of Patent: April 6, 2021
    Assignee: Winbond Electronics Corp.
    Inventor: Sho Okabe
  • Patent number: 10951212
    Abstract: There is disclosed a self-timed processor. The self-timed processor includes a plurality of functional blocks comprising null convention logic. Each of the functional blocks outputs one or more multi-rail data values. A global acknowledge tree generates a global acknowledge signal provided to all of the plurality of functional blocks. The global acknowledge signal switches to a first state when all of the multi-rail data values output from the plurality of functional blocks are in respective valid states, and the global acknowledge signal switches to a second state when all of the multi-rail data values output from the plurality of functional blocks are in a null state.
    Type: Grant
    Filed: February 7, 2019
    Date of Patent: March 16, 2021
    Assignee: Eta Compute, Inc.
    Inventors: Chao Xu, Gopal Raghavan, Ben Wiley Melton, Vidura Manu Wijayasekara, Bryan Garnett Cope, David Cureton Baker, John Whitaker Havlicek
  • Patent number: 10713214
    Abstract: Computational apparatus includes a systolic array of processing elements, each including a multiplier and first and second accumulators. In each of a sequence of processing cycles, the processing elements perform the following steps concurrently: Each processing element, except in the first row and first column of the array, receives first and second operands from adjacent processing elements in a preceding row and column of the array, respectively, multiplies the first and second operands together to generate a product, and accumulates the product in the first accumulator. In addition, each processing element passes a stored output data value from the second accumulator to a succeeding processing element along a respective column of the array, receives a new output data value from a preceding processing element along the respective column, and stores the new output data value in the second accumulator.
    Type: Grant
    Filed: September 20, 2018
    Date of Patent: July 14, 2020
    Assignee: HABANA LABS LTD.
    Inventors: Ron Shalev, Ran Halutz
  • Patent number: 10255070
    Abstract: Global synchrony changes the way computers can be programmed. A new class of ISA level instructions (the globally-synchronous load-store) of the present invention is presented. In the context of multiple load-store machines, the globally synchronous load-store architecture allows the programmer to think about a collection of independent load-store machines as a single load-store machine. These ISA instructions may be applied to a distributed matrix transpose or other data that exhibit a high degree of data non-locality and difficulty in efficiently parallelizing on modern computer system architectures. Included in the new ISA instructions are a setup instruction and a synchronous coalescing access instruction (“sca”). The setup instruction configures a head processor to set up a global map that corresponds processor data contiguously to the memory. The “sca” instruction configures processors to block processor threads until respective times on a global clock, derived from the global map, to access the memory.
    Type: Grant
    Filed: September 4, 2014
    Date of Patent: April 9, 2019
    Assignee: Massachusetts Institute of Technology
    Inventors: David Joseph Whelihan, Paul Stanton Keltcher
  • Patent number: 10205453
    Abstract: There is disclosed a self-timed processor. The self-timed processor includes a plurality of functional blocks comprising null convention logic. Each of the functional blocks outputs one or more multi-rail data values. A global acknowledge tree generates a global acknowledge signal provided to all of the plurality of functional blocks. The global acknowledge signal switches to a first state when all of the multi-rail data values output from the plurality of functional blocks are in respective valid states, and the global acknowledge signal switches to a second state when all of the multi-rail data values output from the plurality of functional blocks are in a null state.
    Type: Grant
    Filed: April 9, 2018
    Date of Patent: February 12, 2019
    Assignee: Eta Compute, Inc.
    Inventors: Chao Xu, Gopal Raghavan, Ben Wiley Melton, Vidura Manu Wijayasekara, Bryan Garnett Cope, David Cureton Baker, John Whitaker Havlicek
  • Patent number: 10031878
    Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. A configurable mesh data bus includes a command mesh, a pull-id mesh, and two data meshes. The configurable mesh data bus extends through all the islands. For each mesh, each island includes a centrally located crossbar switch and eight half links. Two half links extend to ports on the top edge of the island, a half link extends to a port on a right edge of the island, two half links extend to ports on the bottom edge of the island, and a half link extents to a port on the left edge of the island. Two additional links extend to functional circuitry of the island. The configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit.
    Type: Grant
    Filed: March 20, 2017
    Date of Patent: July 24, 2018
    Assignee: Netronome Systems, Inc.
    Inventor: Gavin J. Stark
  • Patent number: 9621481
    Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes islands organized in rows. A configurable mesh control bus extends through the islands. The configurable mesh control bus is configurable to have a unidirectional tree structure such that configuration information passes into the integrated circuit, through a root island, through the branches of the tree structure, and to each of the other islands. The functional circuits of the islands, as well as a configurable mesh data bus of the integrated circuit, are all configured with configuration information supplied via the tree structure. In one example, the configurable control mesh bus portion of each island includes a statically configured switch and multiple half links that radiate from the switch. The static configuration is determined by hardwired tie off connections associated with the island.
    Type: Grant
    Filed: February 17, 2012
    Date of Patent: April 11, 2017
    Assignee: Netronome Systems, Inc.
    Inventor: Gavin J. Stark
  • Patent number: 9612981
    Abstract: An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. A configurable mesh data bus includes a command mesh, a pull-id mesh, and two data meshes. The configurable mesh data bus extends through all the islands. For each mesh, each island includes a centrally located crossbar switch and eight half links. Two half links extend to ports on the top edge of the island, a half link extends to a port on a right edge of the island, two half links extend to ports on the bottom edge of the island, and a half link extents to a port on the left edge of the island. Two additional links extend to functional circuitry of the island. The configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit.
    Type: Grant
    Filed: February 17, 2012
    Date of Patent: April 4, 2017
    Assignee: Netronome Systems, Inc.
    Inventor: Gavin J. Stark
  • Patent number: 9455598
    Abstract: Disclosed is an approach for implementing a flexible parser for a networking system. A micro-core parser is implemented to process packets in a networking system. The micro-cores of the parser read the packet headers, and perform any suitably programmed tasks upon those packets and packet headers. One or more caches may be associated with the micro-cores to hold the packet headers.
    Type: Grant
    Filed: June 20, 2011
    Date of Patent: September 27, 2016
    Assignee: Broadcom Corporation
    Inventors: Kaushik Kuila, David T. Hass
  • Patent number: 9270421
    Abstract: A source node selects a plurality of transmitting nodes to cooperatively encode a set of original packets to transfer to a destination node. Encoding produces a plurality of coded packets and a corresponding code matrix of coefficients. The coded packets and the corresponding code matrix comprise a set of independent equations of independent variables in a system of linear equations, wherein the independent variables comprise the original packets. A destination node may select a set of receiving nodes to cooperatively receive the transmissions. The destination node collects the coded packets and code matrix from the receiving nodes, which provide a sufficient number of independent equations for decoding the original packets. Decoding comprises calculating a solution for the system of linear equations.
    Type: Grant
    Filed: January 27, 2014
    Date of Patent: February 23, 2016
    Assignee: Genghiscomm Holdings, LLC
    Inventor: Steve J Shattil
  • Patent number: 9225471
    Abstract: A source node selects a plurality of transmitting nodes to cooperatively encode a set of original packets to transfer to a destination node. Encoding produces a plurality of coded packets and a corresponding code matrix of coefficients. The coded packets and the corresponding code matrix comprise a set of independent equations of independent variables in a system of linear equations, wherein the independent variables comprise the original packets. A destination node may select a set of receiving nodes to cooperatively receive the transmissions. The destination node collects the coded packets and code matrix from the receiving nodes, which provide a sufficient number of independent equations for decoding the original packets. Decoding comprises calculating a solution for the system of linear equations.
    Type: Grant
    Filed: January 26, 2014
    Date of Patent: December 29, 2015
    Assignee: Genghiscomm Holdings, LLC
    Inventor: Steve J Shattil
  • Patent number: 8897293
    Abstract: In a media access control (MAC) processor, a programmable controller is configured to execute machine readable instructions for implementing MAC functions corresponding to data received by a communication device. A tightly coupled memory is associated with the programmable controller. A system memory is coupled to the programmable controller via a system bus, and a hardware processor is coupled to the system bus and the tightly coupled memory. The hardware processor is configured to implement MAC functions on data received in a communication frame, store, in the tightly coupled memory, processed data corresponding to data in the communication frame that indicates a structure of downlink data in the communication frame, and store, in the system memory, processed data corresponding to other data in the communication frame.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: November 25, 2014
    Assignee: Marvell International Ltd.
    Inventors: Bhaskar Chowdhuri, Srikanth Shubhakoti, Vinod Ananth, Hongyu Xie, Shui Cheong Lee
  • Patent number: 8638805
    Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: January 28, 2014
    Assignee: LSI Corporation
    Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
  • Patent number: 8555031
    Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.
    Type: Grant
    Filed: January 4, 2013
    Date of Patent: October 8, 2013
    Assignee: Altera Corporation
    Inventor: Michael Fitton
  • Patent number: 8359458
    Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.
    Type: Grant
    Filed: July 11, 2011
    Date of Patent: January 22, 2013
    Assignee: Altera Corporation
    Inventor: Michael Fitton
  • Patent number: 8286180
    Abstract: Method and apparatus are provided for a synchronizing execution of a plurality of threads on a multi-threaded processor. Each thread is provided with a number of synchronization points corresponding to points where it is advantageous or preferable that execution should be synchronized with another thread. Execution of a thread is paused when it reaches a synchronization point until at least one other thread with which it is intended to be synchronized reaches a corresponding synchronization point. Execution is subsequently resumed. Where an executing thread branches over a section of code which included a synchronization point then execution is paused at the end of the branch until the at least one other thread reaches the synchronization point of the end of the corresponding branch.
    Type: Grant
    Filed: August 24, 2007
    Date of Patent: October 9, 2012
    Assignee: Imagination Technologies Limited
    Inventor: Yoong Chert Foo
  • Patent number: 8175015
    Abstract: A media access control (MAC) processor includes a programmable controller and a memory coupled to the programmable controller to store machine readable instructions for implementing MAC functions corresponding to data received by a communication device. A hardware processor is coupled to the programmable controller. The hardware processor includes a processing engine configured to implement MAC functions on the data received by the communication device. The hardware processor additionally includes a context memory coupled to the processing engine to store state information of the processing engine corresponding to one or more contexts, and context switch logic coupled to the processing to determine when the processing engine should switch contexts.
    Type: Grant
    Filed: December 12, 2008
    Date of Patent: May 8, 2012
    Assignee: Marvell International Ltd.
    Inventors: Bhaskar Chowdhuri, Srikanth Shubhakoti, Vinod Ananth, Hongyu Xie, Shui Cheong Lee
  • Patent number: 8151090
    Abstract: A systolic data processing apparatus includes a processing element (PE) array and control unit. The PE array comprises a plurality of PEs, each PE executing a thread with respect to different data according to an input instruction and pipelining the instruction at each cycle for executing a program. The control unit inputs a new instruction to a first PE of the PE array at each cycle.
    Type: Grant
    Filed: February 17, 2009
    Date of Patent: April 3, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Gi-Ho Park, Shin-Dug Kim, Jung-Wook Park, Hoon-Mo Yang, Sung-Bae Park
  • Patent number: 8103855
    Abstract: The present disclosure provides a methodology for reducing congestion of a processing unit, preferably by configuring a plurality of functional blocks to run in parallel or in series without the influence or input from the processing unit. In an embodiment, the present method chains a plurality of functional blocks together by software so that one functional block starts after the completion of another functional block. The configuration of the chain can be series, parallel, and any combination thereof, arranged to meet the circuit's objective. The chaining can be configured and re-configured, preferably by software input. The chaining can also be performed at design time or at run time. The chaining can also be modified, preferably at design time, but can also be modified at run time.
    Type: Grant
    Filed: June 29, 2008
    Date of Patent: January 24, 2012
    Assignee: Navosha Corporation
    Inventors: Hirak Mitra, Raj Kulkarni, Richard Wicks, Michael Moon
  • Publication number: 20120011344
    Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.
    Type: Application
    Filed: July 11, 2011
    Publication date: January 12, 2012
    Applicant: ALTERA CORPORATION
    Inventor: Michael Fitton
  • Patent number: 8078834
    Abstract: A digital signal processor includes a control block configured to issue instructions based on a stored program, and a compute array including two or more compute engines configured such that each of the issued instructions executes in successive compute engines of at least a subset of the compute engines at successive times. The digital signal processor may be utilized with a control processor or as a stand-alone processor. The compute array may be configured such that each of the issued instructions flows through successive compute engines of at least a subset of the compute engines at successive times.
    Type: Grant
    Filed: January 9, 2008
    Date of Patent: December 13, 2011
    Assignee: Analog Devices, Inc.
    Inventor: Douglas Garde
  • Patent number: 8054072
    Abstract: A quantum computer includes a unit including thin films A, B and C each containing a physical-system group A, B and C formed of physical systems A, B and C, the films A, B and C being alternately stacked in an order of A, B, C, A, . . . , each of the systems A, B and C having three-different-energy states |0>x, |1>x , |e>x, a quantum bit being expressed by a quantum-mechanical-superposition state of |0>x and |1>x , a light source generating light beams having angular frequencies ?A(E), ye, g, ?A(E), ye, e, ?x, ye, gg, ?x, ye, ge, ?x, ye, eg and ?x, ye, ee, ?A(E), ye, g, a unit controlling frequencies and intensities of the beams, and a unit measuring intensity of light emitted from or transmitted through physical-system group A(E) contained in a lowest one of the thin films A to detect a quantum state of the group A(E).
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: November 8, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kouichi Ichimura, Hayato Goto
  • Patent number: 7979673
    Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.
    Type: Grant
    Filed: May 10, 2010
    Date of Patent: July 12, 2011
    Assignee: Altera Corporation
    Inventor: Michael Fitton
  • Patent number: 7870366
    Abstract: The present disclosure provides an architecture that enables massive parallel processing on an IC while alleviating control congestion, memory access congestion and wiring congestion, together with high flexibility where the processing units are soft-arranged to perform different tasks. In an embodiment, the present architecture includes a functional block with a GO component to start the functional block, and a DONE component to identifying the completion status. The GO and DONE components can be linked together, preferably by a linkage component, to chain the functional blocks. The linkage is preferably soft configurable. In another embodiment, the present architecture includes an integrated circuit comprises a plurality of functional blocks chained together for serial processing, parallel processing, or any combination thereof.
    Type: Grant
    Filed: June 29, 2008
    Date of Patent: January 11, 2011
    Inventors: Hirak Mitra, Raj Kulkarni, Richard Wicks, Michael Moon
  • Publication number: 20100257341
    Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of cells arranged in a plurality of columns and a plurality of rows. Each row represents an instruction in a processor execution queue and each cell represents a dependency relationship between two instructions in the processor execution queue. A first latch couples to the first array and comprises a first bit, the first bit indicating a first status. A second latch couples to the first array and comprises a second bit, the second bit indicating a second status. A first read port couples to the first array, comprising a first read wordline and a first read bitline. The first read wordline couples to the first latch and a first column and asserts a first available signal based on the first bit. The first read bitline couples to a first row and generates a first ready signal based on the first available signal and a first cell.
    Type: Application
    Filed: April 3, 2009
    Publication date: October 7, 2010
    Applicant: International Business Machines Corporation
    Inventors: Mary D. Brown, James W. Bishop, William E. Burky, John B. Griswell, JR., Dung Q. Nguyen, Todd A. Venton
  • Publication number: 20100250640
    Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.
    Type: Application
    Filed: November 21, 2008
    Publication date: September 30, 2010
    Inventor: Katsutoshi Seki
  • Publication number: 20100223445
    Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.
    Type: Application
    Filed: May 10, 2010
    Publication date: September 2, 2010
    Applicant: Altera Corporation
    Inventor: Michael Fitton
  • Publication number: 20100131738
    Abstract: In an array processing section, using data strings entered from input ports, a plurality of data processor elements execute predetermined operations while transferring data to each other, and output data strings of results of the operations from a plurality of output ports. A first data string converter converts data strings stored in a plurality of data storages of a data storage group into a placement suitable for the operations in the array processing section, and enters the converted data strings into the input ports of the array processing section. A second data string converter converts the data strings output from output ports of the array processing section into a placement to be stored in the plurality of data storages of the data storage group.
    Type: Application
    Filed: February 22, 2008
    Publication date: May 27, 2010
    Inventors: Tomoyoshi Kobori, Katsutoshi Seki
  • Patent number: 7716454
    Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.
    Type: Grant
    Filed: October 10, 2006
    Date of Patent: May 11, 2010
    Assignee: Altera Corporation
    Inventor: Michael Fitton
  • Publication number: 20100100704
    Abstract: An integrated circuit 4 is provided including an array 10 of processors 26 with interface circuitry 12 providing communication with further processing circuitry 14. The processors 26 within the array 10 execute individual programs which together provide the functionality of a cycle-based program. During each program-cycle of the cycle based program, each of the processors executes its respective program starting from a predetermined execution start point to evaluate a next state of at least some of the state variables of the cycle-based program. A boundary between program-cycles provides a synchronisation time (point) for processing operations performed by the array.
    Type: Application
    Filed: October 14, 2009
    Publication date: April 22, 2010
    Applicant: ARM Limited
    Inventors: Stephen John Hill, Michael Peter Muller
  • Patent number: 7653805
    Abstract: A semiconductor device for performing data processing by performing a plurality of computations in cycles includes a pipeline formed by connecting a plurality of computing units in series, each of the computing units including: a data line for receiving data; a control line for receiving a rule signal; a circuit information control unit configured to store, before data processing, several circuit information items, and to output a first one of the several circuit information items according to the rule signal received via the control line in a first cycle of the data processing; a processing element configured to construct an execution circuit according to the first circuit information item, to perform a computation using data from the data line, and to output a computation result; a data register for storing the computation result, and for outputting the computation result in a second cycle; and a control register for storing the rule signal and for outputting the rule signal in the second cycle.
    Type: Grant
    Filed: March 23, 2007
    Date of Patent: January 26, 2010
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takashi Yoshikawa, Shigehiro Asano, Yutaka Yamada
  • Patent number: 7636835
    Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises one or more interface modules including circuitry to transfer data to and from a device external to the tiles; and a sub-port routing network including circuitry to route data between a port of a switch and a plurality of sub-ports coupled to one or more interface modules.
    Type: Grant
    Filed: April 14, 2006
    Date of Patent: December 22, 2009
    Assignee: Tilera Corporation
    Inventors: Carl G. Ramey, David Wentzlaff, Anant Agarwal
  • Patent number: 7539845
    Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises an interface coupled to a plurality of the tiles to transfer data between one or more switches of the tiles and one or more switches of tiles in an externally coupled integrated circuit.
    Type: Grant
    Filed: April 14, 2006
    Date of Patent: May 26, 2009
    Assignee: Tilera Corporation
    Inventors: David Wentzlaff, Carl G. Ramey, Anant Agarwal
  • Patent number: 7418536
    Abstract: A processor for use in a router, the processor having a systolic array pipeline for processing data packets to determine to which output port of the router the data packet should be routed. In one embodiment, the systolic array pipeline includes a plurality of programmable functional units and register files arranged sequentially as stages, for processing packet contexts (which contain the packet's destination address) to perform operations, under programmatic control, to determine the destination port of the router for the packet. A single stage of the systolic array may contain a register file and one or more functional units such as adders, shifters, logical units, etc., for performing, in one example, very long instruction word (vliw) operations. The processor may also include a forwarding table memory, on-chip, for storing routing information, and a cross bar selectively connecting the stages of the systolic array with the forwarding table memory.
    Type: Grant
    Filed: January 4, 2006
    Date of Patent: August 26, 2008
    Assignee: Cisco Technology, Inc.
    Inventors: Arthur Tung-Tak Leung, Anthony Li, William Lynch, Sharad Mehrotra
  • Patent number: 7380101
    Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.
    Type: Grant
    Filed: December 27, 2004
    Date of Patent: May 27, 2008
    Assignee: Cisco Technology, Inc.
    Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
  • Patent number: 7278009
    Abstract: Tiered command distribution is described. In an embodiment, a pipeline architecture includes processor chains of data processors that process control events received from an application interface control. A tier assignment algorithm determines the longest path of data processors through the processor chains to determine a tier allocation for each data processor in the set of processor chains. Each tier includes a data processor from one or more of the processor chains where a first set of data processors in a first tier each receive a control event and process the control event and/or process the data according to the control event before a second set of data processors in a second tier each receive the control event.
    Type: Grant
    Filed: March 31, 2005
    Date of Patent: October 2, 2007
    Assignee: Microsoft Corporation
    Inventors: Geoffrey R Smith, Hans-Martin Krober, Michael D. Dodd
  • Patent number: 7260709
    Abstract: The present invention relates to a processing method and apparatus for implementing a systolic-array-like structure. Input data are stored in a depth-configurable register means (DCF) in a predetermined sequence, and are supplied to a processing means (FU) for processing said input data based on control signals generated from instruction data, wherein the depth of the register means (DCF) is controlled in accordance with the instruction data. Thereby, systolic arrays can be mapped onto a programmable processor, e.g. a VLIW processor, without the need for explicitly issuing operations to implement the register moves that constitute the delay lines of the array.
    Type: Grant
    Filed: April 1, 2003
    Date of Patent: August 21, 2007
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Bernardo De Oliveira Kastrup Pereira
  • Patent number: 7237086
    Abstract: A customization program for use in customizing a baseboard management controller used for monitoring operation of various computer system components is disclosed. A user interacts with the customization program to customize the baseboard management controller based on a configuration of components specified for the baseboard of the computer system. The customization program provides a user interface having a repository of icons and a design page. The icons represent various components that may be connected, either directly or indirectly, to the baseboard. The design page is used for constructing a model representing the specified configuration of components. As a user drags icons onto the design page, the model is updated to reflect selection of the components corresponding to these icons. Further, the customization program creates a configuration file that identifies and describes each of the selected components.
    Type: Grant
    Filed: November 26, 2003
    Date of Patent: June 26, 2007
    Assignee: American Megatrends, Inc.
    Inventors: Govind A. Kothandapani, Bakka Ravinder Reddy