Array Processor Patents (Class 712/10)
  • Patent number: 7007155
    Abstract: A circuit employing an array of reconfigurable processing elements for wireless baseband processing. The circuit includes a first linear array of reconfigurable processing elements for processing signals from a first channel, and a second linear array of reconfigurable processing elements, coupled in parallel with the first linear array of reconfigurable processing elements, for processing signals from a second channel that is concurrent with the first channel. The circuit also includes a frame buffer array having a number of frame buffers that corresponds to a number of reconfigurable processing elements in the first and second linear arrays of processing elements. A point-to-point data bus is connected between each reconfigurable processor and an associated frame buffer. A shared data bus is connected between the first and second linear arrays of reconfigurable processing elements and the frame buffer array.
    Type: Grant
    Filed: September 17, 2002
    Date of Patent: February 28, 2006
    Assignee: Morpho Technologies
    Inventors: Behzad Barjesteh Mohebbi, Fadi Joseph Kurdahi
  • Patent number: 7007128
    Abstract: A data interconnect and routing mechanism reduces data communication latency, supports dynamic route determination based upon processor activity level/traffic, and implements an architecture that supports scalable improvements in communication frequencies. In one implementation, a data processing system includes at least first through third processing units, data storage coupled to the plurality of processing units, and an interconnect fabric. The interconnect fabric includes at least a first data bus coupling the first processing unit to the second processing unit and a second data bus coupling the third processing unit to the second processing unit so that the first and third processing units can transmit data traffic to the second processing unit. The data processing system further includes a control channel coupling the first and third processing units.
    Type: Grant
    Filed: January 7, 2004
    Date of Patent: February 28, 2006
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Jerry Don Lewis, Vicente Enrique Chung, Jody Bern Joyner
  • Patent number: 7000022
    Abstract: Frame-based streaming data flows through a graph of multiple interconnected processing modules. The modules have a set of performance parameters whose values specify the sensitivity of each module to the selection of certain resources of a system. A user specifies overall goals for an actual graph for processing a given type of data for a particular purpose. A flow manager constructs the graph as a sequence of module interconnections required for processing the data, in response to the parameter values of the individual modules in the graph in view of the goals for the overall graph as a whole, and divides it into pipes each having one or more modules and each assigned to a memory manager for handling data frames in the pipe.
    Type: Grant
    Filed: June 7, 2004
    Date of Patent: February 14, 2006
    Assignee: Microsoft Corporation
    Inventors: Rafael S. Lisitsa, George H. J. Shaw, Dale A. Sather, Bryan A. Woodruff
  • Patent number: 6993764
    Abstract: A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval.
    Type: Grant
    Filed: June 28, 2001
    Date of Patent: January 31, 2006
    Assignee: The Regents of the University of California
    Inventors: Fabrizio Petrini, Wu-chun Feng
  • Patent number: 6986020
    Abstract: Techniques for providing improved data distribution to and collection from multiple memories are described. Such memories are often associated with and local to processing elements (PEs) within an array processor. Improved data transfer control within a data processing system provides support for radix 2, 4 and 8 fast Fourier transform (FFT) algorithms through data reordering or bit-reversed addressing across multiple PEs, carried out concurrently with FFT computation on a digital signal processor (DSP) array by a DMA unit. Parallel data distribution and collection through forms of multicast and packet-gather operations are also supported.
    Type: Grant
    Filed: September 21, 2004
    Date of Patent: January 10, 2006
    Assignee: PTS Corporation
    Inventors: Edwin F. Barry, Nikos P. Pitsianis, Kevin Coopman
  • Patent number: 6973508
    Abstract: A versatile controller that can be used as either a stand-alone controller in a relatively small process plant or as one of numerous controllers in a distributed process control system depending on the needs of the process plant includes a processor adapted to be programmed to execute one or more programming routines and a memory, such as a non-volatile memory, coupled to the processor and adapted to store the one or more programming routines to be executed on the processor. The versatile controller also includes a plurality of field device input/output ports communicatively connected to the processor, a configuration communication port connected to the processor and to the memory to enable the controller to be configured with the programming routines and a second communication port which enables a user interface to be intermittently connected to the controller to view information stored within the controller memory.
    Type: Grant
    Filed: February 12, 2002
    Date of Patent: December 6, 2005
    Assignee: Fisher-Rosemount Systems, Inc.
    Inventors: Rusty Shepard, Ken Krivoshein, Dan Christensen, Gary Law, Kent Burr, Mark Nixon
  • Patent number: 6971043
    Abstract: An apparatus and method for accessing a first local mass storage device or a second local mass storage device in a fault-tolerant server. In one embodiment, the fault-tolerant server establishes communication between a first computing element and a first local mass storage device. The fault-tolerant server also establishes communications between a second computing element and a second local mass storage device. In one embodiment, the first computing element and the second computing element issue substantially similar instruction streams to one of the local mass storage devices.
    Type: Grant
    Filed: April 11, 2001
    Date of Patent: November 29, 2005
    Assignee: Stratus Technologies Bermuda LTD
    Inventors: Michael McLoughlin, Gerry Griffin
  • Patent number: 6970196
    Abstract: A high-speed vision sensor includes: an analog-to-digital converter array 13, in which one analog-to-digital converter 210 is provided in correspondence with all the photodetector elements 120 that are located on each row in a photodetector array 11; a parallel processing system 14 that includes processor elements 400 and shift registers 410, both of which form a one-to-one correspondence with the photodetector elements 120; and data buses 17, 18 and data buffers 19 and 20 for data transfer to processing elements 400. The processing elements 400 perform high-speed image processing between adjacent pixels by parallel processings. By using the data buses 17, 18, it is possible to attain, at a high rate of speed, such calculation processing that requires data supplied from outside.
    Type: Grant
    Filed: March 10, 2000
    Date of Patent: November 29, 2005
    Assignee: Hamamatsu Photonics K.K.
    Inventors: Masatoshi Ishikawa, Haruyoshi Toyoda
  • Patent number: 6959372
    Abstract: A parallel processing architecture comprising a cluster of embedded processors that share a common code distribution bus. Pages or blocks of code are concurrently loaded into respective program memories of some or all of these processors (typically all processors assigned to a particular task) over the code distribution bus, and are executed in parallel by these processors. A task control processor determines when all of the processors assigned to a particular task have finished executing the current code page, and then loads a new code page (e.g., the next sequential code page within a task) into the program memories of these processors for execution. The processors within the cluster preferably share a common memory (1 per cluster) that is used to receive data inputs from, and to provide data outputs to, a higher level processor. Multiple interconnected clusters may be integrated within a common integrated circuit device.
    Type: Grant
    Filed: February 18, 2003
    Date of Patent: October 25, 2005
    Assignee: Cogent Chipware Inc.
    Inventors: Richard F. Hobson, Bill Ressl, Allan R. Dyck
  • Patent number: 6944683
    Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.
    Type: Grant
    Filed: February 19, 2004
    Date of Patent: September 13, 2005
    Assignee: PTS Corporation
    Inventors: Edwin Frank Barry, Edward A. Wolff
  • Patent number: 6940496
    Abstract: A display module driving system wherein digital pixel data for an image to be displayed is provided to a plurality of column drivers on a row by row basis in serial format over a plurality of dedicated bus lines rather than a single parallel bus line. Digital pixel data for a complete image row is divided into segments, wherein the number of segments is each to the number of column drivers. Each segments is then serialized and transmitted to a corresponding column driver such that the digital pixel data for an entire row is transferred to each of the plurality of column drivers at the same time. The column drivers receive the segments and rearrange the data into parallel. The pixels are then transferred to a digital to analog converter, preferably two pixels at a time, where each pixel is converted into analog red, green and blue signals.
    Type: Grant
    Filed: June 4, 1999
    Date of Patent: September 6, 2005
    Assignee: Silicon, Image, Inc.
    Inventor: Eun-Gu Kim
  • Patent number: 6933942
    Abstract: In a display apparatus, a display instruction generating unit outputs a display instruction. A plurality of display processing units are arranged in parallel, and each of the plurality of display processing units generates display data in response to the display instruction from the display instruction generating unit. A display switching unit selects one of the plurality of display processing units and outputs the display data from the selected display processing unit to the display unit. Thus, a display unit displays the display data.
    Type: Grant
    Filed: July 16, 2002
    Date of Patent: August 23, 2005
    Assignee: NEC Corporation
    Inventor: Junichi Tamai
  • Patent number: 6912608
    Abstract: Techniques for a pipelined bus which provides a very high performance interface to computing elements, such as processing elements, host interfaces, memory controllers, and other application-specific coprocessors and external interface units. The pipelined bus is a robust interconnected bus employing a scalable, pipelined, multi-client topology, with a fully synchronous, packet-switched, split-transaction data transfer model. Multiple non-interfering transfers may occur concurrently since there is no single point of contention on the bus. An aggressive packet transfer model with local conflict resolution in each client and packet-level retries allows recovery from collisions and buffer backups. Clients are assigned unique IDs, based upon a mapping from the system address space allowing identification needed for quick routing of packets among clients.
    Type: Grant
    Filed: April 25, 2002
    Date of Patent: June 28, 2005
    Assignee: PTS Corporation
    Inventors: Edward A. Wolff, David Baker, Bryan Garnett Cope, Edwin Franklin Barry
  • Patent number: 6901359
    Abstract: A system and method for bulk transfer to and from the SRAMs in which a starting memory address is latched and is then incremented every clock cycle to generate a new memory address. The addresses are decoded and memory requests are pipelined to the SRAM memory, one every clock cycle. When the memory controller detects transfer of the boundary of a predetermined number of clock cycles or words (e.g. 64 words or four clock cycles) the burst mode of data transfer is stopped and the memory controller waits for a “done” signal before resuming another cycle of the burst transfer mode. The memory controller on detecting a request on this address boundary first does a memory refresh followed by a requested operation; e.g. a continuation of the transfer operation.
    Type: Grant
    Filed: September 6, 2000
    Date of Patent: May 31, 2005
    Assignee: Quickturn Design Systems, Inc.
    Inventors: William F. Beausoleil, R. Bryan Cook, Tak-kwong Ng, Helmut Roth, Peter Tannenbaum, Lawrence A. Thomas, Norton J. Tomassetti
  • Patent number: 6901491
    Abstract: In one embodiment, a server is provided. The server includes multiple application processor chips. Each of the multiple application processor chips includes multiple processing cores. Multiple memories corresponding to the multiple processor chips are included. The multiple memories are configured such that one processor chip is associated with one memory. A plurality of fabric chips enabling each of the multiple application processor chips to access any of the multiple memories are included. The data associated with one of the multiple application processor chips is stored across each of the multiple memories. In one embodiment, the application processor chips include a remote direct memory access (RDMA) and striping engine. The RDMA and striping engine is configured to store data in a striped manner across the multiple memories. A method for allowing multiple processors to exchange information through horizontal scaling is also provided.
    Type: Grant
    Filed: October 16, 2002
    Date of Patent: May 31, 2005
    Assignee: Sun Microsystems, Inc.
    Inventors: Leslie D. Kohn, Michael K. Wong
  • Patent number: 6898691
    Abstract: This invention discloses a group of instructions, block4 and block4v, in a matrix processor 16 that rearranges data between vector and matrix forms of an A×B matrix of data 120 where the data matrix includes one or more 4×4 sub-matrices of data 160-166. The instructions of this invention simultaneously swaps row or columns between the first 140, second 142, third 144, and fourth 146 matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between the different individual matrix registers, or swapping columns between the different individual matrix registers. Additionally, successive iterations or combinations of the block4 and or block4v instructions perform standard tensor matrix operations from the following group of matrix operations: transpose, shuffle, and deal.
    Type: Grant
    Filed: June 6, 2002
    Date of Patent: May 24, 2005
    Assignee: Intrinsity, Inc.
    Inventors: James S. Blomgren, Timothy A. Olson, Christophe Harle
  • Patent number: 6895452
    Abstract: An architecture is shown where an execution unit is tightly coupled to a shared, reconfigurable memory system. Sequence control signals drive a DMA controller and address generator to control the transfer of data from the shared memory to a bus interface unit (BIU). The sequence control signals also drive a data controller and address generator which controls transfer of data from the shared memory to an execution unit interface (EUI). The EUI is connected to the execution unit operates under control of the data controller and address generator to transfer vector data to and from the shared memory. The shared memory is configured to swap memory space in between the BIU and the execution unit so as to support continuous execution and I/O. A local fast memory is coupled to the execution unit. A local address generator controls the transfer of scalar data between the local fast memory and the execution unit.
    Type: Grant
    Filed: October 16, 1998
    Date of Patent: May 17, 2005
    Assignee: Marger Johnson & McCollom, P.C.
    Inventors: Ron Coleman, Brent LeBack, Stuart Hawkinson, Richard Rubinstein
  • Patent number: 6873287
    Abstract: The present invention relates to a method and an arrangement suitable for embedded signal processing, comprising a number of computational units (100), each computational unit comprising a number of processing elements (20) capable of working independently and transmitting data simultaneously. Said computational units are arranged in clusters, work independently, and transmit data simultaneously, and that said processing elements (20) are globally and regularly inter-connected optically in a hypercube topology and transformed into a planar waveguide.
    Type: Grant
    Filed: November 1, 2001
    Date of Patent: March 29, 2005
    Assignee: Telefonaktiebolaget LM Ericsson
    Inventor: Häkan Forsberg
  • Patent number: 6862548
    Abstract: Described are methods for accurately measuring the skew of clock distribution networks on programmable logic devices. Clock distribution networks are modeled using a sequence of oscillators formed on the device using configurable logic. Each oscillator includes a portion of the network, and consequently oscillates at a frequency that depends on the signal propagation delay associated with the included portion of the network. The various oscillator configurations are defined mathematically as the sum of a series of delays, with the period of each oscillator representing the sum. The respective equations of the oscillators are combined to solve for the delay contribution of the included portion of the clock network. The delay associated with the included portion of the clock network can be combined with similar measurements for other portions of the clock network to more completely describe the network.
    Type: Grant
    Filed: October 30, 2001
    Date of Patent: March 1, 2005
    Assignee: Xilinx, Inc.
    Inventor: Siuki Chan
  • Patent number: 6859869
    Abstract: A data processing system, wherein a data flow processor (DFP) integrated circuit chip is provided which comprises a plurality of orthogonally arranged homogeneously structured cells, each cell having a plurality of logically same and structurally identically arranged modules. The cells are combined and facultatively grouped using lines and columns and connected to the input/output ports of the DFP. A compiler programs and configures the cells, each by itself and facultatively-grouped, such that random logic functions and/or linkages among the cells can be realized. The manipulation of the DFP configuration is performed during DFP operation such that modification of function parts (MACROs) of the DFP can take place without requiring other function parts to be deactivated or being impaired.
    Type: Grant
    Filed: April 12, 1999
    Date of Patent: February 22, 2005
    Assignee: PACT XPP Technologies AG
    Inventor: Martin Vorbach
  • Patent number: 6834295
    Abstract: Techniques for providing improved data distribution to and collection from multiple memories are described. Such memories are often associated with and local to processing elements (PEs) within an array processor. Improved data transfer control within a data processing system provides support for radix 2, 4 and 8 fast Fourier transform (FFT) algorithms through data reordering or bit-reversed addressing across multiple PEs, carried out concurrently with FFT computation on a digital signal processor (DSP) array by a DMA unit. Parallel data distribution and collection through forms of multicast and packet-gather operations are also supported.
    Type: Grant
    Filed: February 23, 2001
    Date of Patent: December 21, 2004
    Assignee: PTS Corporation
    Inventors: Edwin F. Barry, Nikos P. Pitsianis, Kevin Coopman
  • Publication number: 20040255002
    Abstract: According to the present invention, methods and apparatus are provided for increasing the efficiency and effectiveness of communications between multiprocessor clusters. Mechanisms for improving the accuracy of information available to an interconnection controller are implemented in order to allow the interconnection controller to increase reliability and reduce latency in a multiple cluster system. Protocol extensions and link layer extensions are provided with packets to convey information between interconnection controllers of separate multiprocessor clusters.
    Type: Application
    Filed: June 12, 2003
    Publication date: December 16, 2004
    Applicant: Newisys, Inc., A Delaware corporation
    Inventors: Rajesh Kota, Shashank Newawarker, Guru Prasadh, Carl Zeitler, David B. Glasco
  • Publication number: 20040250045
    Abstract: A processing architecture includes a first CPU core portion coupled to a second embedded dynamic random access memory (DRAM) portion. These architectural components jointly implement a single processor and instruction set. Advantageously, the embedded logic on the DRAM chip implements the memory intensive processing tasks, thus reducing the amount of traffic that needs to be bussed back and forth between the CPU core and the embedded DRAM chips. The embedded DRAM logic monitors and manipulates the instruction stream into the CPU core. The architecture of the instruction set, data paths, addressing, control, caching, and interfaces are developed to allow the system to operate using a standard programming model. Specialized video and graphics processing systems are developed. Also, an extended very long instruction word (VLIW) architecture implemented as a primary VLIW processor coupled to an embedded DRAM VLIW extension processor efficiently deals with memory intensive tasks.
    Type: Application
    Filed: July 2, 2004
    Publication date: December 9, 2004
    Inventor: Eric M. Dowling
  • Patent number: 6826645
    Abstract: A method and apparatus in which an arbiter links to a processor having a flexible architecture, and the processor connects to a device through a point to point bus.
    Type: Grant
    Filed: December 13, 2000
    Date of Patent: November 30, 2004
    Assignee: Intel Corporation
    Inventor: Chakravarthy Kosaraju
  • Publication number: 20040236879
    Abstract: An interrupt controller is provided for processing interrupt requests in a system having a plurality of data processing units operable to service those interrupt requests, each interrupt request having an associated priority level. The interrupt controller comprises request logic operable to receive an indication of unserviced interrupt requests, to apply predetermined criteria to determine which of said plurality of data processing units are candidate data processing units for servicing at least one of said unserviced interrupt requests, and to issue a request signal to each said candidate data processing unit. Priority encoding logic is operable to determine a highest priority unserviced interrupt request based on the associated priority levels of the unserviced interrupt requests.
    Type: Application
    Filed: May 23, 2003
    Publication date: November 25, 2004
    Inventors: Daren Croxford, Man Cheung Joseph Yiu
  • Publication number: 20040230770
    Abstract: In a program processing procedure specially designed to perform compilation for parallel processing purposes, a method and system for increasing the program execution rate of a target machine is provided. A compiler front end translates source code into intermediate code that has been divided into basic blocks. A parallelizer converts the intermediate code, which has been generated by the compiler front end, into a parallelly executable form. An execution order determiner determines the order of the basic blocks to be executed. An expanded basic block parallelizer subdivides the intermediate code, which has already been divided into the basic blocks, into execution units, each of which is made up of parallelly executable instructions, following the order determined and on the basic block basis.
    Type: Application
    Filed: June 23, 2004
    Publication date: November 18, 2004
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventors: Kensuke Odani, Taketo Heishi
  • Publication number: 20040221135
    Abstract: A single chip active memory includes a plurality of memory stripes, each coupled to a full word interface and one of a plurality of processing element (PE) sub-arrays. The large number of couplings between a PE sub-array and its associated memory stripe are managed by placing the PE sub-arrays so that their data paths run at right angle to the data paths of the plurality of memory stripes. The data lines exiting the memory stripes are run across the PE sub-arrays on one metal layer. At the appropriate locations, the data lines are coupled to another orthogonally oriented metal layer to complete the coupling between the memory stripe and its associated PE sub-array. The plurality of PE sub-arrays are mapped to form a large logical array, in which each PE is coupled to four other PEs. Physically distant PEs are coupled using current mode differential logical couplings an drivers to insure good signal integrity at high operational speeds. Each PE contains a small DRAM register array.
    Type: Application
    Filed: June 4, 2004
    Publication date: November 4, 2004
    Inventor: Graham Kirsch
  • Publication number: 20040216119
    Abstract: One aspect of the present invention relates to a method for balancing the load of an n-dimensional array of processing elements (PEs), wherein each dimension of the array includes the processing elements arranged in a plurality of lines and wherein each of the PEs has a local number of tasks associated therewith. The method comprises balancing at least one line of PEs in a first dimension, balancing at least one line of PEs in a next dimension, and repeating the balancing at least one line of PEs in a next dimension for each dimension of the n-dimensional array. The method may further comprise selecting one or more lines within said first dimension and shifting the number of tasks assigned to PEs in said selected one or more lines.
    Type: Application
    Filed: October 20, 2003
    Publication date: October 28, 2004
    Inventor: Mark Beaumont
  • Publication number: 20040215925
    Abstract: A method for calculating a local mean number of tasks for each processing element (PEr) in a parallel processing system, wherein each processing element (PEr) has a local number of tasks associated therewith and wherein r represents the number for a selected processing element, the method comprising assigning a value (Er) to the each processing element (PEr), summing a total number of tasks present on the parallel processing system and the value (Er) for the each processing element (PEr), dividing the sum of the total number of tasks present on the parallel processing system and the value (Er) for the each processing element (PEr) by a total number of processing elements in the parallel processing system and truncating a fractional portion of the divided sum for the each processing element.
    Type: Application
    Filed: October 20, 2003
    Publication date: October 28, 2004
    Inventor: Mark Beaumont
  • Patent number: 6801202
    Abstract: A method and computer graphics system capable of implementing multiple pipelines for the parallel processing of graphics data. For certain data, a requirement may exist that the data be processed in order. The graphics system may use a set of tokens to reliably switch between ordered and unordered data modes. Furthermore, the graphics system may be capable of super-sampling and performing real-time convolution. In one embodiment, the computer graphics system may comprise a graphics processor, a sample buffer, and a sample-to-pixel calculation unit. The graphics processor may be configured to receive graphics data and to generate a plurality of samples for each of a plurality of frames. The sample buffer, which is coupled to the graphics processor, may be configured to store the samples. The sample-to-pixel calculation unit is programmable to generate a plurality of output pixels by filtering the rendered samples using a filter.
    Type: Grant
    Filed: June 28, 2001
    Date of Patent: October 5, 2004
    Assignee: Sun Microsystems, Inc.
    Inventors: Scott R. Nelson, Lisa Grenier, Michael F. Deering
  • Publication number: 20040193840
    Abstract: A command engine for an active memory receives high level tasks from a host and generates corresponding sets of either DCU commands to a DRAM control unit or ACU commands to a processing array control unit. The DCU commands include memory addresses, which are also generated by the command engine, and the ACU command include instruction memory addresses corresponding to an address in an array control unit where processing array instructions are stored.
    Type: Application
    Filed: July 28, 2003
    Publication date: September 30, 2004
    Inventor: Graham Kirsch
  • Publication number: 20040193841
    Abstract: In the LU decomposition of a matrix composed of blocks, the blocks to be updated of the matrix are vertically divided in each SMP node connected through a network and each of the divided blocks is allocated to each node. This process is also repeatedly applied to new blocks to be updated later, and the newly divided blocks are also cyclically allocated to each node. Each node updates allocated divided blocks in the original order of blocks. Since by sequentially updating blocks, the amount of processed blocks of each node equally increases, load can be equally distributed.
    Type: Application
    Filed: March 12, 2004
    Publication date: September 30, 2004
    Applicant: FUJITSU LIMITED
    Inventor: Makoto Nakanishi
  • Publication number: 20040193784
    Abstract: A command engine for an active memory receives high level tasks from a host and generates corresponding sets of either DRAM control unit (“DCU”) commands to a DRAM control unit or array control unit (“ACU”) commands to a processing array control unit. The DCU commands include memory addresses, which are also generated by the command engine, and the ACU command include instruction memory addresses corresponding to an address in the ACU where processing array instructions are stored. The processing array instructions are used to address a decode SRAM containing microinstructions that are used to control the operation of an array of processing elements. The number of bits in each of the microinstructions is substantially greater than the number of bits in the corresponding processing array instruction. The decode SRAM is preferably loaded prior to operation of the active memory based on the operations to be performed by the processing elements.
    Type: Application
    Filed: July 28, 2003
    Publication date: September 30, 2004
    Inventor: Graham Kirsch
  • Patent number: 6799194
    Abstract: In a preconditioning process for an iteration method to solve simultaneous linear equations through multilevel block incomplete factorization of a coefficient matrix, a set of variable numbers of variables to be removed is determined at each level of the factorization such that a block matrix comprising coefficients of the variables can be diagonal dominant. The approximate inverse matrix of the block matrix is obtained in iterative computation, and non-zero elements of a coefficient matrix at a coarse level are reduced.
    Type: Grant
    Filed: June 26, 2001
    Date of Patent: September 28, 2004
    Assignees: Fujitsu Limited, Australian National University
    Inventors: Lutz Grosz, Makoto Nakanishi
  • Patent number: 6791551
    Abstract: A system and method for synchronizing image display and buffer swapping in a multiple processor-multiple display environment. In a master-slave dichotomy, one processor or system is deemed the master and the others act as slaves. The master generates signals used to control vertical retrace and buffer swapping for itself and the slaves. In addition, a synchronization signal generator is provided to synchronize a timing signal between the master and slave systems.
    Type: Grant
    Filed: November 27, 2001
    Date of Patent: September 14, 2004
    Assignee: Silicon Graphics, Inc.
    Inventors: Shrijeet Mukherjee, Kanoj Sarcar, James Tornes
  • Publication number: 20040175057
    Abstract: An affine transformation analysis system and method is provided for matching two images. The novel systolic array image affine transformation analysis system comprising a linear rf-processing means, an affine parameter incremental updating means, and a least square error fitting means is based on a Lie transformation group model of cortical visual motion and stereo processing. Image data is provided to a plurality of component linear rf-processing means each comprising a Gabor receptive field, a dynamical Gabor receptive field, and six Lie germs. The Gabor coefficients of images and affine Lie derivatives are extracted from responses of linear receptive fields, respectively. The differences and affine Lie-derivatives of these Gabor coefficients obtained from each parallel pipelined linear rf-processing components are then input to a least square error fitting means, a systolic array comprising a QR decomposition means and a backward substitution means.
    Type: Application
    Filed: March 4, 2003
    Publication date: September 9, 2004
    Inventors: Thomas Tsao, Stanley Yuen
  • Patent number: 6782463
    Abstract: Disclosed is a device comprising a core processing circuit coupled to a single memory array which is partitioned into at least a first portion as a cache memory of the core processing circuit, and a second portion as a memory accessible by the one or more data transmission devices through a data bus independently of the core processing circuit.
    Type: Grant
    Filed: September 14, 2001
    Date of Patent: August 24, 2004
    Assignee: Intel Corporation
    Inventors: Mark A. Schmisseur, Jeff McCoskey, Timothy J. Jehl
  • Publication number: 20040156547
    Abstract: An image processing system includes, in part, an image processing engine adapted to perform object-independent processing corresponding to a first processing layer of the image processing system, a post processing engine adapted to perform object-dependent processing corresponding to a second processing layer of the image processing system, and a processing engine adapted to perform object composition, recognition and association corresponding to a third processing layer of the image processing system The image processing engine includes a multitude of processors each associated with a different one of the pixels of the image. The post processing engine includes an N-way symmetric multi-processing system (SMP) having disposed therein N DFT engines and N matrix multiplication engines, where N is an integer greater than 1. The multitude of the processors of the image processing engine are formed on a semiconductor substrate different from the semiconductor substrate on which images are captured.
    Type: Application
    Filed: January 15, 2004
    Publication date: August 12, 2004
    Applicant: Parimics, Inc.
    Inventor: Axel K. Kloth
  • Publication number: 20040156546
    Abstract: An image processing system processes images via a first processing layer adapted to perform object-independent processing, a second processing layer adapted to perform object-dependent processing, and a third processing layer adapted to perform object composition, recognition and association. The image processing system performs object-independent processing using a plurality of processors each of which is associated with a different one of the pixels of the image. The image processing system performs object-independent processing using a symmetric multi-processor. The plurality of processors may form a massively parallel processor of a systolic array type and configured as a single-instruction multiple-data system. Each of the plurality of the processors is further configured to perform object-independent processing using a unified and symmetric processing of N dimensions in space and one dimension in time.
    Type: Application
    Filed: January 15, 2004
    Publication date: August 12, 2004
    Applicant: Parimics, Inc.
    Inventor: Axel K. Kloth
  • Patent number: 6769056
    Abstract: A manifold array topology includes processing elements, nodes, memories or the like arranged in clusters. Clusters are connected by cluster switch arrangements which advantageously allow changes of organization without physical rearrangement of processing elements. A significant reduction in the typical number of interconnections for preexisting arrays is also achieved. Fast, efficient and cost effective processing and communication result with the added benefit of ready scalability.
    Type: Grant
    Filed: September 24, 2002
    Date of Patent: July 27, 2004
    Assignee: PTS Corporation
    Inventors: Edwin F. Barry, Thomas L. Drabenstott, Gerald G. Pechanek, Nikos P. Pitsianis
  • Publication number: 20040128474
    Abstract: The invention concerns a cell array having an intercell structure and specifies how a favorable segmenting of the intercell structure may be executed in order to improve the interaction of the cells.
    Type: Application
    Filed: January 20, 2004
    Publication date: July 1, 2004
    Inventor: Martin Vorbach
  • Patent number: 6751721
    Abstract: A directory-based multiprocessor cache control scheme for distributing invalidate messages to change the state of shared data in a computer system. The plurality of processors are grouped into a plurality of clusters. A directory controller tracks copies of shared data sent to processors in the clusters. Upon receiving an exclusive request from a processor requesting permission to modify a shared copy of the data, the directory controller generates invalidate messages requesting that other processors sharing the same data invalidate that data. These invalidate messages are sent via a point-to-point transmission only to master processors in clusters actually containing a shared copy of the data. Upon receiving the invalidate message, the master processors broadcast the invalidate message in an ordered fan-in/fan-out process to each processor in the cluster.
    Type: Grant
    Filed: August 31, 2000
    Date of Patent: June 15, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: David A. J. Webb, Jr., Richard E. Kessler, Steve Lang, Aaron T. Spink
  • Patent number: 6751723
    Abstract: An system-on-a-chip integrated circuit has a field programmable gate array core having logic clusters, static random access memory modules, and routing resources, a field programmable gate array virtual component interface translator having inputs and outputs, wherein the inputs are connected to the field programmable gate array core, a microcontroller, a microcontroller virtual component interface translator having input and outputs, wherein the inputs are connected to the microcontroller, a system bus connected to the outputs of the field programmable gate array virtual component interface translator and also to the outputs of said microcontroller virtual component interface translator, and direct connections between the microcontroller and the routing resources of the field programmable gate array core.
    Type: Grant
    Filed: September 2, 2000
    Date of Patent: June 15, 2004
    Assignee: Actel Corporation
    Inventors: Arunangshu Kundu, Arnold Goldfein, William C. Plants, David Hightower
  • Publication number: 20040111586
    Abstract: Causality-based memory ordering in a multiprocessing environment. A disclosed embodiment includes a plurality of processors and arbitration logic coupled to the plurality of processors. The processors and arbitration logic maintain processor consistency yet allow stores generated in a first order by any two or more of the processors to be observed consistent with a different order of stores by at least one of the other processors. Causality monitoring logic coupled to the arbitration logic monitors any causal relationships with respect to observed stores.
    Type: Application
    Filed: December 2, 2003
    Publication date: June 10, 2004
    Inventor: Deborah T. Marr
  • Patent number: 6738891
    Abstract: To execute all processing in an array section of an array-type processor, each processor must execute processing of different types, i.e., processing of an operating unit and processing of a random logic circuit, which limits its size and processing performance. A data path section including processors arranged in an array are connected via programmable switches to primarily execute processing of operation and a state transition controller configured to easily implement a state transition function to control state transitions are independently disposed. These sections are configured in customized structure for respective processing purposes to efficiently implement and achieve the processing of operation and the control operation.
    Type: Grant
    Filed: February 23, 2001
    Date of Patent: May 18, 2004
    Assignee: NEC Corporation
    Inventors: Taro Fujii, Masato Motomura, Koichiro Furuta
  • Patent number: 6738840
    Abstract: A data processing arrangement comprises a plurality of processors and a memory interface via which the processors can access a collective memory. The memory interface comprises an interface memory (SRAM) for temporarily storing data belonging to different processors. The memory interface also comprises a control circuit for controlling the interface memory in such a manner that it forms a FIFO memory for each of the different processors. This makes to possible to realize implementations at a comparatively low cost in comparison with a memory interface comprising a separate FIFO memory for each processor.
    Type: Grant
    Filed: August 17, 2000
    Date of Patent: May 18, 2004
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Thierry Nouvet, Hugues De Perthuis, Stéphane Mutz
  • Patent number: 6735684
    Abstract: A parallel-processing apparatus includes a plurality of cells, variable-delay circuits, a signal output unit, a delay counter, and an accumulation unit. Each cell has a processing circuit for performing arbitrary processing. The variable-delay circuits change the signal propagation delay in accordance with the processing results of the processing circuits. The signal output unit outputs a measurement input signal to the first variable-delay circuit of a variable-delay circuit array. The delay counter receives the measurement input signal output form the signal output unit and a measurement output signal output from the variable-delay circuit array, and obtains the signal propagation delay time of the variable-delay circuit array upon the basis of the measurement input and output signals. The accumulation unit accumulates the processing results of the processing circuits. A parallel processing method is also disclosed.
    Type: Grant
    Filed: September 13, 2000
    Date of Patent: May 11, 2004
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Satoshi Shigematsu, Hiroki Morimura, Katsuyuki Machida
  • Patent number: 6718514
    Abstract: There is provided parity checking device and method in a data communication system. In the parity checking device, a controller determines loop occurring times according to the length of the data and the number of bits to be shifted according to the data or XOR operation results and determines whether the data has an error based on a final XOR operation result. A first register and a second register store the data or the XOR operation results under the control of the controller. A shifter receives the output of the first register and shifts the received bits by the shift bit number received from the controller. An operation unit receives the outputs of the shifter and the second register, performs an XOR operation between the received data, and outputs an XOR operation result under the control of the controller.
    Type: Grant
    Filed: December 29, 2000
    Date of Patent: April 6, 2004
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Myung-Goo Kang
  • Patent number: 6711724
    Abstract: Logic circuits are arranged to constitute a pipeline with a clock signal cycle period set longer than a target cycle period by a gain obtained when replacing a flip-flop circuit by latch circuits. Then, the clock signal cycle period is changed to the target cycle period, to detect a critical path, on which a setup condition error occurs in the pipeline. After replacing the flip-flop circuit related to this error path by complementarily operating latch circuits, related logic circuits are rearranged according to the replacing latch circuits, to meet various operating parameters. In this way, it becomes possible to readily design a pipeline that accurately operates synchronously with a high-speed clock signal.
    Type: Grant
    Filed: December 16, 2002
    Date of Patent: March 23, 2004
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Atsushi Yoshikawa
  • Publication number: 20040030872
    Abstract: The invention is a system and method for executing a program that comprises a plurality of basic blocks on a computer system that comprises a plurality of processing elements. The invention generates a branch instruction by one processing element of the plurality of processing elements, sends the branch instruction to the plurality of processing elements. The invention then independently branches to a target of the branch instruction by each of the processing elements of the plurality of processing elements when each processing element receives the sent branch instruction. At least one processing element of the plurality of processing elements receives the branch instruction at a time later than another processing element of the plurality of processing elements.
    Type: Application
    Filed: August 8, 2002
    Publication date: February 12, 2004
    Inventor: Michael S. Schlansker