Array Processor Operation Patents (Class 712/16)
-
Patent number: 7526630Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.Type: GrantFiled: January 4, 2007Date of Patent: April 28, 2009Assignee: Clearspeed Technology, PLCInventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russel David, Ray McConnell, Tim Day, Trey Greer
-
Patent number: 7515899Abstract: Additional computing power is captured using the idle processing power of mobile phones incorporated into a grid computing system, wherein the system is capable of pushing projects out to available mobile phones for processing during idle operation times. To further efficiently utilize the unused processing cycles of mobile phones, a unique protocol is utilized to coordinate processing tasks which makes use of existing short messages techniques to communicate projects. The unique protocol is combination of bootstrapping using standard compression techniques along with an adaptive compression scheme.Type: GrantFiled: April 23, 2008Date of Patent: April 7, 2009Assignee: International Business Machines CorporationInventors: Hollie Carr, Peter Mattison, Christopher E. Sharp
-
Patent number: 7509442Abstract: An informational-signal-processing apparatus has a plurality of functional blocks and a control block that controls operations of the functional blocks. Each of the functional blocks performs a series of items of processing. The control block or a predetermined block among the control block and the functional blocks distributes a global command. Each of the functional blocks receives the global command and operates adaptively based on the received global command. The functional blocks output a block-to-block synchronizing signal at an output timing of a processed informational signal that has been performed on the basis of the global command.Type: GrantFiled: November 30, 2006Date of Patent: March 24, 2009Assignee: Sony CorporationInventors: Seiji Wada, Tetsujiro Kondo, Yoshihiro Wakita, Takuya Oshima
-
Patent number: 7506134Abstract: The present invention enables efficient matrix multiplication operations on parallel processing devices. One embodiment is a method for mapping CTAs to result matrix tiles for matrix multiplication operations. Another embodiment is a second method for mapping CTAs to result tiles. Yet other embodiments are methods for mapping the individual threads of a CTA to the elements of a tile for result tile computations, source tile copy operations, and source tile copy and transpose operations. The present invention advantageously enables result matrix elements to be computed on a tile-by-tile basis using multiple CTAs executing concurrently on different streaming multiprocessors, enables source tiles to be copied to local memory to reduce the number accesses from the global memory when computing a result tile, and enables coalesced read operations from the global memory as well as write operations to the local memory without bank conflicts.Type: GrantFiled: June 16, 2006Date of Patent: March 17, 2009Assignee: NVIDIA CorporationInventors: Norbert Juffa, Radoslav Danilak
-
Patent number: 7506297Abstract: An automatically reconfigurable high performance FPGA system that includes a hybrid FPGA network and an automated scheduling, partitioning and mapping software tool adapted to configure the hybrid FPGA network in order to implement a functional task. The hybrid FPGA network includes a plurality of field programmable gate arrays, at least one processor, and at least one memory. The automated software tool adapted to carry out the steps of scheduling portions of a functional task in a time sequence, partitioning a plurality of elements of the hybrid FPGA network by allocating or assigning network resources to the scheduled portions of the functional task, mapping the partitioned elements into a physical hardware design for implementing the functional task on the plurality of elements of the hybrid FPGA network, and iteratively repeating the scheduling, partitioning and mapping steps to reach an optimal physical hardware design.Type: GrantFiled: June 15, 2005Date of Patent: March 17, 2009Assignee: University of North Carolina at CharlotteInventors: Arindam Mukherjee, Arun Ravindran
-
Patent number: 7502915Abstract: The present invention provides an adaptive computing engine (ACE) that includes processing nodes having different capabilities such as arithmetic nodes, bit-manipulation nodes, finite state machine nodes, input/output nodes and a programmable scalar node (PSN). In accordance with one embodiment of the present invention, a common architecture is adaptable to function in either a kernel node, or k-node, or as general purpose RISC node. The k-node acts as a system controller responsible for adapting other nodes to perform selected functions. As a RISC node, the PSN is configured to perform computationally intensive applications such as signal processing.Type: GrantFiled: September 29, 2003Date of Patent: March 10, 2009Assignee: NVIDIA CorporationInventors: Rojit Jacob, Dan Minglun Chuang
-
Patent number: 7493475Abstract: An improved superscalar processor. The processor includes multiple lanes, allowing multiple instructions in a bundle to be executed in parallel. In vector mode, the parallel lanes may be used to execute multiple instances of a bundle, representing multiple iterations of the bundle in a vector run. Scheduling logic determines whether, for each bundle, multiple instances can be executed in parallel. If multiple instances can be executed in parallel, coupling circuitry couples an instance of the bundle from one lane into one or more other lanes. In each lane, register addresses are renamed to ensure proper execution of the bundles in the vector run. Additionally, the processor may include a register bank separate from the architectural register file. Renaming logic can generate addresses to this separate register bank that are longer than used to address architectural registers, allowing longer vectors and more efficient processor operation.Type: GrantFiled: November 15, 2006Date of Patent: February 17, 2009Assignee: STMicroelectronics, Inc.Inventor: Osvaldo M. Colavin
-
Publication number: 20090031103Abstract: A patch apparatus in a microprocessor is provided. The patch apparatus includes a plurality of fuse banks and an array controller. The plurality of fuse banks is configured to store associated patch records that are employed to patch microcode or circuits in the microprocessor. The array controller is coupled to the plurality of fuse banks, and is configured to read the associated patch records, and is configured to provide the associated patch records to a patch loader, where the patch loader provides patches corresponding to the associated patch records, as prescribed, to designated target patch mechanisms in the microprocessor. The patch loader provides the patches to the designated target patch mechanisms following transition of a microprocessor reset signal and prior to execution of instructions stored in a BIOS ROM.Type: ApplicationFiled: July 24, 2007Publication date: January 29, 2009Applicant: VIA TECHNOLOGIESInventors: G. GLENN HENRY, TERRY PARKS
-
Patent number: 7483595Abstract: An image processing method and device for processing multiple rows of pixels of an image simultaneously with a single instruction. The processing includes selecting a pixel window having a plurality of pixels of an image spanning across multiple rows and columns, building vertical and horizontal load registers to include the plurality of pixels of the selected pixel window, and simultaneously processing selected pixels of the plurality of pixels included in the vertical and horizontal load registers using a single instruction, wherein the vertical and horizontal load registers are shifted when the selected pixels are processed. Accordingly, a method and device for efficient processing of an image is provided.Type: GrantFiled: September 16, 2004Date of Patent: January 27, 2009Assignee: Marvell International Technology Ltd.Inventors: Douglas Gene Keithley, Roy Gideon Moss
-
Patent number: 7480785Abstract: A row decoding circuit (171) outputs a select signal to a row set in a row range setting unit (172) to select a select signal line (103), processing results from processing circuits (102) on this row are output to a data output line (104), and a row adder (106) adds processing results output to a data output line (104) of a column set in a column range selector (105).Type: GrantFiled: February 13, 2004Date of Patent: January 20, 2009Assignee: Nippon Telegraph and Telephone CorporationInventors: Toshishige Shimamura, Hiroki Morimura, Koji Fujii, Satoshi Shigematsu, Katsuyuki Machida
-
Patent number: 7472392Abstract: One aspect of the present invention relates to a method for balancing the load of an n-dimensional array of processing elements (PEs), wherein each dimension of the array includes the processing elements arranged in a plurality of lines and wherein each of the PEs has a local number of tasks associated therewith. The method comprises balancing at least one line of PEs in a first dimension, balancing at least one line of PEs in a next dimension, and repeating the balancing at least one line of PEs in a next dimension for each dimension of the n-dimensional array. The method may further comprise selecting one or more lines within said first dimension and shifting the number of tasks assigned to PEs in said selected one or more lines.Type: GrantFiled: October 20, 2003Date of Patent: December 30, 2008Assignee: Micron Technology, Inc.Inventor: Mark Beaumont
-
Publication number: 20080307196Abstract: A computer processor having an integrated instruction sequencer, array of processing engines, and I/O controller. The instruction sequencer sequences instructions from a host, and transfers these instructions to the processing engines, thus directing their operation. The I/O controller controls the transfer of I/O data to and from the processing engines in parallel with the processing controlled by the instruction sequencer. The processing engines themselves are constructed with an integer arithmetic and logic unit (ALU), a 1-bit ALU, a decision unit, and registers. Instructions from the instruction sequencer direct the integer ALU to perform integer operations according to logic states stored in the 1-bit ALU and data stored in the decision unit. The 1-bit ALU and the decision unit can modify their stored information in the same clock cycle as the integer ALU carries out its operation. The processing engines also contain a local memory for storing instructions and data.Type: ApplicationFiled: May 28, 2008Publication date: December 11, 2008Inventors: Bogdan Mitu, Gheorghe Stefan, Dan Tomescu
-
Patent number: 7461234Abstract: A heterogeneous array includes clusters of processing elements. The clusters include a combination of ALUs and multiplexers linked by direct connections and various general-purpose routing networks. The multiplexers are controlled by the ALUs in the same cluster, or alternatively by ALUs in other clusters, via a special purpose routing network. Components of applications configured onto the array are selectively implemented in either multiplexers or ALUs, as determined by the relative efficiency of implementing the component in one or the other type of processing element, and by the relative availability of the processing element types. Multiplexer control signals are generated from combinations of ALU status signals, and optionally routed to control multiplexers in different clusters.Type: GrantFiled: May 16, 2005Date of Patent: December 2, 2008Assignee: Panasonic CorporationInventors: Nicholas John Charles Ray, Andrea Olgiati, Anthony I. Stansfield, Alan D Marshall
-
Patent number: 7457939Abstract: A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A processing system is provided for processing programs and data. The processing system has a processing unit and multiple sub-processing units. Each sub-processing unit includes a dedicated local memory for storing programs and data. The dedicated local memory of each respective sub-processing unit is not a cache memory. In an alternative, multiple computing devices may connect to one another via a communications network, and each computing device may include at least one processing element having the processing unit and sub-processing units.Type: GrantFiled: October 18, 2004Date of Patent: November 25, 2008Assignee: Sony Computer Entertainment Inc.Inventors: Masakazu Suzuoki, Takeshi Yamazaki
-
Patent number: 7454593Abstract: The present invention relates to the control of an array of processing elements in a parallel processor using row and column select lines. For each column in the array, a column select line connects to all of the processing elements in the column. For each row in the array, a row select line connecting to all of the processing elements in the row. A processing element in the array may be selected by activation of its row and column select lines. The processing elements are connected to adjacent processing elements by respective segments of a row bus for each row and by respective segments of a column bus for each column. Each row of the array includes a respective column edge register coupled to a processing element at one end of the respective row and to a processing element at the other end of the respective row.Type: GrantFiled: April 11, 2003Date of Patent: November 18, 2008Assignee: Micron Technology, Inc.Inventor: Graham Kirsch
-
Publication number: 20080282061Abstract: An array calculation device that includes a processor array composed of a plurality of processor elements having been assigned with orders, acquires an instruction in each cycle, generates, in each cycle, operation control information for controlling an operation of a processor element of a first order, and then generates an instruction to the processor element of the first order in accordance with the operation control information and the acquired instruction, and also generates, in each cycle, operation control information for controlling an operation of each processor element of a next order and onwards, in accordance with operation control information generated for controlling an operation of a processor element of an immediately preceding order, and then generates an instruction to each processor element of the next order and onwards, in accordance with the operation control information generated and the acquired instruction.Type: ApplicationFiled: August 2, 2005Publication date: November 13, 2008Inventors: Hiroyuki Morishita, Takeshi Tanaka, Masaki Maeda, Yorihiko Wakayama
-
Patent number: 7451293Abstract: A computer processor having an integrated instruction sequencer, array of processing engines, and I/O controller. The instruction sequencer sequences instructions from a host, and transfers these instructions to the processing engines, thus directing their operation. The I/O controller controls the transfer of I/O data to and from the processing engines in parallel with the processing controlled by the instruction sequencer. The processing engines themselves are constructed with an integer arithmetic and logic unit (ALU), a 1-bit ALU, a decision unit, and registers. Instructions from the instruction sequencer direct the integer ALU to perform integer operations according to logic states stored in the 1-bit ALU and data stored in the decision unit. The 1-bit ALU and the decision unit can modify their stored information in the same clock cycle as the integer ALU carries out its operation. The processing engines also contain a local memory for storing instructions and data.Type: GrantFiled: October 19, 2006Date of Patent: November 11, 2008Assignee: Brightscale Inc.Inventors: Bogdan Mitu, Gheorghe Stefan, Dan Tomescu
-
Patent number: 7451292Abstract: Quantum gaps exist between an origin and a destination that heretofore have prevented reliably utilizing the advantages of quantum computing. To predict the outcome of instructions with precision, the input data, preferably a qubit, is collapsed to a point value within the quantum gap based on a software instruction. After collapse the input data is restructured at the destination, wherein dynamics of restructuring are governed by a plurality of gap factors as follows: computational self-awareness; computational decision logic; computational processing logic; computational and network protocol and logic exchange; computational and network components, logic and processes; provides the basis for excitability of the Gap junction and its ability to transmit electronic and optical impulses, integrates them properly, and depends on feedback loop logic; computational and network component and system interoperability; and embodiment substrate and network computational physical topology.Type: GrantFiled: August 8, 2003Date of Patent: November 11, 2008Inventor: Thomas J Routt
-
Patent number: 7447872Abstract: An inter-chip communication (ICC) mechanism enables any processor in a pipelined arrayed processing engine to communicate directly with any other processor of the engine over a low-latency communication path. The ICC mechanism includes a unidirectional control plane path that is separate from a data plane path of the engine and that accommodates control information flow among the processors. The mechanism thus enables inter-processor communication without sending messages over the data plane communication path extending through processors of each pipeline.Type: GrantFiled: May 30, 2002Date of Patent: November 4, 2008Assignee: Cisco Technology, Inc.Inventors: Russell Schroter, John William Marshall, Kenneth H. Potter
-
Patent number: 7441100Abstract: A method for synchronizing a plurality of processors of a multi-processor computer system on a synchronization point is disclosed. The method includes triggering a first set of processors, using a lead processor of the plurality of processors when the lead processor encounters the synchronization point, to enter an exit holding loop. The first set of processors representing the plurality of processors except the lead processor. The triggering the first set of processors is performed without accessing a shared memory area of the multi-processor system. There is also included triggering the plurality of processors, using a tail processor of the plurality of processors when the tail processor encounters the synchronization point, to leave the exit holding loop. The triggering the plurality of processors is performed without accessing the shared memory area of the multi-processor system.Type: GrantFiled: February 27, 2004Date of Patent: October 21, 2008Assignee: Hewlett-Packard Development Company, L.P.Inventors: Chenghung Justin Chen, John W. Curry, Robert Seymour
-
Publication number: 20080229059Abstract: Each possessor node in an array of nodes has a respective local node address, and each local node address comprises a plurality of components having an order of addressing significance from most to least significant. Each node comprises: mapping means configured to map each component of the local node address onto a respective routing direction, and a switch arranged to receive a message having a destination node address identifying a destination node. The switch comprises: means for comparing the local node address to the destination node address to identify a the most significant non-matching component; and means for routing the message to another node, on the condition that the local node address does not match the destination node address, in the direction mapped to the most significant non-matching component.Type: ApplicationFiled: March 14, 2007Publication date: September 18, 2008Inventor: Michael David May
-
Patent number: 7426448Abstract: A mechanism for diagnosing broken scan chains based on leakage light emission is provided. An image capture mechanism detects light emission from leakage current in complementary metal oxide semiconductor (CMOS) devices. The diagnosis mechanism identifies devices with unexpected light emission. An unexpected amount of light emission may indicate that a transistor is turned off when it should be turned on or vice versa. All possible inputs may be tested to determine whether a problem exists with transistors in latches or with transistors in clock buffers. Broken points in the scan chain may then be determined based on the locations of unexpected light emission.Type: GrantFiled: February 3, 2004Date of Patent: September 16, 2008Assignee: International Business Machines CorporationInventors: Peilin Song, Tian Xia, Alan J. Weger, Franco Stellari, Stanislav V. Polonsky
-
Patent number: 7418541Abstract: A method and apparatus are provided for a support interface for memory-mapped resources. A support processor sends a sequence of commands over and FSI interface to a memory-mapped support interface on a processor chip. The memory-mapped support interface updates memory, memory-mapped registers or memory-mapped resources. The interface uses fabric packet generation logic to generate a single command packet in a protocol for the coherency fabric which consists of an address, command and/or data. Fabric commands are converted to FSI protocol and forwarded to attached support chips to access the memory-mapped resource, and responses from the support chips are converted back to fabric response packets. Fabric snoop logic monitors the coherency fabric and decodes responses for packets previously sent by fabric packet generation logic. The fabric snoop logic updates status register and/or writes response data to a read data register. The system also reports any errors that are encountered.Type: GrantFiled: February 10, 2005Date of Patent: August 26, 2008Assignee: International Business Machines CorporationInventors: James Stephen Fields, Jr., Paul Frank Lecocq, Brian Chan Monwai, Thomas Pflueger, Kevin Franklin Reick, Timothy M. Skergan, Scott Barnett Swaney
-
Patent number: 7401333Abstract: The present invention provides an array of parallel programmable processing engines interconnected by a switching network. At least some of the processing engines execute a thread, and at least some threads communicate with each other through communication objects either internally within one processing engine or through the network. A scheduling step of the parallel programmable processing engines is initiated by one or more events, an event being defined by a change of a state variable of a communication object. The array comprises: means for scheduling a scheduling step of the processing engines, the scheduling means comprising means for executing at least a first set of threads in parallel, means for updating state values of communications objects in response to the parallel executing step, and means for repeatedly and sequentially scheduling the executing means and the updating means until no more events occur. The present invention also provides a deterministic method of operating such an array.Type: GrantFiled: August 8, 2001Date of Patent: July 15, 2008Assignee: TranSwitch CorporationInventor: Ivo Vandeweerd
-
Patent number: 7392350Abstract: In a multiprocessor environment, by executing cache-inhibited reads or writes to registers, a scan communication is used to rapidly access registers inside and outside a chip originating the command. Cumbersome locking of the memory location may be thus avoided. Setting of busy latches at the outset virtually eliminates the chance of collisions, and status bits are set to inform the requesting core processor that a command is done and free of error, if that is the case.Type: GrantFiled: February 10, 2005Date of Patent: June 24, 2008Assignee: International Business Machines CorporationInventors: James Stephen Fields, Jr., Michael Stephen Floyd, Paul Frank Lecocq, Larry Scott Leitner, Kevin Franklin Reick
-
Publication number: 20080148010Abstract: The system design is facilitated by eliminating the increase in data transfer volume of the whole system. In order to facilitate the system design, there are provided an operation unit array, a memory array, a data transfer circuit, and a switch circuit. There are also provided a configuration data management unit for managing the configuration data defining the logical behaviors of the operation unit array, the memory array, the data transfer circuit, and the switch circuit, as well as a state transition management unit capable of controlling the switching of the configuration data. The data transfer circuit includes a control circuit capable of autonomously sorting the data by determining the timing of the data sorting according to the setting included in the configuration data.Type: ApplicationFiled: December 14, 2007Publication date: June 19, 2008Inventor: Tomoyuki KODAMA
-
Patent number: 7369683Abstract: In an imaging device of the present invention, an imaging element 2 is driven in a thinning read-out mode for reading out signal charges from a subset of pixels, or in an all-pixels read-out mode for reading out signal charges from all pixels. When the imaging element 2 is driven in the thinning read-out mode, the imaging device processes and records a series of first image data that is obtained by reading out signal charges from the subset of pixels and that constitutes the moving images. When the imaging element 2 is driven in the all-pixels read-out mode, the imaging device processes and records a series of second image data constituting moving images after the number of pixels of the second image data is thinned, and processes and records a portion of the second image data as a still image without thinning when an instruction to pick up the still image is given while picking up the moving images.Type: GrantFiled: August 4, 2004Date of Patent: May 6, 2008Assignee: Sanyo Electric Co., Ltd.Inventors: Akio Kobayashi, Shigeru Miki
-
Patent number: 7356819Abstract: Methods, signals, devices and systems are provided for matching tasks with processing units. A region within a multi-faceted task space is allocated to a processing unit. A point in the multi-faceted task space is assigned to a task. The task is then associated with the processing unit if the region allocated to the processing unit is close to the point assigned to the task. The region allocated to a processing unit may be changed. If no assigned point for a task is sufficiently close to any allocated processing unit region, the task is suspended. Overlapping regions may be assigned to different processing units. In some implementations, the union of the allocated regions covers the task space, while in others it does not. Regions may also be allocated to wait conditions and one or more dimensions of a region may be allocated to conventional processor allocators.Type: GrantFiled: August 7, 2003Date of Patent: April 8, 2008Assignee: Novell, Inc.Inventors: Glenn Ricart, Del Jensen, Stephen R. Carter
-
Patent number: 7315933Abstract: The present invention is a re-configurable circuit capable of reducing latency by selecting a route for skipping the FF of an operation unit and outputting data to a connection destination operation unit if an accumulated process time is below an operation cycle allocated to the operation unit. The operation unit comprises at least a selector, a flip-flop and an operator. In a program for generating configuration information for switching the configuration of the operation unit of the re-configurable circuit, the selector selects the use/non-use of the flip-flop, based on the configuration information and selector switching condition is reflected in the configuration information for determining whether to take a route for transferring data inputted to the selector to the operator or a route for transferring the data to the operator skipping the flip-flop.Type: GrantFiled: October 6, 2005Date of Patent: January 1, 2008Assignee: Fujitsu LimitedInventor: Seiichi Nishijima
-
Patent number: 7266255Abstract: A multi-chip system is disclosed for distributing the convolution process. Rather than having multiple convolution chips working in parallel with each chip working on a different portion of the screen, a new design utilizes chips working in series. Each chip is responsible for a different interleaved region of screen space. Each chip performs part of the convolution process for a pixel and sends a partial result on to the next chip. The final chip completes the convolution and stores the filtered pixel. An alternate design interconnects chips in groups. The chips within a group operate in series, whereas the groups may operate in parallel.Type: GrantFiled: September 26, 2003Date of Patent: September 4, 2007Assignee: Sun Microsystems, Inc.Inventors: Michael A. Wasserman, Paul R. Ramsey, Nathaniel David Naegle
-
Patent number: 7176914Abstract: A system and method are provided for directing the flow of data and instructions into at least one functional unit. In one embodiment of a system of components defining a plurality of nodes, a queue network manager (QNM) forming a part of each node, is provided. In this embodiment, the QNM comprises an interface to a network that supports intercommunication among the plurality of nodes, an interface configured to pass messages with a functional unit within the node, a random access memory (RAM) configured to store at least one of a message and a programmable instruction, and logic configured to control an operational aspect of a functional unit based on contents of the programmable instruction.Type: GrantFiled: May 16, 2002Date of Patent: February 13, 2007Assignee: Hewlett-Packard Development Company, L.P.Inventor: Darel N. Emmot
-
Patent number: 7155466Abstract: An archive cluster application runs in a distributed manner across a redundant array of independent nodes. Each node preferably runs a complete archive cluster application instance. A given nodes provides a data repository, which stores up to a large amount (e.g., a terabyte) of data, while also acting as a portal that enables access to archive files. Each symmetric node has a set of software processes, e.g., a request manager, a storage manager, a metadata manager, and a policy manager. The request manager manages requests to the node for data (i.e., file data), the storage manager manages data read/write functions from a disk associated with the node, and the metadata manager facilitates metadata transactions and recovery across the distributed database. The policy manager implements one or more policies, which are operations that determine the behavior of an “archive object” within the cluster. The archive cluster application provides object-based storage.Type: GrantFiled: October 27, 2004Date of Patent: December 26, 2006Assignee: Archivas, Inc.Inventors: Andres Rodriguez, Jack A. Orenstein, David M. Shaw, Benjamin K. D. Bernhard
-
Patent number: 7130934Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.Type: GrantFiled: April 7, 2005Date of Patent: October 31, 2006Assignee: Altera CorporationInventors: Edwin Franklin Barry, Edward A. Wolff
-
Patent number: 7100020Abstract: An integrated circuit (203) for use in processing streams of data generally and streams of packets in particular. The integrated circuit (203) includes a number of packet processors (307, 313, 303), a table look up engine (301), a queue management engine (305) and a buffer management engine (315). The packet processors (307, 313, 303) include a receive processor (421), a transmit processor (427) and a risc core processor (401), all of which are programmable. The receive processor (421) and the core processor (401) cooperate to receive and route packets being received and the core processor (401) and the transmit processor (427) cooperate to transmit packets. Routing is done by using information from the table look up engine (301) to determine a queue (215) in the queue management engine (305) which is to receive a descriptor (217) describing the received packet's payload.Type: GrantFiled: May 7, 1999Date of Patent: August 29, 2006Assignee: Freescale Semiconductor, Inc.Inventors: Thomas B. Brightman, Andrew T. Brown, John F. Brown, James A. Farrell, Andrew D. Funk, David J. Husak, Edward J. McLellan, Mark A. Sankey, Paul Schmitt, Donald A. Priore
-
Patent number: 7069416Abstract: A single chip active memory includes a plurality of memory stripes, each coupled to a full word interface and one of a plurality of processing element (PE) sub-arrays. The large number of couplings between a PE sub-array and its associated memory stripe are managed by placing the PE sub-arrays so that their data paths run at right angle to the data paths of the plurality of memory stripes. The data lines exiting the memory stripes are run across the PE sub-arrays on one metal layer. At the appropriate locations, the data lines are coupled to another orthogonally oriented metal layer to complete the coupling between the memory stripe and its associated PE sub-array. The plurality of PE sub-arrays are mapped to form a large logical array, in which each PE is coupled to four other PEs. Physically distant PEs are coupled using current mode differential logical couplings an drivers to insure good signal integrity at high operational speeds. Each PE contains a small DRAM register array.Type: GrantFiled: June 4, 2004Date of Patent: June 27, 2006Assignee: Micron Technology, Inc.Inventor: Graham Kirsch
-
Patent number: 7069557Abstract: A virtual path feature in which several virtual channels share an assigned amount of bandwidth is implemented in a network processor. The network processor maintains a schedule indicative of respective times at which a plurality of virtual channels are to be serviced. An entry is read from the schedule. The entry corresponds to a current transmit cycle and includes a pointer to a channel descriptor for a virtual channel to be serviced in the current transmit cycle. A data cell for the virtual channel to be serviced in the current cycle is transmitted. An entry is added to the schedule to point to a channel descriptor that is pointed to by the channel descriptor for the virtual channel serviced in the current transmit cycle.Type: GrantFiled: May 23, 2002Date of Patent: June 27, 2006Assignee: International Business Machines CorporationInventor: Merwin Herscher Alferness
-
Patent number: 7043562Abstract: Irregularities are provided in at least one dimension of a torus or mesh network for lower average path length and lower maximum channel load while increasing tolerance for omitted end-around connections. In preferred embodiments, all nodes supported on each backplane are connected in a single cycle which includes nodes on opposite sides of lower dimension tori. The cycles in adjacent backplanes hop different numbers of nodes.Type: GrantFiled: June 9, 2003Date of Patent: May 9, 2006Assignee: Avivi Systems, Inc.Inventors: William J. Dally, William F. Mann, Philip P. Carvey
-
Patent number: 7028107Abstract: A system for communication between a plurality of functional elements in a cell arrangement and a higher-level unit is described. The system may include, for example, a configuration memory arranged between the functional elements and the higher-level unit; and a control unit configured to move at least one position pointer to a configuration memory location in response to at least one event reported by a functional element. At run time, a configuration word in the configuration memory pointed to by at least one of the position pointers is transferred to the functional element in order to perform reconfiguration without the configuration word being managed by a central logic.Type: GrantFiled: October 7, 2002Date of Patent: April 11, 2006Assignee: Pact XPP Technologies AGInventors: Martin Vorbach, Robert Münch
-
Patent number: 7020761Abstract: Processing restrictions of a computing environment are filtered and blocked, in certain circumstances, such that processing continues despite the restrictions. One restriction includes an indication that address translation is prohibited, in response to a buffer miss. When a processing unit of the computing environment is met with this restriction, it performs a comparison of page indices, which indicates whether the address translation can continue. If address translation can continue, the restriction is ignored. The processing unit includes a processor or a pageable entity, as examples.Type: GrantFiled: May 12, 2003Date of Patent: March 28, 2006Assignee: International Business Machines CorporationInventors: Timothy J. Siegel, Bruce A. Wagar, Ute Gaertner, Lisa C. Heller, Erwin F. Pfeffer
-
Patent number: 6996504Abstract: A scalable computer architecture capable of performing fully scalable simulations includes a plurality of processing elements (PEs) and a plurality of interconnections between the PEs. In this regard, the interconnections can interconnect each processing element to each neighboring processing element located adjacent the respective processing element, and further interconnect at least one processing element to at least one other processing element located remote from the respective at least one processing element. For example, the interconnections can interconnect the plurality of processing elements according to a fractal-type method or a quenched random method. Further, the plurality of interconnections can include at least one interconnection at each length scale of the plurality of processing elements.Type: GrantFiled: November 14, 2001Date of Patent: February 7, 2006Assignee: Mississippi State UniversityInventors: Mark A. Novotny, Gyorgy Korniss
-
Patent number: 6993764Abstract: A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval.Type: GrantFiled: June 28, 2001Date of Patent: January 31, 2006Assignee: The Regents of the University of CaliforniaInventors: Fabrizio Petrini, Wu-chun Feng
-
Patent number: 6990566Abstract: A method and an apparatus for configuration of multiple context processing elements (MCPEs) are described. The method and an apparatus is capable of selectively transmitting data over a bidirectional shared bus network including a plurality of channels between pairs of MCPEs in the networked array. The method and an apparatus then selectively transmits a sideband bit indicating a direction in which the data is transmitted in the shared bus network.Type: GrantFiled: April 20, 2004Date of Patent: January 24, 2006Assignee: Broadcom CorporationInventors: Ethan Mirsky, Robert French, Ian Eslick
-
Patent number: 6968442Abstract: A parallel computer of this invention includes a plurality of memory elements and a plurality of processing elements and each of the processing elements is connected to logically adjacent memory elements. For example, the processing element which corresponds to a logical position (i, j) is connected to the memory elements which correspond to a plurality of logical positions (i, j), (i, j+1), (i+1, j) and (i+1, j+1). It is preferable if each of the memory elements can be accessed from the exterior. According to this invention, efficient memory access can be made and the parallel processing can be performed at high speed without increasing the hardware amount and making the control operation complicated. Further, the operation speed of the image processing can be enhanced by constructing an image memory by use of a plurality of memory elements and causing the processing element to effect the image processing in a distributed and cooperative manner.Type: GrantFiled: June 19, 2002Date of Patent: November 22, 2005Assignee: Kabushiki Kaisha ToshibaInventors: Kenichi Maeda, Nobuyuki Takeda, Yasukazu Okamoto
-
Patent number: 6967950Abstract: In a network of digital signal processor nodes connected in a peer-to-peer relationship, a data packet sent to a node causes a return transmission from that node. The requester digital signal processor sends a data packet to a target digital signal processor. Upon arrival at the target digital signal processor, its receiver drives the arriving request packet into an I/O memory and triggers a transmitter interrupt. Next, the pull interrupt causes the transmitter to execute on a next packet boundary the pull request packet. Finally, the execution of the pull request causes the transmitter to pull a portion of the local I/O memory and send it back to the requester digital signal processor. The same physical portion of the I/O memory is overlaid with two logical uses, a receiver channel and a transmitter code block.Type: GrantFiled: July 13, 2001Date of Patent: November 22, 2005Assignee: Texas Instruments IncorporatedInventors: Peter Galicki, Cheryl S. Shepherd, Jonathan H. Thorn
-
Patent number: 6944747Abstract: A matrix data processor is implemented wherein data elements are stored in physical registers and mapped to logical registers. After being stored in the logical registers, the data elements are then treated as matrix elements. By using a series of variable matrix parameters to define the size and location of the various matrix source and destination elements, as well as the operation(s) to be performed on the matrices, the performance of digital signal processing operations can be significantly enhanced.Type: GrantFiled: December 9, 2002Date of Patent: September 13, 2005Assignee: GemTech Systems, LLCInventors: Gopalan N Nair, Gouri G. Nair
-
Patent number: 6928539Abstract: A test monitor loaded into a multiprocessor machine comprises a program (31) designed to interpret a script language for writing tests, a program (29) that constitutes a kernel part for conducting the tests according to the scripts, and a library (30) of functions that constitutes an application program interface with the firmware of the machine 1. This monitor implements a method for executing instruction sequences simultaneously in several processors (3, 4, 5) of a multiprocessor machine (1). The method comprises a first step (8) in which a single processor operating system is booted in a first processor (2) and a second step (9) in which the first processor (1) orders at least one other processor (3) of the machine, called an application processor, to execute one or more instruction sequences (17, 18, 19) under the control of said first processor.Type: GrantFiled: May 17, 2001Date of Patent: August 9, 2005Assignee: Bull S.A.Inventors: Claude Brassac, Alain Vigor
-
Patent number: 6919894Abstract: A system is described that is broadly directed to a system of integrated circuit components. The system comprises a plurality of nodes that are interconnected by communication links. A random access memory (RAM) is connected to each node. At least one functional unit is integrated into each node, and each functional unit is configured to carry out a predetermined processing function. Finally, each RAM includes a coherency mechanism configured to permit only read access to the RAM by other nodes, the coherency mechanism further configured to permit write access to the RAM only by functional units that are local to the node.Type: GrantFiled: July 21, 2003Date of Patent: July 19, 2005Assignee: Hewlett Packard Development Company, L.P.Inventors: Darel N. Emmot, Byron A. Alcorn
-
Patent number: 6915388Abstract: A multiprocessor computer system includes a plurality of processor nodes, a memory, and an interconnect network connecting the plurality of processor nodes to the memory. The memory includes a plurality of lines and a cache coherence directory structure. The plurality of lines includes a first line. The cache coherence directory structure includes a plurality of directory structure entries. Each directory structure entry includes processor pointer information indicating the processor nodes that have cached copies of the first line. The processor pointer information includes a plurality n of bit vectors, where n is an integer greater than one. The n bit vectors define a matrix having a number of locations equal to the product of the number of bits in each of the n bit vectors.Type: GrantFiled: July 20, 2001Date of Patent: July 5, 2005Assignee: Silicon Graphics, Inc.Inventor: William A. Huffman
-
Patent number: 6883084Abstract: A reconfigurable data path processor comprises a plurality of independent processing elements. Each of the processing elements advantageously comprising an identical architecture. Each processing element comprises a plurality of data processing means for generating a potential output. Each processor is also capable of through-putting an input as a potential output with little or no processing. Each processing element comprises a conditional multiplexer having a first conditional multiplexer input, a second conditional multiplexer input and a conditional multiplexer output. A first potential output value is transmitted to the first conditional multiplexer input, and a second potential output value is transmitted to the second conditional multiplexer output. The conditional multiplexer couples either the first conditional multiplexer input or the second conditional multiplexer input to the conditional multiplexer output, according to an output control command.Type: GrantFiled: July 25, 2002Date of Patent: April 19, 2005Assignee: University of New MexicoInventor: Gregory Donohoe
-
Publication number: 20040215929Abstract: A method of communicating between processing units on different integrated circuit chips in a multi-processor computer system by issuing a command from a source processing unit to a destination processing unit, receiving the command at the destination processing unit while the destination processing unit is processing program instructions, and accessing registers in clock-controlled components of the destination processing unit without interrupting processing of the program instructions by the destination processing unit. The access may be a read from status or mode registers of the destination processing unit, or write to control or mode registers. Many processing units can be interconnected in a ring topology, and the access command can be passed from the source processing unit through several other processing units before reaching the destination processing unit.Type: ApplicationFiled: April 28, 2003Publication date: October 28, 2004Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael Stephen Floyd, Larry Scott Leitner, Kevin Franklin Reick, Kevin Dennis Woodling