Array Processor Operation Patents (Class 712/16)
-
Patent number: 8145880Abstract: According to some embodiments, an integrated circuit comprises a microprocessor matrix of mesh-interconnected matrix processors. Each processor comprises a data switch including a data switch link register and matrix routing logic. The data switch link register includes one or more matrix link-enable register fields specifying a link enable status (e.g. a message-independent, p-to-p, and/or broadcast link enable status) for each inter-processor matrix link of the processor. The matrix routing logic routes inter-processor messages according to the matrix link-enable register field(s). A particular link may be selected by a current matrix processor by selecting an ordered list of matrix links according to a relationship between ?H and ?V, and choosing the first enabled link in the selected list for routing.Type: GrantFiled: July 7, 2008Date of Patent: March 27, 2012Assignee: OvicsInventors: Sorin C Cismas, Ilie Garbacea
-
Patent number: 8135229Abstract: An image processing method and device for processing multiple rows of pixels of an image simultaneously with a single instruction. The processing includes selecting a pixel window having a plurality of pixels of an image spanning across multiple rows and columns, building vertical and horizontal load registers to include the plurality of pixels of the selected pixel window, and simultaneously processing selected pixels of the plurality of pixels included in the vertical and horizontal load registers using a single instruction, wherein the vertical and horizontal load registers are shifted when the selected pixels are processed. Accordingly, a method and device for efficient processing of an image is provided.Type: GrantFiled: January 8, 2009Date of Patent: March 13, 2012Assignee: Marvell International Technology Ltd.Inventors: Douglas G. Keithley, Roy G. Moss
-
Patent number: 8131975Abstract: In some embodiments, an integrated circuit comprises a microprocessor matrix including a plurality of mesh-interconnected matrix processors, wherein each matrix processor comprises a data switch configured to direct inter-processor communications within the matrix, and a mapping unit configured to generate a data switch functionality map for a plurality of data switches in the microprocessor matrix. The data switch functionality map is generated by sending a first message through the matrix, and, setting a first functionality status designation for the first data switch in the data switch functionality map upon receiving a reply to the first message from a first data switch through the matrix.Type: GrantFiled: July 7, 2008Date of Patent: March 6, 2012Assignee: OvicsInventors: Sorin C Cismas, Ilie Garbacea
-
Patent number: 8112612Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.Type: GrantFiled: May 17, 2010Date of Patent: February 7, 2012Assignee: Coherent Logix, IncorporatedInventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
-
Patent number: 8108661Abstract: Provided are a data processing apparatus and a method of controlling the data processing apparatus. The data processing apparatus may select a single stream processor from a plurality of stream processors based on stream processor status information, and input data into the selected stream processor. The stream processor status information may include first status information of a processor core and second status information of at least one internal memory.Type: GrantFiled: April 2, 2009Date of Patent: January 31, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Won Jong Lee, Chan Min Park, Shi Hwa Lee
-
Patent number: 8103855Abstract: The present disclosure provides a methodology for reducing congestion of a processing unit, preferably by configuring a plurality of functional blocks to run in parallel or in series without the influence or input from the processing unit. In an embodiment, the present method chains a plurality of functional blocks together by software so that one functional block starts after the completion of another functional block. The configuration of the chain can be series, parallel, and any combination thereof, arranged to meet the circuit's objective. The chaining can be configured and re-configured, preferably by software input. The chaining can also be performed at design time or at run time. The chaining can also be modified, preferably at design time, but can also be modified at run time.Type: GrantFiled: June 29, 2008Date of Patent: January 24, 2012Assignee: Navosha CorporationInventors: Hirak Mitra, Raj Kulkarni, Richard Wicks, Michael Moon
-
Patent number: 8090913Abstract: A system has a first plurality of cores in a first coherency group. Each core transfers data in packets. The cores are directly coupled serially to form a serial path. The data packets are transferred along the serial path. The serial path is coupled at one end to a packet switch. The packet switch is coupled to a memory. The first plurality of cores and the packet switch are on an integrated circuit. The memory may or may not be on the integrated circuit. In another aspect a second plurality of cores in a second coherency group is coupled to the packet switch. The cores of the first and second pluralities may be reconfigured to form or become part of coherency groups different from the first and second coherency groups.Type: GrantFiled: December 20, 2010Date of Patent: January 3, 2012Assignee: Freescale Semiconductor, Inc.Inventors: Perry H. Pelley, III, George P. Hoekstra, Lucio F. Pessoa
-
Patent number: 8086824Abstract: A stream processing system includes a stream processing module coupled to a memory module and operable so as to fetch stream elements from the memory module, to process the stream elements fetched thereby, and to store processed stream elements in the memory module. The stream processing module includes a number (N) of stream processing units, and the memory module is configured with a number (N) of memory bank units each corresponding to a respective one of the stream processing units. The memory module is reconfigurable based on a desired inter-level configuration so that each of the memory bank units is configured to have a memory size sufficient to meet processing requirement of the respective one of the stream processing units.Type: GrantFiled: May 5, 2009Date of Patent: December 27, 2011Assignee: National Taiwan UniversityInventors: You-Ming Tsao, Liang-Gee Chen, Shao-Yi Chien
-
Publication number: 20110314255Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.Type: ApplicationFiled: June 17, 2010Publication date: December 22, 2011Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
-
Publication number: 20110307685Abstract: A multiprocessor system and method for performing matrix operations includes multiple processors cooperatively performing a sparse matrix operation. Distributed among the processors are non-zero matrix elements of first and second sparse matrices. Mapped across the processors are the matrix elements of a results matrix. Each processor receives, from the other processors, non-zero matrix elements of the first matrix that had been distributed to those other processors and generates partial results based on the received non-zero matrix elements of the first matrix and on the non-zero matrix elements of the second matrix distributed to that processor. Each processor receives those partial results generated by other processors and associated with the matrix elements of the results matrix mapped to that processor.Type: ApplicationFiled: June 6, 2011Publication date: December 15, 2011Inventor: William S. Song
-
Patent number: 8078835Abstract: A processor for performing floating-point operations includes an array of processing elements arranged to enable a floating-point operation. Each processing element includes an arithmetic logic unit to receive two input values and perform integer arithmetic on the received input values. The processing elements in the array are connected together in groups of two or more processing elements to enable floating-point operation.Type: GrantFiled: May 23, 2008Date of Patent: December 13, 2011Assignee: Core Logic, Inc.Inventors: Hoon Mo Yang, Man Hwee Jo, Il Hyun Park, Ki Young Choi
-
Patent number: 8078834Abstract: A digital signal processor includes a control block configured to issue instructions based on a stored program, and a compute array including two or more compute engines configured such that each of the issued instructions executes in successive compute engines of at least a subset of the compute engines at successive times. The digital signal processor may be utilized with a control processor or as a stand-alone processor. The compute array may be configured such that each of the issued instructions flows through successive compute engines of at least a subset of the compute engines at successive times.Type: GrantFiled: January 9, 2008Date of Patent: December 13, 2011Assignee: Analog Devices, Inc.Inventor: Douglas Garde
-
Patent number: 8078804Abstract: A data cache memory coupled to a processor including processor clusters are adapted to operate simultaneously on scalar and vectorial data by providing data locations in the data cache memory for storing data for processing. The data locations are accessed either in a scalar mode or in a vectorial mode. This is done by explicitly mapping the data locations that are scalar and the data locations that are vectorial.Type: GrantFiled: June 26, 2007Date of Patent: December 13, 2011Assignees: STMicroelectronics S.r.l., STMicroelectronics N.V.Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elena Salurso, Elio Guidetti
-
Publication number: 20110296137Abstract: A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.Type: ApplicationFiled: May 28, 2010Publication date: December 1, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 8065337Abstract: Large-scale table data stored in a shared memory are sorted by a plurality of processors in parallel. According to the present invention, the records subjected to processing are first divided for allocation to the plurality of processors. Then, each processor counts the numbers of local occurrences of the field value sequence numbers associated with the records to be processed. The numbers of local occurrences of the field value sequence numbers counted by each processor is then converted into global cumulative numbers, i.e., the cumulative numbers used in common by the plurality of processors. Finally, each processor utilizes the global cumulative numbers as pointers to rearrange the order of the allocated records.Type: GrantFiled: August 13, 2010Date of Patent: November 22, 2011Assignee: Turbo Data Laboratories, Inc.Inventor: Shinji Furusho
-
Patent number: 8051277Abstract: A Programmable Arithmetic Logic Unit Cluster is claimed. The plurality of Programmable Logic Blocks (50) in the cluster are in a physically linear sequence; but, will process data in parallel when the data pathways permit. A physically linear and operationally parallel design is possible mainly due to an Internal Register Bus (30). A small subset of data from the Internal Register Bus (30) is modified by each Arithmetic Logic Unit (53). The greater chunk of information of the Internal Register Bus (30) passes, without changes made to the data, through a plurality of two-to-one-multiplexers (54) as data inputs, bypassing the Data Selection (52) and Arithmetic Logic Unit's (53) circuitry. Only the data specified as the Accumulator (41) and Carry History (31) are modified by the blocks (50). The Accumulator (41) and the Carry Output Line (44) are distributed back onto the Internal Register Bus (30) for subsequent blocks or for the Output Register File (79) by said two-to-one-multiplexers (54).Type: GrantFiled: October 12, 2010Date of Patent: November 1, 2011Inventor: Greg S. Callen
-
Publication number: 20110258413Abstract: An apparatus and method for executing media processing applications in a heterogeneous multicore system are provided. The media processing application executing apparatus includes a configuration deciding unit to decide a configuration for a combination of computational kernels and cores in which the computation kernels are to be executed. The computation kernels are media processing components included in a media processing application. The media processing application executing apparatus also includes an execution unit including multiple heterogeneous cores, to execute the media processing application based on the determined configuration.Type: ApplicationFiled: December 30, 2010Publication date: October 20, 2011Applicant: Samsung Electronics Co., Ltd.Inventors: Seung-Mo Cho, Hyo-Jung Song, Sung-Hak Lee, Dong-Woo Im, Oh-Young Jang, Sung-Jong Seo
-
Patent number: 8027962Abstract: Techniques for asynchronous command processing within a parallel processing environment are provided. A command is raised or received within a parallel processing data warehousing environment. A job or a component of the job is dynamically monitored, controlled, or modified in response to the real-time processing of the command. The job is actively processing within the parallel processing data warehousing environment when the command is received and processed against the job or the component of the job.Type: GrantFiled: September 14, 2007Date of Patent: September 27, 2011Assignee: Teradata US, Inc.Inventors: Alex P Yung, Clovis Franklin Lofton
-
Patent number: 8028150Abstract: A method and system for decoding and modifying processor instructions in runtime according to certain rules in order to separately control processing elements embedded within a multi-processor array using a single instruction. The present invention allows multiple processing elements and/or execution units in a multi-processor array to perform different operations, based upon a variable or variables such as their location in the multi-processor array, while accepting a single instruction as an input.Type: GrantFiled: November 16, 2007Date of Patent: September 27, 2011Inventors: Shlomo Selim Rakib, Yoram Zarai
-
Patent number: 8028290Abstract: Multiple instruction set architectures are supported in a system that provides a power-efficient and flexible platform for virtual machine environments requiring multiple support for multiple instruction set architectures (ISAs). A processor includes multiple cores having disparate native ISAs and that may be selectively enabled for operation, so that power is conserved when support for a particular ISA is not required of the processor. A hypervisor controls operation of the cores, locates a core and enables it if necessary when a request to instantiate a virtual machine having a specified ISA is received. The ISA may be specified by a particular operating system and/or application program requirements.Type: GrantFiled: August 30, 2006Date of Patent: September 27, 2011Assignee: International Business Machines CorporationInventors: James Walter Rymarczyk, Michael Ignatowski, Thomas J. Heller, Jr.
-
Publication number: 20110219209Abstract: Embodiments of the present invention provide techniques, including systems, methods, and computer readable medium, for dynamic atomic bitsets. A dynamic atomic bitset is a data structure that provides a bitset that can grow or shrink in size as required. The dynamic atomic bitset is non-blocking, wait-free, and thread-safe.Type: ApplicationFiled: March 4, 2010Publication date: September 8, 2011Applicant: Oracle International CorporationInventor: Nathan Reynolds
-
Publication number: 20110213947Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.Type: ApplicationFiled: May 25, 2010Publication date: September 1, 2011Inventors: John George Mathieson, Phil Carmack, Brian Smith
-
Publication number: 20110213998Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.Type: ApplicationFiled: May 25, 2010Publication date: September 1, 2011Inventors: John George Mathieson, Phil Carmack, Brian Smith
-
Publication number: 20110202587Abstract: A system and method for processing data utilizes a matrix of processing units using an array of commands stored in memory to process input data words to generate output data words, which can be used in various applications.Type: ApplicationFiled: October 9, 2009Publication date: August 18, 2011Applicant: NXP B.V.Inventor: Xavier Chabot
-
Publication number: 20110185151Abstract: A parallel processor is described which is operated in a SIMD manner. The processor comprises: a plurality of processing elements connected in a string and grouped into a plurality of processing units, wherein each processing unit comprises a plurality of processing elements which each have direct interconnections with all of the other processing elements within the respective processing unit, the interconnections enabling data transfer between any two elements within a unit to be effected in a single clock cycle.Type: ApplicationFiled: May 20, 2009Publication date: July 28, 2011Inventors: Martin Whitaker, John Lancaster
-
Patent number: 7987339Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.Type: GrantFiled: June 30, 2010Date of Patent: July 26, 2011Assignee: Coherent Logix, IncorporatedInventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
-
Patent number: 7987338Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.Type: GrantFiled: May 17, 2010Date of Patent: July 26, 2011Assignee: Coherent Logix, IncorporatedInventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
-
Patent number: 7987340Abstract: Data is transmitted from a sending processor over a network to one or more receiving processor in a forward direction during an allocated slot, and acknowledge signals are sent in a reverse direction during the same allocated slot, to indicate whether the receiving processor is able to receive data If one or more of the receiving processors indicates that it is unable to receive the data, the data is retransmitted during the next allocated slot. This means that the sending processor is able to determine within the slot period whether a retransmission is necessary, but that the slot period only needs to be long enough for one-way communication.Type: GrantFiled: February 19, 2004Date of Patent: July 26, 2011Inventors: Gajinder Panesar, Anthony Peter John Claydon, William Philip Robbins, Alex Orr, Andrew Duller
-
Publication number: 20110179251Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. The sleeping computer (12) can be awaiting data or instructions (12). In the case of instructions, the sleeping computer (12) can be waiting to store the instructions or to immediately execute the instructions. In the later case, the instructions are placed in an instruction register (30a) when they are received and executed therefrom, without first placing the instructions first into memory. The instructions can include a micro-loop (100) which is capable of performing a series of operations repeatedly.Type: ApplicationFiled: March 21, 2011Publication date: July 21, 2011Inventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
-
Patent number: 7984266Abstract: A computer array (10) has a plurality of computers (12) for accomplishing a larger task that is divided into smaller tasks, each of the smaller tasks being assigned to one or more of the computers (12). Each of the computers (12) may be configured for specific functions and individual input/output circuits (26) associated with exterior computers (12) are specifically adapted for particular input/output functions. An example of 25 computers (12) arranged in the computer array (10) has a centralized computational core (34) with the computers (12) nearer the edge of the die (14) being configured for input and/or output.Type: GrantFiled: June 5, 2007Date of Patent: July 19, 2011Assignee: VNS Portfolio LLCInventor: Charles H. Moore
-
Publication number: 20110167240Abstract: A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.Type: ApplicationFiled: March 15, 2011Publication date: July 7, 2011Inventor: Mark Beaumont
-
Publication number: 20110153982Abstract: Systems and methods are disclosed for collecting data from cores of a multi-core processor using collection packets. A collection packet can traverse through cores of the multi-core processor while accumulating requested data. Upon completing the accumulation of the requested data from all required cores, the collection packet can be transmitted to a system operator for system maintenance and/or monitoring.Type: ApplicationFiled: December 21, 2009Publication date: June 23, 2011Applicant: BBN TECHNOLOGIES CORP.Inventor: Craig Partridge
-
Publication number: 20110154345Abstract: Implementations and techniques for multicore processors having a domain interconnection network configured to associate a first collision domain network with a second collision domain network in communication are generally disclosed.Type: ApplicationFiled: December 21, 2009Publication date: June 23, 2011Inventor: Ezekiel Kruglick
-
Patent number: 7966475Abstract: A data processor comprises a plurality of processing elements arranged for parallel processing of data, and a controller for controlling the plurality of processing elements. The controller is operable to determine respective status information for a plurality of processing threads, and to control processing of the processing threads by the plurality of processors in dependence upon such status information.Type: GrantFiled: January 10, 2007Date of Patent: June 21, 2011Assignee: Rambus Inc.Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Patent number: 7962717Abstract: Each possessor node in an array of nodes has a respective local node address, and each local node address comprises a plurality of components having an order of addressing significance from most to least significant. Each node comprises: mapping means configured to map each component of the local node address onto a respective routing direction, and a switch arranged to receive a message having a destination node address identifying a destination node. The switch comprises: means for comparing the local node address to the destination node address to identify a the most significant non-matching component; and means for routing the message to another node, on the condition that the local node address does not match the destination node address, in the direction mapped to the most significant non-matching component.Type: GrantFiled: March 14, 2007Date of Patent: June 14, 2011Assignee: XMOS LimitedInventor: Michael David May
-
Patent number: 7958332Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.Type: GrantFiled: March 13, 2009Date of Patent: June 7, 2011Assignee: Rambus Inc.Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Publication number: 20110131392Abstract: A processing space comprises an array of transistors empowered by forming connections through circuit pass transistors to power and data input/output means and connections therebetween through signal pass transistors. By structuring the needed circuits at the site(s) of the data the von Neumann bottleneck is eliminated, which increases the computing power of the apparatus substantially, thus to enable non-stop Information Processing on steady streams of data and code, with no repetitive instruction and data transfers required. That code will identify the physical locations of every transistor in the processing space, and will enable only the pass transistors therein needed to structure the circuits of any arithmetical/logical algorithm in a processing space of any size, speed, and level of computer power. By joining one processing space to another the apparatus also exhibits super-scalability.Type: ApplicationFiled: January 7, 2011Publication date: June 2, 2011Inventor: William Stuart Lovell
-
Patent number: 7941637Abstract: A system has a first plurality of cores in a first coherency group. Each core transfers data in packets. The cores are directly coupled serially to form a serial path. The data packets are transferred along the serial path. The serial path is coupled at one end to a packet switch. The packet switch is coupled to a memory. The first plurality of cores and the packet switch are on an integrated circuit. The memory may or may not be on the integrated circuit. In another aspect a second plurality of cores in a second coherency group is coupled to the packet switch. The cores of the first and second pluralities may be reconfigured to form or become part of coherency groups different from the first and second coherency groups.Type: GrantFiled: April 15, 2008Date of Patent: May 10, 2011Assignee: Freescale Semiconductor, Inc.Inventors: Perry H. Pelley, III, George P. Hoekstra, Lucio F. C. Pessoa
-
Publication number: 20110107058Abstract: A plurality of processing elements (PEs) include memory local to at least one of the processing elements in a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid to connect the PEs and their local memories to a common controller. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. The packet-switched network supports multiple concurrent transfers between PEs and memory. Memory accesses include block and/or broadcast read and write operations, in which data can be replicated within the nodes and, according to the operation, written into the shared memory or into the local PE memory.Type: ApplicationFiled: January 11, 2011Publication date: May 5, 2011Inventor: Ray MCCONNELL
-
Patent number: 7937557Abstract: A computer array (10) has a plurality of computers (12) for accomplishing a larger task that is divided into smaller tasks, each of the smaller tasks being assigned to one or more of the computers (12). Each of the computers (12) may be configured for specific functions and individual input/output circuits (26) associated with exterior computers (12) are specifically adapted for particular input/output functions. An example of 25 computers (12) arranged in the computer array (10) has a centralized computational core (34) with the computers (12) nearer the edge of the die (14) being configured for input and/or output.Type: GrantFiled: March 16, 2004Date of Patent: May 3, 2011Assignee: VNS Portfolio LLCInventor: Charles H. Moore
-
Patent number: 7937558Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.Type: GrantFiled: February 8, 2008Date of Patent: May 3, 2011Assignee: Coherent Logix, IncorporatedInventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
-
Patent number: 7934075Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously and operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. The instructions executed by the computers (12) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In one application, the sleeping computer (12) is awakened by an input such that it commences an action that would otherwise required an interrupt of an otherwise active computer. For example, one computer (12f) can be used to monitor an input/output port of the computer array (10).Type: GrantFiled: May 26, 2006Date of Patent: April 26, 2011Assignee: VNS Portfolio LLCInventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
-
Patent number: 7930517Abstract: An array of programmable data-processing cells configured as a plurality of cross-connected pipelines. An apparatus includes cells capable of performing data-processing functions selectable by a presented instruction. A first set of cells includes an input cell, an output cell, and a series of at least one interior cell providing an acyclic data processing path from the input cell to the output cell. Additional cells are similarly configured. Memory presents configuration instructions to cells in response to a configuration code. Data advances through ranks of the cells. The configuration code advances to memory associated with a rank in tandem with the data.Type: GrantFiled: January 9, 2009Date of Patent: April 19, 2011Assignee: Wave Semiconductor, Inc.Inventor: Karl M. Fant
-
Patent number: 7925861Abstract: A data processor comprises a plurality of processing elements arranged in a first plurality of single instruction multiple data (SIMD) processing arrays, and comprises a second plurality of controllers for transferring instructions to the processing arrays. Each controller is operable to retrieve a plurality of incoming instruction streams in parallel with one another and operable to supply incoming instruction streams to one of a plurality of processing arrays.Type: GrantFiled: January 31, 2007Date of Patent: April 12, 2011Assignee: Rambus Inc.Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Patent number: 7913069Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. Instruction words (48) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In a particular example, the series of operations are included in a single instruction word (48). The micro-loop (100) in combination with the ability of the computers (12) to send instruction words (48) to a neighboring computer (12) provides a powerful tool for allowing a computer (12) to utilize the resources of a neighboring computer (12).Type: GrantFiled: May 26, 2006Date of Patent: March 22, 2011Assignee: VNS Portfolio LLCInventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
-
Publication number: 20110066825Abstract: Each possessor node in an array of nodes has a respective local node address, and each local node address comprises a plurality of components having an order of addressing significance from most to least significant. Each node comprises: mapping means configured to map each component of the local node address onto a respective routing direction, and a switch arranged to receive a message having a destination node address identifying a destination node. The switch comprises: means for comparing the local node address to the destination node address to identify a the most significant non-matching component; and means for routing the message to another node, on the condition that the local node address does not match the destination node address, in the direction mapped to the most significant non-matching component.Type: ApplicationFiled: November 18, 2010Publication date: March 17, 2011Applicant: XMOS LTD.Inventor: Michael David MAY
-
Patent number: 7908462Abstract: The current invention provides a virtual world simulation system capable of hosting with massive amount of concurrent players by integrating commodity parallel co-processors into servers. The current invention proposes novel parallel processing algorithms to make use of commodity parallel co-processors like a graphic processing unit (GPU) or any specialized hardware with parallel architecture design like a field-programmable gate array (FPGA), to accelerate virtual world simulation.Type: GrantFiled: June 9, 2010Date of Patent: March 15, 2011Assignee: Zillians IncorporatedInventor: Mu Chi Sung
-
Patent number: 7908409Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.Type: GrantFiled: August 6, 2009Date of Patent: March 15, 2011Assignee: Altera CorporationInventors: Edwin Franklin Barry, Edward A. Wolff
-
Patent number: 7904615Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A plurality of read lines (18), write lines (20) and data lines (22) interconnect the computers (12). When one computer (12) sets a read line (18) high and the other computer sets a corresponding write line (20) then data is transferred on the data lines (22). When both the read line (18) and corresponding write line (20) go low this allows both communicating computers (12) to know that the communication is completed. An acknowledge line (72) goes high to restart the computers (12).Type: GrantFiled: February 16, 2006Date of Patent: March 8, 2011Assignee: VNS Portfolio LLCInventor: Charles H. Moore
-
Patent number: 7904695Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A slot sequencer (42) in each of the computers produces a timing pulse to cause the computer (12) to execute a next instruction. However, when the present instruction is a read or write type instruction, the slot sequencer does not produce the pulse until an acknowledge signal (86) starts it. The acknowledge signal (86) is produced when it is recognized that the communication has been completed by the other computer (12).Type: GrantFiled: February 16, 2006Date of Patent: March 8, 2011Assignee: VNS Portfolio LLCInventor: Charles H. Moore