Array Processor Operation Patents (Class 712/16)
  • Patent number: 8145880
    Abstract: According to some embodiments, an integrated circuit comprises a microprocessor matrix of mesh-interconnected matrix processors. Each processor comprises a data switch including a data switch link register and matrix routing logic. The data switch link register includes one or more matrix link-enable register fields specifying a link enable status (e.g. a message-independent, p-to-p, and/or broadcast link enable status) for each inter-processor matrix link of the processor. The matrix routing logic routes inter-processor messages according to the matrix link-enable register field(s). A particular link may be selected by a current matrix processor by selecting an ordered list of matrix links according to a relationship between ?H and ?V, and choosing the first enabled link in the selected list for routing.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: March 27, 2012
    Assignee: Ovics
    Inventors: Sorin C Cismas, Ilie Garbacea
  • Patent number: 8135229
    Abstract: An image processing method and device for processing multiple rows of pixels of an image simultaneously with a single instruction. The processing includes selecting a pixel window having a plurality of pixels of an image spanning across multiple rows and columns, building vertical and horizontal load registers to include the plurality of pixels of the selected pixel window, and simultaneously processing selected pixels of the plurality of pixels included in the vertical and horizontal load registers using a single instruction, wherein the vertical and horizontal load registers are shifted when the selected pixels are processed. Accordingly, a method and device for efficient processing of an image is provided.
    Type: Grant
    Filed: January 8, 2009
    Date of Patent: March 13, 2012
    Assignee: Marvell International Technology Ltd.
    Inventors: Douglas G. Keithley, Roy G. Moss
  • Patent number: 8131975
    Abstract: In some embodiments, an integrated circuit comprises a microprocessor matrix including a plurality of mesh-interconnected matrix processors, wherein each matrix processor comprises a data switch configured to direct inter-processor communications within the matrix, and a mapping unit configured to generate a data switch functionality map for a plurality of data switches in the microprocessor matrix. The data switch functionality map is generated by sending a first message through the matrix, and, setting a first functionality status designation for the first data switch in the data switch functionality map upon receiving a reply to the first message from a first data switch through the matrix.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: March 6, 2012
    Assignee: Ovics
    Inventors: Sorin C Cismas, Ilie Garbacea
  • Patent number: 8112612
    Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: February 7, 2012
    Assignee: Coherent Logix, Incorporated
    Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
  • Patent number: 8108661
    Abstract: Provided are a data processing apparatus and a method of controlling the data processing apparatus. The data processing apparatus may select a single stream processor from a plurality of stream processors based on stream processor status information, and input data into the selected stream processor. The stream processor status information may include first status information of a processor core and second status information of at least one internal memory.
    Type: Grant
    Filed: April 2, 2009
    Date of Patent: January 31, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Won Jong Lee, Chan Min Park, Shi Hwa Lee
  • Patent number: 8103855
    Abstract: The present disclosure provides a methodology for reducing congestion of a processing unit, preferably by configuring a plurality of functional blocks to run in parallel or in series without the influence or input from the processing unit. In an embodiment, the present method chains a plurality of functional blocks together by software so that one functional block starts after the completion of another functional block. The configuration of the chain can be series, parallel, and any combination thereof, arranged to meet the circuit's objective. The chaining can be configured and re-configured, preferably by software input. The chaining can also be performed at design time or at run time. The chaining can also be modified, preferably at design time, but can also be modified at run time.
    Type: Grant
    Filed: June 29, 2008
    Date of Patent: January 24, 2012
    Assignee: Navosha Corporation
    Inventors: Hirak Mitra, Raj Kulkarni, Richard Wicks, Michael Moon
  • Patent number: 8090913
    Abstract: A system has a first plurality of cores in a first coherency group. Each core transfers data in packets. The cores are directly coupled serially to form a serial path. The data packets are transferred along the serial path. The serial path is coupled at one end to a packet switch. The packet switch is coupled to a memory. The first plurality of cores and the packet switch are on an integrated circuit. The memory may or may not be on the integrated circuit. In another aspect a second plurality of cores in a second coherency group is coupled to the packet switch. The cores of the first and second pluralities may be reconfigured to form or become part of coherency groups different from the first and second coherency groups.
    Type: Grant
    Filed: December 20, 2010
    Date of Patent: January 3, 2012
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Perry H. Pelley, III, George P. Hoekstra, Lucio F. Pessoa
  • Patent number: 8086824
    Abstract: A stream processing system includes a stream processing module coupled to a memory module and operable so as to fetch stream elements from the memory module, to process the stream elements fetched thereby, and to store processed stream elements in the memory module. The stream processing module includes a number (N) of stream processing units, and the memory module is configured with a number (N) of memory bank units each corresponding to a respective one of the stream processing units. The memory module is reconfigurable based on a desired inter-level configuration so that each of the memory bank units is configured to have a memory size sufficient to meet processing requirement of the respective one of the stream processing units.
    Type: Grant
    Filed: May 5, 2009
    Date of Patent: December 27, 2011
    Assignee: National Taiwan University
    Inventors: You-Ming Tsao, Liang-Gee Chen, Shao-Yi Chien
  • Publication number: 20110314255
    Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.
    Type: Application
    Filed: June 17, 2010
    Publication date: December 22, 2011
    Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
  • Publication number: 20110307685
    Abstract: A multiprocessor system and method for performing matrix operations includes multiple processors cooperatively performing a sparse matrix operation. Distributed among the processors are non-zero matrix elements of first and second sparse matrices. Mapped across the processors are the matrix elements of a results matrix. Each processor receives, from the other processors, non-zero matrix elements of the first matrix that had been distributed to those other processors and generates partial results based on the received non-zero matrix elements of the first matrix and on the non-zero matrix elements of the second matrix distributed to that processor. Each processor receives those partial results generated by other processors and associated with the matrix elements of the results matrix mapped to that processor.
    Type: Application
    Filed: June 6, 2011
    Publication date: December 15, 2011
    Inventor: William S. Song
  • Patent number: 8078835
    Abstract: A processor for performing floating-point operations includes an array of processing elements arranged to enable a floating-point operation. Each processing element includes an arithmetic logic unit to receive two input values and perform integer arithmetic on the received input values. The processing elements in the array are connected together in groups of two or more processing elements to enable floating-point operation.
    Type: Grant
    Filed: May 23, 2008
    Date of Patent: December 13, 2011
    Assignee: Core Logic, Inc.
    Inventors: Hoon Mo Yang, Man Hwee Jo, Il Hyun Park, Ki Young Choi
  • Patent number: 8078834
    Abstract: A digital signal processor includes a control block configured to issue instructions based on a stored program, and a compute array including two or more compute engines configured such that each of the issued instructions executes in successive compute engines of at least a subset of the compute engines at successive times. The digital signal processor may be utilized with a control processor or as a stand-alone processor. The compute array may be configured such that each of the issued instructions flows through successive compute engines of at least a subset of the compute engines at successive times.
    Type: Grant
    Filed: January 9, 2008
    Date of Patent: December 13, 2011
    Assignee: Analog Devices, Inc.
    Inventor: Douglas Garde
  • Patent number: 8078804
    Abstract: A data cache memory coupled to a processor including processor clusters are adapted to operate simultaneously on scalar and vectorial data by providing data locations in the data cache memory for storing data for processing. The data locations are accessed either in a scalar mode or in a vectorial mode. This is done by explicitly mapping the data locations that are scalar and the data locations that are vectorial.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: December 13, 2011
    Assignees: STMicroelectronics S.r.l., STMicroelectronics N.V.
    Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elena Salurso, Elio Guidetti
  • Publication number: 20110296137
    Abstract: A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8065337
    Abstract: Large-scale table data stored in a shared memory are sorted by a plurality of processors in parallel. According to the present invention, the records subjected to processing are first divided for allocation to the plurality of processors. Then, each processor counts the numbers of local occurrences of the field value sequence numbers associated with the records to be processed. The numbers of local occurrences of the field value sequence numbers counted by each processor is then converted into global cumulative numbers, i.e., the cumulative numbers used in common by the plurality of processors. Finally, each processor utilizes the global cumulative numbers as pointers to rearrange the order of the allocated records.
    Type: Grant
    Filed: August 13, 2010
    Date of Patent: November 22, 2011
    Assignee: Turbo Data Laboratories, Inc.
    Inventor: Shinji Furusho
  • Patent number: 8051277
    Abstract: A Programmable Arithmetic Logic Unit Cluster is claimed. The plurality of Programmable Logic Blocks (50) in the cluster are in a physically linear sequence; but, will process data in parallel when the data pathways permit. A physically linear and operationally parallel design is possible mainly due to an Internal Register Bus (30). A small subset of data from the Internal Register Bus (30) is modified by each Arithmetic Logic Unit (53). The greater chunk of information of the Internal Register Bus (30) passes, without changes made to the data, through a plurality of two-to-one-multiplexers (54) as data inputs, bypassing the Data Selection (52) and Arithmetic Logic Unit's (53) circuitry. Only the data specified as the Accumulator (41) and Carry History (31) are modified by the blocks (50). The Accumulator (41) and the Carry Output Line (44) are distributed back onto the Internal Register Bus (30) for subsequent blocks or for the Output Register File (79) by said two-to-one-multiplexers (54).
    Type: Grant
    Filed: October 12, 2010
    Date of Patent: November 1, 2011
    Inventor: Greg S. Callen
  • Publication number: 20110258413
    Abstract: An apparatus and method for executing media processing applications in a heterogeneous multicore system are provided. The media processing application executing apparatus includes a configuration deciding unit to decide a configuration for a combination of computational kernels and cores in which the computation kernels are to be executed. The computation kernels are media processing components included in a media processing application. The media processing application executing apparatus also includes an execution unit including multiple heterogeneous cores, to execute the media processing application based on the determined configuration.
    Type: Application
    Filed: December 30, 2010
    Publication date: October 20, 2011
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Seung-Mo Cho, Hyo-Jung Song, Sung-Hak Lee, Dong-Woo Im, Oh-Young Jang, Sung-Jong Seo
  • Patent number: 8027962
    Abstract: Techniques for asynchronous command processing within a parallel processing environment are provided. A command is raised or received within a parallel processing data warehousing environment. A job or a component of the job is dynamically monitored, controlled, or modified in response to the real-time processing of the command. The job is actively processing within the parallel processing data warehousing environment when the command is received and processed against the job or the component of the job.
    Type: Grant
    Filed: September 14, 2007
    Date of Patent: September 27, 2011
    Assignee: Teradata US, Inc.
    Inventors: Alex P Yung, Clovis Franklin Lofton
  • Patent number: 8028150
    Abstract: A method and system for decoding and modifying processor instructions in runtime according to certain rules in order to separately control processing elements embedded within a multi-processor array using a single instruction. The present invention allows multiple processing elements and/or execution units in a multi-processor array to perform different operations, based upon a variable or variables such as their location in the multi-processor array, while accepting a single instruction as an input.
    Type: Grant
    Filed: November 16, 2007
    Date of Patent: September 27, 2011
    Inventors: Shlomo Selim Rakib, Yoram Zarai
  • Patent number: 8028290
    Abstract: Multiple instruction set architectures are supported in a system that provides a power-efficient and flexible platform for virtual machine environments requiring multiple support for multiple instruction set architectures (ISAs). A processor includes multiple cores having disparate native ISAs and that may be selectively enabled for operation, so that power is conserved when support for a particular ISA is not required of the processor. A hypervisor controls operation of the cores, locates a core and enables it if necessary when a request to instantiate a virtual machine having a specified ISA is received. The ISA may be specified by a particular operating system and/or application program requirements.
    Type: Grant
    Filed: August 30, 2006
    Date of Patent: September 27, 2011
    Assignee: International Business Machines Corporation
    Inventors: James Walter Rymarczyk, Michael Ignatowski, Thomas J. Heller, Jr.
  • Publication number: 20110219209
    Abstract: Embodiments of the present invention provide techniques, including systems, methods, and computer readable medium, for dynamic atomic bitsets. A dynamic atomic bitset is a data structure that provides a bitset that can grow or shrink in size as required. The dynamic atomic bitset is non-blocking, wait-free, and thread-safe.
    Type: Application
    Filed: March 4, 2010
    Publication date: September 8, 2011
    Applicant: Oracle International Corporation
    Inventor: Nathan Reynolds
  • Publication number: 20110213947
    Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.
    Type: Application
    Filed: May 25, 2010
    Publication date: September 1, 2011
    Inventors: John George Mathieson, Phil Carmack, Brian Smith
  • Publication number: 20110213998
    Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.
    Type: Application
    Filed: May 25, 2010
    Publication date: September 1, 2011
    Inventors: John George Mathieson, Phil Carmack, Brian Smith
  • Publication number: 20110202587
    Abstract: A system and method for processing data utilizes a matrix of processing units using an array of commands stored in memory to process input data words to generate output data words, which can be used in various applications.
    Type: Application
    Filed: October 9, 2009
    Publication date: August 18, 2011
    Applicant: NXP B.V.
    Inventor: Xavier Chabot
  • Publication number: 20110185151
    Abstract: A parallel processor is described which is operated in a SIMD manner. The processor comprises: a plurality of processing elements connected in a string and grouped into a plurality of processing units, wherein each processing unit comprises a plurality of processing elements which each have direct interconnections with all of the other processing elements within the respective processing unit, the interconnections enabling data transfer between any two elements within a unit to be effected in a single clock cycle.
    Type: Application
    Filed: May 20, 2009
    Publication date: July 28, 2011
    Inventors: Martin Whitaker, John Lancaster
  • Patent number: 7987339
    Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.
    Type: Grant
    Filed: June 30, 2010
    Date of Patent: July 26, 2011
    Assignee: Coherent Logix, Incorporated
    Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
  • Patent number: 7987338
    Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: July 26, 2011
    Assignee: Coherent Logix, Incorporated
    Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
  • Patent number: 7987340
    Abstract: Data is transmitted from a sending processor over a network to one or more receiving processor in a forward direction during an allocated slot, and acknowledge signals are sent in a reverse direction during the same allocated slot, to indicate whether the receiving processor is able to receive data If one or more of the receiving processors indicates that it is unable to receive the data, the data is retransmitted during the next allocated slot. This means that the sending processor is able to determine within the slot period whether a retransmission is necessary, but that the slot period only needs to be long enough for one-way communication.
    Type: Grant
    Filed: February 19, 2004
    Date of Patent: July 26, 2011
    Inventors: Gajinder Panesar, Anthony Peter John Claydon, William Philip Robbins, Alex Orr, Andrew Duller
  • Publication number: 20110179251
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. The sleeping computer (12) can be awaiting data or instructions (12). In the case of instructions, the sleeping computer (12) can be waiting to store the instructions or to immediately execute the instructions. In the later case, the instructions are placed in an instruction register (30a) when they are received and executed therefrom, without first placing the instructions first into memory. The instructions can include a micro-loop (100) which is capable of performing a series of operations repeatedly.
    Type: Application
    Filed: March 21, 2011
    Publication date: July 21, 2011
    Inventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
  • Patent number: 7984266
    Abstract: A computer array (10) has a plurality of computers (12) for accomplishing a larger task that is divided into smaller tasks, each of the smaller tasks being assigned to one or more of the computers (12). Each of the computers (12) may be configured for specific functions and individual input/output circuits (26) associated with exterior computers (12) are specifically adapted for particular input/output functions. An example of 25 computers (12) arranged in the computer array (10) has a centralized computational core (34) with the computers (12) nearer the edge of the die (14) being configured for input and/or output.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: July 19, 2011
    Assignee: VNS Portfolio LLC
    Inventor: Charles H. Moore
  • Publication number: 20110167240
    Abstract: A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.
    Type: Application
    Filed: March 15, 2011
    Publication date: July 7, 2011
    Inventor: Mark Beaumont
  • Publication number: 20110153982
    Abstract: Systems and methods are disclosed for collecting data from cores of a multi-core processor using collection packets. A collection packet can traverse through cores of the multi-core processor while accumulating requested data. Upon completing the accumulation of the requested data from all required cores, the collection packet can be transmitted to a system operator for system maintenance and/or monitoring.
    Type: Application
    Filed: December 21, 2009
    Publication date: June 23, 2011
    Applicant: BBN TECHNOLOGIES CORP.
    Inventor: Craig Partridge
  • Publication number: 20110154345
    Abstract: Implementations and techniques for multicore processors having a domain interconnection network configured to associate a first collision domain network with a second collision domain network in communication are generally disclosed.
    Type: Application
    Filed: December 21, 2009
    Publication date: June 23, 2011
    Inventor: Ezekiel Kruglick
  • Patent number: 7966475
    Abstract: A data processor comprises a plurality of processing elements arranged for parallel processing of data, and a controller for controlling the plurality of processing elements. The controller is operable to determine respective status information for a plurality of processing threads, and to control processing of the processing threads by the plurality of processors in dependence upon such status information.
    Type: Grant
    Filed: January 10, 2007
    Date of Patent: June 21, 2011
    Assignee: Rambus Inc.
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Patent number: 7962717
    Abstract: Each possessor node in an array of nodes has a respective local node address, and each local node address comprises a plurality of components having an order of addressing significance from most to least significant. Each node comprises: mapping means configured to map each component of the local node address onto a respective routing direction, and a switch arranged to receive a message having a destination node address identifying a destination node. The switch comprises: means for comparing the local node address to the destination node address to identify a the most significant non-matching component; and means for routing the message to another node, on the condition that the local node address does not match the destination node address, in the direction mapped to the most significant non-matching component.
    Type: Grant
    Filed: March 14, 2007
    Date of Patent: June 14, 2011
    Assignee: XMOS Limited
    Inventor: Michael David May
  • Patent number: 7958332
    Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.
    Type: Grant
    Filed: March 13, 2009
    Date of Patent: June 7, 2011
    Assignee: Rambus Inc.
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Publication number: 20110131392
    Abstract: A processing space comprises an array of transistors empowered by forming connections through circuit pass transistors to power and data input/output means and connections therebetween through signal pass transistors. By structuring the needed circuits at the site(s) of the data the von Neumann bottleneck is eliminated, which increases the computing power of the apparatus substantially, thus to enable non-stop Information Processing on steady streams of data and code, with no repetitive instruction and data transfers required. That code will identify the physical locations of every transistor in the processing space, and will enable only the pass transistors therein needed to structure the circuits of any arithmetical/logical algorithm in a processing space of any size, speed, and level of computer power. By joining one processing space to another the apparatus also exhibits super-scalability.
    Type: Application
    Filed: January 7, 2011
    Publication date: June 2, 2011
    Inventor: William Stuart Lovell
  • Patent number: 7941637
    Abstract: A system has a first plurality of cores in a first coherency group. Each core transfers data in packets. The cores are directly coupled serially to form a serial path. The data packets are transferred along the serial path. The serial path is coupled at one end to a packet switch. The packet switch is coupled to a memory. The first plurality of cores and the packet switch are on an integrated circuit. The memory may or may not be on the integrated circuit. In another aspect a second plurality of cores in a second coherency group is coupled to the packet switch. The cores of the first and second pluralities may be reconfigured to form or become part of coherency groups different from the first and second coherency groups.
    Type: Grant
    Filed: April 15, 2008
    Date of Patent: May 10, 2011
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Perry H. Pelley, III, George P. Hoekstra, Lucio F. C. Pessoa
  • Publication number: 20110107058
    Abstract: A plurality of processing elements (PEs) include memory local to at least one of the processing elements in a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid to connect the PEs and their local memories to a common controller. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. The packet-switched network supports multiple concurrent transfers between PEs and memory. Memory accesses include block and/or broadcast read and write operations, in which data can be replicated within the nodes and, according to the operation, written into the shared memory or into the local PE memory.
    Type: Application
    Filed: January 11, 2011
    Publication date: May 5, 2011
    Inventor: Ray MCCONNELL
  • Patent number: 7937557
    Abstract: A computer array (10) has a plurality of computers (12) for accomplishing a larger task that is divided into smaller tasks, each of the smaller tasks being assigned to one or more of the computers (12). Each of the computers (12) may be configured for specific functions and individual input/output circuits (26) associated with exterior computers (12) are specifically adapted for particular input/output functions. An example of 25 computers (12) arranged in the computer array (10) has a centralized computational core (34) with the computers (12) nearer the edge of the die (14) being configured for input and/or output.
    Type: Grant
    Filed: March 16, 2004
    Date of Patent: May 3, 2011
    Assignee: VNS Portfolio LLC
    Inventor: Charles H. Moore
  • Patent number: 7937558
    Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: May 3, 2011
    Assignee: Coherent Logix, Incorporated
    Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
  • Patent number: 7934075
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously and operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. The instructions executed by the computers (12) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In one application, the sleeping computer (12) is awakened by an input such that it commences an action that would otherwise required an interrupt of an otherwise active computer. For example, one computer (12f) can be used to monitor an input/output port of the computer array (10).
    Type: Grant
    Filed: May 26, 2006
    Date of Patent: April 26, 2011
    Assignee: VNS Portfolio LLC
    Inventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
  • Patent number: 7930517
    Abstract: An array of programmable data-processing cells configured as a plurality of cross-connected pipelines. An apparatus includes cells capable of performing data-processing functions selectable by a presented instruction. A first set of cells includes an input cell, an output cell, and a series of at least one interior cell providing an acyclic data processing path from the input cell to the output cell. Additional cells are similarly configured. Memory presents configuration instructions to cells in response to a configuration code. Data advances through ranks of the cells. The configuration code advances to memory associated with a rank in tandem with the data.
    Type: Grant
    Filed: January 9, 2009
    Date of Patent: April 19, 2011
    Assignee: Wave Semiconductor, Inc.
    Inventor: Karl M. Fant
  • Patent number: 7925861
    Abstract: A data processor comprises a plurality of processing elements arranged in a first plurality of single instruction multiple data (SIMD) processing arrays, and comprises a second plurality of controllers for transferring instructions to the processing arrays. Each controller is operable to retrieve a plurality of incoming instruction streams in parallel with one another and operable to supply incoming instruction streams to one of a plurality of processing arrays.
    Type: Grant
    Filed: January 31, 2007
    Date of Patent: April 12, 2011
    Assignee: Rambus Inc.
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Patent number: 7913069
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. Instruction words (48) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In a particular example, the series of operations are included in a single instruction word (48). The micro-loop (100) in combination with the ability of the computers (12) to send instruction words (48) to a neighboring computer (12) provides a powerful tool for allowing a computer (12) to utilize the resources of a neighboring computer (12).
    Type: Grant
    Filed: May 26, 2006
    Date of Patent: March 22, 2011
    Assignee: VNS Portfolio LLC
    Inventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
  • Publication number: 20110066825
    Abstract: Each possessor node in an array of nodes has a respective local node address, and each local node address comprises a plurality of components having an order of addressing significance from most to least significant. Each node comprises: mapping means configured to map each component of the local node address onto a respective routing direction, and a switch arranged to receive a message having a destination node address identifying a destination node. The switch comprises: means for comparing the local node address to the destination node address to identify a the most significant non-matching component; and means for routing the message to another node, on the condition that the local node address does not match the destination node address, in the direction mapped to the most significant non-matching component.
    Type: Application
    Filed: November 18, 2010
    Publication date: March 17, 2011
    Applicant: XMOS LTD.
    Inventor: Michael David MAY
  • Patent number: 7908462
    Abstract: The current invention provides a virtual world simulation system capable of hosting with massive amount of concurrent players by integrating commodity parallel co-processors into servers. The current invention proposes novel parallel processing algorithms to make use of commodity parallel co-processors like a graphic processing unit (GPU) or any specialized hardware with parallel architecture design like a field-programmable gate array (FPGA), to accelerate virtual world simulation.
    Type: Grant
    Filed: June 9, 2010
    Date of Patent: March 15, 2011
    Assignee: Zillians Incorporated
    Inventor: Mu Chi Sung
  • Patent number: 7908409
    Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.
    Type: Grant
    Filed: August 6, 2009
    Date of Patent: March 15, 2011
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Edward A. Wolff
  • Patent number: 7904615
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A plurality of read lines (18), write lines (20) and data lines (22) interconnect the computers (12). When one computer (12) sets a read line (18) high and the other computer sets a corresponding write line (20) then data is transferred on the data lines (22). When both the read line (18) and corresponding write line (20) go low this allows both communicating computers (12) to know that the communication is completed. An acknowledge line (72) goes high to restart the computers (12).
    Type: Grant
    Filed: February 16, 2006
    Date of Patent: March 8, 2011
    Assignee: VNS Portfolio LLC
    Inventor: Charles H. Moore
  • Patent number: 7904695
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A slot sequencer (42) in each of the computers produces a timing pulse to cause the computer (12) to execute a next instruction. However, when the present instruction is a read or write type instruction, the slot sequencer does not produce the pulse until an acknowledge signal (86) starts it. The acknowledge signal (86) is produced when it is recognized that the communication has been completed by the other computer (12).
    Type: Grant
    Filed: February 16, 2006
    Date of Patent: March 8, 2011
    Assignee: VNS Portfolio LLC
    Inventor: Charles H. Moore