Multimode (e.g., Mimd To Simd, Etc.) Patents (Class 712/20)
  • Publication number: 20130212356
    Abstract: A method comprises measuring the execution time T1 for a problem to be solved with a program being run by a single processor, measuring the execution time TM and TS of MIMD and SIMD program fragments being run by a single processor and a single accelerator correspondingly, determining the specific acceleration ? of the execution time for an SIMD program fragment being run by a single accelerator in comparison with the execution time for the fragment being run by a single processor, determining a portion of the execution time for an MIMD fragment being run by a single processor and a portion of the execution time for an SIMD fragment being run by a single processor and adjusting the quantity of processors or accelerators comprised in a hybrid computing system structure according to the data obtained.
    Type: Application
    Filed: October 13, 2011
    Publication date: August 15, 2013
    Applicant: FEDERAL STATE UNITARY ENTERPRISE
    Inventor: Sergey Alexandrovich Stepanenko
  • Patent number: 8493979
    Abstract: Executing a single instruction/multiple data (SIMD) instruction of a program to process a vector of data wherein each element of the packet vector corresponds to a different received packet.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: July 23, 2013
    Assignee: Intel Corporation
    Inventors: Bryan E. Veal, Travis T. Schluessler
  • Patent number: 8478965
    Abstract: Accelerator functions are cascaded, such that a result of one accelerator function is directly forwarded to another accelerator function, bypassing the processor requesting the functions to be performed. The cascading may be provided during compilation of a program specifying the functions to be performed, but can be dynamically reversed during runtime of the program.
    Type: Grant
    Filed: October 30, 2009
    Date of Patent: July 2, 2013
    Assignee: International Business Machines Corporation
    Inventors: Rajaram B. Krishnamurthy, Thomas A. Gregg
  • Publication number: 20130166877
    Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Inventors: Jack Hilaire CHOQUETTE, Michael FETTERMAN, Shirish GADRE, Xiaogang QIU, Omkar PARANJAPE, Anjana RAJENDRAN, Stewart Glenn CARLTON, Eric Lyell HILL, Rajeshwaran SELVANESAN, Douglas J. HAHN
  • Patent number: 8438404
    Abstract: The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs using program code embodied as a set of virtualized control threads. The arrangement also enables MPEs delegate functionality to one or more groups of SPEs such that those group(s) of SPEs will act as pseudo MPEs. The pseudo MPEs will utilize pseudo virtualized control threads to control the behavior of other groups of SPEs. In a typical embodiment, the apparatus includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: May 7, 2013
    Assignee: International Business Machines Corporation
    Inventors: Karl J. Duvalsaint, Harm P. Hofstee, Daeik Kim, Moon J. Kim
  • Publication number: 20130091340
    Abstract: A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
    Type: Application
    Filed: November 30, 2012
    Publication date: April 11, 2013
    Applicant: SOFT MACHINES, INC.
    Inventor: Soft Machines, Inc.
  • Patent number: 8358695
    Abstract: An apparatus is described for attaching a motion search hardware assist unit to a processing element and its local memory. A current macro block storage unit is attached to a local memory interface unit for storage of a copy of a current macro block from the local memory. A search window reference storage unit having N rows is attached to a local memory interface unit for storage of a copy of N rows of pixels from a search window from the local memory. N independent arithmetic pipelines are attached to the current macro block storage unit and the search window reference storage. Each pipeline operates on one of the N rows of the search window reference storage unit and a corresponding row of the current macro block of the current macro block storage unit. An accumulator is attached to the N independent pipelines to accumulate results from the N arithmetic pipelines, to produce independent results for different organizations of macro blocks.
    Type: Grant
    Filed: April 18, 2007
    Date of Patent: January 22, 2013
    Assignee: Altera Corporation
    Inventors: Mihailo M. Stojancic, Gerald George Pechanek
  • Patent number: 8255886
    Abstract: A method for analyzing and presenting in a graphical manner single instruction, multiple data (SIMD) instructions involves disassembling a stream of machine instructions into a stream of assembly language instructions. Instruction objects “M” and “N” are created to represent SIMD instructions “M” and “N” from the stream of instructions. Instruction objects “M” and “N” include multiple data objects corresponding to the multiple data items of the respective SIMD instruction. Different colors are assigned to at least two of the multiple data objects of instruction object “M.” If a data item of SIMD instruction “N” is based on a data item of SIMD instruction “M,” the color from the source object is automatically assigned to the target object. Dependencies between data items of instruction “M” and “N” are annotated by arrows between corresponding data objects. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: August 28, 2012
    Assignee: Intel Corporation
    Inventor: Peter Lachner
  • Publication number: 20120166762
    Abstract: Provided are a computing apparatus and method based on SIMD architecture capable of supporting various SIMD widths without wasting resources. The computing apparatus includes a plurality of configurable execution cores (CECs) that have a plurality of execution modes, and a controller for detecting a loop region from a program, determining a Single Instruction Multiple Data (SIMD) width for the detected loop region, and determining an execution mode of the processor according to the determined SIMD width.
    Type: Application
    Filed: July 8, 2011
    Publication date: June 28, 2012
    Inventors: Jae Un Park, Suk-Jin Kim, Scott Mahlke, Yong-Jun Park
  • Patent number: 8190856
    Abstract: A processor of SIMD/MIMD dual mode architecture comprises common controlled first processing elements, self-controlled second processing elements and a pipelined (ring) network connecting the first PEs and the second PEs sequentially. An access controller has access control lines, each access control line being connected to each PE of the first and second PEs to control data access timing between each PE and the network. Each PE can be self-controlled or common controlled, such as dual mode SIMD/MIMD architectures, reducing the wiring area requirement.
    Type: Grant
    Filed: March 6, 2007
    Date of Patent: May 29, 2012
    Assignee: NEC Corporation
    Inventors: Hanno Lieske, Shorin Kyo
  • Patent number: 8180998
    Abstract: A system for performing data-parallel operations and task-parallel operations. A first switch fabric node (SFN) includes first and second lane processing engines (LPEs). The first LPE includes a first set of lane processing units (LPUs) configured to perform data-parallel operations, where each LPU performs a set of operations, and each LPU uses a different set of data for the set of operations, and each LPU within the first set of LPUs uses a different set of data for the set of operations. The second LPE includes a second set of LPUs configured to perform task-parallel operations, where each LPU performs a different set of operations. A processing control engine (PCE) is configured to distribute instructions and data to the first LPE and the second LPE. Advantageously, data parallel operations and task parallel operations are able to be performed on the same processor simultaneously.
    Type: Grant
    Filed: September 10, 2008
    Date of Patent: May 15, 2012
    Assignee: NVIDIA Corporation
    Inventors: Monier Maher, Christopher Lamb, Sanjay J. Patel, Peter Hsu
  • Publication number: 20120054468
    Abstract: An apparatus and method that includes a single memory as a VLIW instruction cache and CGA configuration memory is provided. Data is provided from a storage unit to a processing core that is capable of processing data in a first mode and a second mode. If the processing core is processing in the first mode, first data is output. If the processing core is processing in the second mode, second data is output.
    Type: Application
    Filed: August 24, 2011
    Publication date: March 1, 2012
    Inventors: Bernhard EGGER, Dong-Hoon YOO
  • Publication number: 20120036336
    Abstract: The present invention provides an information processing apparatus and an integrated circuit which realize parallel execution of different processing systems, and which do not require the provision of a dedicated memory storing instructions for common processing The information processing apparatus comprises: a plurality of processor elements; an instruction memory storing a first program and a second program; and an arbiter interposed between the processor elements and the instruction memory, the arbiter receiving, from each of the processor elements, a request for an instruction, from among instructions included in the first program and the second program, and controlling access to the instruction memory by the processor elements, wherein the arbiter arbitrates requests made by the processor elements when the requests are (i) simultaneous requests for different instructions included in one of the first program and the second program or (ii) simultaneous requests for an instruction included in the first prog
    Type: Application
    Filed: April 15, 2010
    Publication date: February 9, 2012
    Inventor: Hideshi Nishida
  • Patent number: 8112613
    Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.
    Type: Grant
    Filed: February 18, 2011
    Date of Patent: February 7, 2012
    Assignee: NEC Corporation
    Inventor: Shorin Kyo
  • Patent number: 8103854
    Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.
    Type: Grant
    Filed: April 12, 2010
    Date of Patent: January 24, 2012
    Assignee: Altera Corporation
    Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
  • Patent number: 8074224
    Abstract: Embodiments of the present invention facilitate dynamically adapting to state information changes in a graphics processing environment. In one embodiment, a master register holds state information corresponding to units of work (threads) to be performed. The state information in the master register is copied to a per-group state register when a group of threads is to be launched. The per-group state register is coupled to processing engines configured to process the threads, so that the processing engines read state information from the per-group state register rather than the master register. In another embodiment, a number of master registers may be used to store state information for different types of threads.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: December 6, 2011
    Assignee: NVIDIA Corporation
    Inventors: Bryon S. Nordquist, Brett W. Coon
  • Publication number: 20110271076
    Abstract: An electronic device includes a processing component and a task manager. The processing component is configurable for one of a single-core processing mode and a multi-core processing mode. The task manager determines a number of tasks running on the electronic device. The processor is configured to one of the single-core processing mode and the multi-core processing mode as a function of the number of tasks.
    Type: Application
    Filed: April 28, 2010
    Publication date: November 3, 2011
    Inventors: Maarten Koning, Stephen Li
  • Patent number: 8027962
    Abstract: Techniques for asynchronous command processing within a parallel processing environment are provided. A command is raised or received within a parallel processing data warehousing environment. A job or a component of the job is dynamically monitored, controlled, or modified in response to the real-time processing of the command. The job is actively processing within the parallel processing data warehousing environment when the command is received and processed against the job or the component of the job.
    Type: Grant
    Filed: September 14, 2007
    Date of Patent: September 27, 2011
    Assignee: Teradata US, Inc.
    Inventors: Alex P Yung, Clovis Franklin Lofton
  • Patent number: 8020168
    Abstract: A NOC for dynamic virtual software pipelining including IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including: a computer software application segmented into stages, each stage comprising a flexibly configurable module of computer program instructions identified by a stage ID, each stage assigned to a thread of execution on an IP block; and each stage executing on a thread of execution on an IP block, including a first stage executing on an IP block, producing output data and sending by the first stage the produced output data to a second stage, the output data including control information for the next stage and payload data; and the second stage consuming the produced output data in dependence upon the control information.
    Type: Grant
    Filed: May 9, 2008
    Date of Patent: September 13, 2011
    Assignee: International Business Machines Corporation
    Inventors: Russell D. Hoover, Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
  • Patent number: 7979674
    Abstract: Executing MIMD programs on a SIMD machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing one or more SIMD partitions, booting one or more SIMD partitions in MIMD mode; establishing a MIMD partition; executing by launcher programs a plurality of MIMD programs on two or more of the compute nodes of the MIMD partition; and re-executing a launcher program by an operating system on a compute node in the MIMD partition upon termination of the MIMD program executed by the launcher program.
    Type: Grant
    Filed: May 16, 2007
    Date of Patent: July 12, 2011
    Assignee: International Business Machines Corporation
    Inventors: Todd A. Inglett, Patrick J. McCarthy, Amanda Peters, Thomas A. Budnik, Michael B. Mundy, Gordon G. Stewart
  • Patent number: 7962906
    Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.
    Type: Grant
    Filed: March 15, 2007
    Date of Patent: June 14, 2011
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
  • Publication number: 20110138151
    Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.
    Type: Application
    Filed: February 18, 2011
    Publication date: June 9, 2011
    Applicant: NEC CORPORATION
    Inventor: Shorin KYO
  • Patent number: 7945433
    Abstract: A system and method for design verification and, more particularly, a hardware simulation accelerator design and method that exploits a parallel structure of user models to support a large user model size. The method includes a computer including N number of logic evaluation units (LEUs) that share a common pool of instruction memory (IM). The computer infrastructure is operable to: partition a number of parallel operations in a netlist; and send a same instruction stream of the partitioned number of parallel operations to N number of LEUs from a single IM. The system is a hardware simulation accelerator having a computer infrastructure operable to provide a stream of instructions to multiple LEUs from a single IM. The multiple LEUs are clustered together with multiple IMs such that each LEU is configured to use instructions from any of the multiple IMs thereby allowing a same instruction stream to drive the multiple LEUs.
    Type: Grant
    Filed: April 30, 2007
    Date of Patent: May 17, 2011
    Assignee: International Business Machines Corporation
    Inventors: Daniel R. Crouse, II, Gernot E. Guenther, Viktor Gyuris, Harrell Hoffman, Kevin A. Pasnik, Thomas J. Tryt, John H. Westermann, Jr.
  • Patent number: 7916864
    Abstract: A graphics processing unit is programmed to carry out cryptographic processing so that fast, effective cryptographic processing solutions can be provided without incurring additional hardware costs. The graphics processing unit can efficiently carry out cryptographic processing because it has an architecture that is configured to handle a large number of parallel processes. The cryptographic processing carried out on the graphics processing unit can be further improved by configuring the graphics processing unit to be capable of both floating point and integer operations.
    Type: Grant
    Filed: February 8, 2006
    Date of Patent: March 29, 2011
    Assignee: NVIDIA Corporation
    Inventor: Norbert Juffa
  • Publication number: 20110047348
    Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.
    Type: Application
    Filed: November 2, 2010
    Publication date: February 24, 2011
    Applicant: NEC CORPORATION
    Inventor: Shorin KYO
  • Patent number: 7890735
    Abstract: A multi-threaded microprocessor (1105) for processing instructions in threads. The microprocessor (1105) includes first and second decode pipelines (1730.0, 1730.1), first and second execute pipelines (1740, 1750), and coupling circuitry (1916) operable in a first mode to couple first and second threads from the first and second decode pipelines (1730.0, 1730.1) to the first and second execute pipelines (1740, 1750) respectively, and the coupling circuitry (1916) operable in a second mode to couple the first thread to both the first and second execute pipelines (1740, 1750). Various processes of manufacture, articles of manufacture, processes and methods of operation, circuits, devices, and systems are disclosed.
    Type: Grant
    Filed: August 23, 2006
    Date of Patent: February 15, 2011
    Assignee: Texas Instruments Incorporated
    Inventor: Thang Tran
  • Patent number: 7882312
    Abstract: A state engine receives multiple requests from a parallel processor for a shared state. The state engine includes at least one state element and the at least one state element is adapted to operate, atomically, on the shared state in response to a request made by the parallel processor. The request includes at least a command directing the at least one state element on how to perform an operation on the shared state. The state engine also includes a memory connected to the at least one state element and configured to store the shared state.
    Type: Grant
    Filed: November 11, 2003
    Date of Patent: February 1, 2011
    Assignee: Rambus Inc.
    Inventor: Anthony Spencer
  • Patent number: 7853775
    Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.
    Type: Grant
    Filed: August 9, 2007
    Date of Patent: December 14, 2010
    Assignee: NEC Corporation
    Inventor: Shorin Kyo
  • Publication number: 20100293356
    Abstract: The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The exemplary IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.
    Type: Application
    Filed: May 24, 2010
    Publication date: November 18, 2010
    Applicant: QST HOLDINGS, LLC
    Inventors: Robert T. PLUNKETT, Ghobad HEIDARI, Paul L. MASTER
  • Patent number: 7831802
    Abstract: Executing Multiple Instructions Multiple Data (‘MIMD’) programs on a Single Instruction Multiple Data (‘SIMD’) machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing a SIMD partition comprising a plurality of the compute nodes; booting the SIMD partition in MIMD mode; executing by launcher programs a plurality of MIMD programs on compute nodes in the SIMD partition; and re-executing a launcher program by an operating system on a compute node in the SIMD partition upon termination of the MIMD program executed by the launcher program.
    Type: Grant
    Filed: July 19, 2007
    Date of Patent: November 9, 2010
    Assignee: International Business Machines Corporation
    Inventors: Thomas A. Budnik, Alan J. King, Patrick J. McCarthy, Michael B. Mundy, Amanda Peters, James C. Sexton, Gordon G. Stewart
  • Patent number: 7831803
    Abstract: Executing MIMD programs on a SIMD machine, including establishing on the SIMD machine a plurality of SIMD partitions; booting a first SIMD partition in MIMD mode; executing, on a compute node of the first SIMD partition booted in MIMD mode, a MIMD accelerator program; executing a SIMD program in a second SIMD partition, one instance of the SIMD program executing on each compute node of the second SIMD partition, each instance of the SIMD program carrying out a portion of the data processing effected by the SIMD program; and accelerating, by an instance of the SIMD program through the MIMD accelerator program, a portion of the data processing of the instance of the SIMD program.
    Type: Grant
    Filed: July 19, 2007
    Date of Patent: November 9, 2010
    Assignee: International Business Machines Corporation
    Inventors: Todd A. Inglet, Alan J. King, Patrick J. McCarthy, Amanda Peters, James C. Sexton
  • Patent number: 7831804
    Abstract: A processor architecture includes a number of processing elements for treating input signals. The architecture is organized according to a matrix including rows and columns, the columns of which each include at least one microprocessor block having a computational part and a set of associated processing elements that are able to receive the same input signals. The number of associated processing elements is selectively variable in the direction of the column so as to exploit the parallelism of said signals. Additionally the processor architecture of the present invention enable dynamic switching between instruction parallelism and data parallel processing typical of vectorial functionality. The architecture can be scaled in various dimensions in an optimal configuration for the algorithm to be executed.
    Type: Grant
    Filed: May 30, 2008
    Date of Patent: November 9, 2010
    Assignee: ST Microelectronics S.R.L.
    Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elio Guidetti
  • Publication number: 20100268912
    Abstract: Techniques for thread mapping in multi-core processors are disclosed. An example computing system is disclosed having a multi-core processor with a plurality of processor cores. A performance counter may be configured to collect data relating to the performance of the multi-core processor. A core controller may be configured to map threads of execution to the processor cores based at least in part on the data collected by the performance counter.
    Type: Application
    Filed: April 21, 2009
    Publication date: October 21, 2010
    Inventors: Thomas Martin Conte, Andrew Wolfe
  • Patent number: 7818539
    Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.
    Type: Grant
    Filed: August 28, 2006
    Date of Patent: October 19, 2010
    Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of Technology
    Inventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
  • Patent number: 7818541
    Abstract: A data processing architecture comprising: an input device for receiving an incoming stream of data packets; and a plurality of processing elements which are operable to process data received thereby; wherein the input device is operable to distribute data packets in whole or in part to the processing elements in dependence upon the data processing bandwidth of the processing elements.
    Type: Grant
    Filed: May 23, 2007
    Date of Patent: October 19, 2010
    Assignee: Clearspeed Technology Limited
    Inventors: John Rhoades, Ken Cameron, Paul Winser, Ray McConnell, Gordon Faulds, Simon McIntosh-Smith, Anthony Spencer, Jeff Bond, Matthias Dejaegher, Danny Halamish, Gajinder Panesar
  • Patent number: 7814462
    Abstract: In one embodiment, a process may be performed in parallel on a parallel server by defining a data type that may be used to reference data stored on the parallel server and overloading a previously-defined operation, such that when the overloaded operation is called, a command is sent to the parallel server to manipulate the data stored on the parallel server. In some embodiments, the previously-defined operation that is overloaded may be an operation of an operating system. Further, in some embodiments, when the data stored on the parallel server is no longer needed, a command may be sent to the parallel server to reallocate the memory used to store the data.
    Type: Grant
    Filed: August 31, 2005
    Date of Patent: October 12, 2010
    Assignees: Massachusetts Institute of Technology, The Regents of the University of California
    Inventors: Parry Jones Reginald Husbands, Long Yin Choy, Alan Edelman, Eckart Jansen, Viral B. Shah
  • Patent number: 7814295
    Abstract: Executing MIMD programs on a SIMD machine, including establishing SIMD partitions on the SIMD machine; booting SIMD partitions in MIMD mode; executing MIMD programs on the compute nodes of a first SIMD partition booted in MIMD mode; re-executing a launcher program by an operating system on a compute node in the first SIMD partition booted in MIMD mode upon termination of the MIMD program executed by the launcher program; determining by a scheduler that the first SIMD partition booted in MIMD mode is required to establish a new SIMD partition large enough to run a SIMD program that is scheduled for execution; moving by the scheduler data processing operations from the first SIMD partition booted in MIMD mode to the second SIMD partition booted in MIMD mode; and establishing by the scheduler the new SIMD partition.
    Type: Grant
    Filed: May 18, 2007
    Date of Patent: October 12, 2010
    Assignee: International Business Machines Corporation
    Inventors: Todd A. Inglett, Patrick J. McCarthy, Amanda Peters
  • Patent number: 7814296
    Abstract: Provided is a data processing circuit. A control unit outputs an operation control signal and a memory control signal. A plurality of program memories each outputs a command in response to the memory control signal. A plurality of arithmetic sections each selectively performs any one of the commands from the plurality of program memories in response to the operation control signal. Operation modes of the data processing circuit can be flexibly changed according to operation environments.
    Type: Grant
    Filed: September 5, 2008
    Date of Patent: October 12, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Chun-Gi Lyuh, Jung-Hee Suk, Ik-Jae Chun, Se-Wan Heo, Tae-Moon Roh, Jong-Dae Kim
  • Patent number: 7809925
    Abstract: A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units.
    Type: Grant
    Filed: December 7, 2007
    Date of Patent: October 5, 2010
    Assignee: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
  • Patent number: 7778396
    Abstract: A telephone line status notification system including at least one telephone line having a status, a communications network, at least one communications terminal which is connectable to the communications network and which is employable by a seeking user to communicate via the communications network a status request concerning the status of the at least one telephone line, apparatus for processing the status request the apparatus for processing is connectable to the communications network for receiving the status request from the seeking user therethrough and communicating the request, and apparatus for acquiring the status of the at least one telephone line, the apparatus for acquiring is in communication with the apparatus for processing for receiving the status request therefrom, and the apparatus for acquiring is connectable to the communications network for communicating the status via the communications network.
    Type: Grant
    Filed: May 9, 2006
    Date of Patent: August 17, 2010
    Assignee: AOL Advertising Inc.
    Inventors: Joseph Vardi, Arie Vardi, Joseph Vigiser, Yair Goldfinger
  • Patent number: 7779180
    Abstract: A data processing module includes: a data converter having a TranslateData interface for receiving input data and sending output data, a Property interface for sending and receiving parameter data composed of a character string parameter for Property control, and Open/Close interface for initializing the environment and the state, a query interface for obtaining entries of the internal interfaces of the Open/Close interface, the TranslateData interface, and the Property interface, an API interface for dynamically obtaining by the query interface the four kinds of interfaces of the Open, Close, Property, and TranslateData, and a callback interface designated by the Property interface.
    Type: Grant
    Filed: April 26, 2006
    Date of Patent: August 17, 2010
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masayuki Hagiwara, Hirotomo Kobayashi
  • Patent number: 7752419
    Abstract: The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The exemplary IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.
    Type: Grant
    Filed: December 12, 2001
    Date of Patent: July 6, 2010
    Assignee: QST Holdings, LLC
    Inventors: Robert T. Plunkett, Ghobad Heidari, Paul L. Master
  • Patent number: 7730280
    Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.
    Type: Grant
    Filed: April 18, 2007
    Date of Patent: June 1, 2010
    Assignee: Vicore Technologies, Inc.
    Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
  • Patent number: 7730463
    Abstract: A computer implemented method, system and computer program product for automatically generating SIMD code. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized and combined with the non-simdized code.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: June 1, 2010
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
  • Publication number: 20100088489
    Abstract: A processor of SIMD/MIMD dual mode architecture comprises common controlled first processing elements, self-controlled second processing elements and a pipelined (ring) network connecting the first PEs and the second PEs sequentially. An access controller has access control lines, each access control line being connected to each PE of the first and second PEs to control data access timing between each PE and the network. Each PE can be self-controlled or common controlled, such as dual mode SIMD/MIMD architectures, reducing the wiring area requirement.
    Type: Application
    Filed: March 6, 2007
    Publication date: April 8, 2010
    Inventors: Hanno Lieske, Shorin Kyo
  • Patent number: 7657784
    Abstract: A self-reparable semiconductor comprises M functional units each including N sub-functional units. Corresponding ones of the N sub-functional units in each of the M functional units perform the same function. At least two of the N sub-functional units in one of the M functional units perform different functions. A first spare functional unit includes X sub-functional units, wherein X is greater than or equal to one and less than or equal to N and wherein the X sub-functional units of. the first spare functional unit are functionally interchangeable with corresponding sub-functional units of the M functional units and wherein the X sub-functional units are provided for the at least two of the N sub-functional units. A plurality of switching devices replace at least one of the N sub-functional units with at least one of the X sub-functional units when the at least one of the N sub-functional units is non-operable.
    Type: Grant
    Filed: November 8, 2006
    Date of Patent: February 2, 2010
    Assignee: Marvell World Trade Ltd.
    Inventors: Sehat Sutardja, Pantas Sutardja
  • Patent number: 7640155
    Abstract: A target interface system for interfacing selected components of a communication system and methods for manufacturing and using same. The target interface system includes target interface logic that is distributed among a plurality of reconfigurable logic devices. Being coupled via a serial link, the reconfigurable logic devices each have an input connection for receiving incoming data packets and an output connection for providing outgoing data packets. The serial link couples the input and output connections of successive reconfigurable logic devices to form a dataring structure for distributing the data packets among the reconfigurable logic devices. Thereby, the dataring structure maintains data synchronization among the reconfigurable logic devices such that the distribution of the target interface logic among the reconfigurable logic devices is transparent to software.
    Type: Grant
    Filed: May 31, 2005
    Date of Patent: December 29, 2009
    Assignee: QuickTurn Design Systems, Inc.
    Inventors: Mitchell G. Poplack, John A. Maher
  • Patent number: 7634637
    Abstract: In a processor, a SIMD group (a group of threads for which instructions are issued in parallel using single instruction, multiple data instruction issue techniques) is logically divided into two or more “SIMD subsets,” each containing one or more of the threads in the SIMD group. Each SIMD subset is associated with a different instance of a variable state parameter. The processor determines which of the instructions to be executed for the SIMD group rely on the state variable and serializes execution of such instructions so that the instruction is executed separately for each SIMD subset. Instructions that do not rely on the state variable are advantageously not serialized.
    Type: Grant
    Filed: December 16, 2005
    Date of Patent: December 15, 2009
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Stuart F. Oberman
  • Patent number: 7515899
    Abstract: Additional computing power is captured using the idle processing power of mobile phones incorporated into a grid computing system, wherein the system is capable of pushing projects out to available mobile phones for processing during idle operation times. To further efficiently utilize the unused processing cycles of mobile phones, a unique protocol is utilized to coordinate processing tasks which makes use of existing short messages techniques to communicate projects. The unique protocol is combination of bootstrapping using standard compression techniques along with an adaptive compression scheme.
    Type: Grant
    Filed: April 23, 2008
    Date of Patent: April 7, 2009
    Assignee: International Business Machines Corporation
    Inventors: Hollie Carr, Peter Mattison, Christopher E. Sharp
  • Publication number: 20090049275
    Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.
    Type: Application
    Filed: August 9, 2007
    Publication date: February 19, 2009
    Applicant: NEC CORPORATION
    Inventor: Shorin Kyo