Single Instruction, Multiple Data (simd) Patents (Class 712/22)

Floating point only SIMD instruction set architecture including compare, select, Boolean, and alignment operations

Patent number: 7900025

Abstract: Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.

Type: Grant

Filed: October 14, 2008

Date of Patent: March 1, 2011

Assignee: International Business Machines Corporation

Inventor: Michael K. Gschwind
PROCESSOR AND PROCESSOR CONTROL METHOD

Publication number: 20110047349

Abstract: A processor includes a plurality of subfunctional units provided corresponding to respective slots of one or more pieces of operation result data including a plurality of slots for an SIMD operation; and an enable generating unit configured to, in each of the one or more pieces of the operation result data, compare a value of a predetermined slot with a value of a slot other than the predetermined slot, and disable one or more subfunctional units to which the value equal to the value of the predetermined slot is inputted, and the processor outputs the value of the predetermined slot as the value of the one or more subfunctional units which have been disabled.

Type: Application

Filed: March 12, 2010

Publication date: February 24, 2011

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor: Hiroo HAYASHI
Programmable arrayed processing engine architecture for a network switch

Patent number: 7895412

Abstract: A programmable processing engine processes transient data within an intermediate network station of a computer network. The engine comprises an array of processing elements symmetrically arrayed as rows and columns, and embedded between input and output buffer units with a plurality of interfaces from the array to an external memory. The external memory stores non-transient data organized within data structures, such as forwarding and routing tables, for use in processing the transient data. Each processing element contains an instruction memory that allows programming of the array to process the transient data as processing element stages of baseline or extended pipelines operating in parallel.

Type: Grant

Filed: June 27, 2002

Date of Patent: February 22, 2011

Assignee: Cisco Tehnology, Inc.

Inventors: Darren Kerr, Kenneth Michael Key, Michael L. Wright, William E. Jennings
SIMD PARALLEL COMPUTER SYSTEM, SIMD PARALLEL COMPUTING METHOD, AND CONTROL PROGRAM

Publication number: 20110040952

Abstract: Uniforming of the processing load is efficiently realized. Each processing element configuring an SIMD parallel computer system includes a data storage module that stores data processed or transferred, a number-of-data-sets storage device that stores number of data sets, and a front data storage device that stores the front data. Each processing element further includes a control processor that compares the number of data sets stored in one processing element with the number of data sets stored in the own processing element, and issues a data distribution leveling instruction that designates an action for updating contents of the data storage module, the number-of-data-sets storage device, and the front data storage device according to a rule determined based on a comparison result of the own processing element and that of the other processing elements and an action for moving the data stored in the one processing element to the own processing element.

Type: Application

Filed: April 8, 2009

Publication date: February 17, 2011

Applicant: NEC CORPORATION

Inventor: Shorin Kyo
Processor memory system

Patent number: 7890733

Abstract: A data processor comprises a plurality of processing elements (PEs), with memory local to at least one of the processing elements, and a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid, e.g., in a SIMD array, so as to connect the PEs and their local memories to a common controller. Transaction-enabled PEs and nodes set flags, which are maintained until the transaction is completed and signal status to the controller e.g., over a series of OR-gates. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. External memory may also be connected to the “end” nodes interfacing with the network, eg to provide cache.

Type: Grant

Filed: August 11, 2005

Date of Patent: February 15, 2011

Assignee: Rambus Inc.

Inventor: Ray McConnell
Method and System for Decoding Low Density Parity Check Codes

Publication number: 20110029756

Abstract: A method for decoding a codeword in a data stream encoded according to a low density parity check (LDPC) code having an m×j parity check matrix H by initializing variable nodes with soft values based on symbols in the codeword, wherein a graph representation of H includes m check nodes and j variable nodes, and wherein a check node m provides a row value estimate to a variable node j and a variable node j provides a column value estimate to a check node m if H(m,j) contains a 1, computing row value estimates for each check node, wherein amplitudes of only a subset of column value estimates provided to the check node are computed, computing soft values for each variable node based on the computed row value estimates, determining whether the codeword is decoded based on the soft values, and terminating decoding when the codeword is decoded.

Type: Application

Filed: July 28, 2009

Publication date: February 3, 2011

Inventors: Eric Biscondi, David Hoyle, Tod David Wolf
Method and apparatus for a double width load using a single width load port

Patent number: 7882325

Abstract: A single micro-instruction to perform either an N-bit or a 2N-bit load is provided. A microprocessor having an N-bit load port performs either an N-bit load or a 2N-bit load in a single cycle with the same micro-instruction being used for both the N-bit and the 2N-bit load.

Type: Grant

Filed: December 21, 2007

Date of Patent: February 1, 2011

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Ehud Cohen, Doron Orenstien, Benny Eitan
State engine for data processor

Patent number: 7882312

Abstract: A state engine receives multiple requests from a parallel processor for a shared state. The state engine includes at least one state element and the at least one state element is adapted to operate, atomically, on the shared state in response to a request made by the parallel processor. The request includes at least a command directing the at least one state element on how to perform an operation on the shared state. The state engine also includes a memory connected to the at least one state element and configured to store the shared state.

Type: Grant

Filed: November 11, 2003

Date of Patent: February 1, 2011

Assignee: Rambus Inc.

Inventor: Anthony Spencer
Architectural enhancements to CPU microcode load mechanism using inter processor interrupt messages

Patent number: 7882333

Abstract: A method for loading microcode to a plurality of cores within a processor. The method includes loading the microcode to a first core of the plurality of cores within the processor system and generating a broadcast inter process interrupt (IPI) message via the first core. The IPI message causes other cores within the processor system to synchronize respective microcode with the microcode that is loaded into the first core. The synchronizing loads microcode to the plurality of cores without requiring independent loads of microcode to each core.

Type: Grant

Filed: November 5, 2007

Date of Patent: February 1, 2011

Assignee: Dell Products L.P.

Inventor: Mukund Khatri
Structured programming control flow in a SIMD architecture

Patent number: 7877585

Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

Type: Grant

Filed: August 27, 2007

Date of Patent: January 25, 2011

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov
Method and system for efficient matrix multiplication in a SIMD processor architecture

Patent number: 7873812

Abstract: The new system provides for efficient implementation of matrix multiplication in a SIMD processor. The new system provides ability to map any element of a source vector register to be paired with any element of a second source vector register for vector operations, and specifically vector multiply and vector-multiply-accumulate operations to implement a variety of matrix multiplications without the additional permute or data re-ordering instructions. Operations such as DCT and Color-space transformations for video processing could be very efficiently implemented using this system.

Type: Grant

Filed: April 5, 2004

Date of Patent: January 18, 2011

Inventor: Tibet Mimar
Mechanism that provides efficient multi-word load atomicity

Patent number: 7873794

Abstract: Disclosed is an apparatus, method, and program product that provides atomic, multi-word load support without incurring additional memory utilization. A double-word is atomically loaded without the use of one or more additional fields and without a lock. An invalidity marker is used in connection with a cache miss time to ascertain whether a loaded double-word has been stored and loaded atomically, and is thus, valid.

Type: Grant

Filed: August 21, 2007

Date of Patent: January 18, 2011

Assignee: International Business Machines Corporation

Inventors: Michael Joseph Corrigan, Timothy Joseph Torzewski
SIMD PROCESSOR ARRAY SYSTEM AND DATA TRANSFER METHOD THEREOF

Publication number: 20110010524

Abstract: There is provided an SIMD processor array system in which data can be efficiently transferred between processor elements located at different distances. The SIMD processor array system includes a control processor (CP) that is capable of issuing a plurality of instructions at the same time, and a PE array that includes a plurality of mutually-connected processing elements (PEs) to be controlled by the CP. The CP issues an inter-PE data shift instruction to each PE. According to the inter-PE data shift instruction, each PE performs a data sending operation of copying all the contents of a transfer data storing part of an adjoining PE to a transfer data storing part (MBF) of the own PE, and a data fetch operation of copying part or all of the contents of the MBF of the adjoining PE to a transfer data fetch and storing part (RBUF) of the own PE if part of the contents the MBF of the adjoining PE coincide with the contents of an ID storing part (IDB) of the own PE.

Type: Application

Filed: March 4, 2009

Publication date: January 13, 2011

Applicant: NEC CORPORATION

Inventor: Shorin Kyo
Pipe scheduling for pipelines based on destination register number

Publication number: 20110004743

Abstract: A data processing apparatus 1 has a plurality of registers 10 of the same type of register and a plurality of processing pipelines 40, 50, each processing pipeline 40, 50 being arranged to process instructions. At least one instruction includes a destination register specifier specifying which of said registers is a destination register for storing a processing result of the at least one instruction. Instruction issuing circuitry 26 is configured to issue the at least one instruction for processing by one of the plurality of processing pipelines. The instruction issuing circuitry 26 selects the one of the plurality of processing pipelines to which the candidate instruction is issued in dependence upon the value of the destination register specifier of the candidate instruction.

Type: Application

Filed: July 1, 2009

Publication date: January 6, 2011

Applicant: ARM Limited

Inventor: David Raymond Lutz
UNPACKING PACKED DATA IN MULTIPLE LANES

Publication number: 20100332794

Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

Type: Application

Filed: June 30, 2009

Publication date: December 30, 2010

Inventors: Asaf Hargil, Doron Orenstein
Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior

Patent number: 7861060

Abstract: Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.

Type: Grant

Filed: December 15, 2005

Date of Patent: December 28, 2010

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Stephen D. Lew
Conditional branch instruction capable of testing a plurality of indicators in a predicate register

Patent number: 7861071

Abstract: A method of conditionally executing branch instructions which comprise an opcode field defining a type of test to be applied to determine whether or not to execute a branch operation, a control field designating a control store holding a plurality of indicators and a destination field holding information on a branch target address. The method comprises determining from the opcode field whether or not the test will check the state of one indicator or a plurality of indicators in the designated control store, accessing the designated control store to check the state of said one or said plurality of indicators depending on the determination, and generating a branch target address using information in the destination field in dependence on the state of the or each indicator checked.

Type: Grant

Filed: May 30, 2002

Date of Patent: December 28, 2010

Assignee: Broadcom Corporation

Inventor: Sophie Wilson
Data processing architectures for packet handling wherein batches of data packets of unpredictable size are distributed across processing elements arranged in a SIMD array operable to process different respective packet protocols at once while executing a single common instruction stream

Patent number: 7856543

Abstract: A data processing architecture comprising: an input device for receiving an incoming stream of data packets; and a plurality of processing elements which are operable to process data received thereby; wherein the input device is operable to distribute data packets in whole or in part to the processing elements in dependence upon the data processing bandwidth of the processing elements.

Type: Grant

Filed: February 14, 2002

Date of Patent: December 21, 2010

Assignee: Rambus Inc.

Inventors: John Rhoades, Ken Cameron, Paul Winser, Ray McConnell, Gordon Faulds, Simon McIntosh-Smith, Anthony Spencer, Jeff Bond, Matthias Dejaegher, Danny Halamish, Gajinder Panesar
PROCESSOR AND INFORMATION PROCESSING SYSTEM

Publication number: 20100318766

Abstract: A processor includes a processing unit capable of executing single-instruction multiple-data operations; a register file configured to store data that is to be supplied to the processing unit and to be subjected to operations, and a buffer provided separately from the register file, the buffer being a buffer where an integer “n” number of data columns each having a plurality of data elements are written on a column-by-column basis, and data elements at the same location are selected and read as “n” data elements from the respective “n” data columns, wherein the “n” data elements read from the buffer is supplied to the processing unit as data to be subjected to a single-instruction multiple-data operation.

Type: Application

Filed: June 7, 2010

Publication date: December 16, 2010

Applicant: FUJITSU SEMICONDUCTOR LIMITED

Inventor: Masayuki TSUJI
Load/move and duplicate instructions for a processor

Patent number: 7853778

Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.

Type: Grant

Filed: December 20, 2001

Date of Patent: December 14, 2010

Assignee: Intel Corporation

Inventor: Patrice Roussel
Register renaming of a partially updated data granule

Publication number: 20100312989

Abstract: A processor 2 supporting register renaming has a rename table 20 in which the flag register has multiple tag values associated therewith. These tag values indicate which virtual register corresponds to a destination flag register of the oldest instruction which wrote a still up-to-date value of a subset of the flags.

Type: Application

Filed: June 4, 2009

Publication date: December 9, 2010

Inventor: James Nolan Hardage
USE OF VECTORIZATION INSTRUCTION SETS

Publication number: 20100293534

Abstract: In one embodiment, the invention is a method and apparatus for use of vectorization instruction sets. One embodiment of a method for generating vector instructions includes receiving source code written in a high-level programming language, wherein the source code includes at least one high-level instruction that performs multiple operations on a plurality of vector operands, and compiling the high-level instruction(s) into one or more low-level instructions, wherein the low-level instructions are in an instruction set of a specific computer architecture.

Type: Application

Filed: May 15, 2009

Publication date: November 18, 2010

Inventors: HENRIQUE ANDRADE, Bugra Gedik, Hua Yong Wang, Kun-Lung Wu
Multidimensional processor architecture

Patent number: 7831804

Abstract: A processor architecture includes a number of processing elements for treating input signals. The architecture is organized according to a matrix including rows and columns, the columns of which each include at least one microprocessor block having a computational part and a set of associated processing elements that are able to receive the same input signals. The number of associated processing elements is selectively variable in the direction of the column so as to exploit the parallelism of said signals. Additionally the processor architecture of the present invention enable dynamic switching between instruction parallelism and data parallel processing typical of vectorial functionality. The architecture can be scaled in various dimensions in an optimal configuration for the algorithm to be executed.

Type: Grant

Filed: May 30, 2008

Date of Patent: November 9, 2010

Assignee: ST Microelectronics S.R.L.

Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elio Guidetti
Launching A Secure Kernel In A Multiprocessor System

Publication number: 20100281255

Abstract: In one embodiment of the present invention, a method includes verifying a master processor of a system; validating a trusted agent with the master processor if the master processor is verified; and launching the trusted agent on a plurality of processors of the system if the trusted agent is validated. After execution of such a trusted agent, a secure kernel may then be launched, in certain embodiments. The system may be a multiprocessor server system having a partially or fully connected topology with arbitrary point-to-point interconnects, for example.

Type: Application

Filed: June 29, 2010

Publication date: November 4, 2010

Inventors: John H. Wilson, Ioannis T. Schoinas, Mazin S. Yousif, Linda J. Rankin, David W. Grawrock, Robert J. Greiner, James A. Sutton, Kushagra Vaid, Willard M. Wiseman
ACCELERATING TRACEBACK ON A SIGNAL PROCESSOR

Publication number: 20100274989

Abstract: A method executed by an instruction set on a processor is described. The method includes providing a tbbit instruction, inputting a first index for the tbbit instruction, loading a second value for the tbbit instruction, wherein the second value comprises at least 2b bits, using selected b bits of the first index to select at least one target bit in the loaded second value, shifting the target bit into the bottom of the first index, and computing a second index based on the shifting of the target bit into the bottom of the first index. Other methods and variations are also described.

Type: Application

Filed: December 8, 2008

Publication date: October 28, 2010

Inventors: Mayan Moudgill, Sitij Agrawal
Apparatus and Method for Performing SIMD Multiply-Accumulate Operations

Publication number: 20100274990

Abstract: An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry.

Type: Application

Filed: September 17, 2009

Publication date: October 28, 2010

Inventors: Mladen Wilder, Dominic Hugo Symes, Richard Edward Bruce
Vector processing system

Patent number: 7818540

Abstract: A vector processing system for executing vector instructions, each instruction defining multiple value pairs, an operation to be executed and a modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and, when selected, to implement an operation on said value pair to generate a result, each processing unit comprising at least one flag and being selectable in dependence on a condition defined by said at least one flag, wherein the modifier defines the condition under which the parallel processing unit is individually selected.

Type: Grant

Filed: May 19, 2006

Date of Patent: October 19, 2010

Assignee: Broadcom Corporation

Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
System and method for performing efficient conditional vector operations for data parallel architectures involving both input and conditional vector values

Patent number: 7818539

Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.

Type: Grant

Filed: August 28, 2006

Date of Patent: October 19, 2010

Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of Technology

Inventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
Data processing architectures

Patent number: 7818541

Abstract: A data processing architecture comprising: an input device for receiving an incoming stream of data packets; and a plurality of processing elements which are operable to process data received thereby; wherein the input device is operable to distribute data packets in whole or in part to the processing elements in dependence upon the data processing bandwidth of the processing elements.

Type: Grant

Filed: May 23, 2007

Date of Patent: October 19, 2010

Assignee: Clearspeed Technology Limited

Inventors: John Rhoades, Ken Cameron, Paul Winser, Ray McConnell, Gordon Faulds, Simon McIntosh-Smith, Anthony Spencer, Jeff Bond, Matthias Dejaegher, Danny Halamish, Gajinder Panesar
Method and software for group data operations

Patent number: 7818548

Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, and (c) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways.

Type: Grant

Filed: July 27, 2007

Date of Patent: October 19, 2010

Assignee: Microunity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Algebraic single instruction multiple data processing

Patent number: 7814297

Abstract: A data processing apparatus comprises data processing logic operable to perform data processing operations specified by program instructions. The data processing logic (140) has a plurality of functional units (142, 144, 146) configured to execute in parallel on data received from a data source. A decoder (130) is responsive to a single program instruction to control the data processing logic (140) to concurrently execute the single program instruction on each of a plurality of vector elements of each of a respective plurality of vector input operands (310, 320) received from the data source using the plurality of functional units (142, 144, 146).

Type: Grant

Filed: July 26, 2005

Date of Patent: October 12, 2010

Assignee: ARM Limited

Inventor: Martinus Cornelis Wezelenburg
Addressing Device for Parallel Processor

Publication number: 20100250897

Abstract: The invention relates to a parallel processor which comprises elementary processors (3) disposed according to a topology with a predetermined position within this topology and capable of simultaneously executing the same instruction on different data, the instruction relating to at least one operand and/or providing at least one result. The instruction comprises, for each operand and/or each result, information relating to the position of a field of action within a data structure of the table of dimension M type and the parallel processor comprises means (41, 42, 43) for calculating the address of each operand and/or each result within each elementary processor, as a function of the position of the field of action and of the position of the elementary processor within the topology.

Type: Application

Filed: June 26, 2008

Publication date: September 30, 2010

Applicant: Thales

Inventor: Gérard Gaillat
Method and system for local memory addressing in single instruction, multiple data computer system

Patent number: 7805561

Abstract: A single instruction, multiple data (“SIMD”) computer system includes a central control unit coupled to 256 processing elements (“PEs”) and to 32 static random access memory (“SRAM”) devices. Each group of eight PEs can access respective groups of eight columns in a respective SRAM device. Each PE includes a local column address register that can be loaded through a data bus of the respective PE. A local column address stored in the local column address register is applied to an AND gate, which selects either the local column address or a column address applied to the AND gate by the central control unit. As a result, the central control unit can globally access the SRAM device, or a specific one of the eight columns that can be accessed by each PE can be selected locally by the PE.

Type: Grant

Filed: January 16, 2009

Date of Patent: September 28, 2010

Assignee: Micron Technology, Inc.

Inventor: Jon Skull
PROCESSING ARRAY DATA ON SIMD MULTI-CORE PROCESSOR ARCHITECTURES

Publication number: 20100241824

Abstract: Techniques are disclosed for converting data into a format tailored for efficient multidimensional fast Fourier transforms (FFTS) on single instruction, multiple data (SIMD) multi-core processor architectures. The technique includes converting data from a multidimensional array stored in a conventional row-major order into SIMD format. Converted data in SIMD format consists of a sequence of blocks, where each block interleaves s rows such that SIMD vector processors may operate on s rows simultaneously. As a result, the converted data in SIMD format enables smaller-sized 1D FFTs to be optimized in SIMD multi-core processor architectures.

Type: Application

Filed: March 18, 2009

Publication date: September 23, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David G. Carlson, Travis M. Drucker, Timothy J. Mullins, Jeffrey S. McAllister, Nelson Ramirez
Parallel data processing apparatus

Patent number: 7802079

Abstract: A parallel data processing apparatus using a SIMD array of processing elements is disclosed. The apparatus makes use of a register in order to control issuance of instructions to the processing elements in the array.

Type: Grant

Filed: June 29, 2007

Date of Patent: September 21, 2010

Assignee: Clearspeed Technology Limited

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Trace optimization via fusing operations of a target architecture operation set

Patent number: 7797517

Abstract: Reference architecture instructions are translated into target architecture operations. Sequences of operations, in a predicted execution order in some embodiments, form traces. In some embodiments, a trace is based on a plurality of basic blocks. In some embodiments, a trace is committed or aborted as a single entity. Sequences of operations are optimized by fusing collections of operations; fused operations specify a same observable function as respective collections, but advantageously enable more efficient processing. In some embodiments, a collection comprises multiple register operations. Fusing a register operation with a branch operation in a trace forms a fused reg-op/branch operation. In some embodiments, branch instructions translate into assert operations. Fusing an assert operation with another operation forms a fused assert operation. In some embodiments, fused operations only set architectural state, such as high-order portions of registers, that is subsequently read before being written.

Type: Grant

Filed: November 17, 2006

Date of Patent: September 14, 2010

Assignee: Oracle America, Inc.

Inventor: John Gregory Favor
Synchronization of threads in a cooperative thread array

Patent number: 7788468

Abstract: A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Different threads of the CTA are advantageously synchronized at appropriate points during CTA execution using a barrier synchronization technique in which barrier instructions in the CTA program are detected and used to suspend execution of some threads until a specified number of other threads also reaches the barrier point.

Type: Grant

Filed: December 15, 2005

Date of Patent: August 31, 2010

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Stephen D. Lew, Brett W. Coon, Peter C. Mills
Data processor and methods thereof

Patent number: 7788471

Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.

Type: Grant

Filed: September 18, 2006

Date of Patent: August 31, 2010

Assignee: Freescale Semiconductor, Inc.

Inventor: Chengke Sheng
Method and apparatus for an inductive doubling architecture

Patent number: 7783862

Abstract: One embodiment of the present invention is a processor that processes inductive doubling SIMD instructions, which processor includes: an Instruction Fetch Unit that loads a SIMD instruction and applies it as input to a SIMD Instruction Decode Unit; wherein the SIMD Instruction Decode Unit decodes the applied SIMD instruction and produces output signals including SIMD field width identification signals and one or more SIMD half-operand modifier signals.

Type: Grant

Filed: August 6, 2007

Date of Patent: August 24, 2010

Assignee: International Characters, Inc.

Inventor: Robert D. Cameron
MICROPROCESSOR AND MEMORY-ACCESS CONTROL METHOD

Publication number: 20100211758

Abstract: A microprocessor that can perform sequential processing in data array unit includes: a load store unit that loads, when a fetched instruction is a load instruction for data, a data sequence including designated data from a data memory in memory width unit and specifies, based on an analysis result of the instruction, data scheduled to be designated in a load instruction in future; and a data temporary storage unit that stores use-scheduled data as the data specified by the load store unit.

Type: Application

Filed: December 29, 2009

Publication date: August 19, 2010

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masato Sumiyoshi, Takashi Miyamori, Shunichi Ishiwata, Katsuyuki Kimura, Takahisa Wada, Keiri Nakanishi, Yasuki Tanabe, Ryuji Hada
Scalable VLIW Processor For High-Speed Viterbi and Trellis Coded Modulation Decoding

Publication number: 20100211858

Abstract: An application specific processor to implement a Viterbi decode algorithm for channel decoding functions of received symbols. The Viterbi decode algorithm is at least one of a Bit Serial decode algorithm, and block based decode algorithm. The application specific processor includes a Load-Store, Logical and De-puncturing (LLD) slot that performs a Load-Store function, a Logical function, a De-puncturing function, and a Trace-back Address generation function, a Branch Metric Compute (BMU) slot that performs a Radix-2 branch metric computations, a Radix-4 branch metric computations, and Squared Euclidean Branch Metric computations, and an Add-Compare-Select (ACS) slot that performs a Radix-2 Path metric computations, a Radix-4 Path metric computations, a best state computations, and a decision bit generation. The LLD slot, the BMU slot and the ACS slot perform in a software pipelined manner to enable high speed Viterbi decoding functions.

Type: Application

Filed: February 18, 2010

Publication date: August 19, 2010

Applicant: SAANKHYA LABS PVT LTD

Inventors: Anindya Saha, Hemant Mallapur, Santhosh Billava, Smitha Bmv
System and method for simulating data flow using dataflow computing system

Patent number: 7774189

Abstract: A system and method for implementing a unified model for integration systems is presented. A user provides inputs to an integrated language engine for placing operator components and arc components onto a dataflow diagram. Operator components include data ports for expressing data flow, and also include meta-ports for expressing control flow. Arc components connect operator components together for data and control information to flow between the operator components. The dataflow diagram is a directed acyclic graph that expresses an application without including artificial boundaries during the application design process. Once the integrated language engine generates the dataflow diagram, the integrated language engine compiles the dataflow diagram to generated application code.

Type: Grant

Filed: December 1, 2006

Date of Patent: August 10, 2010

Assignee: International Business Machines Corporation

Inventors: Amir Bar-Or, Michael James Beckerle
Launching a secure kernel in a multiprocessor system

Patent number: 7774600

Abstract: In one embodiment of the present invention, a method includes verifying an initiating logical processor of a system; validating a trusted agent with the initiating logical processor if the initiating logical processor is verified; and launching the trusted agent on a plurality of processors of the system if the trusted agent is validated. After execution of such a trusted agent, a secure kernel may then be launched, in certain embodiments. The system may be a multiprocessor server system having a partially or fully connected topology with arbitrary point-to-point interconnects, for example.

Type: Grant

Filed: December 27, 2007

Date of Patent: August 10, 2010

Assignee: Intel Corporation

Inventors: John H. Wilson, Ioannis T. Schoinas, Mazin S. Yousif, Linda J. Rankin, David W. Grawrock, Robert J. Greiner, James A. Sutton, Kushagra Vaid, Willard M. Wiseman
Parallel operation device allowing efficient parallel operational processing

Patent number: 7769980

Abstract: In arithmetic/logic units (ALU) provided corresponding to entries, an MIMD instruction decoder generating a group of control signals in accordance with a Multiple Instruction-Multiple Data (MIMD) instruction and an MIMD register storing data designating the MIMD instruction are provided, and an inter-ALU communication circuit is provided. The amount and direction of movement of the inter-ALU communication circuit are set by data bits stored in a movement data register. It is possible to execute data movement and arithmetic/logic operation with the amount of movement and operation instruction set individually for each ALU unit. Therefore, in a Single Instruction-Multiple Data type processing device, Multiple Instruction-Multiple Data operation can be executed at high speed in a flexible manner.

Type: Grant

Filed: August 16, 2007

Date of Patent: August 3, 2010

Assignee: Renesas Technology Corp.

Inventors: Toshinori Sueyoshi, Masahiro Iida, Mitsutaka Nakano, Fumiaki Senoue, Katsuya Mizumoto
Processor for processing data using access addresses linked to the data type of the processed data

Patent number: 7769989

Abstract: A processor architecture, for example, a SIMD processor architecture, includes at least two arithmetic/logic units to implement data processing, a data memory arrangement or a memory device interface to a memory arrangement to store data of different data types, an addressing unit to generate access addresses for the data to be stored in the data memory arrangement, and an address memory arrangement to store access addresses. The access addresses are logically linked to the given data type of the data, and/or a distribution of the data to the arithmetic/logic units is dependent on the access addresses, and/or a storage of the output data as the data is dependent on the access addresses.

Type: Grant

Filed: September 1, 2006

Date of Patent: August 3, 2010

Assignee: Trident Microsystems (Far East) Ltd.

Inventors: Carsten Noeske, Matthias Vierthaler
Launching a secure kernel in a multiprocessor system

Patent number: 7770005

Abstract: In one embodiment of the present invention, a method includes verifying an initiating logical processor of a system; validating a trusted agent with the initiating logical processor if the initiating logical processor is verified; and launching the trusted agent on a plurality of processors of the system if the trusted agent is validated. After execution of such a trusted agent, a secure kernel may then be launched, in certain embodiments. The system may be a multiprocessor server system having a partially or fully connected topology with arbitrary point-to-point interconnects, for example.

Type: Grant

Filed: December 27, 2007

Date of Patent: August 3, 2010

Assignee: Intel Corporation

Inventors: John H. Wilson, Ioannis T. Schoinas, Mazin S. Yousif, Linda J. Rankin, David W. Grawrock, Robert J. Greiner, James A. Sutton, Kushagra Vaid, Willard M. Wiseman
Matrix microprocessor and method of operation

Publication number: 20100180100

Abstract: A microprocessor includes a direct access memory (DMA) engine which is responsive to pairs of block indices associated with one or more blocks in a first logical plane and transfers the one or more blocks between the first logical plane, a second logical plane, and a physical memory space according to the pairs of block indices. The logical planes represent two dimensional fields of data such as those found in images and videos. The microprocessor further comprises cache memory which updates its content with one or more cache-blocks which are in the neighborhood of the one or more blocks improving the operation of the cache memory by increasing cache hits. The DMA engine may further operate on n-dimensional blocks in a n-dimensional logical space. The microprocessor further includes special-purpose instructions, operative on a single-instruction-multiple-data (SIMD) computation unit, especially tailored to perform matrix operations.

Type: Application

Filed: January 13, 2009

Publication date: July 15, 2010

Inventors: Tsung-Hsin Lu, Carl Alberola, Rajesh Chhabria, Zhenyu Zhou
Data de-scrambler

Patent number: 7751557

Abstract: A method and apparatus are disclosed for efficiently de-scrambling one or more bytes of data according to DSL standards on a processor. This is achieved by providing an instruction for de-scrambling one or more bytes of data according to the DSL standards. Accordingly, the invention advantageously provides a processor with the ability to de-scramble data with a single instruction thus allowing for more efficient and faster de-scrambling operations for subsequent processing.

Type: Grant

Filed: September 22, 2004

Date of Patent: July 6, 2010

Assignee: Broadcom Corporation

Inventors: Mark Taunton, Timothy Martin Dobson
Method for providing physics simulation data

Patent number: 7739479

Abstract: A method of providing physics data within a game program or simulation using a hardware-based physics processing unit having unique architecture designed to efficiently calculate physics related data.

Type: Grant

Filed: November 19, 2003

Date of Patent: June 15, 2010

Assignee: NVIDIA Corporation

Inventors: Jean Pierre Bordes, Curtis Davis, Monier Maher, Manju Hegde, Otto A. Schmid
Modified-SIMD Data Processing Architecture

Publication number: 20100146241

Abstract: An apparatus and method for processing data includes an array of processing elements to simultaneously perform operations on multiple data elements using a single instruction. A grouping module assigns each processing element within the array to one of several groups. A modification module designates how each group of processing elements should handle the single instruction. This enables each group of processing elements to handle the single instruction differently. Each processing element is configured to handle the single instruction based on the group the processing element belongs to.

Type: Application

Filed: December 9, 2008

Publication date: June 10, 2010

Applicant: Novafora, Inc.

Inventors: Shlomo Selim Rakib, Yoram Zarai

prev … 3 4 5 6 7 8 9 10 11 … next