Single Instruction, Multiple Data (simd) Patents (Class 712/22)
-
Patent number: 7900025Abstract: Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.Type: GrantFiled: October 14, 2008Date of Patent: March 1, 2011Assignee: International Business Machines CorporationInventor: Michael K. Gschwind
-
Publication number: 20110047349Abstract: A processor includes a plurality of subfunctional units provided corresponding to respective slots of one or more pieces of operation result data including a plurality of slots for an SIMD operation; and an enable generating unit configured to, in each of the one or more pieces of the operation result data, compare a value of a predetermined slot with a value of a slot other than the predetermined slot, and disable one or more subfunctional units to which the value equal to the value of the predetermined slot is inputted, and the processor outputs the value of the predetermined slot as the value of the one or more subfunctional units which have been disabled.Type: ApplicationFiled: March 12, 2010Publication date: February 24, 2011Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Hiroo HAYASHI
-
Patent number: 7895412Abstract: A programmable processing engine processes transient data within an intermediate network station of a computer network. The engine comprises an array of processing elements symmetrically arrayed as rows and columns, and embedded between input and output buffer units with a plurality of interfaces from the array to an external memory. The external memory stores non-transient data organized within data structures, such as forwarding and routing tables, for use in processing the transient data. Each processing element contains an instruction memory that allows programming of the array to process the transient data as processing element stages of baseline or extended pipelines operating in parallel.Type: GrantFiled: June 27, 2002Date of Patent: February 22, 2011Assignee: Cisco Tehnology, Inc.Inventors: Darren Kerr, Kenneth Michael Key, Michael L. Wright, William E. Jennings
-
Publication number: 20110040952Abstract: Uniforming of the processing load is efficiently realized. Each processing element configuring an SIMD parallel computer system includes a data storage module that stores data processed or transferred, a number-of-data-sets storage device that stores number of data sets, and a front data storage device that stores the front data. Each processing element further includes a control processor that compares the number of data sets stored in one processing element with the number of data sets stored in the own processing element, and issues a data distribution leveling instruction that designates an action for updating contents of the data storage module, the number-of-data-sets storage device, and the front data storage device according to a rule determined based on a comparison result of the own processing element and that of the other processing elements and an action for moving the data stored in the one processing element to the own processing element.Type: ApplicationFiled: April 8, 2009Publication date: February 17, 2011Applicant: NEC CORPORATIONInventor: Shorin Kyo
-
Patent number: 7890733Abstract: A data processor comprises a plurality of processing elements (PEs), with memory local to at least one of the processing elements, and a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid, e.g., in a SIMD array, so as to connect the PEs and their local memories to a common controller. Transaction-enabled PEs and nodes set flags, which are maintained until the transaction is completed and signal status to the controller e.g., over a series of OR-gates. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. External memory may also be connected to the “end” nodes interfacing with the network, eg to provide cache.Type: GrantFiled: August 11, 2005Date of Patent: February 15, 2011Assignee: Rambus Inc.Inventor: Ray McConnell
-
Publication number: 20110029756Abstract: A method for decoding a codeword in a data stream encoded according to a low density parity check (LDPC) code having an m×j parity check matrix H by initializing variable nodes with soft values based on symbols in the codeword, wherein a graph representation of H includes m check nodes and j variable nodes, and wherein a check node m provides a row value estimate to a variable node j and a variable node j provides a column value estimate to a check node m if H(m,j) contains a 1, computing row value estimates for each check node, wherein amplitudes of only a subset of column value estimates provided to the check node are computed, computing soft values for each variable node based on the computed row value estimates, determining whether the codeword is decoded based on the soft values, and terminating decoding when the codeword is decoded.Type: ApplicationFiled: July 28, 2009Publication date: February 3, 2011Inventors: Eric Biscondi, David Hoyle, Tod David Wolf
-
Patent number: 7882325Abstract: A single micro-instruction to perform either an N-bit or a 2N-bit load is provided. A microprocessor having an N-bit load port performs either an N-bit load or a 2N-bit load in a single cycle with the same micro-instruction being used for both the N-bit and the 2N-bit load.Type: GrantFiled: December 21, 2007Date of Patent: February 1, 2011Assignee: Intel CorporationInventors: Zeev Sperber, Robert Valentine, Ehud Cohen, Doron Orenstien, Benny Eitan
-
Patent number: 7882312Abstract: A state engine receives multiple requests from a parallel processor for a shared state. The state engine includes at least one state element and the at least one state element is adapted to operate, atomically, on the shared state in response to a request made by the parallel processor. The request includes at least a command directing the at least one state element on how to perform an operation on the shared state. The state engine also includes a memory connected to the at least one state element and configured to store the shared state.Type: GrantFiled: November 11, 2003Date of Patent: February 1, 2011Assignee: Rambus Inc.Inventor: Anthony Spencer
-
Patent number: 7882333Abstract: A method for loading microcode to a plurality of cores within a processor. The method includes loading the microcode to a first core of the plurality of cores within the processor system and generating a broadcast inter process interrupt (IPI) message via the first core. The IPI message causes other cores within the processor system to synchronize respective microcode with the microcode that is loaded into the first core. The synchronizing loads microcode to the plurality of cores without requiring independent loads of microcode to each core.Type: GrantFiled: November 5, 2007Date of Patent: February 1, 2011Assignee: Dell Products L.P.Inventor: Mukund Khatri
-
Patent number: 7877585Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.Type: GrantFiled: August 27, 2007Date of Patent: January 25, 2011Assignee: NVIDIA CorporationInventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov
-
Patent number: 7873812Abstract: The new system provides for efficient implementation of matrix multiplication in a SIMD processor. The new system provides ability to map any element of a source vector register to be paired with any element of a second source vector register for vector operations, and specifically vector multiply and vector-multiply-accumulate operations to implement a variety of matrix multiplications without the additional permute or data re-ordering instructions. Operations such as DCT and Color-space transformations for video processing could be very efficiently implemented using this system.Type: GrantFiled: April 5, 2004Date of Patent: January 18, 2011Inventor: Tibet Mimar
-
Patent number: 7873794Abstract: Disclosed is an apparatus, method, and program product that provides atomic, multi-word load support without incurring additional memory utilization. A double-word is atomically loaded without the use of one or more additional fields and without a lock. An invalidity marker is used in connection with a cache miss time to ascertain whether a loaded double-word has been stored and loaded atomically, and is thus, valid.Type: GrantFiled: August 21, 2007Date of Patent: January 18, 2011Assignee: International Business Machines CorporationInventors: Michael Joseph Corrigan, Timothy Joseph Torzewski
-
Publication number: 20110010524Abstract: There is provided an SIMD processor array system in which data can be efficiently transferred between processor elements located at different distances. The SIMD processor array system includes a control processor (CP) that is capable of issuing a plurality of instructions at the same time, and a PE array that includes a plurality of mutually-connected processing elements (PEs) to be controlled by the CP. The CP issues an inter-PE data shift instruction to each PE. According to the inter-PE data shift instruction, each PE performs a data sending operation of copying all the contents of a transfer data storing part of an adjoining PE to a transfer data storing part (MBF) of the own PE, and a data fetch operation of copying part or all of the contents of the MBF of the adjoining PE to a transfer data fetch and storing part (RBUF) of the own PE if part of the contents the MBF of the adjoining PE coincide with the contents of an ID storing part (IDB) of the own PE.Type: ApplicationFiled: March 4, 2009Publication date: January 13, 2011Applicant: NEC CORPORATIONInventor: Shorin Kyo
-
Publication number: 20110004743Abstract: A data processing apparatus 1 has a plurality of registers 10 of the same type of register and a plurality of processing pipelines 40, 50, each processing pipeline 40, 50 being arranged to process instructions. At least one instruction includes a destination register specifier specifying which of said registers is a destination register for storing a processing result of the at least one instruction. Instruction issuing circuitry 26 is configured to issue the at least one instruction for processing by one of the plurality of processing pipelines. The instruction issuing circuitry 26 selects the one of the plurality of processing pipelines to which the candidate instruction is issued in dependence upon the value of the destination register specifier of the candidate instruction.Type: ApplicationFiled: July 1, 2009Publication date: January 6, 2011Applicant: ARM LimitedInventor: David Raymond Lutz
-
Publication number: 20100332794Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.Type: ApplicationFiled: June 30, 2009Publication date: December 30, 2010Inventors: Asaf Hargil, Doron Orenstein
-
Patent number: 7861060Abstract: Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.Type: GrantFiled: December 15, 2005Date of Patent: December 28, 2010Assignee: NVIDIA CorporationInventors: John R. Nickolls, Stephen D. Lew
-
Patent number: 7861071Abstract: A method of conditionally executing branch instructions which comprise an opcode field defining a type of test to be applied to determine whether or not to execute a branch operation, a control field designating a control store holding a plurality of indicators and a destination field holding information on a branch target address. The method comprises determining from the opcode field whether or not the test will check the state of one indicator or a plurality of indicators in the designated control store, accessing the designated control store to check the state of said one or said plurality of indicators depending on the determination, and generating a branch target address using information in the destination field in dependence on the state of the or each indicator checked.Type: GrantFiled: May 30, 2002Date of Patent: December 28, 2010Assignee: Broadcom CorporationInventor: Sophie Wilson
-
Patent number: 7856543Abstract: A data processing architecture comprising: an input device for receiving an incoming stream of data packets; and a plurality of processing elements which are operable to process data received thereby; wherein the input device is operable to distribute data packets in whole or in part to the processing elements in dependence upon the data processing bandwidth of the processing elements.Type: GrantFiled: February 14, 2002Date of Patent: December 21, 2010Assignee: Rambus Inc.Inventors: John Rhoades, Ken Cameron, Paul Winser, Ray McConnell, Gordon Faulds, Simon McIntosh-Smith, Anthony Spencer, Jeff Bond, Matthias Dejaegher, Danny Halamish, Gajinder Panesar
-
Publication number: 20100318766Abstract: A processor includes a processing unit capable of executing single-instruction multiple-data operations; a register file configured to store data that is to be supplied to the processing unit and to be subjected to operations, and a buffer provided separately from the register file, the buffer being a buffer where an integer “n” number of data columns each having a plurality of data elements are written on a column-by-column basis, and data elements at the same location are selected and read as “n” data elements from the respective “n” data columns, wherein the “n” data elements read from the buffer is supplied to the processing unit as data to be subjected to a single-instruction multiple-data operation.Type: ApplicationFiled: June 7, 2010Publication date: December 16, 2010Applicant: FUJITSU SEMICONDUCTOR LIMITEDInventor: Masayuki TSUJI
-
Patent number: 7853778Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.Type: GrantFiled: December 20, 2001Date of Patent: December 14, 2010Assignee: Intel CorporationInventor: Patrice Roussel
-
Publication number: 20100312989Abstract: A processor 2 supporting register renaming has a rename table 20 in which the flag register has multiple tag values associated therewith. These tag values indicate which virtual register corresponds to a destination flag register of the oldest instruction which wrote a still up-to-date value of a subset of the flags.Type: ApplicationFiled: June 4, 2009Publication date: December 9, 2010Inventor: James Nolan Hardage
-
Publication number: 20100293534Abstract: In one embodiment, the invention is a method and apparatus for use of vectorization instruction sets. One embodiment of a method for generating vector instructions includes receiving source code written in a high-level programming language, wherein the source code includes at least one high-level instruction that performs multiple operations on a plurality of vector operands, and compiling the high-level instruction(s) into one or more low-level instructions, wherein the low-level instructions are in an instruction set of a specific computer architecture.Type: ApplicationFiled: May 15, 2009Publication date: November 18, 2010Inventors: HENRIQUE ANDRADE, Bugra Gedik, Hua Yong Wang, Kun-Lung Wu
-
Patent number: 7831804Abstract: A processor architecture includes a number of processing elements for treating input signals. The architecture is organized according to a matrix including rows and columns, the columns of which each include at least one microprocessor block having a computational part and a set of associated processing elements that are able to receive the same input signals. The number of associated processing elements is selectively variable in the direction of the column so as to exploit the parallelism of said signals. Additionally the processor architecture of the present invention enable dynamic switching between instruction parallelism and data parallel processing typical of vectorial functionality. The architecture can be scaled in various dimensions in an optimal configuration for the algorithm to be executed.Type: GrantFiled: May 30, 2008Date of Patent: November 9, 2010Assignee: ST Microelectronics S.R.L.Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elio Guidetti
-
Publication number: 20100281255Abstract: In one embodiment of the present invention, a method includes verifying a master processor of a system; validating a trusted agent with the master processor if the master processor is verified; and launching the trusted agent on a plurality of processors of the system if the trusted agent is validated. After execution of such a trusted agent, a secure kernel may then be launched, in certain embodiments. The system may be a multiprocessor server system having a partially or fully connected topology with arbitrary point-to-point interconnects, for example.Type: ApplicationFiled: June 29, 2010Publication date: November 4, 2010Inventors: John H. Wilson, Ioannis T. Schoinas, Mazin S. Yousif, Linda J. Rankin, David W. Grawrock, Robert J. Greiner, James A. Sutton, Kushagra Vaid, Willard M. Wiseman
-
Publication number: 20100274989Abstract: A method executed by an instruction set on a processor is described. The method includes providing a tbbit instruction, inputting a first index for the tbbit instruction, loading a second value for the tbbit instruction, wherein the second value comprises at least 2b bits, using selected b bits of the first index to select at least one target bit in the loaded second value, shifting the target bit into the bottom of the first index, and computing a second index based on the shifting of the target bit into the bottom of the first index. Other methods and variations are also described.Type: ApplicationFiled: December 8, 2008Publication date: October 28, 2010Inventors: Mayan Moudgill, Sitij Agrawal
-
Publication number: 20100274990Abstract: An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry.Type: ApplicationFiled: September 17, 2009Publication date: October 28, 2010Inventors: Mladen Wilder, Dominic Hugo Symes, Richard Edward Bruce
-
Patent number: 7818540Abstract: A vector processing system for executing vector instructions, each instruction defining multiple value pairs, an operation to be executed and a modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and, when selected, to implement an operation on said value pair to generate a result, each processing unit comprising at least one flag and being selectable in dependence on a condition defined by said at least one flag, wherein the modifier defines the condition under which the parallel processing unit is individually selected.Type: GrantFiled: May 19, 2006Date of Patent: October 19, 2010Assignee: Broadcom CorporationInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Patent number: 7818539Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.Type: GrantFiled: August 28, 2006Date of Patent: October 19, 2010Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of TechnologyInventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
-
Patent number: 7818541Abstract: A data processing architecture comprising: an input device for receiving an incoming stream of data packets; and a plurality of processing elements which are operable to process data received thereby; wherein the input device is operable to distribute data packets in whole or in part to the processing elements in dependence upon the data processing bandwidth of the processing elements.Type: GrantFiled: May 23, 2007Date of Patent: October 19, 2010Assignee: Clearspeed Technology LimitedInventors: John Rhoades, Ken Cameron, Paul Winser, Ray McConnell, Gordon Faulds, Simon McIntosh-Smith, Anthony Spencer, Jeff Bond, Matthias Dejaegher, Danny Halamish, Gajinder Panesar
-
Patent number: 7818548Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, and (c) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways.Type: GrantFiled: July 27, 2007Date of Patent: October 19, 2010Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 7814297Abstract: A data processing apparatus comprises data processing logic operable to perform data processing operations specified by program instructions. The data processing logic (140) has a plurality of functional units (142, 144, 146) configured to execute in parallel on data received from a data source. A decoder (130) is responsive to a single program instruction to control the data processing logic (140) to concurrently execute the single program instruction on each of a plurality of vector elements of each of a respective plurality of vector input operands (310, 320) received from the data source using the plurality of functional units (142, 144, 146).Type: GrantFiled: July 26, 2005Date of Patent: October 12, 2010Assignee: ARM LimitedInventor: Martinus Cornelis Wezelenburg
-
Publication number: 20100250897Abstract: The invention relates to a parallel processor which comprises elementary processors (3) disposed according to a topology with a predetermined position within this topology and capable of simultaneously executing the same instruction on different data, the instruction relating to at least one operand and/or providing at least one result. The instruction comprises, for each operand and/or each result, information relating to the position of a field of action within a data structure of the table of dimension M type and the parallel processor comprises means (41, 42, 43) for calculating the address of each operand and/or each result within each elementary processor, as a function of the position of the field of action and of the position of the elementary processor within the topology.Type: ApplicationFiled: June 26, 2008Publication date: September 30, 2010Applicant: ThalesInventor: Gérard Gaillat
-
Patent number: 7805561Abstract: A single instruction, multiple data (“SIMD”) computer system includes a central control unit coupled to 256 processing elements (“PEs”) and to 32 static random access memory (“SRAM”) devices. Each group of eight PEs can access respective groups of eight columns in a respective SRAM device. Each PE includes a local column address register that can be loaded through a data bus of the respective PE. A local column address stored in the local column address register is applied to an AND gate, which selects either the local column address or a column address applied to the AND gate by the central control unit. As a result, the central control unit can globally access the SRAM device, or a specific one of the eight columns that can be accessed by each PE can be selected locally by the PE.Type: GrantFiled: January 16, 2009Date of Patent: September 28, 2010Assignee: Micron Technology, Inc.Inventor: Jon Skull
-
Publication number: 20100241824Abstract: Techniques are disclosed for converting data into a format tailored for efficient multidimensional fast Fourier transforms (FFTS) on single instruction, multiple data (SIMD) multi-core processor architectures. The technique includes converting data from a multidimensional array stored in a conventional row-major order into SIMD format. Converted data in SIMD format consists of a sequence of blocks, where each block interleaves s rows such that SIMD vector processors may operate on s rows simultaneously. As a result, the converted data in SIMD format enables smaller-sized 1D FFTs to be optimized in SIMD multi-core processor architectures.Type: ApplicationFiled: March 18, 2009Publication date: September 23, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David G. Carlson, Travis M. Drucker, Timothy J. Mullins, Jeffrey S. McAllister, Nelson Ramirez
-
Patent number: 7802079Abstract: A parallel data processing apparatus using a SIMD array of processing elements is disclosed. The apparatus makes use of a register in order to control issuance of instructions to the processing elements in the array.Type: GrantFiled: June 29, 2007Date of Patent: September 21, 2010Assignee: Clearspeed Technology LimitedInventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Patent number: 7797517Abstract: Reference architecture instructions are translated into target architecture operations. Sequences of operations, in a predicted execution order in some embodiments, form traces. In some embodiments, a trace is based on a plurality of basic blocks. In some embodiments, a trace is committed or aborted as a single entity. Sequences of operations are optimized by fusing collections of operations; fused operations specify a same observable function as respective collections, but advantageously enable more efficient processing. In some embodiments, a collection comprises multiple register operations. Fusing a register operation with a branch operation in a trace forms a fused reg-op/branch operation. In some embodiments, branch instructions translate into assert operations. Fusing an assert operation with another operation forms a fused assert operation. In some embodiments, fused operations only set architectural state, such as high-order portions of registers, that is subsequently read before being written.Type: GrantFiled: November 17, 2006Date of Patent: September 14, 2010Assignee: Oracle America, Inc.Inventor: John Gregory Favor
-
Patent number: 7788468Abstract: A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Different threads of the CTA are advantageously synchronized at appropriate points during CTA execution using a barrier synchronization technique in which barrier instructions in the CTA program are detected and used to suspend execution of some threads until a specified number of other threads also reaches the barrier point.Type: GrantFiled: December 15, 2005Date of Patent: August 31, 2010Assignee: NVIDIA CorporationInventors: John R. Nickolls, Stephen D. Lew, Brett W. Coon, Peter C. Mills
-
Patent number: 7788471Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.Type: GrantFiled: September 18, 2006Date of Patent: August 31, 2010Assignee: Freescale Semiconductor, Inc.Inventor: Chengke Sheng
-
Patent number: 7783862Abstract: One embodiment of the present invention is a processor that processes inductive doubling SIMD instructions, which processor includes: an Instruction Fetch Unit that loads a SIMD instruction and applies it as input to a SIMD Instruction Decode Unit; wherein the SIMD Instruction Decode Unit decodes the applied SIMD instruction and produces output signals including SIMD field width identification signals and one or more SIMD half-operand modifier signals.Type: GrantFiled: August 6, 2007Date of Patent: August 24, 2010Assignee: International Characters, Inc.Inventor: Robert D. Cameron
-
Publication number: 20100211758Abstract: A microprocessor that can perform sequential processing in data array unit includes: a load store unit that loads, when a fetched instruction is a load instruction for data, a data sequence including designated data from a data memory in memory width unit and specifies, based on an analysis result of the instruction, data scheduled to be designated in a load instruction in future; and a data temporary storage unit that stores use-scheduled data as the data specified by the load store unit.Type: ApplicationFiled: December 29, 2009Publication date: August 19, 2010Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masato Sumiyoshi, Takashi Miyamori, Shunichi Ishiwata, Katsuyuki Kimura, Takahisa Wada, Keiri Nakanishi, Yasuki Tanabe, Ryuji Hada
-
Publication number: 20100211858Abstract: An application specific processor to implement a Viterbi decode algorithm for channel decoding functions of received symbols. The Viterbi decode algorithm is at least one of a Bit Serial decode algorithm, and block based decode algorithm. The application specific processor includes a Load-Store, Logical and De-puncturing (LLD) slot that performs a Load-Store function, a Logical function, a De-puncturing function, and a Trace-back Address generation function, a Branch Metric Compute (BMU) slot that performs a Radix-2 branch metric computations, a Radix-4 branch metric computations, and Squared Euclidean Branch Metric computations, and an Add-Compare-Select (ACS) slot that performs a Radix-2 Path metric computations, a Radix-4 Path metric computations, a best state computations, and a decision bit generation. The LLD slot, the BMU slot and the ACS slot perform in a software pipelined manner to enable high speed Viterbi decoding functions.Type: ApplicationFiled: February 18, 2010Publication date: August 19, 2010Applicant: SAANKHYA LABS PVT LTDInventors: Anindya Saha, Hemant Mallapur, Santhosh Billava, Smitha Bmv
-
Patent number: 7774189Abstract: A system and method for implementing a unified model for integration systems is presented. A user provides inputs to an integrated language engine for placing operator components and arc components onto a dataflow diagram. Operator components include data ports for expressing data flow, and also include meta-ports for expressing control flow. Arc components connect operator components together for data and control information to flow between the operator components. The dataflow diagram is a directed acyclic graph that expresses an application without including artificial boundaries during the application design process. Once the integrated language engine generates the dataflow diagram, the integrated language engine compiles the dataflow diagram to generated application code.Type: GrantFiled: December 1, 2006Date of Patent: August 10, 2010Assignee: International Business Machines CorporationInventors: Amir Bar-Or, Michael James Beckerle
-
Patent number: 7774600Abstract: In one embodiment of the present invention, a method includes verifying an initiating logical processor of a system; validating a trusted agent with the initiating logical processor if the initiating logical processor is verified; and launching the trusted agent on a plurality of processors of the system if the trusted agent is validated. After execution of such a trusted agent, a secure kernel may then be launched, in certain embodiments. The system may be a multiprocessor server system having a partially or fully connected topology with arbitrary point-to-point interconnects, for example.Type: GrantFiled: December 27, 2007Date of Patent: August 10, 2010Assignee: Intel CorporationInventors: John H. Wilson, Ioannis T. Schoinas, Mazin S. Yousif, Linda J. Rankin, David W. Grawrock, Robert J. Greiner, James A. Sutton, Kushagra Vaid, Willard M. Wiseman
-
Patent number: 7769980Abstract: In arithmetic/logic units (ALU) provided corresponding to entries, an MIMD instruction decoder generating a group of control signals in accordance with a Multiple Instruction-Multiple Data (MIMD) instruction and an MIMD register storing data designating the MIMD instruction are provided, and an inter-ALU communication circuit is provided. The amount and direction of movement of the inter-ALU communication circuit are set by data bits stored in a movement data register. It is possible to execute data movement and arithmetic/logic operation with the amount of movement and operation instruction set individually for each ALU unit. Therefore, in a Single Instruction-Multiple Data type processing device, Multiple Instruction-Multiple Data operation can be executed at high speed in a flexible manner.Type: GrantFiled: August 16, 2007Date of Patent: August 3, 2010Assignee: Renesas Technology Corp.Inventors: Toshinori Sueyoshi, Masahiro Iida, Mitsutaka Nakano, Fumiaki Senoue, Katsuya Mizumoto
-
Patent number: 7769989Abstract: A processor architecture, for example, a SIMD processor architecture, includes at least two arithmetic/logic units to implement data processing, a data memory arrangement or a memory device interface to a memory arrangement to store data of different data types, an addressing unit to generate access addresses for the data to be stored in the data memory arrangement, and an address memory arrangement to store access addresses. The access addresses are logically linked to the given data type of the data, and/or a distribution of the data to the arithmetic/logic units is dependent on the access addresses, and/or a storage of the output data as the data is dependent on the access addresses.Type: GrantFiled: September 1, 2006Date of Patent: August 3, 2010Assignee: Trident Microsystems (Far East) Ltd.Inventors: Carsten Noeske, Matthias Vierthaler
-
Patent number: 7770005Abstract: In one embodiment of the present invention, a method includes verifying an initiating logical processor of a system; validating a trusted agent with the initiating logical processor if the initiating logical processor is verified; and launching the trusted agent on a plurality of processors of the system if the trusted agent is validated. After execution of such a trusted agent, a secure kernel may then be launched, in certain embodiments. The system may be a multiprocessor server system having a partially or fully connected topology with arbitrary point-to-point interconnects, for example.Type: GrantFiled: December 27, 2007Date of Patent: August 3, 2010Assignee: Intel CorporationInventors: John H. Wilson, Ioannis T. Schoinas, Mazin S. Yousif, Linda J. Rankin, David W. Grawrock, Robert J. Greiner, James A. Sutton, Kushagra Vaid, Willard M. Wiseman
-
Publication number: 20100180100Abstract: A microprocessor includes a direct access memory (DMA) engine which is responsive to pairs of block indices associated with one or more blocks in a first logical plane and transfers the one or more blocks between the first logical plane, a second logical plane, and a physical memory space according to the pairs of block indices. The logical planes represent two dimensional fields of data such as those found in images and videos. The microprocessor further comprises cache memory which updates its content with one or more cache-blocks which are in the neighborhood of the one or more blocks improving the operation of the cache memory by increasing cache hits. The DMA engine may further operate on n-dimensional blocks in a n-dimensional logical space. The microprocessor further includes special-purpose instructions, operative on a single-instruction-multiple-data (SIMD) computation unit, especially tailored to perform matrix operations.Type: ApplicationFiled: January 13, 2009Publication date: July 15, 2010Inventors: Tsung-Hsin Lu, Carl Alberola, Rajesh Chhabria, Zhenyu Zhou
-
Patent number: 7751557Abstract: A method and apparatus are disclosed for efficiently de-scrambling one or more bytes of data according to DSL standards on a processor. This is achieved by providing an instruction for de-scrambling one or more bytes of data according to the DSL standards. Accordingly, the invention advantageously provides a processor with the ability to de-scramble data with a single instruction thus allowing for more efficient and faster de-scrambling operations for subsequent processing.Type: GrantFiled: September 22, 2004Date of Patent: July 6, 2010Assignee: Broadcom CorporationInventors: Mark Taunton, Timothy Martin Dobson
-
Patent number: 7739479Abstract: A method of providing physics data within a game program or simulation using a hardware-based physics processing unit having unique architecture designed to efficiently calculate physics related data.Type: GrantFiled: November 19, 2003Date of Patent: June 15, 2010Assignee: NVIDIA CorporationInventors: Jean Pierre Bordes, Curtis Davis, Monier Maher, Manju Hegde, Otto A. Schmid
-
Publication number: 20100146241Abstract: An apparatus and method for processing data includes an array of processing elements to simultaneously perform operations on multiple data elements using a single instruction. A grouping module assigns each processing element within the array to one of several groups. A modification module designates how each group of processing elements should handle the single instruction. This enables each group of processing elements to handle the single instruction differently. Each processing element is configured to handle the single instruction based on the group the processing element belongs to.Type: ApplicationFiled: December 9, 2008Publication date: June 10, 2010Applicant: Novafora, Inc.Inventors: Shlomo Selim Rakib, Yoram Zarai