Single Instruction, Multiple Data (simd) Patents (Class 712/22)

Compiling scalar code for a single instruction multiple data (SIMD) execution engine

Patent number: 8108846

Abstract: A mechanism is provided for performing scalar operations using a SIMD data parallel execution unit. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined.

Type: Grant

Filed: May 28, 2008

Date of Patent: January 31, 2012

Assignee: International Business Machines Corporation

Inventor: Michael K. Gschwind
Processor architectures for enhanced computational capability and low latency

Patent number: 8108653

Abstract: A processor includes a compute array comprising a first plurality of compute engines serially connected along a data flow path such that data flows between successive compute engines at successive times. The first plurality of compute engines includes an initial compute engine and a final compute engine. The data flow path includes a recirculation path connecting the final compute engine to the initial compute engine with no compute engine therebetween.

Type: Grant

Filed: February 5, 2010

Date of Patent: January 31, 2012

Assignee: Analog Devices, Inc.

Inventors: Boris Lerner, Douglas Garde
Methods and apparatus for independent processor node operations in a SIMD array processor

Patent number: 8103854

Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.

Type: Grant

Filed: April 12, 2010

Date of Patent: January 24, 2012

Assignee: Altera Corporation

Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
Methods for scalably exploiting parallelism in a parallel processing system

Patent number: 8099584

Abstract: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.

Type: Grant

Filed: May 2, 2011

Date of Patent: January 17, 2012

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Stephen D. Lew
Tile output using multiple queue output buffering in a raster stage

Patent number: 8085264

Abstract: A method for multiple queue output buffering in a raster stage of a graphics processor. The method includes receiving a graphics primitive for rasterization in a raster stage of a graphics processor. The graphics primitive is rasterized at a first level to generate a plurality of tiles of pixels related to the graphics primitive. Each tile is then rasterized to determine related sub-portions of each tile. The related sub-portions are transferred to a plurality of output queues. The related sub-portions are subsequently output on a per queue basis and on a per clock cycle basis.

Type: Grant

Filed: July 26, 2006

Date of Patent: December 27, 2011

Assignee: NVIDIA Corporation

Inventors: Franklin C. Crow, Jeffrey R. Sewall
Arithmetic processing apparatus

Patent number: 8086830

Abstract: An arithmetic processing apparatus capable of performing an arithmetic operation for generating a condition flag commonly referred to by using a condition flag generated on an arithmetic operation unit basis in as few steps as possible is provided. The arithmetic processing apparatus, which processes multiple data in parallel based on single instruction, includes: processing elements capable of performing a common arithmetic operation based on the evaluation result of the instruction stored in the instruction register; and a condition flag arithmetic operation unit capable of performing one of the logical operation and the comparison operation on the condition flag retained in each processing element, transferring the operation result to each processing element, and updating the condition flag based on the operation result.

Type: Grant

Filed: August 24, 2005

Date of Patent: December 27, 2011

Assignee: Panasonic Corporation

Inventors: Takeshi Furuta, Hideshi Nishida, Takeshi Tanaka
Residual addition for video software techniques

Patent number: 8082419

Abstract: According to some embodiments, a technique provides for the execution of an instruction that includes receiving residual data of a first image and decoded pixels of a second image, zero-extending a plurality of unsigned data operands of the decoded pixels producing a plurality of unpacked data operands, adding a plurality of signed data operands of the residual data to the plurality of unpacked data operands producing a plurality of signed results; and saturating the plurality of signed results producing a plurality of unsigned results.

Type: Grant

Filed: March 30, 2004

Date of Patent: December 20, 2011

Assignee: Intel Corporation

Inventors: Bradley C. Aldrich, Nigel C. Paver, Murli Ganeshan
Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits

Patent number: 8078836

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: December 30, 2007

Date of Patent: December 13, 2011

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
Providing extended precision in SIMD vector arithmetic operations

Patent number: 8074058

Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.

Type: Grant

Filed: June 8, 2009

Date of Patent: December 6, 2011

Assignee: MIPS Technologies, Inc.

Inventors: Timothy J. Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
Parallel histogram generation in SIMD processor by indexing LUTs with vector data element values

Patent number: 8069334

Abstract: The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.

Type: Grant

Filed: March 12, 2009

Date of Patent: November 29, 2011

Inventor: Tibet Mimar
Processor architecture with processing clusters providing vector and scalar data processing capability

Patent number: 8060725

Abstract: A processor architecture for multimedia applications includes processor clusters providing vectorial data processing capability. Processing elements in the processor clusters process both data with a bit length N and data with bit lengths N/2, N/4, and so on according to a Single Instruction Multiple Data (SIMD) function. A load unit loads into the processor clusters data to be processed according to a same instruction. An intercluster data path exchanges data between the processor clusters. The intercluster data path is scalable to activate selected processor clusters. The processor operates simultaneously on SIMD, scalar and vectorial data.

Type: Grant

Filed: June 26, 2007

Date of Patent: November 15, 2011

Assignees: STMicroelectronics S.R.L., STMicroelectronics N.V.

Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elena Salurso, Elio Guidetti
SIMD microprocessor, image processing apparatus including same, and image processing method used therein

Patent number: 8060726

Abstract: A SIMD microprocessor, which can be included in an image processing apparatus using an image processing method used therein, includes a global processor and multiple processor elements controlled by the global processor. Each single processor element of the multiple processor elements includes multiple operation units. The global processor is configured to control the multiple processing elements to uniformly change a configuration of the multiple operation units in the single processor element to determine a number of data units of operation according to the multiple operation units either operated individually or in cooperation with each other in the single processor element and a width of data processed per data unit of operation performed in the single processor element. A processor element number is assigned per data unit of operation to the single processor element to use for executing an operation.

Type: Grant

Filed: February 27, 2008

Date of Patent: November 15, 2011

Assignee: Ricoh Company, Ltd.

Inventor: Tomoaki Ozaki
Provision of extended addressing modes in a single instruction multiple data (SIMD) data processor

Patent number: 8060724

Abstract: Executing a first memory access instruction with update by an N-bit processor includes accessing at least one source register of a plurality of registers, wherein the accessing includes accessing a first register, wherein each register of the plurality of registers includes a main portion of N bits and an extension portion of M bits, wherein the main portion of the first register includes a first address operand. The execution of the first instruction further includes forming a memory access address using the first address operand; using the memory access address as an address for a memory access; producing an updated address operand; and writing the updated address operand to the main portion of the first register. The producing includes accessing an extension portion of a source register of the at least one source register to obtain modifying information and using the modifying information in the producing an updated address operand.

Type: Grant

Filed: August 15, 2008

Date of Patent: November 15, 2011

Assignee: Freescale Semiconductor, Inc.

Inventor: William C. Moyer
Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization

Patent number: 8056069

Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.

Type: Grant

Filed: September 17, 2007

Date of Patent: November 8, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Processor apparatus and method of processing multiple data by single instructions

Patent number: 8041927

Abstract: A processor (and method) of processing multiple data by a single instruction includes first and second register sets each of which includes a plurality of registers, and an arithmetic unit to rearrange data being registered in the first and second register sets according to a relative size of an absolute value of the data between the first and second register sets so that the relative size is defined before executing an instruction considering the relative size.

Type: Grant

Filed: April 7, 2009

Date of Patent: October 18, 2011

Assignee: NEC Corporation

Inventor: Yusuke Kobayashi
Load/move duplicate instructions for a processor

Patent number: 8032735

Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.

Type: Grant

Filed: November 5, 2010

Date of Patent: October 4, 2011

Assignee: Intel Corporation

Inventor: Patrice Roussel
Runtime instruction decoding modification in a multi-processing array

Patent number: 8028150

Abstract: A method and system for decoding and modifying processor instructions in runtime according to certain rules in order to separately control processing elements embedded within a multi-processor array using a single instruction. The present invention allows multiple processing elements and/or execution units in a multi-processor array to perform different operations, based upon a variable or variables such as their location in the multi-processor array, while accepting a single instruction as an input.

Type: Grant

Filed: November 16, 2007

Date of Patent: September 27, 2011

Inventors: Shlomo Selim Rakib, Yoram Zarai
SIMD processor with each processing element receiving buffered control signal from clocked register positioned in the middle of the group

Patent number: 8024550

Abstract: Disclosed is an SIMD-type microprocessor comprising a processor element group, plural processor elements with an operation part and a register file being arranged therein and a processor element control signal generator configured to output a processor element control signal controlling an operation of the processor element, wherein a feed part configured to feed a processor element control signal output from the processor element control signal generator to the processor element is provided at a center of the processor element group.

Type: Grant

Filed: January 21, 2009

Date of Patent: September 20, 2011

Assignee: Ricoh Company, Ltd.

Inventor: Hidehito Kitamura
Information processing system and information processing method

Patent number: 8024531

Abstract: An ascending ordered list without duplication is generated based on a value list divided and held by multiple memory modules. An information processing system has multiple PMMs (Processor Memory Modules), and the PMMs are interconnected via a data transmission path. The memory in the PMM has a list of values, which are ordered in ascending or descending order without duplication. The PMM determines, for a storage value in the value list (LOCAL_LIST) held by the PMM, whether or not the memory module is a representative module representing one or more memory modules holding the storage value based on rankings determined for the individual PMMs and the value lists received from the other PMMs, and if the memory module is determined to be the representative module (RV-0 . . . RV-7), associates to the storage value and stores information indicating that the memory module is the representative module.

Type: Grant

Filed: April 17, 2006

Date of Patent: September 20, 2011

Assignee: Turbo Data Laboratories, Inc.

Inventor: Shinji Furusho
Two-dimensional processor array of processing elements

Patent number: 8024549

Abstract: A data processor apparatus comprises a plurality of data receiving means each for receiving data from a data source; a computational element coupleable to each of said data receiving means for performing an operation on said data; and a controller for controlling the flow of data from each data receiving means to the computational element.

Type: Grant

Filed: March 3, 2006

Date of Patent: September 20, 2011

Assignee: Mtekvision Co., Ltd.

Inventor: Malcolm Stewart
SYSTEM AND METHOD FOR PROCESSING IMAGE DATA RELATIVE TO A FOCUS OF ATTENTION WITHIN THE OVERALL IMAGE

Publication number: 20110211726

Abstract: This invention provides a system and method for processing discrete image data within an overall set of acquired image data based upon a focus of attention within that image. The result of such processing is to operate upon a more limited subset of the overall image data to generate output values required by the vision system process. Such output value can be a decoded ID or other alphanumeric data. The system and method is performed in a vision system having two processor groups, along with a data memory that is smaller in capacity than the amount of image data to be read out from the sensor array. The first processor group is a plurality of SIMD processors and at least one general purpose processor, co-located on the same die with the data memory. A data reduction function operates within the same clock cycle as data-readout from the sensor to generate a reduced data set that is stored in the on-die data memory.

Type: Application

Filed: May 17, 2010

Publication date: September 1, 2011

Applicant: COGNEX CORPORATION

Inventors: Michael C. Moed, E. John McGarry
Method for compiling scalar code for a single instruction multiple data (SIMD) execution engine

Patent number: 8010953

Abstract: Performing scalar operations using a SIMD data parallel execution unit is provided. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined.

Type: Grant

Filed: April 4, 2006

Date of Patent: August 30, 2011

Assignee: International Business Machines Corporation

Inventor: Michael K. Gschwind
Dual Mode Floating Point Multiply Accumulate Unit

Publication number: 20110208946

Abstract: Disclosed are various embodiments of a stream processing unit for single instruction multiple data (SIMD) processing, wherein the stream processing unit executes a stage of a Multiply-Accumulate calculation. In one embodiment, the stream processing unit comprises a plurality of scalar arithmetic logic units (ALUs) configured to receive data having a plurality of data types. The number and type of scalar ALUs corresponds to an SIMD factor. In one embodiment, the scalar ALUs are executed sequentially with a delay being introduced in between execution of each of the scalar ALUs, wherein the delay corresponds to the SIMD factor.

Type: Application

Filed: May 4, 2011

Publication date: August 25, 2011

Applicant: VIA TECHNOLOGIES, INC.

Inventors: Boris Prokopenko, Timour Paltashev, Derek Gladding
SIMD image forming apparatus for minimizing wiring distance between registers and processing devices

Patent number: 8001506

Abstract: A disclosed image processing apparatus includes a SIMD microprocessor in which multiple processor elements are arranged in one dimension, each of the processor elements including multiple access registers arranged in stages for storing image data; and multiple data processing devices corresponding one-to-one with the stages of the access registers, arranged in one dimension in the same direction as the processor elements, and configured to read and write image data from/to the access registers. The access registers of each of the stages, each of which access registers is included in a different one of the processor elements, are connected with a common line. Wiring outlets, each of which connects the common line of a different one of the stages to a corresponding data processing device, are individually disposed within the SIMD microprocessor in such a manner that each wiring outlet has a shortest possible distance to the corresponding data processing device.

Type: Grant

Filed: January 21, 2009

Date of Patent: August 16, 2011

Assignee: Ricoh Company, Ltd.

Inventor: Tomoaki Ozaki
Multi-node chipset lock flow with peer-to-peer non-posted I/O requests

Patent number: 7996572

Abstract: Systems and methods of managing transactions provide for receiving a first flush command at a first I/O hub, wherein the first flush command is dedicated to non-posted transactions. One embodiment further provides for halting an inbound ordering queue of the first I/O hub with regard to non-posted transactions in response to the first flush command and flushing a non-posted transaction from an outgoing buffer of the first I/O hub to a second I/O hub while the inbound ordering queue is halted with regard to non-posted transactions.

Type: Grant

Filed: June 2, 2004

Date of Patent: August 9, 2011

Assignee: Intel Corporation

Inventors: Robert G. Blankenship, Robert J. Greiner, Herbert H. J. Hum, Kenneth C. Creta, Buderya S. Acharya
Relating to Single Instruction Multiple Data (SIMD) Architectures

Publication number: 20110191567

Abstract: Improvements Relating to Single Instruction Multiple Data (SIMD) Architectures A parallel processor for processing a plurality of different processing instruction streams in parallel is described. The processor comprises a plurality of data processing units; and a plurality of SIMD (Single Instruction Multiple Data) controllers, each connectable to a group of data processing units of the plurality of data processing units, and each SIMD controller arranged to handle an individual processing task with a subgroup of actively connected data processing units selected from the group of data processing units. The parallel processor is arranged to vary dynamically the size of the subgroup of data processing units to which each SIMD controller is actively connected under control of received processing instruction streams, thereby permitting each SIMD controller to be actively connected to a different number of processing units for different processing tasks.

Type: Application

Filed: May 20, 2009

Publication date: August 4, 2011

Inventors: John Lancaster, Martin Whitaker
Low-Overhead Misalignment and Reformatting Support for SIMD

Publication number: 20110185150

Abstract: Systems and methods for performing single instruction multiple data (SIMD) operations on a data set. The methods may include examining a structure of the data set to determine what reorganization may be necessary to facilitate SIMD processing. The method may include selecting a stored bit mask corresponding to the organization of the data set and loading the bit mask into an application specific register (ASR). Subsequently, the data may be reorganized inline according to the ASR as the data is loaded into the SIMD functional unit such that the SIMD functional unit may operate on the data set. The results of the SIMD operation may be written to a results register.

Type: Application

Filed: January 26, 2010

Publication date: July 28, 2011

Applicant: SUN MICROSYSTEMS, INC.

Inventor: Lawrence A. Spracklen
Data Processing Architecture

Publication number: 20110185151

Abstract: A parallel processor is described which is operated in a SIMD manner. The processor comprises: a plurality of processing elements connected in a string and grouped into a plurality of processing units, wherein each processing unit comprises a plurality of processing elements which each have direct interconnections with all of the other processing elements within the respective processing unit, the interconnections enabling data transfer between any two elements within a unit to be effected in a single clock cycle.

Type: Application

Filed: May 20, 2009

Publication date: July 28, 2011

Inventors: Martin Whitaker, John Lancaster
MAXIMIZED MEMORY THROUGHPUT ON PARALLEL PROCESSING DEVICES

Publication number: 20110173414

Abstract: In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.

Type: Application

Filed: March 23, 2011

Publication date: July 14, 2011

Applicant: NVIDIA Corporation

Inventors: Norbert Juffa, Brett W. Coon
Efficient Multi-Level Software Cache Using SIMD Vector Permute Functionality

Publication number: 20110161548

Abstract: A cache manager receives a request for data, which includes a requested effective address. The cache manager determines whether the requested effective address matches a most recently used effective address stored in a mapped tag vector. When the most recently used effective address matches the requested effective address, the cache manager identifies a corresponding cache location and retrieves the data from the identified cache location. However, when the most recently used effective address fails to match the requested effective address, the cache manager determines whether the requested effective address matches a subsequent effective address stored in the mapped tag vector. When the cache manager determines a match to a subsequent effective address, the cache manager identifies a different cache location corresponding to the subsequent effective address and retrieves the data from the different cache location.

Type: Application

Filed: December 29, 2009

Publication date: June 30, 2011

Applicant: International Business Machines Corporation

Inventors: Brian Flachs, Barry L. Minor, Mark Richard Nutter
Floating Point Collect and Operate

Publication number: 20110161624

Abstract: Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a single stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register.

Type: Application

Filed: December 29, 2009

Publication date: June 30, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian K. Flachs, Seiji Maeda, Steven Osman
Gathering and Scattering Multiple Data Elements

Publication number: 20110153983

Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.

Type: Application

Filed: December 22, 2009

Publication date: June 23, 2011

Inventors: Christopher J. Hughes, Yen-Kuang (Y.K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
Parallel data processing apparatus

Patent number: 7966475

Abstract: A data processor comprises a plurality of processing elements arranged for parallel processing of data, and a controller for controlling the plurality of processing elements. The controller is operable to determine respective status information for a plurality of processing threads, and to control processing of the processing threads by the plurality of processors in dependence upon such status information.

Type: Grant

Filed: January 10, 2007

Date of Patent: June 21, 2011

Assignee: Rambus Inc.

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Methods for performing extended table lookups using SIMD vector permutation instructions that support out-of-range index values

Patent number: 7962718

Abstract: A permutation instruction generates vector elements for a destination register using identified source and destination registers. A plurality of partial table lookups corresponding to an extended table produces a plurality of intermediate results. At least one source register stores a plurality of index values corresponding to the extended table. Out-of-range index values are values that are not contained in at least one additional source register and result in a predetermined constant value being stored into a predetermined vector element of the destination register. The index values are adjusted between the partial table lookups. A final result is formed by performing a logic function with the plurality of intermediate results. The final result is thereby formed without a full table lookup of each element of the final result.

Type: Grant

Filed: October 12, 2007

Date of Patent: June 14, 2011

Assignee: Freescale Semiconductor, Inc.

Inventor: William C. Moyer
PROCESSING ELEMENTS, MIXED MODE PARALLEL PROCESSOR SYSTEM, PROCESSING METHOD BY PROCESSING ELEMENTS, MIXED MODE PARALLEL PROCESSOR METHOD, PROCESSING PROGRAM BY PROCESSING ELEMENTS AND MIXED MODE PARALLEL PROCESSING PROGRAM

Publication number: 20110138151

Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.

Type: Application

Filed: February 18, 2011

Publication date: June 9, 2011

Applicant: NEC CORPORATION

Inventor: Shorin KYO
Parallel data processing apparatus

Patent number: 7958332

Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.

Type: Grant

Filed: March 13, 2009

Date of Patent: June 7, 2011

Assignee: Rambus Inc.

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Bi-directional data transfer within a single I/O operation

Patent number: 7941570

Abstract: An article of manufacture, apparatus, and a method for facilitating input/output (I/O) processing for an I/O operation at a host computer system configured for communication with a control unit. The method includes the host computer system obtaining a transport command word (TCW) for an I/O operation having both input and output data. The TCW specifies a location of the output data and a location for storing the input data. The host computer system forwards the I/O operation to the control unit for execution. The host computer system gathers the output data responsive to the location of the output data specified by the TCW, and then forwards the output data to the control unit for use in the execution of the I/O operation. The host computer system receives the input data from the control unit and stores the input data at the location specified by the TCW.

Type: Grant

Filed: February 14, 2008

Date of Patent: May 10, 2011

Assignee: International Business Machines Corporation

Inventors: John R. Flanagan, Daniel F. Casper, Catherine C. Huang, Matthew J. Kalos, Ugochukwu C. Njoku, Dale F. Riedy, Gustav E. Sittmann
TRANSPOSING ARRAY DATA ON SIMD MULTI-CORE PROCESSOR ARCHITECTURES

Publication number: 20110107060

Abstract: Systems, methods and articles of manufacture are disclosed for transposing array data on a SIMD multi-core processor architecture. A matrix in a SIMD format may be received. The matrix may comprise a SIMD conversion of a matrix M in a conventional data format. A mapping may be defined from each element of the matrix to an element of a SIMD conversion of a transpose of matrix M. A SIMD-transposed matrix T may be generated based on matrix M and the defined mapping. A row-wise algorithm may be applied to T, without modification, to operate on columns of matrix M.

Type: Application

Filed: November 4, 2009

Publication date: May 5, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jeffrey S. McAllister, Mark A. Bransford, Timothy J. Mullins, Nelson Ramirez
Methods for scalably exploiting parallelism in a parallel processing system

Patent number: 7937567

Abstract: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.

Type: Grant

Filed: November 1, 2006

Date of Patent: May 3, 2011

Assignee: Nvidia Corporation

Inventors: John R. Nickolls, Stephen D. Lew
Automatic control of multiple arithmetic/logic SIMD units

Publication number: 20110099352

Abstract: There is provided a method of performing single instruction multiple data (SIMD) operations. The method comprises storing a plurality of arrays in memory for performing SIMD operations thereon; determining a total number of SIMD operations to be performed on the plurality of arrays; loading a counter with the total number of SIMD operations to be performed on the plurality of arrays; enabling a plurality of arithmetic logic units (ALUs) to perform a first number of operations on first elements of the plurality of arrays; performing the first number of operations on first elements of the plurality of arrays using the plurality of ALUs; decrementing the counter by the first number of operations to provide a remaining number of operations; and enabling a number of the plurality of ALUs to perform the remaining number of operations on second elements of the plurality of arrays.

Type: Application

Filed: November 23, 2009

Publication date: April 28, 2011

Applicant: Mindspeed Technologies, Inc.

Inventor: Patrick D. Ryan
METHOD AND APPARATUS FOR PACKING DATA

Publication number: 20110093682

Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to pack the packed data responsive to a pack instruction received by the decoder. A first packed data element and a second packed data element are received from the first source register. A third packed data element and a fourth packed data element are received from the second source register. The circuit packs packing a portion of each of the packed data elements into a destination register resulting with the portion from second packed data element adjacent to the portion from the first packed data element, and the portion from the fourth packed data element adjacent to the portion from the third packed data element.

Type: Application

Filed: December 22, 2010

Publication date: April 21, 2011

Inventors: ALEXANDER PELEG, YAAKOV YAARI, MILLIND MITTAL, LARRY M. MENNEMEIER, BENNY EITAN
PARALLEL DATA PROCESSING SYSTEMS AND METHODS USING COOPERATIVE THREAD ARRAYS

Publication number: 20110087860

Abstract: Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.

Type: Application

Filed: December 17, 2010

Publication date: April 14, 2011

Applicant: NVIDIA Corporation

Inventors: John R. Nickolls, Stephen D. Lew
Plural SIMD arrays processing threads fetched in parallel and prioritized by thread manager sequentially transferring instructions to array controller for distribution

Patent number: 7925861

Abstract: A data processor comprises a plurality of processing elements arranged in a first plurality of single instruction multiple data (SIMD) processing arrays, and comprises a second plurality of controllers for transferring instructions to the processing arrays. Each controller is operable to retrieve a plurality of incoming instruction streams in parallel with one another and operable to supply incoming instruction streams to one of a plurality of processing arrays.

Type: Grant

Filed: January 31, 2007

Date of Patent: April 12, 2011

Assignee: Rambus Inc.

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
DATA PROCESSING ARCHITECTURES FOR PACKET HANDLING

Publication number: 20110083000

Abstract: A data processing architecture includes an input device that receives an incoming stream of data packets. A plurality of processing elements are operable to process data received from the input device. The input device is operable to distribute data packets in whole or in part to the processing elements in dependence upon the data processing bandwidth of the processing elements.

Type: Application

Filed: December 10, 2010

Publication date: April 7, 2011

Inventors: John Rhoades, Ken Cameron, Paul Winser, Ray McConnell, Gordon Faulds, Simon McIntosh-Smith, Anthony Spencer, Jeff Bond, Matthias Dejaegher, Danny Halamish, Gajinder Panesar
Graphics processing unit used for cryptographic processing

Patent number: 7916864

Abstract: A graphics processing unit is programmed to carry out cryptographic processing so that fast, effective cryptographic processing solutions can be provided without incurring additional hardware costs. The graphics processing unit can efficiently carry out cryptographic processing because it has an architecture that is configured to handle a large number of parallel processes. The cryptographic processing carried out on the graphics processing unit can be further improved by configuring the graphics processing unit to be capable of both floating point and integer operations.

Type: Grant

Filed: February 8, 2006

Date of Patent: March 29, 2011

Assignee: NVIDIA Corporation

Inventor: Norbert Juffa
Data processing architectures for packet handling using a SIMD array

Patent number: 7917727

Abstract: An input/output system transfers data packets to and from a SIMD array of processing elements (PEs) such that different sizes of data packets are transferred to respective ones of the PEs. The packets are transferred in batches to respective different addresses in the array under the control of the PEs. Transfer to or from the array may be carried out when either a batch or part of a batch is ready for transfer. The decision to transfer either full or part batches is made in dependence upon the speed of the PEs and the speed and intermittency of the data packets.

Type: Grant

Filed: May 23, 2007

Date of Patent: March 29, 2011

Assignee: Rambus, Inc.

Inventors: John Rhoades, Ken Cameron, Paul Winser, Ray McConnell, Gordon Faulds, Simon McIntosh-Smith, Anthony Spencer, Jeff Bond, Matthias Dejaegher, Danny Halamish, Gajinder Panesar
Method for variable length opcode mapping in a VLIW processor

Publication number: 20110072238

Abstract: The present invention provides a method for reducing program memory size required for a dual-issue processor with a scalar processor plus a SIMD vector processor. Coding the map of next group of instruction pairs in a no-operation (NOP) instruction of scalar and vector processor reduces the cases where one of the scalar or vector opcode being a NOP opcode. NOP for either scalar or vector processor defines the next 13 instructions as scalar-plus-vector, scalar-followed-by-scalar, or vector-followed-by-vector so that execution unit performs accordingly until next NOP or a branch instruction.

Type: Application

Filed: September 20, 2009

Publication date: March 24, 2011

Inventor: Tibet Mimar
Cellular engine for a data processing system

Patent number: 7908461

Abstract: A data processing system includes an associative memory device containing n-cells, each of the n-cells includes a processing circuit. A controller is utilized for issuing one of a plurality of instructions to the associative memory device, while a clock device is utilized for outputting a synchronizing clock signal comprised of a predetermined number of clock cycles per second. The clock device outputs the synchronizing clock signal to the associative memory device and the controller which globally communicates one of the plurality of instructions to all of the n-cells simultaneously, within one of the clock cycles.

Type: Grant

Filed: December 19, 2007

Date of Patent: March 15, 2011

Assignee: Allsearch Semi, LLC

Inventors: Gheorghe Stefan, Dan Tomescu
Single instruction for data scrambling

Patent number: 7903810

Abstract: A method and apparatus are disclosed for efficiently scrambling one or more bytes of data according to DSL standards on a processor. This is achieved by providing an instruction for scrambling one or more bytes of data according to the DSL standards. Accordingly, the invention advantageously provides a processor with the ability to scramble data with a single instruction thus allowing for more efficient and faster scrambling operations for subsequent modulation and transmission.

Type: Grant

Filed: September 22, 2004

Date of Patent: March 8, 2011

Assignee: Broadcom Corporation

Inventors: Mark Taunton, Timothy Martin Dobson
System and method for efficiently executing single program multiple data (SPMD) programs

Patent number: 7904905

Abstract: A system and method is disclosed for efficiently executing single program multiple data (SPMD) programs in a microprocessor. A micro single instruction multiple data (SIMD) unit is located within the microprocessor. A job buffer that is coupled to the micro SIMD unit dynamically allocates tasks to the micro SIMD unit. The SPMD programs each comprise a plurality of input data streams having moderate diversification of control flows. The system executes each SPMD program once for each input data stream of the plurality of input data streams.

Type: Grant

Filed: November 14, 2003

Date of Patent: March 8, 2011

Assignee: STMicroelectronics, Inc.

Inventor: Stefano Cervini

prev … 2 3 4 5 6 7 8 9 10 … next