Single Instruction, Multiple Data (simd) Patents (Class 712/22)

Apparatus and method for performing SIMD multiply-accumulate operations

Patent number: 8443170

Abstract: An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry.

Type: Grant

Filed: September 17, 2009

Date of Patent: May 14, 2013

Assignee: ARM Limited

Inventors: Mladen Wilder, Dominic Hugo Symes, Richard Edward Bruce
Processing module with millimeter wave transceiver interconnection

Patent number: 8438322

Abstract: A processing module includes a fetch and decode module, an instruction register, a data register, an execution module, and a millimeter wave (MMW) transceiver section. The fetch and decode module is operable to fetch and decode an instruction of a program and to identify data associated with the instruction. The execution module is operable to execute the instruction upon the data associated with the instruction. The MMW transceiver section is operable to wirelessly receive at least one of the instruction and the data associated with the instruction from memory.

Type: Grant

Filed: August 30, 2008

Date of Patent: May 7, 2013

Assignee: Broadcom Corporation

Inventors: Ahmadreza (Reza) Rofougaran, Timothy W. Markison
Allocating rename register from separate register sets for each result data of multiple data processing instruction

Patent number: 8438366

Abstract: Multiple data processing instructions instruct a computing device to process multiple data including first data and second data. When a multiple data processing instruction is decoded, two allocatable registers are selected. One is used to store the result of a processing operation performed on first data by one processing unit, and the other is used to store the result of a processing operation performed on second data by another processing unit. Those stored processing results are then transferred to result registers. Normal data processing instructions, on the other hand, instruct a processing operation on third data. When a normal data processing instruction is decoded, one allocatable register is selected and used to store the result of processing that a processing unit performs on the third data. The stored processing result is then transferred to a result register.

Type: Grant

Filed: August 2, 2010

Date of Patent: May 7, 2013

Assignee: Fujitsu Limited

Inventors: Yasunobu Akizuki, Toshio Yoshida
Thread count throttling for efficient resource utilization

Patent number: 8429656

Abstract: Methods and apparatuses are presented for graphics operations with thread count throttling, involving operating a processor to carry out multiple threads of execution of, wherein the processor comprises at least one execution unit capable of supporting up to a maximum number of threads, obtaining a defined memory allocation size for allocating, in at least one memory device, a thread-specific memory space for the multiple threads, obtaining a per thread memory requirement corresponding to the thread-specific memory space, determining a thread count limit based on the defined memory allocation size and the per thread memory requirement, and sending a command to the processor to cause the processor to limit the number of threads carried out by the at least one execution unit to a reduced number of threads, the reduced number of threads being less than the maximum number of threads.

Type: Grant

Filed: November 2, 2006

Date of Patent: April 23, 2013

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Bryon S. Nordquist
Disabling redundant subfunctional units receiving same input value and outputting same output value for the disabled units in SIMD processor

Patent number: 8429380

Abstract: A processor includes a plurality of subfunctional units provided corresponding to respective slots of one or more pieces of operation result data including a plurality of slots for an SIMD operation; and an enable generating unit configured to, in each of the one or more pieces of the operation result data, compare a value of a predetermined slot with a value of a slot other than the predetermined slot, and disable one or more subfunctional units to which the value equal to the value of the predetermined slot is inputted, and the processor outputs the value of the predetermined slot as the value of the one or more subfunctional units which have been disabled.

Type: Grant

Filed: March 12, 2010

Date of Patent: April 23, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventor: Hiroo Hayashi
Storage system and method for controlling the same

Patent number: 8429667

Abstract: Optimum load distribution processing is selected and executed based on settings made by a user in consideration of load changes caused by load distribution in a plurality of asymmetric cores, by using: a controller having a plurality of cores, and configured to extract, for each LU, a pattern showing the relationship between a core having an LU ownership and a candidate core as an LU ownership change destination based on LU ownership management information; to measure, for each LU, the usage of a plurality of resources; to predicate, for each LU based on the measurement results, a change in the usage of the plurality of resources and overhead to be generated by transfer processing itself; to select, based on the respective prediction results, a pattern that matches the user's setting information; and to transfer the LU ownership to the core belonging to the selected pattern.

Type: Grant

Filed: February 29, 2012

Date of Patent: April 23, 2013

Assignee: Hitachi, Ltd.

Inventors: Junji Ogawa, Yusuke Nonaka, Yuko Matsui
Selective thread spawning within a multi-threaded processing system

Patent number: 8413151

Abstract: One embodiment of the present invention sets forth a technique for selectively spawning threads within a multiprocessing system. A computation work distributor (CWD), within the system, is responsible for performing the detailed work needed to spawn a thread grid. A request to the CWD to spawn a thread grid includes a predicate table, which includes an array of flags used to indicate which thread indices should have an associated thread block spawned and which should not. Greater efficiency is achieved by only spawning thread blocks that should perform useful computation.

Type: Grant

Filed: December 19, 2007

Date of Patent: April 2, 2013

Assignee: NVIDIA Corporation

Inventors: John A. Stratton, David Luebke
Method and system for decoding low density parity check codes

Patent number: 8392789

Abstract: A method for decoding a codeword in a data stream encoded according to a low density parity check (LDPC) code having an m×j parity check matrix H by initializing variable nodes with soft values based on symbols in the codeword, wherein a graph representation of H includes m check nodes and j variable nodes, and wherein a check node m provides a row value estimate to a variable node j and a variable node j provides a column value estimate to a check node m if H(m,j) contains a 1, computing row value estimates for each check node, wherein amplitudes of only a subset of column value estimates provided to the check node are computed, computing soft values for each variable node based on the computed row value estimates, determining whether the codeword is decoded based on the soft values, and terminating decoding when the codeword is decoded.

Type: Grant

Filed: July 28, 2009

Date of Patent: March 5, 2013

Assignee: Texas Instruments Incorporated

Inventors: Eric Biscondi, David Hoyle, Tod David Wolf
Optimizing scalar code executed on a SIMD engine by alignment of SIMD slots

Patent number: 8370817

Abstract: A mechanism is provided for optimizing scalar code executed on a single instruction multiple data (SIMD) engine by aligning the slots of SIMD registers. With the mechanism, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated.

Type: Grant

Filed: May 27, 2008

Date of Patent: February 5, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien
Method and system for local memory addressing in single instruction, multiple data computer system

Patent number: 8341328

Abstract: A single instruction, multiple data (“SIMD”) computer system includes a central control unit coupled to 256 processing elements (“PEs”) and to 32 static random access memory (“SRAM”) devices. Each group of eight PEs can access respective groups of eight columns in a respective SRAM device. Each PE includes a local column address register that can be loaded through a data bus of the respective PE. A local column address stored in the local column address register is applied to an AND gate, which selects either the local column address or a column address applied to the AND gate by the central control unit. As a result, the central control unit can globally access the SRAM device, or a specific one of the eight columns that can be accessed by each PE can be selected locally by the PE.

Type: Grant

Filed: September 27, 2010

Date of Patent: December 25, 2012

Assignee: Micron Technology, Inc.

Inventor: Jon Skull
AUTOMATIC KERNEL MIGRATION FOR HETEROGENEOUS CORES

Publication number: 20120297163

Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

Type: Application

Filed: May 16, 2011

Publication date: November 22, 2012

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
Computer architecture and method of operation for multi-computer distributed processing having redundant array of independent systems with replicated memory and code striping

Patent number: 8316190

Abstract: Computers and other computing machines and information appliances having a modified computer architecture and program structure which enables the operation of an application program concurrently or simultaneously on a plurality of computers interconnected via a communications link or network using a special distributed runtime (DRT), and that provides for a redundant array of independent computing systems that include computer code distribution using code-striping onto the plurality of the computers or computing machines. A redundant array of independent computing systems operating in concert and code-striping features.

Type: Grant

Filed: March 19, 2008

Date of Patent: November 20, 2012

Assignee: Waratek Pty. Ltd.

Inventor: John M. Holt
DATA PROCESSING DEVICE AND DATA PROCESSING METHOD THEREOF

Publication number: 20120265964

Abstract: Disclosed is a data processing device capable of efficiently performing an arithmetic process on variable-length data and an arithmetic process on fixed-length data. The data processing device includes first PEs of SIMD type, SRAMs provided respectively for the first PEs, and second PEs. The first PEs each perform an arithmetic operation on data stored in a corresponding one of the SRAMs. The second PEs each perform an arithmetic operation on data stored in corresponding ones of the SRAMs. Therefore, the SRAMs can be shared so as to efficiently perform the arithmetic process on variable-length data and the arithmetic process on fixed-length data.

Type: Application

Filed: February 3, 2012

Publication date: October 18, 2012

Inventors: Kan MURATA, Hideyuki Noda, Masaru Haraguchi
Synchronisation of execution threads on a multi-threaded processor

Patent number: 8286180

Abstract: Method and apparatus are provided for a synchronizing execution of a plurality of threads on a multi-threaded processor. Each thread is provided with a number of synchronization points corresponding to points where it is advantageous or preferable that execution should be synchronized with another thread. Execution of a thread is paused when it reaches a synchronization point until at least one other thread with which it is intended to be synchronized reaches a corresponding synchronization point. Execution is subsequently resumed. Where an executing thread branches over a section of code which included a synchronization point then execution is paused at the end of the branch until the at least one other thread reaches the synchronization point of the end of the corresponding branch.

Type: Grant

Filed: August 24, 2007

Date of Patent: October 9, 2012

Assignee: Imagination Technologies Limited

Inventor: Yoong Chert Foo
Vectorizing Combinations of Program Operations

Publication number: 20120254845

Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.

Type: Application

Filed: March 30, 2011

Publication date: October 4, 2012

Inventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
METHOD AND APPARATUS FOR FAST BRANCH-FREE VECTOR DIVISION COMPUTATION

Publication number: 20120254585

Abstract: Methods and apparatus for double precision division/inversion vector computations on Single Instruction Multiple Data (SIMD) computing platforms are described. In one embodiment, an input argument is represented by an exponent portion and a fraction portion. These portions are scaled, inverted, and multiplied to generate an inverse version of the input argument. In an embodiment, the inversion of the exponent portion may be done by changing the sign of the exponent. Other embodiments are also described.

Type: Application

Filed: December 25, 2009

Publication date: October 4, 2012

Inventors: Andrey Kolesov, Valery Kuriakin, Maria Guseva
SIMD merge-sort and duplicate removal operations for data arrays

Patent number: 8261043

Abstract: A method and apparatus are provided to perform efficient merging operations of two or more streams of data by using SIMD instruction. Streams of data are merged together in parallel and with mitigated or removed conditional branching. The merge operations of the streams of data include Merge AND and Merge OR operations.

Type: Grant

Filed: May 12, 2009

Date of Patent: September 4, 2012

Assignee: SAP AG

Inventors: Hiroshi Inoue, Moriyoshi Ohara, Hideaki Komatsu
Methods and apparatus for analyzing SIMD code

Patent number: 8255886

Abstract: A method for analyzing and presenting in a graphical manner single instruction, multiple data (SIMD) instructions involves disassembling a stream of machine instructions into a stream of assembly language instructions. Instruction objects “M” and “N” are created to represent SIMD instructions “M” and “N” from the stream of instructions. Instruction objects “M” and “N” include multiple data objects corresponding to the multiple data items of the respective SIMD instruction. Different colors are assigned to at least two of the multiple data objects of instruction object “M.” If a data item of SIMD instruction “N” is based on a data item of SIMD instruction “M,” the color from the source object is automatically assigned to the target object. Dependencies between data items of instruction “M” and “N” are annotated by arrows between corresponding data objects. Other embodiments are described and claimed.

Type: Grant

Filed: June 30, 2008

Date of Patent: August 28, 2012

Assignee: Intel Corporation

Inventor: Peter Lachner
ENABLING VIRTUAL CALLS IN A SIMD ENVIRONMENT

Publication number: 20120210098

Abstract: Systems and methods of enabling virtual calls in a single instruction multiple data (SIMD) environment may involve detecting a virtual call of a function and using a single dispatch of the function to invoke the virtual call for two or more channels of the virtual call. In one example, it is determined that the two or more channels share a common target address and a single dispatch of the function is conducted with respect to the common target address. The process may be iterated for additional channels of the virtual call that share a common target address.

Type: Application

Filed: February 16, 2011

Publication date: August 16, 2012

Inventors: Wei-Yu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
Vector completion mask handling

Patent number: 8239659

Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.

Type: Grant

Filed: September 29, 2006

Date of Patent: August 7, 2012

Assignee: Intel Corporation

Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
General distributed reduction for data parallel computing

Patent number: 8239847

Abstract: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.

Type: Grant

Filed: March 18, 2009

Date of Patent: August 7, 2012

Assignee: Microsoft Corporation

Inventors: Yuan Yu, Pradeep Kumar Gunda, Michael A Isard
Processing data of a file using multiple threads during a deduplication gathering phase

Patent number: 8234250

Abstract: A method and apparatus for deduplication of files of a storage system is described. During a gathering phase, a file may be simultaneously processed by two or more threads to produce and store content identifiers for data blocks of the file. Each file may be sub-divided into multiple file sub-portions, each file sub-portion comprising a predetermined number of data blocks. A thread may be assigned to each sub-portion of a file for processing the data blocks. The currently assigned sub-portion for each thread may be recorded and used upon a system crash to restart each scanner thread at the currently assigned sub-portion to minimize the data blocks that are re-processed. The size of a file sub-portion may be predetermined based on the organization of inode data structures representing the files (e.g., based on the maximum number of pointers that an indirect block in the inode data structure may contain).

Type: Grant

Filed: September 17, 2009

Date of Patent: July 31, 2012

Assignee: NetApp. Inc.

Inventors: Alok Sharma, Praveen Killamsetti, Bipul Raj
Method and apparatus for shuffling data

Patent number: 8225075

Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.

Type: Grant

Filed: October 8, 2010

Date of Patent: July 17, 2012

Assignee: Intel Corporation

Inventors: William W. Macy, Jr., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
SIMD type microprocessor having processing elements that have plural determining units

Patent number: 8219783

Abstract: An SIMD type microprocessor is disclosed. The SIMD type microprocessor includes plural PEs (processor elements) each of which provides an ALU (arithmetic and logic unit) for lower-order bits, an ALU for upper-order bits, a control circuit for lower-order bits, a control circuit for upper-order bits, a range determining circuit for lower-order bits, and a range determining circuit for upper-order bits. The SIMD type microprocessor further includes a global processor, a range designation bus for lower-order bits which connects the global processor to the range determining circuit for lower-order bits, and a range designation bus for upper-order bits which connects the global processor to the range determining circuit for upper-order bits.

Type: Grant

Filed: July 3, 2008

Date of Patent: July 10, 2012

Assignee: Ricoh Company, Ltd.

Inventor: Kazuhiko Hara
System and method for speculative thread assist in a heterogeneous processing environment

Patent number: 8214808

Abstract: A system and method for speculative assistance to a thread in a heterogeneous processing environment is provided. A first set of instructions is identified in a source code representation (e.g., a source code file) that is suitable for speculative execution. The identified set of instructions are analyzed to determine the processing requirements. Based on the analysis, a processor type is identified that will be used to execute the identified first set of instructions based. The processor type is selected from more than one processor types that are included in the heterogeneous processing environment. The heterogeneous processing environment includes more than one heterogeneous processing cores in a single silicon substrate. The various processing cores can utilize different instruction set architectures (ISAs). An object code representation is then generated for the identified first set of instructions with the object code representation being adapted to execute on the determined type of processor.

Type: Grant

Filed: May 7, 2007

Date of Patent: July 3, 2012

Assignee: International Business Machines Corporation

Inventors: Michael Norman Day, Michael Karl Gschwind, John Kevin Patrick O'Brien, Kathryn O'Brien
COMPUTING APPARATUS AND METHOD BASED ON A RECONFIGURABLE SINGLE INSTRUCTION MULTIPLE DATA (SIMD) ARCHITECTURE

Publication number: 20120166762

Abstract: Provided are a computing apparatus and method based on SIMD architecture capable of supporting various SIMD widths without wasting resources. The computing apparatus includes a plurality of configurable execution cores (CECs) that have a plurality of execution modes, and a controller for detecting a loop region from a program, determining a Single Instruction Multiple Data (SIMD) width for the detected loop region, and determining an execution mode of the processor according to the determined SIMD width.

Type: Application

Filed: July 8, 2011

Publication date: June 28, 2012

Inventors: Jae Un Park, Suk-Jin Kim, Scott Mahlke, Yong-Jun Park
Method and apparatus for scheduling the issue of instructions in a microprocessor using multiple phases of execution

Publication number: 20120159120

Abstract: A microprocessor configured to execute programs divided into discrete phases, comprising: a scheduler for scheduling program instructions to be executed on the processor; a plurality of resources for executing programming instructions issued by the scheduler; wherein the scheduler is configured to schedule each phase of the program only after receiving an indication that execution of the preceding phase of the program has been completed. By splitting programs into multiple phases and providing a scheduler that is able to determine whether execution of a phase has been completed, each phase can be separately scheduled and the results of preceding phases can be used to inform the scheduling of subsequent phases.

Type: Application

Filed: May 19, 2011

Publication date: June 21, 2012

Inventor: Yoong Chert Foo
ENHANCING PERFORMANCE BY INSTRUCTION INTERLEAVING AND/OR CONCURRENT PROCESSING OF MULTIPLE BUFFERS

Publication number: 20120151183

Abstract: An embodiment may include circuitry to execute, at least in part, a first list of instructions and/or to concurrently process, at least in part, first and second buffers. The execution of the first list of instructions may result, at least in part, from invocation of a first function call. The first list of instructions may include at least one portion of a second list of instructions interleaved, at least in part, with at least one other portion of a third list of instructions. The portions may be concurrently carried out, at least in part, by one or more sets of execution units of the circuitry. The second and third lists of instructions may implement, at least in part, respective algorithms that are amenable to being invoked by separate respective function calls. The concurrent processing may involve, at least in part, complementary algorithms.

Type: Application

Filed: December 8, 2010

Publication date: June 14, 2012

Inventors: James D. Guilford, Wajdi K. Feghali, Vinodh Gopal, Gilbert M. Wolrich, Erdinc Ozturk, Martin G. Dixon, Deniz Karakoyunlu, Kahraman D. Akdemir
Data Driven Micro-Scheduling of the Individual Processing Elements of a Wide Vector SIMD Processing Unit

Publication number: 20120151145

Abstract: A method for optimizing processing in a SIMD core. The method comprises processing units of data within a working domain, wherein the processing includes one or more working items executing in parallel within a persistent thread. The method further comprises retrieving a unit of data from within a working domain, processing the unit of data, retrieving other units of data when processing of the unit of data has finished, processing the other units of data, and terminating the execution of the working items when processing of the working domain has finished.

Type: Application

Filed: December 13, 2010

Publication date: June 14, 2012

Applicant: Advanced Micro Devices, Inc.

Inventor: Alexander M. LYASHEVSKY
Performing Function Calls Using Single Instruction Multiple Data (SIMD) Registers

Publication number: 20120151182

Abstract: In one embodiment, a processor can perform a function call from a main program to a function that is to operate on at least one vector-type operand, in which only scalar values are passed to the function, and input values to the function including the at least one vector-type operand are to be renamed from virtual registers identified in the function to physical registers of a vector register file, and output values from the function including the at least one vector-type operand are to be renamed from virtual registers identified in the function to physical registers of the vector register file. Other embodiments are described and claimed.

Type: Application

Filed: December 9, 2010

Publication date: June 14, 2012

Inventor: Tomasz Madajczak
Load/move duplicate instructions for a processor

Patent number: 8200941

Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.

Type: Grant

Filed: April 15, 2011

Date of Patent: June 12, 2012

Assignee: Intel Corporation

Inventor: Patrice Roussel
Distributed memory type information processing system

Patent number: 8200913

Abstract: An information processing system includes a plurality of PMM and data transmission paths for connection between the PMM and transmitting a value of a PMM to another PMM. A memory of each PMM holds a list of values of first items arranged in the ascending order or descending order without overlap and/or a list of values of the second item to be shared. A memory module of each PMM transmits a value contained in the value list to another PMM, receives a value contained in the value list from the another PMM, references the value list of the first item and the value list of the second item of the another PMM, and generates a list of common values considering the values contained in the value lists of the first item and the second item of all the other PMM.

Type: Grant

Filed: January 25, 2005

Date of Patent: June 12, 2012

Assignee: Turbo Data Laboratories, Inc.

Inventor: Shinji Furusho
Apparatus and method for performing re-arrangement operations on data

Patent number: 8200948

Abstract: An apparatus and method are provided for performing re-arrangement operations on data. The data processing apparatus has a register data store with a plurality of registers for storing data, and processing logic for performing a sequence of operations on data including at least one re-arrangement operation. The processing logic has scalar processing logic for performing scalar operations and SIMD processing logic for performing SIMD operations. The SIMD processing logic is responsive to a re-arrangement instruction specifying a family of re-arrangement operations to perform a selected re-arrangement operation from that family on a plurality of data elements constituted by data in one or more registers identified by the re-arrangement instruction. The selected re-arrangement operation is dependent on at least one parameter provided by the scalar processing logic, that parameter identifying a data element width for the data elements on which the selected re-arrangement operation is performed.

Type: Grant

Filed: December 4, 2007

Date of Patent: June 12, 2012

Assignee: ARM Limited

Inventors: Daniel Kershaw, Dominic Hugo Symes, Alastair Reid
Reduction operations in a synchronous parallel thread processing system with disabled execution threads

Patent number: 8200940

Abstract: A system and method for successfully performing reduction operations in a multi-threaded SIMD (single-instruction multiple-data) system while one or more threads are disabled allows for the reduction operations to be performed without a performance penalty compared with performing the same operation with all of the threads enabled. The source data for each intermediate computation of the reduction operation is remapped by a configurable crossbar as needed to avoid using invalid data from the disabled threads. The remapping function is transparent to the user and enables correct execution of order invariant reduction operations and order dependent prefix-reduction operations.

Type: Grant

Filed: June 30, 2008

Date of Patent: June 12, 2012

Assignee: NVIDIA Corporation

Inventor: John Erik Lindholm
Packing two packed signed data in registers with saturation

Patent number: 8190867

Abstract: A processor comprising a register file, and a decoder to decode an instruction to specify a first source register having a first packed signed 16-bit integers, and to specify a second source register having a second packed signed 16-bit integers. A functional unit to generate a result to be stored in a specified destination. The result including a third packed 8-bit integers including an integer for each integer in the first packed integers, and an integer for each integer in the second packed integers. The integers corresponding to the first packed integers next to one another in the result. The integers corresponding to the second packed integers next to one another. A highest order integer of the result corresponding to a highest order integer of the first packed integers. A lowest order integer of the result corresponding to a lowest order integer of the second packed integers.

Type: Grant

Filed: May 16, 2011

Date of Patent: May 29, 2012

Assignee: Intel Corporation

Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
System of lanes of processing units receiving instructions via shared memory units for data-parallel or task-parallel operations

Patent number: 8180998

Abstract: A system for performing data-parallel operations and task-parallel operations. A first switch fabric node (SFN) includes first and second lane processing engines (LPEs). The first LPE includes a first set of lane processing units (LPUs) configured to perform data-parallel operations, where each LPU performs a set of operations, and each LPU uses a different set of data for the set of operations, and each LPU within the first set of LPUs uses a different set of data for the set of operations. The second LPE includes a second set of LPUs configured to perform task-parallel operations, where each LPU performs a different set of operations. A processing control engine (PCE) is configured to distribute instructions and data to the first LPE and the second LPE. Advantageously, data parallel operations and task parallel operations are able to be performed on the same processor simultaneously.

Type: Grant

Filed: September 10, 2008

Date of Patent: May 15, 2012

Assignee: NVIDIA Corporation

Inventors: Monier Maher, Christopher Lamb, Sanjay J. Patel, Peter Hsu
Memory controller

Patent number: 8176290

Abstract: A memory controller, on receiving a write request to write write-data into an address of a second memory region issued by a processor, determines whether read-data requested to be read from an address of a first memory region by the processor is matched with the write-data requested to be written into the address of the second memory region, and if the read-data is matched with the write-data, prevents the write-data from being written into the address of the second memory region.

Type: Grant

Filed: June 11, 2009

Date of Patent: May 8, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takahisa Wada, Katsuyuki Kimura, Shunichi Ishiwata, Takashi Miyamori, Ryuji Hada, Keiri Nakanishi, Yasuki Tanabe, Masato Sumiyoshi
Data processing apparatus comprising an array controller for separating an instruction stream processing instructions and data transfer instructions

Patent number: 8171263

Abstract: A parallel data processing apparatus using a SIMD array of processing elements is disclosed. The apparatus makes use of a register in order to control issuance of instructions to the processing elements in the array.

Type: Grant

Filed: June 29, 2007

Date of Patent: May 1, 2012

Assignee: Rambus Inc.

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Efficient code generation using loop peeling for SIMD loop code with multile misaligned statements

Patent number: 8171464

Abstract: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.

Type: Grant

Filed: May 16, 2008

Date of Patent: May 1, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Launching a secure kernel in a multiprocessor system

Patent number: 8161280

Abstract: In one embodiment of the present invention, a method includes verifying a master processor of a system; validating a trusted agent with the master processor if the master processor is verified; and launching the trusted agent on a plurality of processors of the system if the trusted agent is validated. After execution of such a trusted agent, a secure kernel may then be launched, in certain embodiments. The system may be a multiprocessor server system having a partially or fully connected topology with arbitrary point-to-point interconnects, for example.

Type: Grant

Filed: June 29, 2010

Date of Patent: April 17, 2012

Assignee: Intel Corporation

Inventors: John H. Wilson, Ioannis T. Schoinas, Mazin S. Yousif, Linda J. Rankin, David W. Grawrock, Robert J. Greiner, James A. Sutton, Kushagra Vaid, Willard M. Wiseman
Store misaligned vector with permute

Patent number: 8161271

Abstract: Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized.

Type: Grant

Filed: July 11, 2007

Date of Patent: April 17, 2012

Assignee: International Business Machines Corporation

Inventors: David Arnold Luick, Eric Oliver Mejdrich, Adam James Muff
Methods and apparatus for scalable array processor interrupt detection and response

Patent number: 8161267

Abstract: Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism.

Type: Grant

Filed: November 30, 2010

Date of Patent: April 17, 2012

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Patrick R. Marchand, Gerald George Pechanek, Larry D. Larsen
System and method for selecting and executing an optimal load distribution processing in a storage system

Patent number: 8146092

Abstract: A controller having a plurality of cores extracts, for each logical unit (LU), a pattern showing the relationship between a core having an LU ownership and a candidate core as an LU ownership change destination based on LU ownership management information, measures, for each LU, the usage of a plurality of resources, predicates, for each LU based on the measurement results, a change in the usage of the plurality of resources and overhead to be generated by transfer processing itself, selects, based on the respective prediction results, a pattern that matches the user's setting information, and transfer the LU ownership to the core belonging to the selected pattern.

Type: Grant

Filed: March 2, 2009

Date of Patent: March 27, 2012

Assignee: Hitachi, Ltd.

Inventors: Junji Ogawa, Yusuke Nonaka, Yuko Matsui
Security management in multi-node, multi-processor platforms

Patent number: 8146150

Abstract: Multi-node and multi-processor security management is described in this application. Data may be secured in a TPM of any one of a plurality of nodes, each node including one or more processors. The secured data may be protected using hardware hooks to prevent unauthorized access to the secured information. Security hierarchy may be put in place to protect certain memory addresses from access by requiring permission by VMM, OS, ACM or processor hardware. The presence of secured data may be communicated to each of the nodes to ensure that data is protected. Other embodiments are described.

Type: Grant

Filed: December 31, 2007

Date of Patent: March 27, 2012

Assignee: Intel Corporation

Inventors: Mahesh S. Natu, Sham Datta
SIMD processor performing fractional multiply operation with saturation history data processing to generate condition code flags

Patent number: 8131981

Abstract: A data processing system, apparatus and method for performing fractional multiply operations is disclosed. The system includes a memory that stores instructions for SIMD operations and a processing core. The processing core includes registers that store operands for the fractional multiply operations. A coprocessor included in the processing core performs the fractional multiply operations on the operands and stores the result in a destination register that is also included in the processing core.

Type: Grant

Filed: August 12, 2009

Date of Patent: March 6, 2012

Assignee: Marvell International Ltd.

Inventors: Nigel C. Paver, Bradley C. Aldrich
SIMD array operable to process different respective packet protocols simultaneously while executing a single common instruction stream

Patent number: 8127112

Abstract: A data processing architecture includes an input device that receives an incoming stream of data packets. A plurality of processing elements are operable to process data received from the input device. The input device is operable to distribute data packets in whole or in part to the processing elements in dependence upon the data processing bandwidth of the processing elements.

Type: Grant

Filed: December 10, 2010

Date of Patent: February 28, 2012

Assignee: Rambus Inc.

Inventors: John Rhoades, Ken Cameron, Paul Winser, Ray McConnell, Gordon Faulds, Simon McIntosh-Smith, Anthony Spencer, Jeff Bond, Matthias Dejaegher, Danny Halamish, Gajinder Panesar
SIMD processor for performing data filtering and/or interpolation

Patent number: 8122227

Abstract: A data processing circuit contains an instruction execution circuit that has an instruction set that comprises a SIMD instruction. The instruction execution circuit comprises a plurality of arithmetic circuits, arranged to perform N respective identical operations in parallel in response to the SIMD instruction. The SIMD instruction defines selects a first one and a second one of the registers. The SIMD instruction defines a first and second series of N respective SIMD instruction operands of the SIMD instruction from the addressed registers. Each arithmetic circuit receives a respective first operand and a respective second operand from the first and second series respectively, when executing the SIMD instruction. The instruction execution circuit is arranged for selecting the first and second series so that they partially overlap. Preferably, the position of the operands of at least one the series is under program control, preferably under control of operand data.

Type: Grant

Filed: November 2, 2005

Date of Patent: February 21, 2012

Assignee: Silicon Hive B.V.

Inventor: Antonius A. M. Van Wel
Method for efficient generation of a Fletcher checksum using a single SIMD pipeline

Patent number: 8112691

Abstract: The generation of Fletcher/Alder partial checksums are transformed from a space that requires integer multiplications and additions to a space that requires only integer additions and shifts on a single SIMD pipeline capable processor. This transformation permits the use of Fletcher/Alder checksums on processors where the performance of SIMD instructions are sub-optimal, on CMT processors that support a single SIMD pipeline as well as other processors that can be configured by executing software to implement SIMD operations for a single SIMD pipeline. The implementation of the process with this transformation on a general-purpose computer system transforms that general-purpose computer system into a special-purpose computer system that uses a single SIMD pipeline to generate a Fletcher/Alder checksum. The elimination of integer multiplications in the generation of the partial checksums results in a significant improvement in performance.

Type: Grant

Filed: March 25, 2008

Date of Patent: February 7, 2012

Assignee: Oracle America, Inc.

Inventor: Lawrence A. Spracklen
Parallel data processing systems and methods using cooperative thread arrays with unique thread identifiers as an input to compute an identifier of a location in a shared memory

Patent number: 8112614

Abstract: Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.

Type: Grant

Filed: December 17, 2010

Date of Patent: February 7, 2012

Assignee: Nvidia Corporation

Inventors: John R. Nickolls, Stephen D. Lew
System and method for data forwarding in a programmable multiple network processor environment

Patent number: RE43825

Abstract: A system and method forward data between processing elements. A first processing element includes an address register that stores a first memory address. A forwarding storage element is coupled to the first processing element. A second processing element, coupled to the forwarding storage element, transmits a second memory address to the forwarding storage element. The forwarding storage transmits the second memory address to the first processing element, and the first processing element compares the second memory address with the first memory address.

Type: Grant

Filed: November 19, 2007

Date of Patent: November 20, 2012

Assignee: The United States of America as Represented by the Secretary of the Navy

Inventors: Joel Zvi Apisdorf, Sam Brandon Sandbote, Michael Daniel Poole

prev 1 2 3 4 5 6 7 8 9 … next