Single Instruction, Multiple Data (simd) Patents (Class 712/22)

SYSTEM, APPARATUS AND METHOD FOR LOOP REMAINDER MASK INSTRUCTION

Publication number: 20140189296

Abstract: A loop remainder mask instruction indicates a current iteration count of a loop as a first operand, an iteration limit of a loop as a second operand, and a destination. The loop contains iterations and each iteration includes a data element of the array. A processor receives the loop remainder mask instruction, decodes the instruction for execution, and stores a result of the execution in the destination. The result indicates a number of data elements of the array past an end of a preceding portion of the array that are to be handled separately from the preceding portion, the end of the preceding portion being where the current iteration count is recorded.

Type: Application

Filed: December 14, 2011

Publication date: July 3, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Andrey Naraikin, Suleyman Sair, Asaf Hargil, Miland B. Girkar, Bret T. Toll, Mark J. Charney
SIMD parallel computer system, SIMD parallel computing method, and control program

Patent number: 8769244

Abstract: Uniforming of the processing load is efficiently realized. Each processing element configuring an SIMD parallel computer system includes a data storage module that stores data processed or transferred, a number-of-data-sets storage device that stores number of data sets, and a front data storage device that stores the front data. Each processing element further includes a control processor that compares the number of data sets stored in one processing element with the number of data sets stored in the own processing element, and issues a data distribution leveling instruction that designates an action for updating contents of the data storage module, the number-of-data-sets storage device, and the front data storage device according to a rule determined based on a comparison result of the own processing element and that of the other processing elements and an action for moving the data stored in the one processing element to the own processing element.

Type: Grant

Filed: April 8, 2009

Date of Patent: July 1, 2014

Assignee: Nec Corporation

Inventor: Shorin Kyo
HIGH LEVEL SOFTWARE EXECUTION MASK OVERRIDE

Publication number: 20140181467

Abstract: Methods, and media, and computer systems are provided. The method includes, the media includes control logic for, and the computer system includes a processor with control logic for overriding an execution mask of SIMD hardware to enable at least one of a plurality of lanes of the SIMD hardware. Overriding the execution mask is responsive to a data parallel computation and a diverged control flow of a workgroup.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Timothy G. Rogers, Bradford M. Beckmann, James M. O'Connor
Memory access consolidation for SIMD processing elements using transaction identifiers

Patent number: 8762691

Abstract: A data processing apparatus includes a plurality of processing elements arranged in a single instruction multiple data array. The apparatus includes an instruction controller operable to receive instructions from a plurality of instructions streams, and to transfer instructions from those instructions streams to the processing elements in the array, such that the data processing apparatus is operable to process a plurality of processing threads substantially in parallel with one another. A data transfer controller is provided which is operable to control transfer of data between the internal memory units associated with the processing elements, and memory external to the array.

Type: Grant

Filed: June 29, 2007

Date of Patent: June 24, 2014

Assignee: Rambus Inc.

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Collective acceleration unit tree structure

Patent number: 8756270

Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.

Type: Grant

Filed: April 24, 2012

Date of Patent: June 17, 2014

Assignee: International Business Machines Corporation

Inventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
Collective acceleration unit tree structure

Patent number: 8751655

Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.

Type: Grant

Filed: March 29, 2010

Date of Patent: June 10, 2014

Assignee: International Business Machines Corporation

Inventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
Processor to execute shift right merge instructions

Patent number: 8745358

Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.

Type: Grant

Filed: September 4, 2012

Date of Patent: June 3, 2014

Assignee: Intel Corporation

Inventors: Julien Sebot, William W. Macy, Eric Debes, Huy V. Nguyen
Low-overhead misalignment and reformatting support for SIMD

Patent number: 8732437

Abstract: Systems and methods for performing single instruction multiple data (SIMD) operations on a data set. The methods may include examining a structure of the data set to determine what reorganization may be necessary to facilitate SIMD processing. The method may include selecting a stored bit mask corresponding to the organization of the data set and loading the bit mask into an application specific register (ASR). Subsequently, the data may be reorganized inline according to the ASR as the data is loaded into the SIMD functional unit such that the SIMD functional unit may operate on the data set. The results of the SIMD operation may be written to a results register.

Type: Grant

Filed: January 26, 2010

Date of Patent: May 20, 2014

Assignee: Oracle America, Inc.

Inventor: Lawrence A. Spracklen
Configurable SIMD engine with high, low and mixed precision modes

Patent number: 8725990

Abstract: A configurable SIMD engine in a video processor for executing video processing operations. The engine includes a SIMD component having a plurality of inputs for receiving input data and a plurality of outputs for providing output data. A plurality of execution units are included in the SIMD component. Each of the execution units comprise a first and a second data path, and are configured for selectively implementing arithmetic operations on a set of low precision or high precision inputs. Each of the execution units have a first configuration and a second configuration, such that the first data path and the second data path are combined to produce a single high precision output in the first configuration, and such that the first data path and the second data path are partitioned to produce a respective first low precision output and second low precision output in the second configuration.

Type: Grant

Filed: November 4, 2005

Date of Patent: May 13, 2014

Assignee: Nvidia Corporation

Inventors: Ashish Karandikar, Pooja Agarwal
Management of conditional branches within a data parallel system

Patent number: 8726252

Abstract: A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements. The processor continues processing next statements inline after all SIMD lanes are complete, while providing speculative and parallel processing capability for multiple data operations of the executable program.

Type: Grant

Filed: January 28, 2011

Date of Patent: May 13, 2014

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Brian Flachs, Dorit Nuzman, Ira Rosen, Ulrich Weigand, Ayal Zaks
Register files for a digital signal processor operating in an interleaved multi-threaded environment

Patent number: 8713286

Abstract: A processor device is disclosed and includes a memory and a sequencer that is responsive to the memory. The sequencer supports very long instruction word (VLIW) type instructions and at least one VLIW instruction packet uses a number of operands during execution. The processor device further includes a plurality of instruction execution units responsive to the sequencer and a plurality of register files. Each of the plurality of register files includes a plurality of registers and the plurality of register files are coupled to the plurality of instruction execution units. Further, each of the plurality of register files includes a number of data read ports and the number of data read ports of each of the plurality of register files is less than the number of operands used by the at least one VLIW instruction packet.

Type: Grant

Filed: April 26, 2005

Date of Patent: April 29, 2014

Assignee: QUALCOMM Incorporated

Inventors: Muhammad Ahmed, Erich Plondke, Lucian Codrescu, William C. Anderson
Single-instruction multiple-data vector permutation instruction and method for performing table lookups for in-range index values and determining constant values for out-of-range index values

Patent number: 8700884

Abstract: A processor in a data processing system executes a permutation instruction which identifies a first source register, at least one other source register, and a destination register. The first source register stores at least one in-range index value for the at least one other source register and at least one out-of-range index value for the at least one other source register. The at least one other source register stores a plurality of vector element values, wherein each in-range index value indicates which vector element value of the at least one other source register is to be stored into a corresponding vector element of the destination register. Each out-of-range index value is used to indicate which one of at least two predetermined constant values is to be stored into a corresponding vector element of the destination register. Partial table lookups using a permutation instruction shortens the time required to retrieve data.

Type: Grant

Filed: October 12, 2007

Date of Patent: April 15, 2014

Assignee: Freescale Semiconductor, Inc.

Inventors: William C. Moyer, Imran Ahmed, Dan E. Tamir
Method and apparatus for shuffling data

Patent number: 8688959

Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.

Type: Grant

Filed: September 10, 2012

Date of Patent: April 1, 2014

Assignee: Intel Corporation

Inventors: William W. Macy, Jr., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
Mechanism for conflict detection using SIMD

Patent number: 8688957

Abstract: A system and method are configured to detect conflicts when converting scalar processes to parallel processes (“SIMDifying”). Conflicts may be detected for an unordered single index, an ordered single index and/or ordered pairs of indices. Conflicts may be further detected for read-after-write dependencies. Conflict detection is configured to identify operations (i.e., iterations) in a sequence of iterations that may not be done in parallel.

Type: Grant

Filed: December 21, 2010

Date of Patent: April 1, 2014

Assignee: Intel Corporation

Inventors: Mikhail Smelyanskiy, Yen-Kuang Chen, Daehyun Kim, Christopher J. Hughes, Victor W. Lee
System for data collection from processing elements in a SIMD processor

Patent number: 8688958

Abstract: A processor has a plurality of PEs (processing elements) that operate in parallel based on operation commands and an information collection unit that collects the data of the plurality of PEs, wherein each of the plurality of PEs holds data and a condition flag, supplies the data and the condition flag to the information collection unit upon receiving an operation command, and upon receiving an update request for updating the condition flag, updates the condition flag in accordance with the update request that was received; and the information collection unit, upon receiving the data and the condition flags, selects one PE based on a predetermined order of priority from among the PEs for which the received condition flags are active and both supplies the data of the selected PE as collection result data and supplies an update request for updating the condition flag of the PE that was selected.

Type: Grant

Filed: January 14, 2010

Date of Patent: April 1, 2014

Assignee: NEC Corporation

Inventor: Shohei Nomoto
Data processing apparatus and method for handling vector instructions

Patent number: 8661225

Abstract: A data processing apparatus and method and provided for handling vector instructions. The data processing apparatus has a register data store with a plurality of registers arranged to store data elements. A vector processing unit is then used to execute a sequence of vector instructions, with the vector processing unit having a plurality of lanes of parallel processing and having access to the register data store in order to read data elements from, and write data elements to, the register data store during the execution of the sequence of vector instructions. A skip indication storage maintains a skip indicator for each of the lanes of parallel processing. The vector processing unit is responsive to a vector skip instruction to perform an update operation to set within the skip indication storage the skip indicator for a determined one or more lanes.

Type: Grant

Filed: January 19, 2010

Date of Patent: February 25, 2014

Assignee: ARM Limited

Inventors: Andreas Björklund, Erik Persson, Ola Hugosson
Compiler for providing intrinsic supports for VLIW PAC processors with distributed register files and method thereof

Patent number: 8656376

Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.

Type: Grant

Filed: September 1, 2011

Date of Patent: February 18, 2014

Assignee: National Tsing Hua University

Inventors: Jenq Kuen Lee, Chi Bang Kuan
CIRCUIT AND METHOD FOR SEARCHING A DATA ARRAY AND SINGLE-INSTRUCTION, MULTIPLE-DATA PROCESSING UNIT INCORPORATING THE SAME

Publication number: 20140032879

Abstract: Search circuitry responsive to a single instruction for undertaking a step of a search of a data array for an extreme value therein, a method of searching a data array to identify an extreme value therein and a location thereof and a single-instruction, multiple-data (SIMD) processing unit incorporating the search circuitry or the method. In one embodiment, the search circuitry includes: a comparison element configured to compare two values in the data array, (2) multiplexers coupled to the comparison element and configured to select a more extreme value of the two values and a location in the data array of the more extreme value and (3) an incrementer configured to increment a counter associated with the search.

Type: Application

Filed: July 26, 2012

Publication date: January 30, 2014

Applicant: VeriSilicon Holdings Co., Ltd

Inventor: Stephen E. Jarboe
Packet draining from a scheduling hierarchy in a traffic manager of a network processor

Patent number: 8638805

Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.

Type: Grant

Filed: September 30, 2011

Date of Patent: January 28, 2014

Assignee: LSI Corporation

Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
Packing signed word elements from two source registers to saturated signed byte elements in destination register

Patent number: 8639914

Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.

Type: Grant

Filed: December 29, 2012

Date of Patent: January 28, 2014

Assignee: Intel Corporation

Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
SIMD processor array system and data transfer method thereof

Patent number: 8635432

Abstract: There is provided an SIMD processor array system in which data can be efficiently transferred between processor elements located at different distances. The SIMD processor array system includes a control processor (CP) that is capable of issuing a plurality of instructions at the same time, and a PE array that includes a plurality of mutually-connected processing elements (PEs) to be controlled by the CP. The CP issues an inter-PE data shift instruction to each PE. According to the inter-PE data shift instruction, each PE performs a data sending operation of copying all the contents of a transfer data storing part of an adjoining PE to a transfer data storing part (MBF) of the own PE, and a data fetch operation of copying part or all of the contents of the MBF of the adjoining PE to a transfer data fetch and storing part (RBUF) of the own PE if part of the contents the MBF of the adjoining PE coincide with the contents of an ID storing part (IDB) of the own PE.

Type: Grant

Filed: March 4, 2009

Date of Patent: January 21, 2014

Assignee: NEC Corporation

Inventor: Shorin Kyo
EFFICIENT HARDWARE INSTRUCTIONS FOR SINGLE INSTRUCTION MULTIPLE DATA PROCESSORS

Publication number: 20140013077

Abstract: A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.

Type: Application

Filed: September 10, 2013

Publication date: January 9, 2014

Applicant: Oracle International Corporation

Inventors: Amit Ganesh, Shasank K. Chavan, Vineet Marwah, Jesse Kamp, Anindya C. Patthak, Michael J. Gleeson, Allison L. Holloway, Roger Macnicol
EFFICIENT HARDWARE INSTRUCTIONS FOR SINGLE INSTRUCTION MULTIPLE DATA PROCESSORS

Publication number: 20140013078

Abstract: A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.

Type: Application

Filed: September 10, 2013

Publication date: January 9, 2014

Applicant: Oracle International Corporation

Inventors: Amit Ganesh, Shasank K. Chavan, Vineet Marwah, Jesse Kamp, Anindya C. Patthak, Michael J. Gleeson, Allison L. Holloway, Roger Macnicol
Retargetting an application program for execution by a general purpose processor

Patent number: 8612732

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Grant

Filed: March 19, 2009

Date of Patent: December 17, 2013

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Boris Beylin, Jayant B. Kolhe, Douglas Saylor
Computing device, calculating method, and program product

Patent number: 8612507

Abstract: A computing device includes: a deciding unit which, in computation of values of nodes on a lattice in a direction where a value of m representing a horizontal axis coordinate of the lattice increases, decides dummy nodes to be added to m=n?1, so as to enable values of nodes on m=n to be calculated by adding the dummy nodes to m=n?1 and executing a vector operation through the use of the SIMD function by using values of nodes on m=n?1 and values of the added dummy nodes; an adding unit adding the dummy nodes decided by the deciding unit to m=n?1; and a calculating unit calculating the values of the nodes present on m=n by executing the vector operation through the use of the SIMD function by using the values of the nodes on m=n?1 and the values of the dummy nodes added by the adding unit.

Type: Grant

Filed: April 16, 2010

Date of Patent: December 17, 2013

Assignee: NS Solutions Corporation

Inventor: Hiroki Takeshita
Partition-free multi-socket memory system architecture

Patent number: 8605099

Abstract: A technique to increase memory bandwidth for throughput applications. In one embodiment, memory bandwidth can be increased, particularly for throughput applications, without increasing interconnect trace or pin count by pipelining pages between one or more memory storage areas on half cycles of a memory access clock.

Type: Grant

Filed: March 31, 2008

Date of Patent: December 10, 2013

Assignee: Intel Corporation

Inventor: Eric Sprangle
Execution of instruction with element size control bit to interleavingly store half packed data elements of source registers in same size destination register

Patent number: 8601246

Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.

Type: Grant

Filed: June 27, 2002

Date of Patent: December 3, 2013

Assignee: Intel Corporation

Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
MINICORE-BASED RECONFIGURABLE PROCESSOR AND METHOD OF FLEXIBLY PROCESSING MULTIPLE DATA USING THE SAME

Publication number: 20130318324

Abstract: A minicore-based reconfigurable processor and a method of flexibly processing multiple data using the same are provided. The reconfigurable processor includes minicores, each of the minicores including function units configured to perform different operations, respectively. The reconfigurable processor further includes a processing unit configured to activate two or more function units of two or more respective minicores, among the minicores, that are configured to perform an operation of a single instruction multiple data (SIMD) instruction, the processing unit further configured to execute the SIMD instruction using the activated two or more function units.

Type: Application

Filed: February 13, 2013

Publication date: November 28, 2013

Applicant: Samsung Electronics Co., Ltd.

Inventor: Dong-Kwan SUH
Floating point collect and operate

Patent number: 8595467

Abstract: Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a singe stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register.

Type: Grant

Filed: December 29, 2009

Date of Patent: November 26, 2013

Assignee: International Business Machines Corporation

Inventors: Brian K. Flachs, Seiji Maeda, Steven Osman
Integrated circuit device, electronic apparatus and method for manufacturing of electronic apparatus

Patent number: 8587568

Abstract: An integrated circuit device includes a host I/F, an information register, and a control section. The information register stores wave selection information for selecting waveform information which defines a waveform of a drive signal of the electro-optical device. Waveform information selected by the wave selection information stored in the information register from among a plurality of pieces of waveform information is loaded to an information memory at the time of manufacturing an electronic apparatus including the electro-optical device. The control section controls the display of the electro-optical device on the basis of the waveform information read from the information memory at the time of an actual operation of the electronic apparatus.

Type: Grant

Filed: March 24, 2010

Date of Patent: November 19, 2013

Assignee: Seiko Epson Corporation

Inventor: Hideki Ogawa
Dynamic load balancing of instructions for execution by heterogeneous processing engines

Patent number: 8578387

Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

Type: Grant

Filed: July 31, 2007

Date of Patent: November 5, 2013

Assignee: Nvidia Corporation

Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
Support for non-local returns in parallel thread SIMD engine

Patent number: 8572355

Abstract: One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.

Type: Grant

Filed: September 13, 2010

Date of Patent: October 29, 2013

Assignee: Nvidia Corporation

Inventors: Guillermo Juan Rozas, Brett W. Coon
Residual addition for video software techniques

Patent number: 8560809

Abstract: According to some embodiments, a technique provides for the execution of an instruction that includes receiving residual data of a first image and decoded pixels of a second image, zero-extending a plurality of unsigned data operands of the decoded pixels producing a plurality of unpacked data operands, adding a plurality of signed data operands of the residual data to the plurality of unpacked data operands producing a plurality of signed results; and saturating the plurality of signed results producing a plurality of unsigned results.

Type: Grant

Filed: November 15, 2011

Date of Patent: October 15, 2013

Assignee: Intel Corporation

Inventors: Bradley C. Aldrich, Nigel C. Paver, Murli Ganeshan
Load/move duplicate instructions for a processor

Patent number: 8539202

Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.

Type: Grant

Filed: June 12, 2012

Date of Patent: September 17, 2013

Assignee: Intel Corporation

Inventor: Patrice Roussel
Transposing array data on SIMD multi-core processor architectures

Patent number: 8539201

Abstract: Systems, methods and articles of manufacture are disclosed for transposing array data on a SIMD multi-core processor architecture. A matrix in a SIMD format may be received. The matrix may comprise a SIMD conversion of a matrix M in a conventional data format. A mapping may be defined from each element of the matrix to an element of a SIMD conversion of a transpose of matrix M. A SIMD-transposed matrix T may be generated based on matrix M and the defined mapping. A row-wise algorithm may be applied to T, without modification, to operate on columns of matrix M.

Type: Grant

Filed: November 4, 2009

Date of Patent: September 17, 2013

Assignee: International Business Machines Corporation

Inventors: Jeffrey S. McAllister, Timothy J. Mullins, Nelson Ramirez, Mark A. Bransford
Selectively isolating processor elements into subsets of processor elements

Patent number: 8532288

Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.

Type: Grant

Filed: December 1, 2006

Date of Patent: September 10, 2013

Assignee: International Business Machines Corporation

Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
Three-Dimensional Permute Unit for a Single-Instruction Multiple-Data Processor

Publication number: 20130227249

Abstract: A three-dimensional (3D) permute unit for a single-instruction-multiple-data stacked processor includes a first vector permute subunit and a second vector permute subunit. The first and second vector permute subunits are arranged in different layers of a 3D chip package. The vector permute subunits are each configured to process a portion of at least two input vectors. A first contact sub-field of the first vector permute subunit is configured to connect output ports of a first crossbar of the first vector permute subunit, holding an intermediate result of the first vector permute subunit, to a second contact sub-field of the second vector permute subunit. A first contact sub-field of the second vector permute subunit is configured to connect output ports of a first crossbar of the second vector permute subunit, holding an intermediate result of the second vector permute subunit, to a second contact sub-field of the first vector permute subunit.

Type: Application

Filed: February 25, 2013

Publication date: August 29, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
Interleaving corresponding data elements from part of two source registers to destination register in processor operable to perform saturation

Patent number: 8521994

Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to pack the packed data responsive to a pack instruction received by the decoder. A first packed data element and a second packed data element are received from the first source register. A third packed data element and a fourth packed data element are received from the second source register. The circuit packs packing a portion of each of the packed data elements into a destination register resulting with the portion from second packed data element adjacent to the portion from the first packed data element, and the portion from the fourth packed data element adjacent to the portion from the third packed data element.

Type: Grant

Filed: December 22, 2010

Date of Patent: August 27, 2013

Assignee: Intel Corporation

Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
Vector completion mask handling

Patent number: 8510536

Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.

Type: Grant

Filed: June 28, 2012

Date of Patent: August 13, 2013

Assignee: Intel Corporation

Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
Single instruction processing of network packets

Patent number: 8493979

Abstract: Executing a single instruction/multiple data (SIMD) instruction of a program to process a vector of data wherein each element of the packet vector corresponds to a different received packet.

Type: Grant

Filed: December 30, 2008

Date of Patent: July 23, 2013

Assignee: Intel Corporation

Inventors: Bryan E. Veal, Travis T. Schluessler
Bi-directional data transfer within a single I/O operation

Patent number: 8495253

Abstract: An article of manufacture, apparatus, and a method for facilitating input/output (I/O) processing for an I/O operation at a host computer system configured for communication with a control unit. The method includes the host computer system obtaining a transport command word (TCW) for an I/O operation having both input and output data. The TCW specifies a location of the output data and a location for storing the input data. The host computer system forwards the I/O operation to the control unit for execution. The host computer system gathers the output data responsive to the location of the output data specified by the TCW, and then forwards the output data to the control unit for use in the execution of the I/O operation. The host computer system receives the input data from the control unit and stores the input data at the location specified by the TCW.

Type: Grant

Filed: March 30, 2011

Date of Patent: July 23, 2013

Assignee: International Business Machines Corporation

Inventors: John R. Flanagan, Daniel F. Casper, Catherine C. Huang, Matthew J. Kalos, Ugochukwu C. Njoku, Dale F. Riedy, Gustav E. Sittmann
PROCESSOR WITH INTER-PROCESSING PATH COMMUNICATION

Publication number: 20130185538

Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage. The instruction stream includes scalar instructions executable by the scalar processor core and vector instructions executable by the vector coprocessor core. The scalar processor core is configured to pass the vector instructions to the vector coprocessor core. The vector coprocessor core configured to process a plurality of data values in parallel while executing each vector instruction passed by the scalar processor core. The vector coprocessor core includes a plurality of processing paths arranged in parallel to process the data values. Each of the processing paths includes an execution unit. Each of the execution units is configured to communicate a result of processing to each other of the execution units.

Type: Application

Filed: July 13, 2012

Publication date: July 18, 2013

Applicant: TEXAS INSTRUMENTS INCORPORATED

Inventors: Ching-Yu Hung, Shinri Inamori, Jagadeesh Sankaran, Peter Chang
Performing a multiply-multiply-accumulate instruction

Patent number: 8478969

Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.

Type: Grant

Filed: September 24, 2010

Date of Patent: July 2, 2013

Assignee: Intel Corporation

Inventor: Eric S. Sprangle
VECTOR SIMD PROCESSOR

Publication number: 20130166878

Abstract: Operation parallelism of a data processor is enhanced by floating-point inner product execution units compatible with single instruction multiple data (SIMD). An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining efficiency of floating-point length-4 vector inner product execution units is implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.

Type: Application

Filed: November 28, 2012

Publication date: June 27, 2013

Applicant: RENESAS ELECTRONICS CORPORATION

Inventor: Renesas Electronics Corporation
Signal processing apparatus with signal control units and processor units operating based on different threads

Patent number: 8464025

Abstract: A signal processing apparatus able to raise a processing capability in processing accompanying access to a storing means is provided. Stream control units (SCU) 203—0 to 203—3 access data at an external memory system or local memories 204—0 to 204—3 according to a thread under control from a host processor. Processor units (PU) arrays 202—0 to 202—3 perform image processing by a different thread from the thread of the SCUs 203—0 to 203—3.

Type: Grant

Filed: May 22, 2006

Date of Patent: June 11, 2013

Assignee: Sony Corporation

Inventors: Yuji Yamaguchi, Masatoshi Imai, Toshiharu Noda, Naosuke Asari, Tomoo Mitsunaga, Mitsuharu Ohki, Kazumasa Ito, Hidetoshi Nagano, Sumito Arakawa, Kei Ito
Bitstream Buffer Manipulation With A SIMD Merge Instruction

Publication number: 20130145120

Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.

Type: Application

Filed: January 29, 2013

Publication date: June 6, 2013

Inventors: Yen-Kueng Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Yeung
Method and structure of using SIMD vector architectures to implement matrix multiplication

Patent number: 8458442

Abstract: A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=?1.

Type: Grant

Filed: August 26, 2009

Date of Patent: June 4, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Michael Karl Gschwind, John A. Gunnels, Fred Gehrung Gustavson, Brett Olsson
Bitstream Buffer Manipulation With A SIMD Merge Instruction

Publication number: 20130138917

Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.

Type: Application

Filed: January 29, 2013

Publication date: May 30, 2013

Inventors: Yen-Kueng Chen, William W. Macy, JR., Matthew Holliman, Eric L. Debes, Minerva M. Yeung
Instruction controller to distribute serial and SIMD instructions to serial and SIMD processors

Patent number: 8447953

Abstract: A microprocessor architecture comprises a plurality of processing elements arranged in a single instruction multiple data SIMD array, wherein each processing element includes a plurality of execution units, each of which is operable to process an instruction of a particular instruction type, a serial processor which includes a plurality of execution units, each of which is operable to process an instruction of a particular instruction type, and an instruction controller operable to receive a plurality of instructions, and to distribute received instructions to the execution units in dependence upon the instruction types of the received instruction. The execution units of the serial processor are operable to process respective instructions in parallel.

Type: Grant

Filed: February 7, 2006

Date of Patent: May 21, 2013

Assignee: Rambus Inc.

Inventor: Leon David Wildman
Bitstream Buffer Manipulation With A SIMD Merge Instruction

Publication number: 20130124824

Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.

Type: Application

Filed: December 27, 2012

Publication date: May 16, 2013

Inventors: Yen-Kueng Chen, William W. Macy, Matthew Holliman, Eric L. Debes, Minerva M. Yeung

prev 1 2 3 4 5 6 7 8 … next