Vector Processor Operation Patents (Class 712/7)

Sequential (Class 712/8)

Concurrent (Class 712/9)

Vector execution unit with prenormalization of denormal values

Patent number: 9092257

Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.

Type: Grant

Filed: March 11, 2013

Date of Patent: July 28, 2015

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Apparatus and method for separate asymmetric control processing and data path processing in a dual path processor

Patent number: 9047094

Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment according to the invention, there is provided a computer processor, the processor comprising: a decode unit for decoding instruction packets fetched from a memory holding a sequence of instruction packets; and first and second processing channels, each channel comprising a plurality of functional units, wherein the first processing channel is capable of performing control operations and comprises a control register file having a relatively narrower bit width, and the second processing channel is capable of performing data processing operations at least one input of which is a vector and comprises a data register file having a relatively wider bit width.

Type: Grant

Filed: March 31, 2004

Date of Patent: June 2, 2015

Assignee: Icera Inc.

Inventor: Simon Knowles
DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING VECTOR PROCESSING

Publication number: 20150149744

Abstract: A data processing apparatus and method are provided for processing execution threads, where each execution thread specifies at least one instruction. The data processing apparatus has a vector processing unit providing a plurality M of lanes of parallel processing, within each lane the vector processing unit being configured to perform a processing operation on a data element input to that lane for each of one or more input operands. A vector instruction is received that is specified by a group of the execution threads, that vector instruction identifying an associated processing operation and also providing an indication of the data elements of each input operand that are to be subjected to that associated processing operation. Vector merge circuitry then determines, based on that information, a required number of lanes of parallel processing for performing the associated processing operation.

Type: Application

Filed: October 2, 2014

Publication date: May 28, 2015

Inventor: Ronny PEDERSEN
VECTOR PROCESSING ENGINES (VPEs) EMPLOYING MERGING CIRCUITRY IN DATA FLOW PATHS BETWEEN EXECUTION UNITS AND VECTOR DATA MEMORY TO PROVIDE IN-FLIGHT MERGING OF OUTPUT VECTOR DATA STORED TO VECTOR DATA MEMORY, AND RELATED VECTOR PROCESSING INSTRUCTIONS, SYSTEMS, AND METHODS

Publication number: 20150143077

Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.

Type: Application

Filed: November 15, 2013

Publication date: May 21, 2015

Applicant: QUALCOMM Incorporated

Inventor: Raheel Khan
VECTOR PROCESSING ENGINES (VPEs) EMPLOYING DESPREADING CIRCUITRY IN DATA FLOW PATHS BETWEEN EXECUTION UNITS AND VECTOR DATA MEMORY TO PROVIDE IN-FLIGHT DESPREADING OF SPREAD-SPECTRUM SEQUENCES, AND RELATED VECTOR PROCESSING INSTRUCTIONS, SYSTEMS, AND METHODS

Publication number: 20150143076

Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.

Type: Application

Filed: November 15, 2013

Publication date: May 21, 2015

Applicant: QUALCOMM Incorporated

Inventor: Raheel Khan
VECTOR PROCESSING ENGINES (VPEs) EMPLOYING TAPPED-DELAY LINE(S) FOR PROVIDING PRECISION CORRELATION / COVARIANCE VECTOR PROCESSING OPERATIONS WITH REDUCED SAMPLE RE-FETCHING AND POWER CONSUMPTION, AND RELATED VECTOR PROCESSOR SYSTEMS AND METHODS

Publication number: 20150143079

Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision correlation/covariance vector processing operations with reduced sample re-fetching and/or power consumption are disclosed. The VPEs disclosed herein are configured to provide correlation/covariance vector processing operations, such as code division multiple access (CDMA) correlation/covariance vector processing operations as a non-limiting example. A tapped-delay line(s) is included in the data flow paths between memory and execution units in the VPE. The tapped-delay line (s) is configured to receive and provide an input vector data sample set to execution units for performing correlation/covariance vector processing operations.

Type: Application

Filed: November 15, 2013

Publication date: May 21, 2015

Applicant: QUALCOMM Incorporated

Inventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
VECTOR CHECKSUM INSTRUCTION

Publication number: 20150143080

Abstract: Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.

Type: Application

Filed: December 9, 2014

Publication date: May 21, 2015

Inventors: Jonathan D. Bradbury, Eric M. Schwarz
VECTOR PROCESSING ENGINES (VPEs) EMPLOYING A TAPPED-DELAY LINE(S) FOR PROVIDING PRECISION FILTER VECTOR PROCESSING OPERATIONS WITH REDUCED SAMPLE RE-FETCHING AND POWER CONSUMPTION, AND RELATED VECTOR PROCESSOR SYSTEMS AND METHODS

Publication number: 20150143078

Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption are disclosed. Related vector processor systems and methods are also disclosed. The VPEs are configured to provide filter vector processing operations. To minimize re-fetching of input vector data samples from memory to reduce power consumption, a tapped-delay line(s) is included in the data flow paths between a vector data file and execution units in the VPE. The tapped-delay line(s) is configured to receive and provide input vector data sample sets to execution units for performing filter vector processing operations. The tapped-delay line(s) is also configured to shift the input vector data sample set for filter delay taps and provide the shifted input vector data sample set to execution units, so the shifted input vector data sample set does not have to be re-fetched during filter vector processing operations.

Type: Application

Filed: November 15, 2013

Publication date: May 21, 2015

Applicant: QUALCOMM Incorporated

Inventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
CRYPTOGRAPHIC SUPPORT INSTRUCTIONS

Publication number: 20150121036

Abstract: A data processing system 2 includes a single instruction multiple data register file 12 and single instruction multiple processing circuitry 14. The single instruction multiple data processing circuitry 14 supports execution of cryptographic processing instructions for performing parts of a hash algorithm. The operands are stored within the single instruction multiple data register file 12. The cryptographic support instructions do not follow normal lane-based processing and generate output operands in which the different portions of the output operand depend upon multiple different elements within the input operand.

Type: Application

Filed: December 30, 2014

Publication date: April 30, 2015

Inventors: Matthew James HORSNELL, Richard Roy GRISENTHWAITE, Stuart David BILES, Daniel KERSHAW
INSTRUCTION AND LOGIC TO PROVIDE CONVERSIONS BETWEEN A MASK REGISTER AND A GENERAL PURPOSE REGISTER OR MEMORY

Publication number: 20150113246

Abstract: Instructions and logic provide conversions between a mask register and a general purpose register or memory. Some embodiments, responsive to an instruction specifying: a destination operand, a mask length corresponding to a number of mask data fields, and a source operand; values are read from data fields in the source operand, corresponding to the specified mask length, and stored to corresponding data fields in the destination operand specified by the instruction, wherein one of the source or the destination operands is a mask register. Values indicative of masked vector elements may be stored to any data fields in the destination operand other than the number of data fields corresponding to the specified mask length. For some embodiments, the other one of the source or the destination operands may be a general purpose register or a memory location.

Type: Application

Filed: November 25, 2011

Publication date: April 23, 2015

Applicant: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Robert Valentine, Bret L. Toll, Mark J. Charney
Acceleration of string comparisons using vector instructions

Patent number: 9009447

Abstract: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.

Type: Grant

Filed: July 18, 2011

Date of Patent: April 14, 2015

Assignee: Oracle International Corporation

Inventor: Darryl J. Gove
DATA PROCESSING APPARATUS AND METHOD FOR CONTROLLING PERFORMANCE OF SPECULATIVE VECTOR OPERATIONS

Publication number: 20150100755

Abstract: A data processing apparatus and a method of controlling performance of speculative vector operations are provided. The apparatus comprises processing circuitry for performing a sequence of speculative vector operations on vector operands, each vector operand comprising a plurality of vector elements, and speculation control circuitry for maintaining a speculation width indication indicating the number of vector elements of each vector operand to be subjected to the speculative vector operations. The speculation width indication is set to an initial value prior to performance of the sequence of speculative vector operations. The processing circuitry generates progress indications during performance of the sequence of speculative vector operations, and the speculation control circuitry detects, with reference to the progress indications and speculation reduction criteria, presence of a speculation reduction condition.

Type: Application

Filed: August 18, 2014

Publication date: April 9, 2015

Inventors: Alastair David REID, Daniel KERSHAW
Vector compare-and-exchange operation

Patent number: 8996845

Abstract: A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.

Type: Grant

Filed: December 22, 2009

Date of Patent: March 31, 2015

Assignee: Intel Corporation

Inventors: Ravi Rajwar, Andrew T. Forsyth
Predicate Attribute Tracker

Publication number: 20150089190

Abstract: In an embodiment, a processor includes a register attribute tracker configured to track one or more attributes corresponding to registers. The register attribute tracker may track the attributes associated with the registers when those registers are used as output registers of instructions that explicitly define the attributes and, if the register attribute tracker has a tracked attribute associated with an input register of an instruction that does not explicitly define the attribute, the register attribute tracker may annotate the instruction with an attribute and/or associate an attribute with the output register of the instruction in the register attribute tracker.

Type: Application

Filed: September 24, 2013

Publication date: March 26, 2015

Applicant: Apple Inc.

Inventor: Jeffry E. Gonion
Early Issue of Null-Predicated Operations

Publication number: 20150089191

Abstract: In an embodiment, a processor includes an issue circuit configured to issue instruction operations for execution. The issue circuit may be configured to monitor the source operands of the instruction operations, and to issue instruction operations for which the source operands (including predicate operands, as appropriate) are resolved. Additionally, the issue circuit may be configured to detect a null predicate that indicates that none of the vector elements will be modified by a corresponding instruction operation. The issue circuit may be configured to issue the corresponding instruction operation with the null predicate even if other source operands are not yet resolved.

Type: Application

Filed: September 24, 2013

Publication date: March 26, 2015

Applicant: Apple Inc.

Inventor: Jeffry E. Gonion
Dynamic Attribute Inference

Publication number: 20150089192

Abstract: In an embodiment, a processor may be configured to dynamically infer one or more attributes of input and/or output registers of an instruction, given the attributes corresponding to at least one input registers. The inference may be made at the issue circuit/stage of the processor, for those registers that do not have attribute information at the issue circuit/stage. In an embodiment, the processor may also include a register attribute tracker configured to track attributes of registers prior to the issue stage of the processor pipeline. The processor may feed back, to the register attribute tracker, inferred attributes and the register addresses of the registers to which the inferred attributes apply. The register attribute tracker may be configured to may associate the inferred attribute with the identified register attribute tracker may also be configured to infer input register attributes from other input register attributes.

Type: Application

Filed: September 24, 2013

Publication date: March 26, 2015

Applicant: Apple Inc.

Inventor: Jeffry E. Gonion
Predecode logic autovectorizing a group of scalar instructions including result summing add instruction to a vector instruction for execution in vector unit with dot product adder

Patent number: 8984260

Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.

Type: Grant

Filed: December 20, 2011

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE

Publication number: 20150074373

Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.

Type: Application

Filed: June 2, 2012

Publication date: March 12, 2015

Inventors: Zeev Sperber, Robert Valentine, Shlomo Raikin, Stanislav Shwartsman, Gal Ofir, Igor Yanover, Guy Patkin, Levy Ofer
METHOD AND APPARATUS FOR ASYNCHRONOUS PROCESSOR WITH AUXILIARY ASYNCHRONOUS VECTOR PROCESSOR

Publication number: 20150074374

Abstract: An asynchronous processing system comprising an asynchronous scalar processor and an asynchronous vector processor coupled to the scalar processor. The asynchronous scalar processor is configured to perform processing functions on input data and to output instructions. The asynchronous vector processor is configured to perform processing functions in response to a very long instruction word (VLIW) received from the scalar processor. The VLIW comprises a first portion and a second portion, at least the first portion comprising a vector instruction.

Type: Application

Filed: September 8, 2014

Publication date: March 12, 2015

Inventors: Qifan Zhang, Wuxian Shi, Yiqun Ge, Tao Huang, Wen Tong
Reversing processing order in half-pumped SIMD execution units to achieve K cycle issue-to-issue latency

Patent number: 8977835

Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.

Type: Grant

Filed: November 14, 2013

Date of Patent: March 10, 2015

Assignee: International Business Machines Corporation

Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
Vector conflict instructions

Patent number: 8972698

Abstract: A processing core implemented on a semiconductor chip is described having first execution unit logic circuitry that includes first comparison circuitry to compare each element in a first input vector against every element of a second input vector. The processing core also has second execution logic circuitry that includes second comparison circuitry to compare a first input value against every data element of an input vector.

Type: Grant

Filed: December 22, 2010

Date of Patent: March 3, 2015

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Mark J. Charney, Yen-Kuang Chen, Jesus Corbal, Andrew T. Forsyth, Milind B. Girkar, Jonathan C. Hall, Hideki Ido, Robert Valentine, Jeffrey Wiedemeier
VECTOR PROCESSOR

Publication number: 20150046673

Abstract: A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.

Type: Application

Filed: August 12, 2014

Publication date: February 12, 2015

Inventors: Brendan BARRY, Fergal CONNOR, Martin O'RIORDAN, David MOLONEY, Sean POWER
APPARATUS, SYSTEMS, AND METHODS FOR LOW POWER COMPUTATIONAL IMAGING

Publication number: 20150046675

Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.

Type: Application

Filed: August 12, 2014

Publication date: February 12, 2015

Inventors: Brendan BARRY, Richard RICHMOND, Fergal CONNOR, David MOLONEY
LOW POWER COMPUTATIONAL IMAGING

Publication number: 20150046674

Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.

Type: Application

Filed: August 12, 2014

Publication date: February 12, 2015

Inventors: Brendan BARRY, Richard RICHMOND, Fergal CONNOR, David MOLONEY
Reversing processing order in half-pumped SIMD execution units to achieve K cycle issue-to-issue latency

Patent number: 8949575

Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.

Type: Grant

Filed: December 14, 2011

Date of Patent: February 3, 2015

Assignee: International Business Machines Corporation

Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
VECTOR EXECUTION UNIT FOR DIGITAL SIGNAL PROCESSOR

Publication number: 20140372728

Abstract: A vector execution unit for use in a digital signal processor enables a new set of instructions. The unit comprises a first input port for receiving at least a first input data vector, an instruction decoder, a vector output port, and least one data-path. The instruction decoding unit is arranged to control the data-path to perform a comparison related to the first input data vector, and the processor comprises an integer port arranged to output the result of the comparison in the form of a decision vector to a memory unit or a functional unit in the digital signal processor. Alternatively or in addition, the integer port is also arranged to receive a decision vector of integer data, and the instruction decoding unit is arranged to control the data-path to process the first input data in dependence of the value of the integer data.

Type: Application

Filed: November 28, 2012

Publication date: December 18, 2014

Applicant: Media Tek Sweden AB

Inventors: Anders Nilsson, Eric Tell
Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a same set of per-lane control bits

Patent number: 8914613

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: August 26, 2011

Date of Patent: December 16, 2014

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
DIGITAL SIGNAL PROCESSOR

Publication number: 20140359252

Abstract: A multicore processor is achieved by a processor assembly, comprising a first processor having a first core and at least a first and a second unit, each being selected from the group of vector execution units, memory units and accelerators, said first core and first and second units being interconnected by a first network, and a second processor having a second core wherein the first core is arranged to enable the second core to control at least one of the units in the first processor. Each processors generally comprises a combination of execution units, memory units and accelerators, which may be controlled and/or accessed by units in the other processor.

Type: Application

Filed: November 28, 2012

Publication date: December 4, 2014

Applicant: Media Tek Sweden AB

Inventors: Anders Nilsson, Eric Tell
DIGITAL SIGNAL PROCESSOR AND BASEBAND COMMUNICATION DEVICE

Publication number: 20140344549

Abstract: The invention relates to a digital signal processor comprising a processor core, an integer execution unit and a number of vector execution units, said digital signal processor comprising a program memory arranged to hold instructions for the execution units and issue logic for issuing instructions. The digital signal processor comprises an issue control unit for selecting at least two execution units that are to receive and execute the same instruction at the same time, and logic for sending the instruction to said at least two execution units.

Type: Application

Filed: November 28, 2012

Publication date: November 20, 2014

Applicant: Media Tek Sweden AB

Inventors: Anders Nilsson, Eric Tell
Multithreaded processor with multiple concurrent pipelines per thread

Patent number: 8892849

Abstract: A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.

Type: Grant

Filed: October 15, 2009

Date of Patent: November 18, 2014

Assignee: QUALCOMM Incorporated

Inventors: Erdem Hokenek, Mayan Moudgill, Michael J. Schulte, C. John Glossner
Hardware to support looping code in an image processing system

Patent number: 8892853

Abstract: An image processing system including a vector processor and a memory adapted for attaching to the vector processor. The memory is adapted to store multiple image frames. The vector processor includes an address generator operatively attached to the memory to access the memory. The address generator is adapted for calculating addresses of the memory over the multiple image frames. The addresses may be calculated over the image frames based upon an image parameter. The image parameter may specify which of the image frames are processed simultaneously. A scalar processor may be attached to the vector processor. The scalar processor provides the image parameter(s) to the address generator for address calculation over the multiple image frames. An input register may be attached to the vector processor. The input register may be adapted to receive a very long instruction word (VLIW) instruction.

Type: Grant

Filed: June 10, 2010

Date of Patent: November 18, 2014

Assignee: Mobileye Technologies Limited

Inventors: Yosef Kreinin, Gil Dogon, Emmanuel Sixsou, Yosi Arbeli, Mois Navon, Roman Sajman
Vector processing circuit, command issuance control method, and processor system

Patent number: 8874879

Abstract: A vector processing circuit includes a vector register file including a plurality of array elements, a command issuance control circuit, and a plurality of pipeline arithmetic units. Each pipeline arithmetic unit performs arithmetic processing of data stored in the array elements indicated as a source by one command in parts through a plurality of cycles and stores the result in the array elements indicated as a destination by the one command through a plurality of cycles. When data word length of a preceding command is longer than that of a subsequent command, the command issuance control circuit changes data sizes of the array elements in accordance with data word length of the command and determines whether there is register interference between the array element to be processed at a non-head cycle of the preceding command, and the array element to be processed at a head cycle of the subsequent command.

Type: Grant

Filed: October 24, 2011

Date of Patent: October 28, 2014

Assignee: Fujitsu Limited

Inventors: Yi Ge, Yoshimasa Takebe, Hiromasa Takahashi
On-the-fly permutation of vector elements for executing successive elemental instructions

Patent number: 8868885

Abstract: A device system and method for processing program instructions, for example, to execute intra vector operations. A fetch unit may receive a program instruction defining different operations on data elements stored at the same vector memory address. A processor may include different types of execution units each executing a different one of a predetermined plurality of elemental instructions. Each program instruction may be a combination of one or more of the elemental instructions. The processor may receive a vector of data elements stored non-consecutively at the same vector memory address to be processed by a same one of the elemental instructions and a vector of configuration values independently associated with executing the same elemental instruction on the non-consecutive data elements. At least two configuration values may be different to implement different operations by executing the same elemental instruction using the different configuration values on the vector of non-consecutive data elements.

Type: Grant

Filed: November 18, 2010

Date of Patent: October 21, 2014

Assignee: Ceva D.S.P. Ltd.

Inventors: Yaakov Dekter, Michael Boukaya, Shai Shpigelblat, Moshe Steinberg
Method for vector processing

Patent number: 8856492

Abstract: The present application relates to a method for processing data in a vector processor. The present application relates also to a vector processor for performing said method and a cellular communication device comprising said vector processor. The method for processing data in a vector processor comprises executing segmented operations on a segment of a vector for generating results, collecting the results of the segmented operations, and delivering the results in a result vector in such a way that subsequent operations remain processing in vector mode.

Type: Grant

Filed: May 29, 2009

Date of Patent: October 7, 2014

Assignee: NXP B.V.

Inventors: Mahima Smriti, Jean-Paul Charles Francois Hubert Smeets, Willem Egbert Hendrik Kloosterhuis
ENHANCED PREDICATE REGISTERS

Publication number: 20140289495

Abstract: Systems, apparatuses and methods for utilizing enhanced predicate registers which specify the element width and which elements are to be processed. The predicate size is dynamic, depending on the contents of the enhanced predicate register used for an instruction rather than being a static quality of a specific instruction. Specifying the element size in the enhanced predicate registers results in fewer instructions in an instruction set.

Type: Application

Filed: March 18, 2014

Publication date: September 25, 2014

Applicant: Apple Inc.

Inventor: Jeffry E. Gonion
VECTOR INDIRECT ELEMENT VERTICAL ADDRESSING MODE WITH HORIZONTAL PERMUTE

Publication number: 20140281372

Abstract: An example method for placing one or more element data values into an output vector includes identifying a vertical permute control vector including a plurality of elements, each element of the plurality of elements including a register address. The method also includes for each element of the plurality of elements, reading a register address from the vertical permute control vector. The method further includes retrieving a plurality of element data values based on the register address. The method also includes identifying a horizontal permute control vector including a set of addresses corresponding to an output vector. The method further includes placing at least some of the retrieved element data values of the plurality of element data values into the output vector based on the set of addresses in the horizontal permute control vector.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: QUALCOMM INCORPORATED

Inventors: Ajay Anant Ingle, David J. Hoyle, Marc M. Hoffman
TECHNIQUES FOR ENABLING BIT-PARALLEL WIDE STRING MATCHING WITH A SIMD REGISTER

Publication number: 20140281371

Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Inventors: HARIHARAN THANTRY, MANI AZIMI
DIGITAL SIGNAL PROCESSOR AND BASEBAND COMMUNICATION DEVICE

Publication number: 20140281373

Abstract: A digital signal processor has a vector execution unit arranged to execute instructions on multiple data in the form of a vector, comprising a local queue arranged to receive instructions from a program memory and to hold them in the local queue until a predefined condition is fulfilled. The local queue being arranged to receive a sequence of instructions at a time from the program memory and to store the last N instructions, N being an integer. A vector controller in the vector execution unit comprises queue control means arranged to make the local queue repeat a sequence of M instructions stored in the local queue, M being an integer less than or equal to N, a number K of times. This reduces the time the vector execution unit is kept waiting because of IDLE commands in the program memory.

Type: Application

Filed: September 17, 2012

Publication date: September 18, 2014

Applicant: MediaTek Sweden AB

Inventor: Anders Nilsson
VECTOR PROCESSING ENGINES HAVING PROGRAMMABLE DATA PATH CONFIGURATIONS FOR PROVIDING MULTI-MODE VECTOR PROCESSING, AND RELATED VECTOR PROCESSORS, SYSTEMS, AND METHODS

Publication number: 20140281370

Abstract: Embodiments disclosed herein include vector processing engines (VPEs) having programmable data path configurations for providing multi-mode vector processing. Related vector processors, systems, and methods are also disclosed. The VPEs include a vector processing stage(s) configured to process vector data according to a vector instruction executed in the vector processing stage. Each vector processing stage includes vector processing blocks each configured to process vector data based on the vector instruction being executed. The vector processing blocks are capable of providing different vector operations for different types of vector instructions based on data path configurations. Data paths of the vector processing blocks are programmable to be reprogrammable to process vector data differently according to the particular vector instruction being executed.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Applicant: QUALCOMM Incorporated

Inventor: Raheel Khan
ANALYZING POTENTIAL BENEFITS OF VECTORIZATION

Publication number: 20140258677

Abstract: Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media (transitory and non-transitory) are described herein for analyzing execution of a plurality of executable instructions and, based on the analysis, providing an indication of a benefit to be obtained by vectorization of at least a subset of the plurality of executable instructions. In various embodiments, the analysis may include identification of the subset of the plurality of executable instructions suitable for conversion to one or more single-instruction multiple-data (“SIMD”) instructions.

Type: Application

Filed: March 5, 2013

Publication date: September 11, 2014

Inventors: Ruchira Sasanka, Jeffrey J. Cook, Abhinav Das, Jayaram Bobba, Michael R. Greenfield, Suresh Srinivas
List Vector Processing Apparatus, List Vector Processing Method, Storage Medium, Compiler, and Information Processing Apparatus

Publication number: 20140244969

Abstract: Disclosed is a list vector processing apparatus (LVPA) or the like which can process the indirect reference at a high speed. The LVPA includes: a gather processing unit processing a first gather instruction to store a value of a storage area accessed by only a self information processing apparatus (SelfIPA) in a plurality of information processing apparatuses according to a list vector storing an address representing a storage area read from a storage apparatus into a register, and a process of generating reference access information indicating whether being a storage area accessed by both of the SelfIPA and another information processing apparatus; a communication unit for related information; an access information operating unit to calculate an area accessed by the information processing apparatus; and a scatter processing unit processing a first scatter instruction to store a value stored in the register into the storage area accessed by only the SelfIPA.

Type: Application

Filed: February 24, 2014

Publication date: August 28, 2014

Applicant: NEC CORPORATION

Inventor: Satoru TAGAYA
DIGITAL SIGNAL PROCESSOR AND BASEBAND COMMUNICATION DEVICE

Publication number: 20140244970

Abstract: For increased efficiency, a digital signal processor comprises a vector execution unit arranged to execute instructions that are to be performed on multiple data in the form of a vector, comprising a vector controller arranged to determine if an instruction is a vector instruction and, if it is, inform a count register arranged to hold the vector length, said vector controller being further arranged receive an issue signal and control the execution of instructions based on this issue signal, said vector execution unit being characterized in that it comprises a local queue arranged to receive instructions from a program memory and to hold them in the local queue until a predefined condition is fulfilled, and that the vector controller comprises queue control means arranged to control the local queue.

Type: Application

Filed: September 17, 2012

Publication date: August 28, 2014

Applicant: MEDIATEK SWEDEN AB

Inventor: Anders Nilsson
Persistent Relocatable Reset Vector for Processor

Publication number: 20140215182

Abstract: In an embodiment, an integrated circuit includes at least one processor. The processor may include a reset vector base address register configured to store a reset vector address for the processor. Responsive to a reset, the processor may be configured to capture a reset vector address on an input, updating the reset vector base address register. Upon release from reset, the processor may initiate instruction execution at the reset vector address. The integrated circuit may further include a logic circuit that is coupled to provide the reset vector address. The logic circuit may include a register that is programmable with the reset vector address. More particularly, in an embodiment, the register may be programmable via a write operation issued by the processor (e.g. a memory-mapped write operation). Accordingly, the reset vector address may be programmable in the integrated circuit, and may be changed from time to time.

Type: Application

Filed: January 25, 2013

Publication date: July 31, 2014

Applicant: Apple Inc.

Inventors: Josh P. de Cesare, Gerard R. Williams, III, Michael J. Smith, Wei-Han Lien
Vector index instruction for generating a result vector with incremental values based on a start value and an increment value

Patent number: 8793472

Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector.

Type: Grant

Filed: November 8, 2011

Date of Patent: July 29, 2014

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
VECTOR ELEMENT ROTATE AND INSERT UNDER MASK INSTRUCTION

Publication number: 20140208067

Abstract: A Vector Element Rotate and Insert Under Mask instruction. Each element of a second operand of the instruction is rotated in a specified direction by a specified number of bits. For each bit in a third operand of the instruction that is set to one, the corresponding bit of the rotated elements in the second operand replaces the corresponding bit in a first operand of the instruction.

Type: Application

Filed: January 23, 2013

Publication date: July 24, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
VECTOR GENERATE MASK INSTRUCTION

Publication number: 20140208066

Abstract: A Vector Generate Mask instruction. For each element in the first operand, a bit mask is generated. The mask includes bits set to a selected value starting at a position specified by a first field of the instruction and ending at a position specified by a second field of the instruction.

Type: Application

Filed: January 23, 2013

Publication date: July 24, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
THREE INPUT OPERAND VECTOR ADD INSTRUCTION THAT DOES NOT RAISE ARITHMETIC FLAGS FOR CRYPTOGRAPHIC APPLICATIONS

Publication number: 20140195817

Abstract: A method is described that includes performing the following within an instruction execution pipeline implemented on a semiconductor chip: summing three input vector operands through execution of a single instruction; and, not raising any arithmetic flags even though a result of the summing creates more bits than circuitry designed to transport the summation is able to transport.

Type: Application

Filed: December 23, 2011

Publication date: July 10, 2014

Applicant: INTEL CORPORATION

Inventors: Wajdi K. Feghali, Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Gilbert M. Wolrich, Kirk S. Yap, Sean M. Gulley, Martin G. Dixon
MEMORY ACCESS FOR A VECTOR PROCESSOR

Publication number: 20140195776

Abstract: A method and device for memory access in processors is provided. A processor, comprising a plurality of computational units, is capable of executing a single instruction on multiple pieces of data simultaneously (SIMD). A read operation is initiated to load data from memory into the plurality of computational units (CUs) arranged into a plurality of CU groups. The memory is arranged into a plurality of memory macro-blocks each associated with a respective CU group of the plurality of CU groups. For each CU group a respective first memory address is determined and for each CU group, the data in the associated memory macro-block is accessed at the respective first memory address.

Type: Application

Filed: January 9, 2013

Publication date: July 10, 2014

Applicant: COGNIVUE CORPORATION

Inventors: Malcolm STEWART, Ali Osman ORS, Daniel LAROCHE
INSTRUCTION FOR FAST ZUC ALGORITHM PROCESSING

Publication number: 20140189290

Abstract: Vector instructions for performing ZUC stream cipher operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first vector instruction to perform an update to a liner feedback shift register (LFSR), and receives a second vector instruction to perform an update to a state of a finite state machine (FSM), where the FSM receives inputs from re-ordered bits of the LFSR. The execution circuitry executes the first vector instruction and the second vector instruction in a single-instruction multiple data (SIMD) pipeline.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Inventors: Gilbert M. Wolrich, Vinodh Gopal, Kirk S. Yap, Wajdi K. Feghali
INSTRUCTION FOR ACCELERATING SNOW 3G WIRELESS SECURITY ALGORITHM

Publication number: 20140189289

Abstract: Vector instructions for performing SNOW 3G wireless security operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first operand of the first instruction specifying a first vector register that stores a current state of a finite state machine (FSM). The execution circuitry also receives a second operand of the first instruction specifying a second vector register that stores data elements of a liner feedback shift register (LFSR) that are needed for updating the FSM. The execution circuitry executes the first instruction to produce a updated state of the FSM and an output of the FSM in a destination operand of the first instruction.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Inventors: Gilbert M. Wolrich, Vinodh Gopal, Erdinc Ozturk, Kirk S. Yap, Wajdi K. Feghali

prev 1 2 3 4 5 6 next