Vector Processor Operation Patents (Class 712/7)
  • Patent number: 10235398
    Abstract: An object of the present invention is to efficiently perform a data load process or a data store process between a memory and a storage unit in a processor. The processor includes: a plurality of storage units associated with a plurality of data elements included in a data set; and a control unit that reads the plurality of data elements stored in adjacent storage areas from a memory, in which a plurality of the data sets is stored, collectively for respective data sets, sorts the respective read data elements to a storage unit corresponding to the data element among the plurality of storage units, and writes the data elements to the respective data sets.
    Type: Grant
    Filed: April 23, 2015
    Date of Patent: March 19, 2019
    Assignee: Renesas Electronics Corporation
    Inventor: Masayuki Kimura
  • Patent number: 10026458
    Abstract: Memories and methods for performing an atomic memory operation are disclosed, including a memory having a memory store, operation logic, and a command decoder. Operation logic can be configured to receive data and perform operations thereon in accordance with internal control signals. A command decoder can be configured to receive command packets having at least a memory command portion in which a memory command is provided and data configuration portion in which configuration information related to data associated with a command packet is provided. The command decoder is further configured to generate a command control signal based at least in part on the memory command and further configured to generate control signal based at least in part on the configuration information.
    Type: Grant
    Filed: October 21, 2010
    Date of Patent: July 17, 2018
    Assignee: Micron Technology, Inc.
    Inventor: David Resnick
  • Patent number: 9766858
    Abstract: A data processing system supports vector operands with components representing different bit significance portions of an integer number. Processing circuitry performs a processing operation specified by a program instruction in dependence upon a number of components comprising the vector as specified by metadata for the vector.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: September 19, 2017
    Assignee: ARM Limited
    Inventors: David Raymond Lutz, Neil Burgess, Christopher Neal Hinds
  • Patent number: 9696994
    Abstract: A data processing apparatus includes a comparison unit configured to perform an element comparison process performing a comparison of a first data element at a first index in the first vector with a second data element at a second index in the second vector. A hazard vector generation unit is configured to populate a hazard vector at an index determined by the first index with a value determined by the second index. The comparison unit performs the element comparison process by iteratively comparing data elements of the first vector with each element of a subset of the second vector. It then determines the subset of the second vector as those data elements at indices in the second vector which are less than a current index of the first vector and which are greater than previously determined values of the second index for which the match condition was true.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: July 4, 2017
    Assignee: ARM LIMITED
    Inventor: Alastair David Reid
  • Patent number: 9411842
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.
    Type: Grant
    Filed: August 1, 2013
    Date of Patent: August 9, 2016
    Assignee: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Patent number: 9335997
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping rotate previous operation dependent upon the input vectors.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: May 10, 2016
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 9218182
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor a data element shuffle and an operation on the shuffled data elements in response to a single data element shuffle and an operation instruction that includes a destination vector register operand, a first and second source vector register operands, an immediate value, and an opcode are described.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: December 22, 2015
    Assignee: Intel Corporation
    Inventors: Igor Ermolaev, Elmoustapha Ould-Ahmed-Vall, Bret Toll, Jesus Corbal, Andrey Naraikin
  • Patent number: 9210480
    Abstract: A video processing system includes a video encoder that encodes a video stream into an independent video layer stream and a first dependent video layer stream based on a motion vector data or grayscale and color data.
    Type: Grant
    Filed: December 20, 2007
    Date of Patent: December 8, 2015
    Assignee: BROADCOM CORPORATION
    Inventors: Stephen E. Gordon, Sherman Chen, Michael Dove, David Rosmann, Thomas J. Quigley, Jeyhan Karaoguz
  • Patent number: 9164767
    Abstract: In a vector processing device, a data dependence detecting unit detects a data dependence relation between a preceding instruction and a succeeding instruction which are inputted from an instruction buffer, and an instruction issuance control unit controls issuance of an instruction based on a detection result thereof. When there is a data dependence relation between the preceding instruction and the succeeding instruction, the instruction issuance control unit generates a new instruction equivalent to processing related to a vector register including the data dependence relation with the succeeding instruction in processing executed by the preceding instruction and issues the new instruction between the preceding instruction and the succeeding instruction, and thereby a data hazard can be avoided between the preceding instruction and the succeeding instruction without making a stall occur.
    Type: Grant
    Filed: December 13, 2012
    Date of Patent: October 20, 2015
    Assignee: SOCIONEXT INC.
    Inventor: Takashi Nishikawa
  • Patent number: 9160338
    Abstract: A method for implementing an adaptive interface between at least one FPGA with at least one FPGA application and at least one I/O module, which are designed as the corresponding sender side or receiver side, for connection to the FPGA, whereby a serial interface is formed between the at least one FPGA and the at least one I/O module, comprising the steps of configuring a maximum number of registers to be transmitted for each FPGA application, configuring a shared, fixed register width for all registers, setting an enable signal on the sender side for the registers to be transmitted out of the maximum number of registers to be transmitted, transmitting the enable signal from the sender side to the receiver side, and transmitting the registers, for which the enable signal is set, from the sender side to the receiver side.
    Type: Grant
    Filed: May 12, 2014
    Date of Patent: October 13, 2015
    Assignee: dSPACE digital signal processing and control engineering GmbH
    Inventors: Dirk Hasse, Robert Polnau
  • Patent number: 9092256
    Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.
    Type: Grant
    Filed: December 6, 2012
    Date of Patent: July 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9092257
    Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: July 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9047094
    Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment according to the invention, there is provided a computer processor, the processor comprising: a decode unit for decoding instruction packets fetched from a memory holding a sequence of instruction packets; and first and second processing channels, each channel comprising a plurality of functional units, wherein the first processing channel is capable of performing control operations and comprises a control register file having a relatively narrower bit width, and the second processing channel is capable of performing data processing operations at least one input of which is a vector and comprises a data register file having a relatively wider bit width.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: June 2, 2015
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Publication number: 20150149744
    Abstract: A data processing apparatus and method are provided for processing execution threads, where each execution thread specifies at least one instruction. The data processing apparatus has a vector processing unit providing a plurality M of lanes of parallel processing, within each lane the vector processing unit being configured to perform a processing operation on a data element input to that lane for each of one or more input operands. A vector instruction is received that is specified by a group of the execution threads, that vector instruction identifying an associated processing operation and also providing an indication of the data elements of each input operand that are to be subjected to that associated processing operation. Vector merge circuitry then determines, based on that information, a required number of lanes of parallel processing for performing the associated processing operation.
    Type: Application
    Filed: October 2, 2014
    Publication date: May 28, 2015
    Inventor: Ronny PEDERSEN
  • Publication number: 20150143079
    Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision correlation/covariance vector processing operations with reduced sample re-fetching and/or power consumption are disclosed. The VPEs disclosed herein are configured to provide correlation/covariance vector processing operations, such as code division multiple access (CDMA) correlation/covariance vector processing operations as a non-limiting example. A tapped-delay line(s) is included in the data flow paths between memory and execution units in the VPE. The tapped-delay line (s) is configured to receive and provide an input vector data sample set to execution units for performing correlation/covariance vector processing operations.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
  • Publication number: 20150143076
    Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Publication number: 20150143077
    Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Publication number: 20150143080
    Abstract: Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.
    Type: Application
    Filed: December 9, 2014
    Publication date: May 21, 2015
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz
  • Publication number: 20150143078
    Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption are disclosed. Related vector processor systems and methods are also disclosed. The VPEs are configured to provide filter vector processing operations. To minimize re-fetching of input vector data samples from memory to reduce power consumption, a tapped-delay line(s) is included in the data flow paths between a vector data file and execution units in the VPE. The tapped-delay line(s) is configured to receive and provide input vector data sample sets to execution units for performing filter vector processing operations. The tapped-delay line(s) is also configured to shift the input vector data sample set for filter delay taps and provide the shifted input vector data sample set to execution units, so the shifted input vector data sample set does not have to be re-fetched during filter vector processing operations.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
  • Publication number: 20150121036
    Abstract: A data processing system 2 includes a single instruction multiple data register file 12 and single instruction multiple processing circuitry 14. The single instruction multiple data processing circuitry 14 supports execution of cryptographic processing instructions for performing parts of a hash algorithm. The operands are stored within the single instruction multiple data register file 12. The cryptographic support instructions do not follow normal lane-based processing and generate output operands in which the different portions of the output operand depend upon multiple different elements within the input operand.
    Type: Application
    Filed: December 30, 2014
    Publication date: April 30, 2015
    Inventors: Matthew James HORSNELL, Richard Roy GRISENTHWAITE, Stuart David BILES, Daniel KERSHAW
  • Publication number: 20150113246
    Abstract: Instructions and logic provide conversions between a mask register and a general purpose register or memory. Some embodiments, responsive to an instruction specifying: a destination operand, a mask length corresponding to a number of mask data fields, and a source operand; values are read from data fields in the source operand, corresponding to the specified mask length, and stored to corresponding data fields in the destination operand specified by the instruction, wherein one of the source or the destination operands is a mask register. Values indicative of masked vector elements may be stored to any data fields in the destination operand other than the number of data fields corresponding to the specified mask length. For some embodiments, the other one of the source or the destination operands may be a general purpose register or a memory location.
    Type: Application
    Filed: November 25, 2011
    Publication date: April 23, 2015
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Robert Valentine, Bret L. Toll, Mark J. Charney
  • Patent number: 9009447
    Abstract: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.
    Type: Grant
    Filed: July 18, 2011
    Date of Patent: April 14, 2015
    Assignee: Oracle International Corporation
    Inventor: Darryl J. Gove
  • Publication number: 20150100755
    Abstract: A data processing apparatus and a method of controlling performance of speculative vector operations are provided. The apparatus comprises processing circuitry for performing a sequence of speculative vector operations on vector operands, each vector operand comprising a plurality of vector elements, and speculation control circuitry for maintaining a speculation width indication indicating the number of vector elements of each vector operand to be subjected to the speculative vector operations. The speculation width indication is set to an initial value prior to performance of the sequence of speculative vector operations. The processing circuitry generates progress indications during performance of the sequence of speculative vector operations, and the speculation control circuitry detects, with reference to the progress indications and speculation reduction criteria, presence of a speculation reduction condition.
    Type: Application
    Filed: August 18, 2014
    Publication date: April 9, 2015
    Inventors: Alastair David REID, Daniel KERSHAW
  • Patent number: 8996845
    Abstract: A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.
    Type: Grant
    Filed: December 22, 2009
    Date of Patent: March 31, 2015
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Andrew T. Forsyth
  • Publication number: 20150089190
    Abstract: In an embodiment, a processor includes a register attribute tracker configured to track one or more attributes corresponding to registers. The register attribute tracker may track the attributes associated with the registers when those registers are used as output registers of instructions that explicitly define the attributes and, if the register attribute tracker has a tracked attribute associated with an input register of an instruction that does not explicitly define the attribute, the register attribute tracker may annotate the instruction with an attribute and/or associate an attribute with the output register of the instruction in the register attribute tracker.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20150089191
    Abstract: In an embodiment, a processor includes an issue circuit configured to issue instruction operations for execution. The issue circuit may be configured to monitor the source operands of the instruction operations, and to issue instruction operations for which the source operands (including predicate operands, as appropriate) are resolved. Additionally, the issue circuit may be configured to detect a null predicate that indicates that none of the vector elements will be modified by a corresponding instruction operation. The issue circuit may be configured to issue the corresponding instruction operation with the null predicate even if other source operands are not yet resolved.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20150089192
    Abstract: In an embodiment, a processor may be configured to dynamically infer one or more attributes of input and/or output registers of an instruction, given the attributes corresponding to at least one input registers. The inference may be made at the issue circuit/stage of the processor, for those registers that do not have attribute information at the issue circuit/stage. In an embodiment, the processor may also include a register attribute tracker configured to track attributes of registers prior to the issue stage of the processor pipeline. The processor may feed back, to the register attribute tracker, inferred attributes and the register addresses of the registers to which the inferred attributes apply. The register attribute tracker may be configured to may associate the inferred attribute with the identified register attribute tracker may also be configured to infer input register attributes from other input register attributes.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8984260
    Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: March 17, 2015
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20150074374
    Abstract: An asynchronous processing system comprising an asynchronous scalar processor and an asynchronous vector processor coupled to the scalar processor. The asynchronous scalar processor is configured to perform processing functions on input data and to output instructions. The asynchronous vector processor is configured to perform processing functions in response to a very long instruction word (VLIW) received from the scalar processor. The VLIW comprises a first portion and a second portion, at least the first portion comprising a vector instruction.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Qifan Zhang, Wuxian Shi, Yiqun Ge, Tao Huang, Wen Tong
  • Publication number: 20150074373
    Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.
    Type: Application
    Filed: June 2, 2012
    Publication date: March 12, 2015
    Inventors: Zeev Sperber, Robert Valentine, Shlomo Raikin, Stanislav Shwartsman, Gal Ofir, Igor Yanover, Guy Patkin, Levy Ofer
  • Patent number: 8977835
    Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.
    Type: Grant
    Filed: November 14, 2013
    Date of Patent: March 10, 2015
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
  • Patent number: 8972698
    Abstract: A processing core implemented on a semiconductor chip is described having first execution unit logic circuitry that includes first comparison circuitry to compare each element in a first input vector against every element of a second input vector. The processing core also has second execution logic circuitry that includes second comparison circuitry to compare a first input value against every data element of an input vector.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: March 3, 2015
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mark J. Charney, Yen-Kuang Chen, Jesus Corbal, Andrew T. Forsyth, Milind B. Girkar, Jonathan C. Hall, Hideki Ido, Robert Valentine, Jeffrey Wiedemeier
  • Publication number: 20150046673
    Abstract: A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.
    Type: Application
    Filed: August 12, 2014
    Publication date: February 12, 2015
    Inventors: Brendan BARRY, Fergal CONNOR, Martin O'RIORDAN, David MOLONEY, Sean POWER
  • Publication number: 20150046675
    Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.
    Type: Application
    Filed: August 12, 2014
    Publication date: February 12, 2015
    Inventors: Brendan BARRY, Richard RICHMOND, Fergal CONNOR, David MOLONEY
  • Publication number: 20150046674
    Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.
    Type: Application
    Filed: August 12, 2014
    Publication date: February 12, 2015
    Inventors: Brendan BARRY, Richard RICHMOND, Fergal CONNOR, David MOLONEY
  • Patent number: 8949575
    Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.
    Type: Grant
    Filed: December 14, 2011
    Date of Patent: February 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
  • Publication number: 20140372728
    Abstract: A vector execution unit for use in a digital signal processor enables a new set of instructions. The unit comprises a first input port for receiving at least a first input data vector, an instruction decoder, a vector output port, and least one data-path. The instruction decoding unit is arranged to control the data-path to perform a comparison related to the first input data vector, and the processor comprises an integer port arranged to output the result of the comparison in the form of a decision vector to a memory unit or a functional unit in the digital signal processor. Alternatively or in addition, the integer port is also arranged to receive a decision vector of integer data, and the instruction decoding unit is arranged to control the data-path to process the first input data in dependence of the value of the integer data.
    Type: Application
    Filed: November 28, 2012
    Publication date: December 18, 2014
    Applicant: Media Tek Sweden AB
    Inventors: Anders Nilsson, Eric Tell
  • Patent number: 8914613
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: August 26, 2011
    Date of Patent: December 16, 2014
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Publication number: 20140359252
    Abstract: A multicore processor is achieved by a processor assembly, comprising a first processor having a first core and at least a first and a second unit, each being selected from the group of vector execution units, memory units and accelerators, said first core and first and second units being interconnected by a first network, and a second processor having a second core wherein the first core is arranged to enable the second core to control at least one of the units in the first processor. Each processors generally comprises a combination of execution units, memory units and accelerators, which may be controlled and/or accessed by units in the other processor.
    Type: Application
    Filed: November 28, 2012
    Publication date: December 4, 2014
    Applicant: Media Tek Sweden AB
    Inventors: Anders Nilsson, Eric Tell
  • Publication number: 20140344549
    Abstract: The invention relates to a digital signal processor comprising a processor core, an integer execution unit and a number of vector execution units, said digital signal processor comprising a program memory arranged to hold instructions for the execution units and issue logic for issuing instructions. The digital signal processor comprises an issue control unit for selecting at least two execution units that are to receive and execute the same instruction at the same time, and logic for sending the instruction to said at least two execution units.
    Type: Application
    Filed: November 28, 2012
    Publication date: November 20, 2014
    Applicant: Media Tek Sweden AB
    Inventors: Anders Nilsson, Eric Tell
  • Patent number: 8892853
    Abstract: An image processing system including a vector processor and a memory adapted for attaching to the vector processor. The memory is adapted to store multiple image frames. The vector processor includes an address generator operatively attached to the memory to access the memory. The address generator is adapted for calculating addresses of the memory over the multiple image frames. The addresses may be calculated over the image frames based upon an image parameter. The image parameter may specify which of the image frames are processed simultaneously. A scalar processor may be attached to the vector processor. The scalar processor provides the image parameter(s) to the address generator for address calculation over the multiple image frames. An input register may be attached to the vector processor. The input register may be adapted to receive a very long instruction word (VLIW) instruction.
    Type: Grant
    Filed: June 10, 2010
    Date of Patent: November 18, 2014
    Assignee: Mobileye Technologies Limited
    Inventors: Yosef Kreinin, Gil Dogon, Emmanuel Sixsou, Yosi Arbeli, Mois Navon, Roman Sajman
  • Patent number: 8892849
    Abstract: A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.
    Type: Grant
    Filed: October 15, 2009
    Date of Patent: November 18, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Erdem Hokenek, Mayan Moudgill, Michael J. Schulte, C. John Glossner
  • Patent number: 8874879
    Abstract: A vector processing circuit includes a vector register file including a plurality of array elements, a command issuance control circuit, and a plurality of pipeline arithmetic units. Each pipeline arithmetic unit performs arithmetic processing of data stored in the array elements indicated as a source by one command in parts through a plurality of cycles and stores the result in the array elements indicated as a destination by the one command through a plurality of cycles. When data word length of a preceding command is longer than that of a subsequent command, the command issuance control circuit changes data sizes of the array elements in accordance with data word length of the command and determines whether there is register interference between the array element to be processed at a non-head cycle of the preceding command, and the array element to be processed at a head cycle of the subsequent command.
    Type: Grant
    Filed: October 24, 2011
    Date of Patent: October 28, 2014
    Assignee: Fujitsu Limited
    Inventors: Yi Ge, Yoshimasa Takebe, Hiromasa Takahashi
  • Patent number: 8868885
    Abstract: A device system and method for processing program instructions, for example, to execute intra vector operations. A fetch unit may receive a program instruction defining different operations on data elements stored at the same vector memory address. A processor may include different types of execution units each executing a different one of a predetermined plurality of elemental instructions. Each program instruction may be a combination of one or more of the elemental instructions. The processor may receive a vector of data elements stored non-consecutively at the same vector memory address to be processed by a same one of the elemental instructions and a vector of configuration values independently associated with executing the same elemental instruction on the non-consecutive data elements. At least two configuration values may be different to implement different operations by executing the same elemental instruction using the different configuration values on the vector of non-consecutive data elements.
    Type: Grant
    Filed: November 18, 2010
    Date of Patent: October 21, 2014
    Assignee: Ceva D.S.P. Ltd.
    Inventors: Yaakov Dekter, Michael Boukaya, Shai Shpigelblat, Moshe Steinberg
  • Patent number: 8856492
    Abstract: The present application relates to a method for processing data in a vector processor. The present application relates also to a vector processor for performing said method and a cellular communication device comprising said vector processor. The method for processing data in a vector processor comprises executing segmented operations on a segment of a vector for generating results, collecting the results of the segmented operations, and delivering the results in a result vector in such a way that subsequent operations remain processing in vector mode.
    Type: Grant
    Filed: May 29, 2009
    Date of Patent: October 7, 2014
    Assignee: NXP B.V.
    Inventors: Mahima Smriti, Jean-Paul Charles Francois Hubert Smeets, Willem Egbert Hendrik Kloosterhuis
  • Publication number: 20140289495
    Abstract: Systems, apparatuses and methods for utilizing enhanced predicate registers which specify the element width and which elements are to be processed. The predicate size is dynamic, depending on the contents of the enhanced predicate register used for an instruction rather than being a static quality of a specific instruction. Specifying the element size in the enhanced predicate registers results in fewer instructions in an instruction set.
    Type: Application
    Filed: March 18, 2014
    Publication date: September 25, 2014
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20140281371
    Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Inventors: HARIHARAN THANTRY, MANI AZIMI
  • Publication number: 20140281373
    Abstract: A digital signal processor has a vector execution unit arranged to execute instructions on multiple data in the form of a vector, comprising a local queue arranged to receive instructions from a program memory and to hold them in the local queue until a predefined condition is fulfilled. The local queue being arranged to receive a sequence of instructions at a time from the program memory and to store the last N instructions, N being an integer. A vector controller in the vector execution unit comprises queue control means arranged to make the local queue repeat a sequence of M instructions stored in the local queue, M being an integer less than or equal to N, a number K of times. This reduces the time the vector execution unit is kept waiting because of IDLE commands in the program memory.
    Type: Application
    Filed: September 17, 2012
    Publication date: September 18, 2014
    Applicant: MediaTek Sweden AB
    Inventor: Anders Nilsson
  • Publication number: 20140281372
    Abstract: An example method for placing one or more element data values into an output vector includes identifying a vertical permute control vector including a plurality of elements, each element of the plurality of elements including a register address. The method also includes for each element of the plurality of elements, reading a register address from the vertical permute control vector. The method further includes retrieving a plurality of element data values based on the register address. The method also includes identifying a horizontal permute control vector including a set of addresses corresponding to an output vector. The method further includes placing at least some of the retrieved element data values of the plurality of element data values into the output vector based on the set of addresses in the horizontal permute control vector.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Ajay Anant Ingle, David J. Hoyle, Marc M. Hoffman
  • Publication number: 20140281370
    Abstract: Embodiments disclosed herein include vector processing engines (VPEs) having programmable data path configurations for providing multi-mode vector processing. Related vector processors, systems, and methods are also disclosed. The VPEs include a vector processing stage(s) configured to process vector data according to a vector instruction executed in the vector processing stage. Each vector processing stage includes vector processing blocks each configured to process vector data based on the vector instruction being executed. The vector processing blocks are capable of providing different vector operations for different types of vector instructions based on data path configurations. Data paths of the vector processing blocks are programmable to be reprogrammable to process vector data differently according to the particular vector instruction being executed.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM Incorporated
    Inventor: Raheel Khan