Vector Processor Operation Patents (Class 712/7)
  • Patent number: 9092257
    Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: July 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9047094
    Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment according to the invention, there is provided a computer processor, the processor comprising: a decode unit for decoding instruction packets fetched from a memory holding a sequence of instruction packets; and first and second processing channels, each channel comprising a plurality of functional units, wherein the first processing channel is capable of performing control operations and comprises a control register file having a relatively narrower bit width, and the second processing channel is capable of performing data processing operations at least one input of which is a vector and comprises a data register file having a relatively wider bit width.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: June 2, 2015
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Publication number: 20150149744
    Abstract: A data processing apparatus and method are provided for processing execution threads, where each execution thread specifies at least one instruction. The data processing apparatus has a vector processing unit providing a plurality M of lanes of parallel processing, within each lane the vector processing unit being configured to perform a processing operation on a data element input to that lane for each of one or more input operands. A vector instruction is received that is specified by a group of the execution threads, that vector instruction identifying an associated processing operation and also providing an indication of the data elements of each input operand that are to be subjected to that associated processing operation. Vector merge circuitry then determines, based on that information, a required number of lanes of parallel processing for performing the associated processing operation.
    Type: Application
    Filed: October 2, 2014
    Publication date: May 28, 2015
    Inventor: Ronny PEDERSEN
  • Publication number: 20150143077
    Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Publication number: 20150143076
    Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Publication number: 20150143079
    Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision correlation/covariance vector processing operations with reduced sample re-fetching and/or power consumption are disclosed. The VPEs disclosed herein are configured to provide correlation/covariance vector processing operations, such as code division multiple access (CDMA) correlation/covariance vector processing operations as a non-limiting example. A tapped-delay line(s) is included in the data flow paths between memory and execution units in the VPE. The tapped-delay line (s) is configured to receive and provide an input vector data sample set to execution units for performing correlation/covariance vector processing operations.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
  • Publication number: 20150143080
    Abstract: Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.
    Type: Application
    Filed: December 9, 2014
    Publication date: May 21, 2015
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz
  • Publication number: 20150143078
    Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption are disclosed. Related vector processor systems and methods are also disclosed. The VPEs are configured to provide filter vector processing operations. To minimize re-fetching of input vector data samples from memory to reduce power consumption, a tapped-delay line(s) is included in the data flow paths between a vector data file and execution units in the VPE. The tapped-delay line(s) is configured to receive and provide input vector data sample sets to execution units for performing filter vector processing operations. The tapped-delay line(s) is also configured to shift the input vector data sample set for filter delay taps and provide the shifted input vector data sample set to execution units, so the shifted input vector data sample set does not have to be re-fetched during filter vector processing operations.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
  • Publication number: 20150121036
    Abstract: A data processing system 2 includes a single instruction multiple data register file 12 and single instruction multiple processing circuitry 14. The single instruction multiple data processing circuitry 14 supports execution of cryptographic processing instructions for performing parts of a hash algorithm. The operands are stored within the single instruction multiple data register file 12. The cryptographic support instructions do not follow normal lane-based processing and generate output operands in which the different portions of the output operand depend upon multiple different elements within the input operand.
    Type: Application
    Filed: December 30, 2014
    Publication date: April 30, 2015
    Inventors: Matthew James HORSNELL, Richard Roy GRISENTHWAITE, Stuart David BILES, Daniel KERSHAW
  • Publication number: 20150113246
    Abstract: Instructions and logic provide conversions between a mask register and a general purpose register or memory. Some embodiments, responsive to an instruction specifying: a destination operand, a mask length corresponding to a number of mask data fields, and a source operand; values are read from data fields in the source operand, corresponding to the specified mask length, and stored to corresponding data fields in the destination operand specified by the instruction, wherein one of the source or the destination operands is a mask register. Values indicative of masked vector elements may be stored to any data fields in the destination operand other than the number of data fields corresponding to the specified mask length. For some embodiments, the other one of the source or the destination operands may be a general purpose register or a memory location.
    Type: Application
    Filed: November 25, 2011
    Publication date: April 23, 2015
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Robert Valentine, Bret L. Toll, Mark J. Charney
  • Patent number: 9009447
    Abstract: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.
    Type: Grant
    Filed: July 18, 2011
    Date of Patent: April 14, 2015
    Assignee: Oracle International Corporation
    Inventor: Darryl J. Gove
  • Publication number: 20150100755
    Abstract: A data processing apparatus and a method of controlling performance of speculative vector operations are provided. The apparatus comprises processing circuitry for performing a sequence of speculative vector operations on vector operands, each vector operand comprising a plurality of vector elements, and speculation control circuitry for maintaining a speculation width indication indicating the number of vector elements of each vector operand to be subjected to the speculative vector operations. The speculation width indication is set to an initial value prior to performance of the sequence of speculative vector operations. The processing circuitry generates progress indications during performance of the sequence of speculative vector operations, and the speculation control circuitry detects, with reference to the progress indications and speculation reduction criteria, presence of a speculation reduction condition.
    Type: Application
    Filed: August 18, 2014
    Publication date: April 9, 2015
    Inventors: Alastair David REID, Daniel KERSHAW
  • Patent number: 8996845
    Abstract: A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.
    Type: Grant
    Filed: December 22, 2009
    Date of Patent: March 31, 2015
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Andrew T. Forsyth
  • Publication number: 20150089190
    Abstract: In an embodiment, a processor includes a register attribute tracker configured to track one or more attributes corresponding to registers. The register attribute tracker may track the attributes associated with the registers when those registers are used as output registers of instructions that explicitly define the attributes and, if the register attribute tracker has a tracked attribute associated with an input register of an instruction that does not explicitly define the attribute, the register attribute tracker may annotate the instruction with an attribute and/or associate an attribute with the output register of the instruction in the register attribute tracker.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20150089191
    Abstract: In an embodiment, a processor includes an issue circuit configured to issue instruction operations for execution. The issue circuit may be configured to monitor the source operands of the instruction operations, and to issue instruction operations for which the source operands (including predicate operands, as appropriate) are resolved. Additionally, the issue circuit may be configured to detect a null predicate that indicates that none of the vector elements will be modified by a corresponding instruction operation. The issue circuit may be configured to issue the corresponding instruction operation with the null predicate even if other source operands are not yet resolved.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20150089192
    Abstract: In an embodiment, a processor may be configured to dynamically infer one or more attributes of input and/or output registers of an instruction, given the attributes corresponding to at least one input registers. The inference may be made at the issue circuit/stage of the processor, for those registers that do not have attribute information at the issue circuit/stage. In an embodiment, the processor may also include a register attribute tracker configured to track attributes of registers prior to the issue stage of the processor pipeline. The processor may feed back, to the register attribute tracker, inferred attributes and the register addresses of the registers to which the inferred attributes apply. The register attribute tracker may be configured to may associate the inferred attribute with the identified register attribute tracker may also be configured to infer input register attributes from other input register attributes.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8984260
    Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: March 17, 2015
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20150074373
    Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.
    Type: Application
    Filed: June 2, 2012
    Publication date: March 12, 2015
    Inventors: Zeev Sperber, Robert Valentine, Shlomo Raikin, Stanislav Shwartsman, Gal Ofir, Igor Yanover, Guy Patkin, Levy Ofer
  • Publication number: 20150074374
    Abstract: An asynchronous processing system comprising an asynchronous scalar processor and an asynchronous vector processor coupled to the scalar processor. The asynchronous scalar processor is configured to perform processing functions on input data and to output instructions. The asynchronous vector processor is configured to perform processing functions in response to a very long instruction word (VLIW) received from the scalar processor. The VLIW comprises a first portion and a second portion, at least the first portion comprising a vector instruction.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Qifan Zhang, Wuxian Shi, Yiqun Ge, Tao Huang, Wen Tong
  • Patent number: 8977835
    Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.
    Type: Grant
    Filed: November 14, 2013
    Date of Patent: March 10, 2015
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
  • Patent number: 8972698
    Abstract: A processing core implemented on a semiconductor chip is described having first execution unit logic circuitry that includes first comparison circuitry to compare each element in a first input vector against every element of a second input vector. The processing core also has second execution logic circuitry that includes second comparison circuitry to compare a first input value against every data element of an input vector.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: March 3, 2015
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mark J. Charney, Yen-Kuang Chen, Jesus Corbal, Andrew T. Forsyth, Milind B. Girkar, Jonathan C. Hall, Hideki Ido, Robert Valentine, Jeffrey Wiedemeier
  • Publication number: 20150046673
    Abstract: A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.
    Type: Application
    Filed: August 12, 2014
    Publication date: February 12, 2015
    Inventors: Brendan BARRY, Fergal CONNOR, Martin O'RIORDAN, David MOLONEY, Sean POWER
  • Publication number: 20150046675
    Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.
    Type: Application
    Filed: August 12, 2014
    Publication date: February 12, 2015
    Inventors: Brendan BARRY, Richard RICHMOND, Fergal CONNOR, David MOLONEY
  • Publication number: 20150046674
    Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.
    Type: Application
    Filed: August 12, 2014
    Publication date: February 12, 2015
    Inventors: Brendan BARRY, Richard RICHMOND, Fergal CONNOR, David MOLONEY
  • Patent number: 8949575
    Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.
    Type: Grant
    Filed: December 14, 2011
    Date of Patent: February 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
  • Publication number: 20140372728
    Abstract: A vector execution unit for use in a digital signal processor enables a new set of instructions. The unit comprises a first input port for receiving at least a first input data vector, an instruction decoder, a vector output port, and least one data-path. The instruction decoding unit is arranged to control the data-path to perform a comparison related to the first input data vector, and the processor comprises an integer port arranged to output the result of the comparison in the form of a decision vector to a memory unit or a functional unit in the digital signal processor. Alternatively or in addition, the integer port is also arranged to receive a decision vector of integer data, and the instruction decoding unit is arranged to control the data-path to process the first input data in dependence of the value of the integer data.
    Type: Application
    Filed: November 28, 2012
    Publication date: December 18, 2014
    Applicant: Media Tek Sweden AB
    Inventors: Anders Nilsson, Eric Tell
  • Patent number: 8914613
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: August 26, 2011
    Date of Patent: December 16, 2014
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Publication number: 20140359252
    Abstract: A multicore processor is achieved by a processor assembly, comprising a first processor having a first core and at least a first and a second unit, each being selected from the group of vector execution units, memory units and accelerators, said first core and first and second units being interconnected by a first network, and a second processor having a second core wherein the first core is arranged to enable the second core to control at least one of the units in the first processor. Each processors generally comprises a combination of execution units, memory units and accelerators, which may be controlled and/or accessed by units in the other processor.
    Type: Application
    Filed: November 28, 2012
    Publication date: December 4, 2014
    Applicant: Media Tek Sweden AB
    Inventors: Anders Nilsson, Eric Tell
  • Publication number: 20140344549
    Abstract: The invention relates to a digital signal processor comprising a processor core, an integer execution unit and a number of vector execution units, said digital signal processor comprising a program memory arranged to hold instructions for the execution units and issue logic for issuing instructions. The digital signal processor comprises an issue control unit for selecting at least two execution units that are to receive and execute the same instruction at the same time, and logic for sending the instruction to said at least two execution units.
    Type: Application
    Filed: November 28, 2012
    Publication date: November 20, 2014
    Applicant: Media Tek Sweden AB
    Inventors: Anders Nilsson, Eric Tell
  • Patent number: 8892849
    Abstract: A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.
    Type: Grant
    Filed: October 15, 2009
    Date of Patent: November 18, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Erdem Hokenek, Mayan Moudgill, Michael J. Schulte, C. John Glossner
  • Patent number: 8892853
    Abstract: An image processing system including a vector processor and a memory adapted for attaching to the vector processor. The memory is adapted to store multiple image frames. The vector processor includes an address generator operatively attached to the memory to access the memory. The address generator is adapted for calculating addresses of the memory over the multiple image frames. The addresses may be calculated over the image frames based upon an image parameter. The image parameter may specify which of the image frames are processed simultaneously. A scalar processor may be attached to the vector processor. The scalar processor provides the image parameter(s) to the address generator for address calculation over the multiple image frames. An input register may be attached to the vector processor. The input register may be adapted to receive a very long instruction word (VLIW) instruction.
    Type: Grant
    Filed: June 10, 2010
    Date of Patent: November 18, 2014
    Assignee: Mobileye Technologies Limited
    Inventors: Yosef Kreinin, Gil Dogon, Emmanuel Sixsou, Yosi Arbeli, Mois Navon, Roman Sajman
  • Patent number: 8874879
    Abstract: A vector processing circuit includes a vector register file including a plurality of array elements, a command issuance control circuit, and a plurality of pipeline arithmetic units. Each pipeline arithmetic unit performs arithmetic processing of data stored in the array elements indicated as a source by one command in parts through a plurality of cycles and stores the result in the array elements indicated as a destination by the one command through a plurality of cycles. When data word length of a preceding command is longer than that of a subsequent command, the command issuance control circuit changes data sizes of the array elements in accordance with data word length of the command and determines whether there is register interference between the array element to be processed at a non-head cycle of the preceding command, and the array element to be processed at a head cycle of the subsequent command.
    Type: Grant
    Filed: October 24, 2011
    Date of Patent: October 28, 2014
    Assignee: Fujitsu Limited
    Inventors: Yi Ge, Yoshimasa Takebe, Hiromasa Takahashi
  • Patent number: 8868885
    Abstract: A device system and method for processing program instructions, for example, to execute intra vector operations. A fetch unit may receive a program instruction defining different operations on data elements stored at the same vector memory address. A processor may include different types of execution units each executing a different one of a predetermined plurality of elemental instructions. Each program instruction may be a combination of one or more of the elemental instructions. The processor may receive a vector of data elements stored non-consecutively at the same vector memory address to be processed by a same one of the elemental instructions and a vector of configuration values independently associated with executing the same elemental instruction on the non-consecutive data elements. At least two configuration values may be different to implement different operations by executing the same elemental instruction using the different configuration values on the vector of non-consecutive data elements.
    Type: Grant
    Filed: November 18, 2010
    Date of Patent: October 21, 2014
    Assignee: Ceva D.S.P. Ltd.
    Inventors: Yaakov Dekter, Michael Boukaya, Shai Shpigelblat, Moshe Steinberg
  • Patent number: 8856492
    Abstract: The present application relates to a method for processing data in a vector processor. The present application relates also to a vector processor for performing said method and a cellular communication device comprising said vector processor. The method for processing data in a vector processor comprises executing segmented operations on a segment of a vector for generating results, collecting the results of the segmented operations, and delivering the results in a result vector in such a way that subsequent operations remain processing in vector mode.
    Type: Grant
    Filed: May 29, 2009
    Date of Patent: October 7, 2014
    Assignee: NXP B.V.
    Inventors: Mahima Smriti, Jean-Paul Charles Francois Hubert Smeets, Willem Egbert Hendrik Kloosterhuis
  • Publication number: 20140289495
    Abstract: Systems, apparatuses and methods for utilizing enhanced predicate registers which specify the element width and which elements are to be processed. The predicate size is dynamic, depending on the contents of the enhanced predicate register used for an instruction rather than being a static quality of a specific instruction. Specifying the element size in the enhanced predicate registers results in fewer instructions in an instruction set.
    Type: Application
    Filed: March 18, 2014
    Publication date: September 25, 2014
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20140281372
    Abstract: An example method for placing one or more element data values into an output vector includes identifying a vertical permute control vector including a plurality of elements, each element of the plurality of elements including a register address. The method also includes for each element of the plurality of elements, reading a register address from the vertical permute control vector. The method further includes retrieving a plurality of element data values based on the register address. The method also includes identifying a horizontal permute control vector including a set of addresses corresponding to an output vector. The method further includes placing at least some of the retrieved element data values of the plurality of element data values into the output vector based on the set of addresses in the horizontal permute control vector.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Ajay Anant Ingle, David J. Hoyle, Marc M. Hoffman
  • Publication number: 20140281371
    Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Inventors: HARIHARAN THANTRY, MANI AZIMI
  • Publication number: 20140281373
    Abstract: A digital signal processor has a vector execution unit arranged to execute instructions on multiple data in the form of a vector, comprising a local queue arranged to receive instructions from a program memory and to hold them in the local queue until a predefined condition is fulfilled. The local queue being arranged to receive a sequence of instructions at a time from the program memory and to store the last N instructions, N being an integer. A vector controller in the vector execution unit comprises queue control means arranged to make the local queue repeat a sequence of M instructions stored in the local queue, M being an integer less than or equal to N, a number K of times. This reduces the time the vector execution unit is kept waiting because of IDLE commands in the program memory.
    Type: Application
    Filed: September 17, 2012
    Publication date: September 18, 2014
    Applicant: MediaTek Sweden AB
    Inventor: Anders Nilsson
  • Publication number: 20140281370
    Abstract: Embodiments disclosed herein include vector processing engines (VPEs) having programmable data path configurations for providing multi-mode vector processing. Related vector processors, systems, and methods are also disclosed. The VPEs include a vector processing stage(s) configured to process vector data according to a vector instruction executed in the vector processing stage. Each vector processing stage includes vector processing blocks each configured to process vector data based on the vector instruction being executed. The vector processing blocks are capable of providing different vector operations for different types of vector instructions based on data path configurations. Data paths of the vector processing blocks are programmable to be reprogrammable to process vector data differently according to the particular vector instruction being executed.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Publication number: 20140258677
    Abstract: Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media (transitory and non-transitory) are described herein for analyzing execution of a plurality of executable instructions and, based on the analysis, providing an indication of a benefit to be obtained by vectorization of at least a subset of the plurality of executable instructions. In various embodiments, the analysis may include identification of the subset of the plurality of executable instructions suitable for conversion to one or more single-instruction multiple-data (“SIMD”) instructions.
    Type: Application
    Filed: March 5, 2013
    Publication date: September 11, 2014
    Inventors: Ruchira Sasanka, Jeffrey J. Cook, Abhinav Das, Jayaram Bobba, Michael R. Greenfield, Suresh Srinivas
  • Publication number: 20140244969
    Abstract: Disclosed is a list vector processing apparatus (LVPA) or the like which can process the indirect reference at a high speed. The LVPA includes: a gather processing unit processing a first gather instruction to store a value of a storage area accessed by only a self information processing apparatus (SelfIPA) in a plurality of information processing apparatuses according to a list vector storing an address representing a storage area read from a storage apparatus into a register, and a process of generating reference access information indicating whether being a storage area accessed by both of the SelfIPA and another information processing apparatus; a communication unit for related information; an access information operating unit to calculate an area accessed by the information processing apparatus; and a scatter processing unit processing a first scatter instruction to store a value stored in the register into the storage area accessed by only the SelfIPA.
    Type: Application
    Filed: February 24, 2014
    Publication date: August 28, 2014
    Applicant: NEC CORPORATION
    Inventor: Satoru TAGAYA
  • Publication number: 20140244970
    Abstract: For increased efficiency, a digital signal processor comprises a vector execution unit arranged to execute instructions that are to be performed on multiple data in the form of a vector, comprising a vector controller arranged to determine if an instruction is a vector instruction and, if it is, inform a count register arranged to hold the vector length, said vector controller being further arranged receive an issue signal and control the execution of instructions based on this issue signal, said vector execution unit being characterized in that it comprises a local queue arranged to receive instructions from a program memory and to hold them in the local queue until a predefined condition is fulfilled, and that the vector controller comprises queue control means arranged to control the local queue.
    Type: Application
    Filed: September 17, 2012
    Publication date: August 28, 2014
    Applicant: MEDIATEK SWEDEN AB
    Inventor: Anders Nilsson
  • Publication number: 20140215182
    Abstract: In an embodiment, an integrated circuit includes at least one processor. The processor may include a reset vector base address register configured to store a reset vector address for the processor. Responsive to a reset, the processor may be configured to capture a reset vector address on an input, updating the reset vector base address register. Upon release from reset, the processor may initiate instruction execution at the reset vector address. The integrated circuit may further include a logic circuit that is coupled to provide the reset vector address. The logic circuit may include a register that is programmable with the reset vector address. More particularly, in an embodiment, the register may be programmable via a write operation issued by the processor (e.g. a memory-mapped write operation). Accordingly, the reset vector address may be programmable in the integrated circuit, and may be changed from time to time.
    Type: Application
    Filed: January 25, 2013
    Publication date: July 31, 2014
    Applicant: Apple Inc.
    Inventors: Josh P. de Cesare, Gerard R. Williams, III, Michael J. Smith, Wei-Han Lien
  • Patent number: 8793472
    Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector.
    Type: Grant
    Filed: November 8, 2011
    Date of Patent: July 29, 2014
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20140208067
    Abstract: A Vector Element Rotate and Insert Under Mask instruction. Each element of a second operand of the instruction is rotated in a specified direction by a specified number of bits. For each bit in a third operand of the instruction that is set to one, the corresponding bit of the rotated elements in the second operand replaces the corresponding bit in a first operand of the instruction.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 24, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Publication number: 20140208066
    Abstract: A Vector Generate Mask instruction. For each element in the first operand, a bit mask is generated. The mask includes bits set to a selected value starting at a position specified by a first field of the instruction and ending at a position specified by a second field of the instruction.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 24, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Publication number: 20140195817
    Abstract: A method is described that includes performing the following within an instruction execution pipeline implemented on a semiconductor chip: summing three input vector operands through execution of a single instruction; and, not raising any arithmetic flags even though a result of the summing creates more bits than circuitry designed to transport the summation is able to transport.
    Type: Application
    Filed: December 23, 2011
    Publication date: July 10, 2014
    Applicant: INTEL CORPORATION
    Inventors: Wajdi K. Feghali, Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Gilbert M. Wolrich, Kirk S. Yap, Sean M. Gulley, Martin G. Dixon
  • Publication number: 20140195776
    Abstract: A method and device for memory access in processors is provided. A processor, comprising a plurality of computational units, is capable of executing a single instruction on multiple pieces of data simultaneously (SIMD). A read operation is initiated to load data from memory into the plurality of computational units (CUs) arranged into a plurality of CU groups. The memory is arranged into a plurality of memory macro-blocks each associated with a respective CU group of the plurality of CU groups. For each CU group a respective first memory address is determined and for each CU group, the data in the associated memory macro-block is accessed at the respective first memory address.
    Type: Application
    Filed: January 9, 2013
    Publication date: July 10, 2014
    Applicant: COGNIVUE CORPORATION
    Inventors: Malcolm STEWART, Ali Osman ORS, Daniel LAROCHE
  • Publication number: 20140189290
    Abstract: Vector instructions for performing ZUC stream cipher operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first vector instruction to perform an update to a liner feedback shift register (LFSR), and receives a second vector instruction to perform an update to a state of a finite state machine (FSM), where the FSM receives inputs from re-ordered bits of the LFSR. The execution circuitry executes the first vector instruction and the second vector instruction in a single-instruction multiple data (SIMD) pipeline.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Gilbert M. Wolrich, Vinodh Gopal, Kirk S. Yap, Wajdi K. Feghali
  • Publication number: 20140189289
    Abstract: Vector instructions for performing SNOW 3G wireless security operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first operand of the first instruction specifying a first vector register that stores a current state of a finite state machine (FSM). The execution circuitry also receives a second operand of the first instruction specifying a second vector register that stores data elements of a liner feedback shift register (LFSR) that are needed for updating the FSM. The execution circuitry executes the first instruction to produce a updated state of the FSM and an output of the FSM in a destination operand of the first instruction.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Gilbert M. Wolrich, Vinodh Gopal, Erdinc Ozturk, Kirk S. Yap, Wajdi K. Feghali