Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
  • Publication number: 20150089189
    Abstract: In an embodiment, a processor may implement a vector instruction set including predicate vectors and multiple vector element sizes. The vector instruction set may include predicate vector pack and unpack instructions. Responsive to the predicate vector pack instruction, the processor may pack predicates from multiple predicate vector source registers into a destination predicate vector register. Responsive to the predicate vector unpack instruction, the processor may select a portion of a source predicate vector register and write the result to a destination predicate vector register. Additionally, the predicate vector register may store one or more vector attributes associated with the corresponding vector. The processor may modify the attribute as part of the pack/unpack operation (e.g. based on a pack/unpack factor). Additionally, vector pack/unpack instructions that are controlled by the attribute in a corresponding predicate vector register may be implemented.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20150088926
    Abstract: Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (“SIMD”) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.
    Type: Application
    Filed: July 22, 2014
    Publication date: March 26, 2015
    Inventors: SHASANK K. CHAVAN, PHUMPONG WATANAPRAKORNKUL
  • Publication number: 20150046672
    Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, at least two bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.
    Type: Application
    Filed: August 6, 2013
    Publication date: February 12, 2015
    Inventors: Terence Sych, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20150046671
    Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, a plurality of bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.
    Type: Application
    Filed: August 6, 2013
    Publication date: February 12, 2015
    Inventor: Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20150039851
    Abstract: Methods, apparatus, instructions and logic provide SIMD vector sub-byte decompression functionality. Embodiments include shuffling a first and second byte into the least significant portion of a first vector element, and a third and fourth byte into the most significant portion. Processing continues shuffling a fifth and sixth byte into the least significant portion of a second vector element, and a seventh and eighth byte into the most significant portion. Then by shifting the first vector element by a first shift count and the second vector element by a second shift count, sub-byte elements are aligned to the least significant bits of their respective bytes. Processors then shuffle a byte from each of the shifted vector elements' least significant portions into byte positions of a destination vector element, and from each of the shifted vector elements' most significant portions into byte positions of another destination vector element.
    Type: Application
    Filed: July 31, 2013
    Publication date: February 5, 2015
    Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Robert Valentine
  • Publication number: 20150039853
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.
    Type: Application
    Filed: August 1, 2013
    Publication date: February 5, 2015
    Applicant: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Publication number: 20150039852
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, data compaction is performed using vectorized instructions to identify a shuffle mask based on matching bits and update an output array based on the shuffle mask and an input array. In a related technique, a hash table probe involves using vectorized instructions to determine whether each key in one or more hash buckets matches a particular input key.
    Type: Application
    Filed: August 1, 2013
    Publication date: February 5, 2015
    Applicant: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Publication number: 20150039854
    Abstract: Systems and techniques disclosed herein include methods for de-quantization of feature vectors used in automatic speech recognition. A SIMD vector processor is used in one embodiment for efficient vectorized lookup of floating point values in conjunction with fMPE processing for increasing the discriminative power of input signals. These techniques exploit parallelism to effectively reduce the latency of speech recognition in a system operating in a high dimensional feature space. In one embodiment, a bytewise integer lookup operation effectively performs a floating point or a multiple byte lookup.
    Type: Application
    Filed: August 1, 2013
    Publication date: February 5, 2015
    Applicant: Nuance Communications, Inc.
    Inventor: Justin Vaughn Wick
  • Publication number: 20140365747
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: December 11, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Moustapha Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Boris Ginzburg, Ziv Aviv
  • Patent number: 8909901
    Abstract: In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values so that selected portions of the first and second source operands or a predetermined value can be stored into elements of a destination. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 28, 2007
    Date of Patent: December 9, 2014
    Assignee: Intel Corporation
    Inventors: Cristina Anderson, Mark Buxton, Doron Orenstien, Bob Valentine
  • Patent number: 8904153
    Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.
    Type: Grant
    Filed: September 7, 2010
    Date of Patent: December 2, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
  • Publication number: 20140317377
    Abstract: A processor core that includes a hardware decode unit to decode a vector frequency compress instruction that includes a source operand and a destination operand. The source operand specifying a source vector register that includes a plurality of source data elements including one or more runs of identical data elements that are each to be compressed in a destination vector register as a value and run length pair. The destination operand identifies the destination vector register. The processor core also includes an execution engine unit to execute the decoded vector frequency compress instruction which causes, for each source data element, a value to be copied into the destination vector register to indicate that source data element's value. One or more runs of the source data elements equal are encoded in the destination vector register as the predetermined compression value followed by a run length for that run.
    Type: Application
    Filed: December 30, 2011
    Publication date: October 23, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
  • Publication number: 20140281368
    Abstract: An example method for executing multiple instructions in one or more slots includes receiving a packet including multiple instructions and executing the multiple instructions in one or more slots in a time shared manner. Each slot is associated with an execution data path or a memory data path. An example method for executing at least one instruction in a plurality of phases includes receiving a packet including an instruction, splitting the instruction into a plurality of phases, and executing the instruction in the plurality of phases.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Ajay Anant Ingle, Lucian Codrescu, David J. Hoyle, Jose Fridman, Marc M. Hoffman, Deepak Mathew
  • Publication number: 20140281369
    Abstract: An apparatus and method are described for fetching and storing a plurality of portions of a data stream into a plurality of registers. For example, a method according to one embodiment includes the following operations: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining the system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses; and storing the N designated portions of the data stream into the N vector registers.
    Type: Application
    Filed: December 23, 2011
    Publication date: September 18, 2014
    Inventor: Ashish Jha
  • Publication number: 20140223138
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a mask register into a vector register in response to a single vector packed convert a mask register to a vector register instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: August 7, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Amit Gradstein, Zeev Sperber
  • Publication number: 20140201499
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a list of index values into a mask value in response to a single vector packed conversion of a list of index values into a mask value instruction that includes a destination writemask register operand, a source vector register operand, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: July 17, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Garrett T. Drysdale
  • Publication number: 20140201498
    Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
    Type: Application
    Filed: September 26, 2011
    Publication date: July 17, 2014
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
  • Publication number: 20140201497
    Abstract: An apparatus is described having functional unit logic circuitry. The functional unit logic circuitry has a first register to store a first input vector operand having an element for each dimension of a multi-dimensional data structure. Each element of the first vector operand specifying the size of its respective dimension. The functional unit has a second register to store a second input vector operand specifying coordinates of a particular segment of the multi-dimensional structure. The functional unit also has logic circuitry to calculate an address offset for the particular segment relative to an address of an origin segment of the multi-dimensional structure.
    Type: Application
    Filed: December 23, 2011
    Publication date: July 17, 2014
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 8782376
    Abstract: A processor including: a first and at least a second data processing channel with enable logic for selectively enabling the second channel; logic for generating first and second storage addresses having a variable offset therebetween based on the same one or more address operands of the same storage access instruction; and circuitry for transferring data between the first address and a register of the first data processing channel and between the second address and a corresponding register of the second channel based on a same one or more register specifier operands of the access instruction. The first data processing channel performs an operation using one or more registers of the first data processing channel, and on condition of being enabled the second channel performs the same operation using a corresponding one or more of its own registers based on the same one or more operands of the data processing instruction.
    Type: Grant
    Filed: August 26, 2011
    Date of Patent: July 15, 2014
    Assignee: Icera Inc.
    Inventors: Simon Knowles, Edward Andrews, Stephen Felix, Simon Huckett, Colman Hegarty
  • Publication number: 20140164733
    Abstract: A transpose instruction is described. A transpose instruction is fetched, where the transpose instruction includes an operand that specifies a vector register or a location in memory. The transpose instruction is decoded. The decoded transpose instruction is executed causing each data element in the specified vector register or location in memory to be stored in that specified vector register or location in memory in reverse order.
    Type: Application
    Filed: December 30, 2011
    Publication date: June 12, 2014
    Inventor: Ashish Jha
  • Publication number: 20140156969
    Abstract: A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.
    Type: Application
    Filed: December 17, 2013
    Publication date: June 5, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: MAARTEN J. BOERSMA, UDO KRAUTZ, ULRIKE SCHMIDT
  • Publication number: 20140149713
    Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.
    Type: Application
    Filed: December 23, 2011
    Publication date: May 29, 2014
    Inventor: Ashish Jha
  • Publication number: 20140136815
    Abstract: A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.
    Type: Application
    Filed: November 12, 2012
    Publication date: May 15, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Maarten J. Boersma, Udo Krautz, Ulrike Schmidt
  • Publication number: 20140129801
    Abstract: Embodiments of systems, apparatuses, and methods for performing delta encoding on packed data elements of a source and storing the results in packed data elements of a destination using a single vector packed delta encode instruction are described.
    Type: Application
    Filed: December 28, 2011
    Publication date: May 8, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale
  • Publication number: 20140122831
    Abstract: Instructions and logic provide vector compress and rotate functionality. Some embodiments, responsive to an instruction specifying: a vector source, a mask, a vector destination and destination offset, read the mask, and copy corresponding unmasked vector elements from the vector source to adjacent sequential locations in the vector destination, starting at the vector destination offset location. In some embodiments, the unmasked vector elements from the vector source are copied to adjacent sequential element locations modulo the total number of element locations in the vector destination. In some alternative embodiments, copying stops whenever the vector destination is full, and upon copying an unmasked vector element from the vector source to an adjacent sequential element location in the vector destination, the value of a corresponding field in the mask is changed to a masked value. Alternative embodiments zero elements of the vector destination, in which no element from the vector source is copied.
    Type: Application
    Filed: October 30, 2012
    Publication date: May 1, 2014
    Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Robert Valentine
  • Patent number: 8688962
    Abstract: Apparatuses and methods to perform gather instructions are presented. In one embodiment, an apparatus comprises a gather logic module which includes a gather logic unit to identify locality of data elements in response to a gather instruction. The apparatus includes memory comprising a plurality of memory rows including a memory row associated with the gather instruction. The apparatus further includes memory structure to store data element addresses accessed in response to the gather instruction.
    Type: Grant
    Filed: April 1, 2011
    Date of Patent: April 1, 2014
    Assignee: Intel Corporation
    Inventors: Shlomo Raikin, Robert Valentine
  • Patent number: 8683178
    Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition.
    Type: Grant
    Filed: April 20, 2011
    Date of Patent: March 25, 2014
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8667250
    Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 26, 2007
    Date of Patent: March 4, 2014
    Assignee: Intel Corporation
    Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
  • Publication number: 20140047211
    Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.
    Type: Application
    Filed: August 13, 2012
    Publication date: February 13, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Patent number: 8650382
    Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: February 11, 2014
    Assignee: Intel Corporation
    Inventor: Patrice Roussel
  • Patent number: 8635431
    Abstract: A dedicated vector gather buffer (VGB) that stores multiple cache lines read from a memory hierarchy in one or more Logical Units (LUs) each having multiple buffer entries and performs parallel operations on vector registers. Once loaded with data, an LU is read using a single port. The VGB initiates prefetch events that keep it full in response to the demand created by ‘gather’ instructions. The VGB includes one or more write ports for receiving data from the memory hierarchy and a read port capable of reading data from the columns of the LU to be loaded into a vector register. Data is extracted from the VGB by (1) using a separate port for each item read, (2) implementing each VGB entry as a shift register and shifting an appropriate amount until all entries are aligned, or (3) enforcing a uniform offset for all items.
    Type: Grant
    Filed: December 8, 2010
    Date of Patent: January 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Daniel Citron, Dorit Nuzman
  • Publication number: 20140019712
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed compression and repeat in response to a single vector packed compression and repeat instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: January 16, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
  • Publication number: 20140019714
    Abstract: A processor core that includes a hardware decode unit and an execution engine unit. The hardware decode unit to decode a vector frequency expand instruction, wherein the vector frequency compress instruction includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes one or more pairs of a value and run length that are to be expanded into a run of that value based on the run length. The execution engine unit to execute the decoded vector frequency expand instruction which causes, a set of one or more source data elements in the source vector register to be expanded into a set of destination data elements comprising more elements than the set of source data elements and including at least one run of identical values which were run length encoded in the source vector register.
    Type: Application
    Filed: December 30, 2011
    Publication date: January 16, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles Yount, Bret L. Toll
  • Publication number: 20140019713
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: January 16, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Publication number: 20140013076
    Abstract: A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.
    Type: Application
    Filed: September 10, 2013
    Publication date: January 9, 2014
    Applicant: Oracle International Corporation
    Inventors: Amit Ganesh, Shasank K. Chavan, Vineet Marwah, Jesse Kamp, Anindya C. Patthak, Michael J. Gleeson, Allison L. Holloway, Roger Macnicol
  • Publication number: 20140013075
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.
    Type: Application
    Filed: December 23, 2011
    Publication date: January 9, 2014
    Inventors: Mostafa Hagog, Elmoustapha Ould-Aumed-Vall, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Publication number: 20130238877
    Abstract: Provided is a technique for improving the transfer latency of vector register file data when an interrupt is generated. According to an aspect, when interrupt occurs, a core determines whether to store vector register file data currently being executed in a first memory or in a second memory based on whether or not the first memory can store the vector register file data therein. In response to not being able to store the vector register file data in the first memory, a data transfer unit, which is implemented as hardware, is provided to store vector register file data in the second memory.
    Type: Application
    Filed: November 9, 2012
    Publication date: September 12, 2013
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jin-Seok Lee, Dong-Hoon Yoo, Won-Sub Kim, Tai-Song Jin, Hae-Woo Park, Min-Wook Ahn, Hee-Jin Ahn
  • Publication number: 20130232318
    Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 5, 2013
    Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
  • Publication number: 20130212354
    Abstract: The present invention provides a method for performing data array sorting of vector elements in a N-wide SIMD that is accelerated by a factor of about N/2 over scalar implementation excluding scalar load/store instructions. A vector compare instruction with ability to compare any two vector elements in accordance to optimized data array sorting algorithms, followed by a vector-multiplex instruction which performs exchanges of vector elements in accordance with condition flags generated by the vector compare instruction provides an efficient but programmable method of performing data sorting with a factor of about N/2 acceleration. A mask bit prevents changes to elements which is not involved in a certain stage of sorting.
    Type: Application
    Filed: September 20, 2009
    Publication date: August 15, 2013
    Inventor: Tibet Mimar
  • Publication number: 20130212353
    Abstract: The present invention incorporates a system for vector Look-Up Table (LUT) operations into a single-instruction multiple-data (SIMD) processor in order to implement plurality of LUT operations simultaneously, where each of the LUT contents could be the same or different. Elements of one or two vector registers are used to form LUT indexes, and the output of vector LUT operation is written into a vector register. No dedicated LUT memory is required; rather, data memory is organized as multiple separate data memory banks, where a portion of each data memory bank is used for LUT operations. For a single-input vector LUT operation, the address input of each LUT is operably coupled to any of the input vector register's elements using input vector element mapping logic in one embodiment. Thus, one input vector element can produce (a positive integer) N output elements using N different LUTs, or (another positive integer) K input vector elements can produce N output elements, where K is an integer from one to N.
    Type: Application
    Filed: February 3, 2003
    Publication date: August 15, 2013
    Inventor: Tibet Mimar
  • Publication number: 20130091339
    Abstract: An apparatus and method for creation of reordered vectors from sequential input data for block based decimation, filtering, interpolation and matrix transposition using a memory circuit for a Single Instruction, Multiple Data (SIMD) Digital Signal Processor (DSP). This memory circuit includes a two-dimensional storage array, a rotate-and-distribute unit, a read-controller and a write to controller, to map input vectors containing sequential data elements in columns of the two-dimensional array and extract reordered target vectors from this array. The data elements and memory configuration are received from the SIMD DSP.
    Type: Application
    Filed: October 5, 2011
    Publication date: April 11, 2013
    Applicant: ST-Ericsson SA
    Inventors: David Van Kampen, Kees Van Berkel, Sven Goossens, Wim Kloosterhuis, Claudiu Zissulescu-Ianculescu
  • Publication number: 20130080737
    Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by the elements to the data store, and configured in response to receipt of at least two decoded vector data access instructions, and one of the instructions being a write instruction. Data accesses are performed in the instructed order to determine an element indicating the next data access for each of said vector data access instructions. One of the next data accesses is selected to be issued to the data store in dependence upon an order in which the at least two vector data instructions were received. The position of the elements indicates the next data accesses relative to each other within their respective plurality of elements. A numerical position of the element indicating the next data access within the plurality of elements of an earlier instruction is less than a predetermined value.
    Type: Application
    Filed: September 28, 2011
    Publication date: March 28, 2013
    Applicant: ARM Limited
    Inventor: Alastair David Reid
  • Patent number: 8375196
    Abstract: A data processing apparatus includes a vector register bank having a plurality of vector registers, each register including a plurality of storage cells, each cell storing a data element. A vector processing unit is provided for executing a sequence of vector instructions. The processing unit is arranged to issue a set rearrangement enable signal to the vector register bank. The write interface of the vector register bank is modified to provide not only a first input for receiving the data elements generated by the vector processing unit during normal execution, but also has a second input coupled via a data rearrangement path to the matrix of storage cells via which the data elements currently stored in the matrix of storage cells are provided to the write interface in a rearranged form representing the arrangement of data elements that would be obtained by performance of the predetermined rearrangement operation.
    Type: Grant
    Filed: January 19, 2010
    Date of Patent: February 12, 2013
    Assignee: ARM Limited
    Inventors: Andreas Björklund, Erik Persson, Ola Hugosson
  • Publication number: 20130024653
    Abstract: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.
    Type: Application
    Filed: July 18, 2011
    Publication date: January 24, 2013
    Inventor: Darryl J. Gove
  • Patent number: 8356159
    Abstract: The described embodiments provide a system that sets elements in a result vector based on an input vector. During operation, the system determines a location of a key element within the input vector. Next, the system generates a result vector. When generating the result vector, the system sets one or more elements of the result vector based on the location of the key element in the input vector.
    Type: Grant
    Filed: April 7, 2009
    Date of Patent: January 15, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
  • Patent number: 8316215
    Abstract: It is an object to speed up a vector store instruction on a memory that is divided into banks as setting a plurality of elements as a unit while minimizing an increase in physical quantity. A vector processing apparatus has a plurality of register banks and processes a data string including a plurality of data elements retained in the plurality of register banks, wherein: the plurality of register banks each have a read pointer 113 that points to a read position for reading the data elements; and the start position of the read pointer 113 is changed from one register bank to another. For example, consecutive numbers assigned to the register banks may be used as the read start positions of the respective register banks.
    Type: Grant
    Filed: March 7, 2008
    Date of Patent: November 20, 2012
    Assignee: NEC Corporation
    Inventor: Noritaka Hoshi
  • Publication number: 20120260062
    Abstract: A method and system for providing dynamic addressability of data elements in a vector register file with subword parallelism. The method includes the steps of: determining a plurality of data elements required for an instruction; storing an address for each of the data elements into a pointer register where the addresses are stored as a number of offsets from the vector register file's origin; reading the addresses from the pointer register; extracting the data elements located at the addresses from the vector register file; and placing the data elements in a subword slot of the vector register file so that the data elements are located on a single vector within the vector register file; where at least one of the steps is carried out using a computer device so that data elements in a vector register file with subword parallelism are dynamically addressable.
    Type: Application
    Filed: April 7, 2011
    Publication date: October 11, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey H. Derby, Robert K. Montoye
  • Patent number: 8255884
    Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
    Type: Grant
    Filed: June 6, 2008
    Date of Patent: August 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
  • Publication number: 20120185670
    Abstract: A processing core implemented on a semiconductor chip is described. The processing core includes logic circuitry to identify whether vector instructions and integer scalar instructions are to be executed with two registers or three registers, where, in the case of two registers input operand information is destroyed in one of two registers, and, in the case of three registers input operand is not destroyed. The processing core also includes steering circuitry coupled to the logic circuitry. The steering circuitry is to control first data paths between scalar integer execution units and a scalar integer register bank such that two registers are accessed from the scalar register bank if two register execution is identified for the scalar integer instructions or three registers are accessed from the scalar integer register bank if three register execution is identified for the scalar integer instructions.
    Type: Application
    Filed: January 14, 2011
    Publication date: July 19, 2012
    Inventors: Bret L. Toll, Robert Valentine, Maxim Locktyukhin, Elmoustapha Ould-Ahmed-Vall