Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
-
Publication number: 20150089189Abstract: In an embodiment, a processor may implement a vector instruction set including predicate vectors and multiple vector element sizes. The vector instruction set may include predicate vector pack and unpack instructions. Responsive to the predicate vector pack instruction, the processor may pack predicates from multiple predicate vector source registers into a destination predicate vector register. Responsive to the predicate vector unpack instruction, the processor may select a portion of a source predicate vector register and write the result to a destination predicate vector register. Additionally, the predicate vector register may store one or more vector attributes associated with the corresponding vector. The processor may modify the attribute as part of the pack/unpack operation (e.g. based on a pack/unpack factor). Additionally, vector pack/unpack instructions that are controlled by the attribute in a corresponding predicate vector register may be implemented.Type: ApplicationFiled: September 24, 2013Publication date: March 26, 2015Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Publication number: 20150088926Abstract: Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (“SIMD”) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.Type: ApplicationFiled: July 22, 2014Publication date: March 26, 2015Inventors: SHASANK K. CHAVAN, PHUMPONG WATANAPRAKORNKUL
-
Publication number: 20150046672Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, at least two bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.Type: ApplicationFiled: August 6, 2013Publication date: February 12, 2015Inventors: Terence Sych, Elmoustapha Ould-Ahmed-Vall
-
Publication number: 20150046671Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, a plurality of bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.Type: ApplicationFiled: August 6, 2013Publication date: February 12, 2015Inventor: Elmoustapha Ould-Ahmed-Vall
-
Publication number: 20150039851Abstract: Methods, apparatus, instructions and logic provide SIMD vector sub-byte decompression functionality. Embodiments include shuffling a first and second byte into the least significant portion of a first vector element, and a third and fourth byte into the most significant portion. Processing continues shuffling a fifth and sixth byte into the least significant portion of a second vector element, and a seventh and eighth byte into the most significant portion. Then by shifting the first vector element by a first shift count and the second vector element by a second shift count, sub-byte elements are aligned to the least significant bits of their respective bytes. Processors then shuffle a byte from each of the shifted vector elements' least significant portions into byte positions of a destination vector element, and from each of the shifted vector elements' most significant portions into byte positions of another destination vector element.Type: ApplicationFiled: July 31, 2013Publication date: February 5, 2015Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Robert Valentine
-
Publication number: 20150039853Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.Type: ApplicationFiled: August 1, 2013Publication date: February 5, 2015Applicant: Oracle International CorporationInventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
-
Publication number: 20150039852Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, data compaction is performed using vectorized instructions to identify a shuffle mask based on matching bits and update an output array based on the shuffle mask and an input array. In a related technique, a hash table probe involves using vectorized instructions to determine whether each key in one or more hash buckets matches a particular input key.Type: ApplicationFiled: August 1, 2013Publication date: February 5, 2015Applicant: Oracle International CorporationInventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
-
Publication number: 20150039854Abstract: Systems and techniques disclosed herein include methods for de-quantization of feature vectors used in automatic speech recognition. A SIMD vector processor is used in one embodiment for efficient vectorized lookup of floating point values in conjunction with fMPE processing for increasing the discriminative power of input signals. These techniques exploit parallelism to effectively reduce the latency of speech recognition in a system operating in a high dimensional feature space. In one embodiment, a bytewise integer lookup operation effectively performs a floating point or a multiple byte lookup.Type: ApplicationFiled: August 1, 2013Publication date: February 5, 2015Applicant: Nuance Communications, Inc.Inventor: Justin Vaughn Wick
-
Publication number: 20140365747Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described.Type: ApplicationFiled: December 23, 2011Publication date: December 11, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Moustapha Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Boris Ginzburg, Ziv Aviv
-
Patent number: 8909901Abstract: In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values so that selected portions of the first and second source operands or a predetermined value can be stored into elements of a destination. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed.Type: GrantFiled: December 28, 2007Date of Patent: December 9, 2014Assignee: Intel CorporationInventors: Cristina Anderson, Mark Buxton, Doron Orenstien, Bob Valentine
-
Patent number: 8904153Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.Type: GrantFiled: September 7, 2010Date of Patent: December 2, 2014Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
-
Publication number: 20140317377Abstract: A processor core that includes a hardware decode unit to decode a vector frequency compress instruction that includes a source operand and a destination operand. The source operand specifying a source vector register that includes a plurality of source data elements including one or more runs of identical data elements that are each to be compressed in a destination vector register as a value and run length pair. The destination operand identifies the destination vector register. The processor core also includes an execution engine unit to execute the decoded vector frequency compress instruction which causes, for each source data element, a value to be copied into the destination vector register to indicate that source data element's value. One or more runs of the source data elements equal are encoded in the destination vector register as the predetermined compression value followed by a run length for that run.Type: ApplicationFiled: December 30, 2011Publication date: October 23, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
-
Publication number: 20140281368Abstract: An example method for executing multiple instructions in one or more slots includes receiving a packet including multiple instructions and executing the multiple instructions in one or more slots in a time shared manner. Each slot is associated with an execution data path or a memory data path. An example method for executing at least one instruction in a plurality of phases includes receiving a packet including an instruction, splitting the instruction into a plurality of phases, and executing the instruction in the plurality of phases.Type: ApplicationFiled: March 14, 2013Publication date: September 18, 2014Applicant: QUALCOMM INCORPORATEDInventors: Ajay Anant Ingle, Lucian Codrescu, David J. Hoyle, Jose Fridman, Marc M. Hoffman, Deepak Mathew
-
Publication number: 20140281369Abstract: An apparatus and method are described for fetching and storing a plurality of portions of a data stream into a plurality of registers. For example, a method according to one embodiment includes the following operations: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining the system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses; and storing the N designated portions of the data stream into the N vector registers.Type: ApplicationFiled: December 23, 2011Publication date: September 18, 2014Inventor: Ashish Jha
-
Publication number: 20140223138Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a mask register into a vector register in response to a single vector packed convert a mask register to a vector register instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.Type: ApplicationFiled: December 23, 2011Publication date: August 7, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Amit Gradstein, Zeev Sperber
-
Publication number: 20140201499Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a list of index values into a mask value in response to a single vector packed conversion of a list of index values into a mask value instruction that includes a destination writemask register operand, a source vector register operand, and an opcode are described.Type: ApplicationFiled: December 23, 2011Publication date: July 17, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Garrett T. Drysdale
-
Publication number: 20140201498Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.Type: ApplicationFiled: September 26, 2011Publication date: July 17, 2014Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
-
Publication number: 20140201497Abstract: An apparatus is described having functional unit logic circuitry. The functional unit logic circuitry has a first register to store a first input vector operand having an element for each dimension of a multi-dimensional data structure. Each element of the first vector operand specifying the size of its respective dimension. The functional unit has a second register to store a second input vector operand specifying coordinates of a particular segment of the multi-dimensional structure. The functional unit also has logic circuitry to calculate an address offset for the particular segment relative to an address of an origin segment of the multi-dimensional structure.Type: ApplicationFiled: December 23, 2011Publication date: July 17, 2014Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 8782376Abstract: A processor including: a first and at least a second data processing channel with enable logic for selectively enabling the second channel; logic for generating first and second storage addresses having a variable offset therebetween based on the same one or more address operands of the same storage access instruction; and circuitry for transferring data between the first address and a register of the first data processing channel and between the second address and a corresponding register of the second channel based on a same one or more register specifier operands of the access instruction. The first data processing channel performs an operation using one or more registers of the first data processing channel, and on condition of being enabled the second channel performs the same operation using a corresponding one or more of its own registers based on the same one or more operands of the data processing instruction.Type: GrantFiled: August 26, 2011Date of Patent: July 15, 2014Assignee: Icera Inc.Inventors: Simon Knowles, Edward Andrews, Stephen Felix, Simon Huckett, Colman Hegarty
-
Publication number: 20140164733Abstract: A transpose instruction is described. A transpose instruction is fetched, where the transpose instruction includes an operand that specifies a vector register or a location in memory. The transpose instruction is decoded. The decoded transpose instruction is executed causing each data element in the specified vector register or location in memory to be stored in that specified vector register or location in memory in reverse order.Type: ApplicationFiled: December 30, 2011Publication date: June 12, 2014Inventor: Ashish Jha
-
Publication number: 20140156969Abstract: A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.Type: ApplicationFiled: December 17, 2013Publication date: June 5, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: MAARTEN J. BOERSMA, UDO KRAUTZ, ULRIKE SCHMIDT
-
Publication number: 20140149713Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.Type: ApplicationFiled: December 23, 2011Publication date: May 29, 2014Inventor: Ashish Jha
-
Publication number: 20140136815Abstract: A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.Type: ApplicationFiled: November 12, 2012Publication date: May 15, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Maarten J. Boersma, Udo Krautz, Ulrike Schmidt
-
Publication number: 20140129801Abstract: Embodiments of systems, apparatuses, and methods for performing delta encoding on packed data elements of a source and storing the results in packed data elements of a destination using a single vector packed delta encode instruction are described.Type: ApplicationFiled: December 28, 2011Publication date: May 8, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale
-
Publication number: 20140122831Abstract: Instructions and logic provide vector compress and rotate functionality. Some embodiments, responsive to an instruction specifying: a vector source, a mask, a vector destination and destination offset, read the mask, and copy corresponding unmasked vector elements from the vector source to adjacent sequential locations in the vector destination, starting at the vector destination offset location. In some embodiments, the unmasked vector elements from the vector source are copied to adjacent sequential element locations modulo the total number of element locations in the vector destination. In some alternative embodiments, copying stops whenever the vector destination is full, and upon copying an unmasked vector element from the vector source to an adjacent sequential element location in the vector destination, the value of a corresponding field in the mask is changed to a masked value. Alternative embodiments zero elements of the vector destination, in which no element from the vector source is copied.Type: ApplicationFiled: October 30, 2012Publication date: May 1, 2014Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Robert Valentine
-
Patent number: 8688962Abstract: Apparatuses and methods to perform gather instructions are presented. In one embodiment, an apparatus comprises a gather logic module which includes a gather logic unit to identify locality of data elements in response to a gather instruction. The apparatus includes memory comprising a plurality of memory rows including a memory row associated with the gather instruction. The apparatus further includes memory structure to store data element addresses accessed in response to the gather instruction.Type: GrantFiled: April 1, 2011Date of Patent: April 1, 2014Assignee: Intel CorporationInventors: Shlomo Raikin, Robert Valentine
-
Patent number: 8683178Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition.Type: GrantFiled: April 20, 2011Date of Patent: March 25, 2014Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 8667250Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.Type: GrantFiled: December 26, 2007Date of Patent: March 4, 2014Assignee: Intel CorporationInventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
-
Publication number: 20140047211Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.Type: ApplicationFiled: August 13, 2012Publication date: February 13, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 8649508Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.Type: GrantFiled: September 29, 2008Date of Patent: February 11, 2014Assignee: Tata Consultancy Services Ltd.Inventor: Natarajan Vijayarangan
-
Patent number: 8650382Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.Type: GrantFiled: September 14, 2012Date of Patent: February 11, 2014Assignee: Intel CorporationInventor: Patrice Roussel
-
Patent number: 8635431Abstract: A dedicated vector gather buffer (VGB) that stores multiple cache lines read from a memory hierarchy in one or more Logical Units (LUs) each having multiple buffer entries and performs parallel operations on vector registers. Once loaded with data, an LU is read using a single port. The VGB initiates prefetch events that keep it full in response to the demand created by ‘gather’ instructions. The VGB includes one or more write ports for receiving data from the memory hierarchy and a read port capable of reading data from the columns of the LU to be loaded into a vector register. Data is extracted from the VGB by (1) using a separate port for each item read, (2) implementing each VGB entry as a shift register and shifting an appropriate amount until all entries are aligned, or (3) enforcing a uniform offset for all items.Type: GrantFiled: December 8, 2010Date of Patent: January 21, 2014Assignee: International Business Machines CorporationInventors: Daniel Citron, Dorit Nuzman
-
Publication number: 20140019712Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed compression and repeat in response to a single vector packed compression and repeat instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.Type: ApplicationFiled: December 23, 2011Publication date: January 16, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
-
Publication number: 20140019714Abstract: A processor core that includes a hardware decode unit and an execution engine unit. The hardware decode unit to decode a vector frequency expand instruction, wherein the vector frequency compress instruction includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes one or more pairs of a value and run length that are to be expanded into a run of that value based on the run length. The execution engine unit to execute the decoded vector frequency expand instruction which causes, a set of one or more source data elements in the source vector register to be expanded into a set of destination data elements comprising more elements than the set of source data elements and including at least one run of identical values which were run length encoded in the source vector register.Type: ApplicationFiled: December 30, 2011Publication date: January 16, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles Yount, Bret L. Toll
-
Publication number: 20140019713Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.Type: ApplicationFiled: December 23, 2011Publication date: January 16, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
-
Publication number: 20140013076Abstract: A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.Type: ApplicationFiled: September 10, 2013Publication date: January 9, 2014Applicant: Oracle International CorporationInventors: Amit Ganesh, Shasank K. Chavan, Vineet Marwah, Jesse Kamp, Anindya C. Patthak, Michael J. Gleeson, Allison L. Holloway, Roger Macnicol
-
Publication number: 20140013075Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.Type: ApplicationFiled: December 23, 2011Publication date: January 9, 2014Inventors: Mostafa Hagog, Elmoustapha Ould-Aumed-Vall, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
-
Publication number: 20130238877Abstract: Provided is a technique for improving the transfer latency of vector register file data when an interrupt is generated. According to an aspect, when interrupt occurs, a core determines whether to store vector register file data currently being executed in a first memory or in a second memory based on whether or not the first memory can store the vector register file data therein. In response to not being able to store the vector register file data in the first memory, a data transfer unit, which is implemented as hardware, is provided to store vector register file data in the second memory.Type: ApplicationFiled: November 9, 2012Publication date: September 12, 2013Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jin-Seok Lee, Dong-Hoon Yoo, Won-Sub Kim, Tai-Song Jin, Hae-Woo Park, Min-Wook Ahn, Hee-Jin Ahn
-
Publication number: 20130232318Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.Type: ApplicationFiled: March 15, 2013Publication date: September 5, 2013Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
-
Publication number: 20130212354Abstract: The present invention provides a method for performing data array sorting of vector elements in a N-wide SIMD that is accelerated by a factor of about N/2 over scalar implementation excluding scalar load/store instructions. A vector compare instruction with ability to compare any two vector elements in accordance to optimized data array sorting algorithms, followed by a vector-multiplex instruction which performs exchanges of vector elements in accordance with condition flags generated by the vector compare instruction provides an efficient but programmable method of performing data sorting with a factor of about N/2 acceleration. A mask bit prevents changes to elements which is not involved in a certain stage of sorting.Type: ApplicationFiled: September 20, 2009Publication date: August 15, 2013Inventor: Tibet Mimar
-
Publication number: 20130212353Abstract: The present invention incorporates a system for vector Look-Up Table (LUT) operations into a single-instruction multiple-data (SIMD) processor in order to implement plurality of LUT operations simultaneously, where each of the LUT contents could be the same or different. Elements of one or two vector registers are used to form LUT indexes, and the output of vector LUT operation is written into a vector register. No dedicated LUT memory is required; rather, data memory is organized as multiple separate data memory banks, where a portion of each data memory bank is used for LUT operations. For a single-input vector LUT operation, the address input of each LUT is operably coupled to any of the input vector register's elements using input vector element mapping logic in one embodiment. Thus, one input vector element can produce (a positive integer) N output elements using N different LUTs, or (another positive integer) K input vector elements can produce N output elements, where K is an integer from one to N.Type: ApplicationFiled: February 3, 2003Publication date: August 15, 2013Inventor: Tibet Mimar
-
Publication number: 20130091339Abstract: An apparatus and method for creation of reordered vectors from sequential input data for block based decimation, filtering, interpolation and matrix transposition using a memory circuit for a Single Instruction, Multiple Data (SIMD) Digital Signal Processor (DSP). This memory circuit includes a two-dimensional storage array, a rotate-and-distribute unit, a read-controller and a write to controller, to map input vectors containing sequential data elements in columns of the two-dimensional array and extract reordered target vectors from this array. The data elements and memory configuration are received from the SIMD DSP.Type: ApplicationFiled: October 5, 2011Publication date: April 11, 2013Applicant: ST-Ericsson SAInventors: David Van Kampen, Kees Van Berkel, Sven Goossens, Wim Kloosterhuis, Claudiu Zissulescu-Ianculescu
-
Publication number: 20130080737Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by the elements to the data store, and configured in response to receipt of at least two decoded vector data access instructions, and one of the instructions being a write instruction. Data accesses are performed in the instructed order to determine an element indicating the next data access for each of said vector data access instructions. One of the next data accesses is selected to be issued to the data store in dependence upon an order in which the at least two vector data instructions were received. The position of the elements indicates the next data accesses relative to each other within their respective plurality of elements. A numerical position of the element indicating the next data access within the plurality of elements of an earlier instruction is less than a predetermined value.Type: ApplicationFiled: September 28, 2011Publication date: March 28, 2013Applicant: ARM LimitedInventor: Alastair David Reid
-
Patent number: 8375196Abstract: A data processing apparatus includes a vector register bank having a plurality of vector registers, each register including a plurality of storage cells, each cell storing a data element. A vector processing unit is provided for executing a sequence of vector instructions. The processing unit is arranged to issue a set rearrangement enable signal to the vector register bank. The write interface of the vector register bank is modified to provide not only a first input for receiving the data elements generated by the vector processing unit during normal execution, but also has a second input coupled via a data rearrangement path to the matrix of storage cells via which the data elements currently stored in the matrix of storage cells are provided to the write interface in a rearranged form representing the arrangement of data elements that would be obtained by performance of the predetermined rearrangement operation.Type: GrantFiled: January 19, 2010Date of Patent: February 12, 2013Assignee: ARM LimitedInventors: Andreas Björklund, Erik Persson, Ola Hugosson
-
Publication number: 20130024653Abstract: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.Type: ApplicationFiled: July 18, 2011Publication date: January 24, 2013Inventor: Darryl J. Gove
-
Patent number: 8356159Abstract: The described embodiments provide a system that sets elements in a result vector based on an input vector. During operation, the system determines a location of a key element within the input vector. Next, the system generates a result vector. When generating the result vector, the system sets one or more elements of the result vector based on the location of the key element in the input vector.Type: GrantFiled: April 7, 2009Date of Patent: January 15, 2013Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
-
Patent number: 8316215Abstract: It is an object to speed up a vector store instruction on a memory that is divided into banks as setting a plurality of elements as a unit while minimizing an increase in physical quantity. A vector processing apparatus has a plurality of register banks and processes a data string including a plurality of data elements retained in the plurality of register banks, wherein: the plurality of register banks each have a read pointer 113 that points to a read position for reading the data elements; and the start position of the read pointer 113 is changed from one register bank to another. For example, consecutive numbers assigned to the register banks may be used as the read start positions of the respective register banks.Type: GrantFiled: March 7, 2008Date of Patent: November 20, 2012Assignee: NEC CorporationInventor: Noritaka Hoshi
-
Publication number: 20120260062Abstract: A method and system for providing dynamic addressability of data elements in a vector register file with subword parallelism. The method includes the steps of: determining a plurality of data elements required for an instruction; storing an address for each of the data elements into a pointer register where the addresses are stored as a number of offsets from the vector register file's origin; reading the addresses from the pointer register; extracting the data elements located at the addresses from the vector register file; and placing the data elements in a subword slot of the vector register file so that the data elements are located on a single vector within the vector register file; where at least one of the steps is carried out using a computer device so that data elements in a vector register file with subword parallelism are dynamically addressable.Type: ApplicationFiled: April 7, 2011Publication date: October 11, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jeffrey H. Derby, Robert K. Montoye
-
Patent number: 8255884Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.Type: GrantFiled: June 6, 2008Date of Patent: August 28, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
-
Publication number: 20120185670Abstract: A processing core implemented on a semiconductor chip is described. The processing core includes logic circuitry to identify whether vector instructions and integer scalar instructions are to be executed with two registers or three registers, where, in the case of two registers input operand information is destroyed in one of two registers, and, in the case of three registers input operand is not destroyed. The processing core also includes steering circuitry coupled to the logic circuitry. The steering circuitry is to control first data paths between scalar integer execution units and a scalar integer register bank such that two registers are accessed from the scalar register bank if two register execution is identified for the scalar integer instructions or three registers are accessed from the scalar integer register bank if three register execution is identified for the scalar integer instructions.Type: ApplicationFiled: January 14, 2011Publication date: July 19, 2012Inventors: Bret L. Toll, Robert Valentine, Maxim Locktyukhin, Elmoustapha Ould-Ahmed-Vall