Distributing Of Vector Data To Vector Registers Patents (Class 712/4)

Masking to control an access to data in vector register (Class 712/5)

Predicate Vector Pack and Unpack Instructions

Publication number: 20150089189

Abstract: In an embodiment, a processor may implement a vector instruction set including predicate vectors and multiple vector element sizes. The vector instruction set may include predicate vector pack and unpack instructions. Responsive to the predicate vector pack instruction, the processor may pack predicates from multiple predicate vector source registers into a destination predicate vector register. Responsive to the predicate vector unpack instruction, the processor may select a portion of a source predicate vector register and write the result to a destination predicate vector register. Additionally, the predicate vector register may store one or more vector attributes associated with the corresponding vector. The processor may modify the attribute as part of the pack/unpack operation (e.g. based on a pack/unpack factor). Additionally, vector pack/unpack instructions that are controlled by the attribute in a corresponding predicate vector register may be implemented.

Type: Application

Filed: September 24, 2013

Publication date: March 26, 2015

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
METHODS AND SYSTEMS FOR FAST SET-MEMBERSHIP TESTS USING ONE OR MORE PROCESSORS THAT SUPPORT SINGLE INSTRUCTION MULTIPLE DATA INSTRUCTIONS

Publication number: 20150088926

Abstract: Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (“SIMD”) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.

Type: Application

Filed: July 22, 2014

Publication date: March 26, 2015

Inventors: SHASANK K. CHAVAN, PHUMPONG WATANAPRAKORNKUL
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE POPULATION COUNT FUNCTIONALITY FOR GENOME SEQUENCING AND ALIGNMENT

Publication number: 20150046672

Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, at least two bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.

Type: Application

Filed: August 6, 2013

Publication date: February 12, 2015

Inventors: Terence Sych, Elmoustapha Ould-Ahmed-Vall
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR POPULATION COUNT FUNCTIONALITY

Publication number: 20150046671

Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, a plurality of bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.

Type: Application

Filed: August 6, 2013

Publication date: February 12, 2015

Inventor: Elmoustapha Ould-Ahmed-Vall
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR SUB-BYTE DECOMPRESSION FUNCTIONALITY

Publication number: 20150039851

Abstract: Methods, apparatus, instructions and logic provide SIMD vector sub-byte decompression functionality. Embodiments include shuffling a first and second byte into the least significant portion of a first vector element, and a third and fourth byte into the most significant portion. Processing continues shuffling a fifth and sixth byte into the least significant portion of a second vector element, and a seventh and eighth byte into the most significant portion. Then by shifting the first vector element by a first shift count and the second vector element by a second shift count, sub-byte elements are aligned to the least significant bits of their respective bytes. Processors then shuffle a byte from each of the shifted vector elements' least significant portions into byte positions of a destination vector element, and from each of the shifted vector elements' most significant portions into byte positions of another destination vector element.

Type: Application

Filed: July 31, 2013

Publication date: February 5, 2015

Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Robert Valentine
ESTIMATING A COST OF PERFORMING DATABASE OPERATIONS USING VECTORIZED INSTRUCTIONS

Publication number: 20150039853

Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.

Type: Application

Filed: August 1, 2013

Publication date: February 5, 2015

Applicant: Oracle International Corporation

Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
DATA COMPACTION USING VECTORIZED INSTRUCTIONS

Publication number: 20150039852

Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, data compaction is performed using vectorized instructions to identify a shuffle mask based on matching bits and update an output array based on the shuffle mask and an input array. In a related technique, a hash table probe involves using vectorized instructions to determine whether each key in one or more hash buckets matches a particular input key.

Type: Application

Filed: August 1, 2013

Publication date: February 5, 2015

Applicant: Oracle International Corporation

Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
VECTORIZED LOOKUP OF FLOATING POINT VALUES

Publication number: 20150039854

Abstract: Systems and techniques disclosed herein include methods for de-quantization of feature vectors used in automatic speech recognition. A SIMD vector processor is used in one embodiment for efficient vectorized lookup of floating point values in conjunction with fMPE processing for increasing the discriminative power of input signals. These techniques exploit parallelism to effectively reduce the latency of speech recognition in a system operating in a high dimensional feature space. In one embodiment, a bytewise integer lookup operation effectively performs a floating point or a multiple byte lookup.

Type: Application

Filed: August 1, 2013

Publication date: February 5, 2015

Applicant: Nuance Communications, Inc.

Inventor: Justin Vaughn Wick
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING A HORIZONTAL PARTIAL SUM IN RESPONSE TO A SINGLE INSTRUCTION

Publication number: 20140365747

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described.

Type: Application

Filed: December 23, 2011

Publication date: December 11, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Moustapha Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Boris Ginzburg, Ziv Aviv
Permute operations with flexible zero control

Patent number: 8909901

Abstract: In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values so that selected portions of the first and second source operands or a predetermined value can be stored into elements of a destination. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed.

Type: Grant

Filed: December 28, 2007

Date of Patent: December 9, 2014

Assignee: Intel Corporation

Inventors: Cristina Anderson, Mark Buxton, Doron Orenstien, Bob Valentine
Vector loads with multiple vector elements from a same cache line in a scattered load operation

Patent number: 8904153

Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.

Type: Grant

Filed: September 7, 2010

Date of Patent: December 2, 2014

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
VECTOR FREQUENCY COMPRESS INSTRUCTION

Publication number: 20140317377

Abstract: A processor core that includes a hardware decode unit to decode a vector frequency compress instruction that includes a source operand and a destination operand. The source operand specifying a source vector register that includes a plurality of source data elements including one or more runs of identical data elements that are each to be compressed in a destination vector register as a value and run length pair. The destination operand identifies the destination vector register. The processor core also includes an execution engine unit to execute the decoded vector frequency compress instruction which causes, for each source data element, a value to be copied into the destination vector register to indicate that source data element's value. One or more runs of the source data elements equal are encoded in the destination vector register as the predetermined compression value followed by a run length for that run.

Type: Application

Filed: December 30, 2011

Publication date: October 23, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
CYCLE SLICED VECTORS AND SLOT EXECUTION ON A SHARED DATAPATH

Publication number: 20140281368

Abstract: An example method for executing multiple instructions in one or more slots includes receiving a packet including multiple instructions and executing the multiple instructions in one or more slots in a time shared manner. Each slot is associated with an execution data path or a memory data path. An example method for executing at least one instruction in a plurality of phases includes receiving a packet including an instruction, splitting the instruction into a plurality of phases, and executing the instruction in the plurality of phases.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Applicant: QUALCOMM INCORPORATED

Inventors: Ajay Anant Ingle, Lucian Codrescu, David J. Hoyle, Jose Fridman, Marc M. Hoffman, Deepak Mathew
APPARATUS AND METHOD FOR SLIDING WINDOW DATA GATHER

Publication number: 20140281369

Abstract: An apparatus and method are described for fetching and storing a plurality of portions of a data stream into a plurality of registers. For example, a method according to one embodiment includes the following operations: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining the system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses; and storing the N designated portions of the data stream into the N vector registers.

Type: Application

Filed: December 23, 2011

Publication date: September 18, 2014

Inventor: Ashish Jha
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING CONVERSION OF A MASK REGISTER INTO A VECTOR REGISTER.

Publication number: 20140223138

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a mask register into a vector register in response to a single vector packed convert a mask register to a vector register instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.

Type: Application

Filed: December 23, 2011

Publication date: August 7, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Amit Gradstein, Zeev Sperber
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING CONVERSION OF A LIST OF INDEX VALUES INTO A MASK VALUE

Publication number: 20140201499

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a list of index values into a mask value in response to a single vector packed conversion of a list of index values into a mask value instruction that includes a destination writemask register operand, a source vector register operand, and an opcode are described.

Type: Application

Filed: December 23, 2011

Publication date: July 17, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Garrett T. Drysdale
INSTRUCTION AND LOGIC TO PROVIDE VECTOR SCATTER-OP AND GATHER-OP FUNCTIONALITY

Publication number: 20140201498

Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.

Type: Application

Filed: September 26, 2011

Publication date: July 17, 2014

Applicant: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
INSTRUCTION FOR ELEMENT OFFSET CALCULATION IN A MULTI-DIMENSIONAL ARRAY

Publication number: 20140201497

Abstract: An apparatus is described having functional unit logic circuitry. The functional unit logic circuitry has a first register to store a first input vector operand having an element for each dimension of a multi-dimensional data structure. Each element of the first vector operand specifying the size of its respective dimension. The functional unit has a second register to store a second input vector operand specifying coordinates of a particular segment of the multi-dimensional structure. The functional unit also has logic circuitry to calculate an address offset for the particular segment relative to an address of an origin segment of the multi-dimensional structure.

Type: Application

Filed: December 23, 2011

Publication date: July 17, 2014

Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
Vector instruction execution to load vector data in registers of plural vector units using offset addressing logic

Patent number: 8782376

Abstract: A processor including: a first and at least a second data processing channel with enable logic for selectively enabling the second channel; logic for generating first and second storage addresses having a variable offset therebetween based on the same one or more address operands of the same storage access instruction; and circuitry for transferring data between the first address and a register of the first data processing channel and between the second address and a corresponding register of the second channel based on a same one or more register specifier operands of the access instruction. The first data processing channel performs an operation using one or more registers of the first data processing channel, and on condition of being enabled the second channel performs the same operation using a corresponding one or more of its own registers based on the same one or more operands of the data processing instruction.

Type: Grant

Filed: August 26, 2011

Date of Patent: July 15, 2014

Assignee: Icera Inc.

Inventors: Simon Knowles, Edward Andrews, Stephen Felix, Simon Huckett, Colman Hegarty
TRANSPOSE INSTRUCTION

Publication number: 20140164733

Abstract: A transpose instruction is described. A transpose instruction is fetched, where the transpose instruction includes an operand that specifies a vector register or a location in memory. The transpose instruction is decoded. The decoded transpose instruction is executed causing each data element in the specified vector register or location in memory to be stored in that specified vector register or location in memory in reverse order.

Type: Application

Filed: December 30, 2011

Publication date: June 12, 2014

Inventor: Ashish Jha
VERIFICATION OF A VECTOR EXECUTION UNIT DESIGN

Publication number: 20140156969

Abstract: A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.

Type: Application

Filed: December 17, 2013

Publication date: June 5, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: MAARTEN J. BOERSMA, UDO KRAUTZ, ULRIKE SCHMIDT
MULTI-REGISTER GATHER INSTRUCTION

Publication number: 20140149713

Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.

Type: Application

Filed: December 23, 2011

Publication date: May 29, 2014

Inventor: Ashish Jha
VERIFICATION OF A VECTOR EXECUTION UNIT DESIGN

Publication number: 20140136815

Abstract: A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.

Type: Application

Filed: November 12, 2012

Publication date: May 15, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Maarten J. Boersma, Udo Krautz, Ulrike Schmidt
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING DELTA ENCODING ON PACKED DATA ELEMENTS

Publication number: 20140129801

Abstract: Embodiments of systems, apparatuses, and methods for performing delta encoding on packed data elements of a source and storing the results in packed data elements of a destination using a single vector packed delta encode instruction are described.

Type: Application

Filed: December 28, 2011

Publication date: May 8, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale
INSTRUCTION AND LOGIC TO PROVIDE VECTOR COMPRESS AND ROTATE FUNCTIONALITY

Publication number: 20140122831

Abstract: Instructions and logic provide vector compress and rotate functionality. Some embodiments, responsive to an instruction specifying: a vector source, a mask, a vector destination and destination offset, read the mask, and copy corresponding unmasked vector elements from the vector source to adjacent sequential locations in the vector destination, starting at the vector destination offset location. In some embodiments, the unmasked vector elements from the vector source are copied to adjacent sequential element locations modulo the total number of element locations in the vector destination. In some alternative embodiments, copying stops whenever the vector destination is full, and upon copying an unmasked vector element from the vector source to an adjacent sequential element location in the vector destination, the value of a corresponding field in the mask is changed to a masked value. Alternative embodiments zero elements of the vector destination, in which no element from the vector source is copied.

Type: Application

Filed: October 30, 2012

Publication date: May 1, 2014

Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Robert Valentine
Gather cache architecture

Patent number: 8688962

Abstract: Apparatuses and methods to perform gather instructions are presented. In one embodiment, an apparatus comprises a gather logic module which includes a gather logic unit to identify locality of data elements in response to a gather instruction. The apparatus includes memory comprising a plurality of memory rows including a memory row associated with the gather instruction. The apparatus further includes memory structure to store data element addresses accessed in response to the gather instruction.

Type: Grant

Filed: April 1, 2011

Date of Patent: April 1, 2014

Assignee: Intel Corporation

Inventors: Shlomo Raikin, Robert Valentine
Sharing a fault-status register when processing vector instructions

Patent number: 8683178

Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition.

Type: Grant

Filed: April 20, 2011

Date of Patent: March 25, 2014

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Methods, apparatus, and instructions for converting vector data

Patent number: 8667250

Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.

Type: Grant

Filed: December 26, 2007

Date of Patent: March 4, 2014

Assignee: Intel Corporation

Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
VECTOR REGISTER FILE

Publication number: 20140047211

Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.

Type: Application

Filed: August 13, 2012

Publication date: February 13, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
System and method for implementing elliptic curve scalar multiplication in cryptography

Patent number: 8649508

Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.

Type: Grant

Filed: September 29, 2008

Date of Patent: February 11, 2014

Assignee: Tata Consultancy Services Ltd.

Inventor: Natarajan Vijayarangan
Load/move and duplicate instructions for a processor

Patent number: 8650382

Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.

Type: Grant

Filed: September 14, 2012

Date of Patent: February 11, 2014

Assignee: Intel Corporation

Inventor: Patrice Roussel
Vector gather buffer for multiple address vector loads

Patent number: 8635431

Abstract: A dedicated vector gather buffer (VGB) that stores multiple cache lines read from a memory hierarchy in one or more Logical Units (LUs) each having multiple buffer entries and performs parallel operations on vector registers. Once loaded with data, an LU is read using a single port. The VGB initiates prefetch events that keep it full in response to the demand created by ‘gather’ instructions. The VGB includes one or more write ports for receiving data from the memory hierarchy and a read port capable of reading data from the columns of the LU to be loaded into a vector register. Data is extracted from the VGB by (1) using a separate port for each item read, (2) implementing each VGB entry as a shift register and shifting an appropriate amount until all entries are aligned, or (3) enforcing a uniform offset for all items.

Type: Grant

Filed: December 8, 2010

Date of Patent: January 21, 2014

Assignee: International Business Machines Corporation

Inventors: Daniel Citron, Dorit Nuzman
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING VECTOR PACKED COMPRESSION AND REPEAT

Publication number: 20140019712

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed compression and repeat in response to a single vector packed compression and repeat instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.

Type: Application

Filed: December 23, 2011

Publication date: January 16, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
VECTOR FREQUENCY EXPAND INSTRUCTION

Publication number: 20140019714

Abstract: A processor core that includes a hardware decode unit and an execution engine unit. The hardware decode unit to decode a vector frequency expand instruction, wherein the vector frequency compress instruction includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes one or more pairs of a value and run length that are to be expanded into a run of that value based on the run length. The execution engine unit to execute the decoded vector frequency expand instruction which causes, a set of one or more source data elements in the source vector register to be expanded into a set of destination data elements comprising more elements than the set of source data elements and including at least one run of identical values which were run length encoded in the source vector register.

Type: Application

Filed: December 30, 2011

Publication date: January 16, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles Yount, Bret L. Toll
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING A DOUBLE BLOCKED SUM OF ABSOLUTE DIFFERENCES

Publication number: 20140019713

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.

Type: Application

Filed: December 23, 2011

Publication date: January 16, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
EFFICIENT HARDWARE INSTRUCTIONS FOR SINGLE INSTRUCTION MULTIPLE DATA PROCESSORS

Publication number: 20140013076

Abstract: A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.

Type: Application

Filed: September 10, 2013

Publication date: January 9, 2014

Applicant: Oracle International Corporation

Inventors: Amit Ganesh, Shasank K. Chavan, Vineet Marwah, Jesse Kamp, Anindya C. Patthak, Michael J. Gleeson, Allison L. Holloway, Roger Macnicol
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING A HORIZONTAL ADD OR SUBTRACT IN RESPONSE TO A SINGLE INSTRUCTION

Publication number: 20140013075

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.

Type: Application

Filed: December 23, 2011

Publication date: January 9, 2014

Inventors: Mostafa Hagog, Elmoustapha Ould-Aumed-Vall, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
CORE SYSTEM FOR PROCESSING AN INTERRUPT AND METHOD FOR TRANSMISSION OF VECTOR REGISTER FILE DATA THEREFOR

Publication number: 20130238877

Abstract: Provided is a technique for improving the transfer latency of vector register file data when an interrupt is generated. According to an aspect, when interrupt occurs, a core determines whether to store vector register file data currently being executed in a first memory or in a second memory based on whether or not the first memory can store the vector register file data therein. In response to not being able to store the vector register file data in the first memory, a data transfer unit, which is implemented as hardware, is provided to store vector register file data in the second memory.

Type: Application

Filed: November 9, 2012

Publication date: September 12, 2013

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jin-Seok Lee, Dong-Hoon Yoo, Won-Sub Kim, Tai-Song Jin, Hae-Woo Park, Min-Wook Ahn, Hee-Jin Ahn
METHODS, APPARATUS, AND INSTRUCTIONS FOR CONVERTING VECTOR DATA

Publication number: 20130232318

Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.

Type: Application

Filed: March 15, 2013

Publication date: September 5, 2013

Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
Method for efficient data array sorting in a programmable processor

Publication number: 20130212354

Abstract: The present invention provides a method for performing data array sorting of vector elements in a N-wide SIMD that is accelerated by a factor of about N/2 over scalar implementation excluding scalar load/store instructions. A vector compare instruction with ability to compare any two vector elements in accordance to optimized data array sorting algorithms, followed by a vector-multiplex instruction which performs exchanges of vector elements in accordance with condition flags generated by the vector compare instruction provides an efficient but programmable method of performing data sorting with a factor of about N/2 acceleration. A mask bit prevents changes to elements which is not involved in a certain stage of sorting.

Type: Application

Filed: September 20, 2009

Publication date: August 15, 2013

Inventor: Tibet Mimar
System for implementing vector look-up table operations in a SIMD processor

Publication number: 20130212353

Abstract: The present invention incorporates a system for vector Look-Up Table (LUT) operations into a single-instruction multiple-data (SIMD) processor in order to implement plurality of LUT operations simultaneously, where each of the LUT contents could be the same or different. Elements of one or two vector registers are used to form LUT indexes, and the output of vector LUT operation is written into a vector register. No dedicated LUT memory is required; rather, data memory is organized as multiple separate data memory banks, where a portion of each data memory bank is used for LUT operations. For a single-input vector LUT operation, the address input of each LUT is operably coupled to any of the input vector register's elements using input vector element mapping logic in one embodiment. Thus, one input vector element can produce (a positive integer) N output elements using N different LUTs, or (another positive integer) K input vector elements can produce N output elements, where K is an integer from one to N.

Type: Application

Filed: February 3, 2003

Publication date: August 15, 2013

Inventor: Tibet Mimar
SIMD Memory Circuit And Methodology To Support Upsampling, Downsampling And Transposition

Publication number: 20130091339

Abstract: An apparatus and method for creation of reordered vectors from sequential input data for block based decimation, filtering, interpolation and matrix transposition using a memory circuit for a Single Instruction, Multiple Data (SIMD) Digital Signal Processor (DSP). This memory circuit includes a two-dimensional storage array, a rotate-and-distribute unit, a read-controller and a write to controller, to map input vectors containing sequential data elements in columns of the two-dimensional array and extract reordered target vectors from this array. The data elements and memory configuration are received from the SIMD DSP.

Type: Application

Filed: October 5, 2011

Publication date: April 11, 2013

Applicant: ST-Ericsson SA

Inventors: David Van Kampen, Kees Van Berkel, Sven Goossens, Wim Kloosterhuis, Claudiu Zissulescu-Ianculescu
Interleaving data accesses issued in response to vector access instructions

Publication number: 20130080737

Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by the elements to the data store, and configured in response to receipt of at least two decoded vector data access instructions, and one of the instructions being a write instruction. Data accesses are performed in the instructed order to determine an element indicating the next data access for each of said vector data access instructions. One of the next data accesses is selected to be issued to the data store in dependence upon an order in which the at least two vector data instructions were received. The position of the elements indicates the next data accesses relative to each other within their respective plurality of elements. A numerical position of the element indicating the next data access within the plurality of elements of an earlier instruction is less than a predetermined value.

Type: Application

Filed: September 28, 2011

Publication date: March 28, 2013

Applicant: ARM Limited

Inventor: Alastair David Reid
Vector processor with vector register file configured as matrix of data cells each selecting input from generated vector data or data from other cell via predetermined rearrangement path

Patent number: 8375196

Abstract: A data processing apparatus includes a vector register bank having a plurality of vector registers, each register including a plurality of storage cells, each cell storing a data element. A vector processing unit is provided for executing a sequence of vector instructions. The processing unit is arranged to issue a set rearrangement enable signal to the vector register bank. The write interface of the vector register bank is modified to provide not only a first input for receiving the data elements generated by the vector processing unit during normal execution, but also has a second input coupled via a data rearrangement path to the matrix of storage cells via which the data elements currently stored in the matrix of storage cells are provided to the write interface in a rearranged form representing the arrangement of data elements that would be obtained by performance of the predetermined rearrangement operation.

Type: Grant

Filed: January 19, 2010

Date of Patent: February 12, 2013

Assignee: ARM Limited

Inventors: Andreas Björklund, Erik Persson, Ola Hugosson
ACCELERATION OF STRING COMPARISONS USING VECTOR INSTRUCTIONS

Publication number: 20130024653

Abstract: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.

Type: Application

Filed: July 18, 2011

Publication date: January 24, 2013

Inventor: Darryl J. Gove
Break, pre-break, and remaining instructions for processing vectors

Patent number: 8356159

Abstract: The described embodiments provide a system that sets elements in a result vector based on an input vector. During operation, the system determines a location of a key element within the input vector. Next, the system generates a result vector. When generating the result vector, the system sets one or more elements of the result vector based on the location of the key element in the input vector.

Type: Grant

Filed: April 7, 2009

Date of Patent: January 15, 2013

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
Vector processor with plural arithmetic units for processing a vector data string divided into plural register banks accessed by read pointers starting at different positions

Patent number: 8316215

Abstract: It is an object to speed up a vector store instruction on a memory that is divided into banks as setting a plurality of elements as a unit while minimizing an increase in physical quantity. A vector processing apparatus has a plurality of register banks and processes a data string including a plurality of data elements retained in the plurality of register banks, wherein: the plurality of register banks each have a read pointer 113 that points to a read position for reading the data elements; and the start position of the read pointer 113 is changed from one register bank to another. For example, consecutive numbers assigned to the register banks may be used as the read start positions of the respective register banks.

Type: Grant

Filed: March 7, 2008

Date of Patent: November 20, 2012

Assignee: NEC Corporation

Inventor: Noritaka Hoshi
SYSTEM AND METHOD FOR PROVIDING DYNAMIC ADDRESSABILITY OF DATA ELEMENTS IN A REGISTER FILE WITH SUBWORD PARALLELISM

Publication number: 20120260062

Abstract: A method and system for providing dynamic addressability of data elements in a vector register file with subword parallelism. The method includes the steps of: determining a plurality of data elements required for an instruction; storing an address for each of the data elements into a pointer register where the addresses are stored as a number of offsets from the vector register file's origin; reading the addresses from the pointer register; extracting the data elements located at the addresses from the vector register file; and placing the data elements in a subword slot of the vector register file so that the data elements are located on a single vector within the vector register file; where at least one of the steps is carried out using a computer device so that data elements in a vector register file with subword parallelism are dynamically addressable.

Type: Application

Filed: April 7, 2011

Publication date: October 11, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jeffrey H. Derby, Robert K. Montoye
Optimized scalar promotion with load and splat SIMD instructions

Patent number: 8255884

Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

Type: Grant

Filed: June 6, 2008

Date of Patent: August 28, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
SCALAR INTEGER INSTRUCTIONS CAPABLE OF EXECUTION WITH THREE REGISTERS

Publication number: 20120185670

Abstract: A processing core implemented on a semiconductor chip is described. The processing core includes logic circuitry to identify whether vector instructions and integer scalar instructions are to be executed with two registers or three registers, where, in the case of two registers input operand information is destroyed in one of two registers, and, in the case of three registers input operand is not destroyed. The processing core also includes steering circuitry coupled to the logic circuitry. The steering circuitry is to control first data paths between scalar integer execution units and a scalar integer register bank such that two registers are accessed from the scalar register bank if two register execution is identified for the scalar integer instructions or three registers are accessed from the scalar integer register bank if three register execution is identified for the scalar integer instructions.

Type: Application

Filed: January 14, 2011

Publication date: July 19, 2012

Inventors: Bret L. Toll, Robert Valentine, Maxim Locktyukhin, Elmoustapha Ould-Ahmed-Vall

prev 1 2 3 4 5 6 next