Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
  • Patent number: 10268580
    Abstract: Processors and methods implementing a machine instruction to perform cache line demotion on multiple cache lines to enable efficient sharing of cache lines between processor cores. One general aspect includes a processor comprising: a plurality of hardware processor cores, where each of the hardware processor cores to include a first cache. The processor also includes a second cache, communicatively coupled to and shared by the plurality of hardware processor cores. The processor to support a first machine instruction, the first machine instruction to include a vector register operand identifying a vector register which contains a plurality of data elements each used to identify a cache line. An execution of the first machine instruction by one of the plurality of hardware processor cores to cause a plurality of identified cache lines to be demoted, such that the demoted cache lines are moved from the first cache to the second cache.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: April 23, 2019
    Assignee: Intel Corporation
    Inventors: Kshitij A. Doshi, Namakkal N. Venkatesan, Ren Wang, Andrew J. Herdrich
  • Patent number: 10180838
    Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.
    Type: Grant
    Filed: September 19, 2017
    Date of Patent: January 15, 2019
    Assignee: Intel Corporation
    Inventor: Ashish Jha
  • Patent number: 10152323
    Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.
    Type: Grant
    Filed: October 21, 2016
    Date of Patent: December 11, 2018
    Assignee: Intel Corporation
    Inventors: Patrice L. Roussel, William W. Macy, Jr., Huy V. Nguyen, Eric L. Debes
  • Patent number: 10111049
    Abstract: A method, an apparatus, and a computer program product for wireless communication are provided. The apparatus may be a UE. The UE receives at least one of a unicast or a broadcast/multicast communication on a first frequency from a first cell of a serving eNB through a first receive chain. In addition, the UE receives at least one of broadcast/multicast signal, synchronization signal, or reference signal communication on a second frequency from a second cell of the serving eNB through a second receive chain without having received instruction from the serving eNB to receive the at least one of the broadcast/multicast signal, the synchronization signal, or the reference signal communication.
    Type: Grant
    Filed: July 8, 2013
    Date of Patent: October 23, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Jack Shyh-Hurng Shauh, Srinivasan Balasubramanian, Kuo-Chun Lee, Sivaramakrishna Veerepalli, Jun Wang
  • Patent number: 10055225
    Abstract: A processor fetches a multi-register scatter instruction that includes a source operand and a destination operand. The source operand specifies a source vector register that includes multiple source data elements. The destination operand identifies multiple destination data elements that each specify a destination vector register and an index into that destination vector register. The instruction is decoded and executed, causing, for each of those identified destination data elements, the one of the source data elements that is in a position in the source vector register that corresponds with a position of that destination data element to be stored in the destination vector register at the index specified by that destination data element.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: August 21, 2018
    Assignee: Intel Corporation
    Inventor: Ashish Jha
  • Patent number: 10019262
    Abstract: A processor comprises a plurality of vector registers, and an execution unit, operatively coupled to the plurality of vector registers, the execution unit comprising a logic circuit implementing a load instruction for loading, into two or more vector registers, two or more data items associated with a data structure stored in a memory, wherein each one of the two or more vector registers is to store a data item associated with a certain position number within the data structure.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: July 10, 2018
    Assignee: Intel Corporation
    Inventors: Ashish Jha, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark J. Charney, Milind B. Girkar
  • Patent number: 9893880
    Abstract: A method for secure comparison of encrypted symbols. According to one embodiment, a user may encrypt two symbols, share the encrypted symbols with an untrusted third party that can compute algorithms on these symbols without access the original data or encryption keys such that the result of running the algorithm on the encrypted data can be decrypted to a result which is equivalent to the result of running the algorithm on the original unencrypted data. In one embodiment the untrusted third party may perform a sequence of operations on the encrypted symbols to produce an encrypted result which, when decrypted by a trusted party, indicates whether the two symbols are the same.
    Type: Grant
    Filed: November 15, 2013
    Date of Patent: February 13, 2018
    Assignee: RAYTHEON BBN TECHNOLOGIES CORP.
    Inventors: Kurt Rohloff, David Bruce Cousins, Richard Schantz
  • Patent number: 9894371
    Abstract: A method, system and computer program for decompressing video data, storing the compressed video data in such a way that random access is possible and the data can be mapped efficiently into existing memory systems and interface protocols. The compression is accomplished via lossless compression using an algorithm optimized for video data. Due to the compressed format, data transmission consumes less bandwidth than using uncompressed data and prevents degradation in the decoded video.
    Type: Grant
    Filed: October 19, 2016
    Date of Patent: February 13, 2018
    Assignee: Entropic Communications, LLC
    Inventors: Matthew Damian Bates, Mark Alan Baur
  • Patent number: 9843551
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing and transferring messages. An example method includes providing a queue having an ordered plurality of storage blocks. Each storage block stores one or more respective messages and is associated with a respective time. The times increase from a block designating a head of the queue to a block designating a tail of the queue. The method also includes reading, by each of a plurality of first sender processes, messages from one or more blocks in the queue beginning at the head of the queue. The read messages are sent, by each of the plurality of first sender processes, to a respective recipient. One or more of the blocks are designated as old when they have associated times that are earlier than a first time. A block is designated as a new head of the queue when the block is associated with a time later than or equal to the first time.
    Type: Grant
    Filed: October 12, 2016
    Date of Patent: December 12, 2017
    Assignee: Machine Zone, Inc.
    Inventor: Igor Milyakov
  • Patent number: 9798541
    Abstract: An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: October 24, 2017
    Assignee: INTEL CORPORATION
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 9792115
    Abstract: A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: October 17, 2017
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Andrew T. Forsyth, Thomas D. Fletcher, Lisa K. Wu, Eric Sprangle
  • Patent number: 9766887
    Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: September 19, 2017
    Assignee: Intel Corporation
    Inventor: Ashish Jha
  • Patent number: 9760372
    Abstract: A method for combining data values through associative operations. The method includes, with a processor, arranging any number of data values into a plurality of columns according to natural parallelism of the associative operations and reading each column to a register of an individual processor. The processors are directed to combine the data values in the columns in parallel using a first associative operation. The results of the first associative operation for each column are stored in a register of each processor.
    Type: Grant
    Filed: September 1, 2011
    Date of Patent: September 12, 2017
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Ren Wu, Bin Zhang, Meichun Hsu, Qiming Chen
  • Patent number: 9665368
    Abstract: Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: May 30, 2017
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Elmoustapha Ould-Ahmed_Vall, Bret L. Toll, Robert Valentine
  • Patent number: 9594724
    Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.
    Type: Grant
    Filed: August 9, 2012
    Date of Patent: March 14, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
  • Patent number: 9594557
    Abstract: A method provides support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.
    Type: Grant
    Filed: March 18, 2016
    Date of Patent: March 14, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9594556
    Abstract: A circuit arrangement and program product provide support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.
    Type: Grant
    Filed: March 18, 2016
    Date of Patent: March 14, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9582466
    Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.
    Type: Grant
    Filed: August 13, 2012
    Date of Patent: February 28, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
  • Patent number: 9507601
    Abstract: An apparatus for processing a plurality of data sets is disclosed, wherein one data set of the plurality of data sets includes N components and has a data type of one of a scalar type and a vector type, wherein N is a positive integer number. The apparatus includes a memory module and a data accessing module. The memory module comprises N memory units configured to store the plurality of data sets. The data accessing module is configured to write the data set into the memory module according to a write data index corresponding to the data set and one of a first writing mapping information and a second writing mapping information, wherein the first writing mapping information is employed when the data type is one of the scalar and the vector type and the second writing mapping information is employed when the data type is the other of the scalar and the vector type.
    Type: Grant
    Filed: February 19, 2014
    Date of Patent: November 29, 2016
    Assignee: MEDIATEK INC.
    Inventors: Pei-Kuei Tsung, Mu-Fan Murphy Chang, Po-Chun Fan
  • Patent number: 9471289
    Abstract: Systems and methods for source-to-source transformation for compiler optimization for many integrated core (MIC) coprocessors, including identifying data dependencies in candidate loops and data elements used in each iteration for arrays, profiling candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time, and creating an outer loop outside the candidate loop, with each iteration of the outer loop executing m iterations of the candidate loop. Data streaming is performed by determining optimum buffer size for one or more arrays and inserting code before the outer loop to create optimum sized buffers, overlapping data transfer between central processing units (CPUs) and MICs with the computation; reusing buffers to reduce memory employed on the MICs, and reusing threads on MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.
    Type: Grant
    Filed: March 25, 2015
    Date of Patent: October 18, 2016
    Assignee: NEC Corporation
    Inventors: Min Feng, Srimat Chakradhar, Linhai Song
  • Patent number: 9436465
    Abstract: A processor, which executes m number of arithmetic operations in parallel, executes a partial sum instruction which takes an i-th to (i+m?1)-th elements of an input data series as input elements, so as to obtain first vector data, executes the partial sum instruction which takes a (i+x)-th to (i+x+m?1)-th elements of the input data series as the input elements, so as to obtain second vector data, and performs operations to subtract the p-th element of the first vector data and add the p-th element of the second vector data from and to a sum of the i-th to (i+x?1)-th elements of the input data series in parallel for each of the 0-th to (m?1)-th elements, so as to calculate sums of elements for m sections different from each other in parallel, and moving average processing to calculate a moving average from the sums of elements of the sections.
    Type: Grant
    Filed: April 7, 2014
    Date of Patent: September 6, 2016
    Assignee: FUJITSU LIMITED
    Inventors: Makiko Ito, Manabu Kubota, Kazuhiro Nomoto
  • Patent number: 9424031
    Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: August 23, 2016
    Assignee: INTEL CORPORATION
    Inventors: Hariharan Thantry, Mani Azimi
  • Patent number: 9350584
    Abstract: An element selection unit (200) and a method therein for vector element selection. The element selection unit comprises a selector control circuit (404) and a selector data path circuit (406), which data path circuit comprises a plurality of layers of multiplexers. The element selection unit further comprises a receiving circuit (401) configured to receive an instruction to perform a selection of elements from an input vector. The selector control circuit (404) is configured to generate a multiplexer control signal for each multiplexer based on a bit map and on a plurality of relative offset values. The data path circuit is configured to propagate the elements comprised in the input vector through the plurality of layers of multiplexers towards an output vector based on the generated multiplexer control signals. The data path circuit is further configured to write the propagated elements to enabled elements of the output vector.
    Type: Grant
    Filed: June 10, 2013
    Date of Patent: May 24, 2016
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventor: David Van Kampen
  • Patent number: 9268626
    Abstract: An apparatus and method are described for detecting and responding to fault conditions in a processor. For example, one embodiment of a method comprises: reading each active element in succession from a first vector register, each active element specifying an address for a gather or load operation; detecting one or more fault conditions associated with one or more of the active elements; for each active element read in succession prior to a detected fault condition on an element other than the first active element, storing the data loaded from an address associated with the active element in a first output vector register; and for each active element associated with the detected fault condition and following the detected fault condition, setting a bit in an output mask register to indicate the detected fault condition.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: February 23, 2016
    Assignee: INTEL CORPORATION
    Inventors: Jayashankar Bharadwaj, Victor W. Lee, Kim Daehyun, Nalini Vasudevan, Tin-Fook Ngai, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 9268566
    Abstract: Multiple sets of character data having termination characters are compared using parallel processing and without causing unwarranted exceptions. Each set of character data to be compared is loaded within one or more vector registers. In particular, in one embodiment, for each set of character data to be compared, an instruction is used that loads data in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. Further, an instruction is used to find the index of the first delimiter character, i.e., the first zero or null character, or the index of unequal characters. Using these instructions, a location of the end of one of the sets of data or a location of an unequal character is efficiently provided.
    Type: Grant
    Filed: March 15, 2012
    Date of Patent: February 23, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Timothy J. Slegel
  • Publication number: 20150143074
    Abstract: Vector exception handling is facilitated. A vector instruction is executed that operates on one or more elements of a vector register. When an exception is encountered during execution of the instruction, a vector exception code is provided that indicates a position within the vector register that caused the exception. The vector exception code also includes a reason for the exception.
    Type: Application
    Filed: December 5, 2014
    Publication date: May 21, 2015
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz, Timothy J. Slegel
  • Publication number: 20150089189
    Abstract: In an embodiment, a processor may implement a vector instruction set including predicate vectors and multiple vector element sizes. The vector instruction set may include predicate vector pack and unpack instructions. Responsive to the predicate vector pack instruction, the processor may pack predicates from multiple predicate vector source registers into a destination predicate vector register. Responsive to the predicate vector unpack instruction, the processor may select a portion of a source predicate vector register and write the result to a destination predicate vector register. Additionally, the predicate vector register may store one or more vector attributes associated with the corresponding vector. The processor may modify the attribute as part of the pack/unpack operation (e.g. based on a pack/unpack factor). Additionally, vector pack/unpack instructions that are controlled by the attribute in a corresponding predicate vector register may be implemented.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20150088926
    Abstract: Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (“SIMD”) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.
    Type: Application
    Filed: July 22, 2014
    Publication date: March 26, 2015
    Inventors: SHASANK K. CHAVAN, PHUMPONG WATANAPRAKORNKUL
  • Publication number: 20150046671
    Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, a plurality of bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.
    Type: Application
    Filed: August 6, 2013
    Publication date: February 12, 2015
    Inventor: Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20150046672
    Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, at least two bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.
    Type: Application
    Filed: August 6, 2013
    Publication date: February 12, 2015
    Inventors: Terence Sych, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20150039851
    Abstract: Methods, apparatus, instructions and logic provide SIMD vector sub-byte decompression functionality. Embodiments include shuffling a first and second byte into the least significant portion of a first vector element, and a third and fourth byte into the most significant portion. Processing continues shuffling a fifth and sixth byte into the least significant portion of a second vector element, and a seventh and eighth byte into the most significant portion. Then by shifting the first vector element by a first shift count and the second vector element by a second shift count, sub-byte elements are aligned to the least significant bits of their respective bytes. Processors then shuffle a byte from each of the shifted vector elements' least significant portions into byte positions of a destination vector element, and from each of the shifted vector elements' most significant portions into byte positions of another destination vector element.
    Type: Application
    Filed: July 31, 2013
    Publication date: February 5, 2015
    Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Robert Valentine
  • Publication number: 20150039854
    Abstract: Systems and techniques disclosed herein include methods for de-quantization of feature vectors used in automatic speech recognition. A SIMD vector processor is used in one embodiment for efficient vectorized lookup of floating point values in conjunction with fMPE processing for increasing the discriminative power of input signals. These techniques exploit parallelism to effectively reduce the latency of speech recognition in a system operating in a high dimensional feature space. In one embodiment, a bytewise integer lookup operation effectively performs a floating point or a multiple byte lookup.
    Type: Application
    Filed: August 1, 2013
    Publication date: February 5, 2015
    Applicant: Nuance Communications, Inc.
    Inventor: Justin Vaughn Wick
  • Publication number: 20150039852
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, data compaction is performed using vectorized instructions to identify a shuffle mask based on matching bits and update an output array based on the shuffle mask and an input array. In a related technique, a hash table probe involves using vectorized instructions to determine whether each key in one or more hash buckets matches a particular input key.
    Type: Application
    Filed: August 1, 2013
    Publication date: February 5, 2015
    Applicant: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Publication number: 20150039853
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.
    Type: Application
    Filed: August 1, 2013
    Publication date: February 5, 2015
    Applicant: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Publication number: 20140365747
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: December 11, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Moustapha Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Boris Ginzburg, Ziv Aviv
  • Patent number: 8909901
    Abstract: In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values so that selected portions of the first and second source operands or a predetermined value can be stored into elements of a destination. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 28, 2007
    Date of Patent: December 9, 2014
    Assignee: Intel Corporation
    Inventors: Cristina Anderson, Mark Buxton, Doron Orenstien, Bob Valentine
  • Patent number: 8904153
    Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.
    Type: Grant
    Filed: September 7, 2010
    Date of Patent: December 2, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
  • Publication number: 20140317377
    Abstract: A processor core that includes a hardware decode unit to decode a vector frequency compress instruction that includes a source operand and a destination operand. The source operand specifying a source vector register that includes a plurality of source data elements including one or more runs of identical data elements that are each to be compressed in a destination vector register as a value and run length pair. The destination operand identifies the destination vector register. The processor core also includes an execution engine unit to execute the decoded vector frequency compress instruction which causes, for each source data element, a value to be copied into the destination vector register to indicate that source data element's value. One or more runs of the source data elements equal are encoded in the destination vector register as the predetermined compression value followed by a run length for that run.
    Type: Application
    Filed: December 30, 2011
    Publication date: October 23, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
  • Publication number: 20140281368
    Abstract: An example method for executing multiple instructions in one or more slots includes receiving a packet including multiple instructions and executing the multiple instructions in one or more slots in a time shared manner. Each slot is associated with an execution data path or a memory data path. An example method for executing at least one instruction in a plurality of phases includes receiving a packet including an instruction, splitting the instruction into a plurality of phases, and executing the instruction in the plurality of phases.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Ajay Anant Ingle, Lucian Codrescu, David J. Hoyle, Jose Fridman, Marc M. Hoffman, Deepak Mathew
  • Publication number: 20140281369
    Abstract: An apparatus and method are described for fetching and storing a plurality of portions of a data stream into a plurality of registers. For example, a method according to one embodiment includes the following operations: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining the system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses; and storing the N designated portions of the data stream into the N vector registers.
    Type: Application
    Filed: December 23, 2011
    Publication date: September 18, 2014
    Inventor: Ashish Jha
  • Publication number: 20140223138
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a mask register into a vector register in response to a single vector packed convert a mask register to a vector register instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: August 7, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Amit Gradstein, Zeev Sperber
  • Publication number: 20140201497
    Abstract: An apparatus is described having functional unit logic circuitry. The functional unit logic circuitry has a first register to store a first input vector operand having an element for each dimension of a multi-dimensional data structure. Each element of the first vector operand specifying the size of its respective dimension. The functional unit has a second register to store a second input vector operand specifying coordinates of a particular segment of the multi-dimensional structure. The functional unit also has logic circuitry to calculate an address offset for the particular segment relative to an address of an origin segment of the multi-dimensional structure.
    Type: Application
    Filed: December 23, 2011
    Publication date: July 17, 2014
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20140201498
    Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
    Type: Application
    Filed: September 26, 2011
    Publication date: July 17, 2014
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
  • Publication number: 20140201499
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a list of index values into a mask value in response to a single vector packed conversion of a list of index values into a mask value instruction that includes a destination writemask register operand, a source vector register operand, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: July 17, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Garrett T. Drysdale
  • Patent number: 8782376
    Abstract: A processor including: a first and at least a second data processing channel with enable logic for selectively enabling the second channel; logic for generating first and second storage addresses having a variable offset therebetween based on the same one or more address operands of the same storage access instruction; and circuitry for transferring data between the first address and a register of the first data processing channel and between the second address and a corresponding register of the second channel based on a same one or more register specifier operands of the access instruction. The first data processing channel performs an operation using one or more registers of the first data processing channel, and on condition of being enabled the second channel performs the same operation using a corresponding one or more of its own registers based on the same one or more operands of the data processing instruction.
    Type: Grant
    Filed: August 26, 2011
    Date of Patent: July 15, 2014
    Assignee: Icera Inc.
    Inventors: Simon Knowles, Edward Andrews, Stephen Felix, Simon Huckett, Colman Hegarty
  • Publication number: 20140164733
    Abstract: A transpose instruction is described. A transpose instruction is fetched, where the transpose instruction includes an operand that specifies a vector register or a location in memory. The transpose instruction is decoded. The decoded transpose instruction is executed causing each data element in the specified vector register or location in memory to be stored in that specified vector register or location in memory in reverse order.
    Type: Application
    Filed: December 30, 2011
    Publication date: June 12, 2014
    Inventor: Ashish Jha
  • Publication number: 20140156969
    Abstract: A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.
    Type: Application
    Filed: December 17, 2013
    Publication date: June 5, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: MAARTEN J. BOERSMA, UDO KRAUTZ, ULRIKE SCHMIDT
  • Publication number: 20140149713
    Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.
    Type: Application
    Filed: December 23, 2011
    Publication date: May 29, 2014
    Inventor: Ashish Jha
  • Publication number: 20140136815
    Abstract: A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.
    Type: Application
    Filed: November 12, 2012
    Publication date: May 15, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Maarten J. Boersma, Udo Krautz, Ulrike Schmidt
  • Publication number: 20140129801
    Abstract: Embodiments of systems, apparatuses, and methods for performing delta encoding on packed data elements of a source and storing the results in packed data elements of a destination using a single vector packed delta encode instruction are described.
    Type: Application
    Filed: December 28, 2011
    Publication date: May 8, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale