Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
  • Patent number: 11550584
    Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: January 10, 2023
    Assignee: NVIDIA CORPORATION
    Inventors: Maciej Piotr Tyrlik, Ajay Sudarshan Tirumala, Shirish Gadre
  • Patent number: 11544214
    Abstract: A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.
    Type: Grant
    Filed: May 12, 2015
    Date of Patent: January 3, 2023
    Assignee: Optimum Semiconductor Technologies, Inc.
    Inventors: Mayan Moudgill, Gary J. Nacer, C. John Glossner, Arthur Joseph Hoane, Paul Hurtley, Murugappan Senthilvelan, Pablo Balzola, Vitaly Kalashnikov, Sitij Agrawal
  • Patent number: 11537399
    Abstract: In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: December 27, 2022
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Ali Sazegari
  • Patent number: 11468003
    Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop.
    Type: Grant
    Filed: September 21, 2020
    Date of Patent: October 11, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Ching-Yu Hung, Shinri Inamori, Jagadeesh Sankaran, Peter Chang
  • Patent number: 11442726
    Abstract: Vector pack and unpack instructions are described. An instruction to perform a conversion between one decimal format and another decimal format is executed, in which the one decimal format or the other decimal format is a zoned decimal format. The executing includes obtaining a value from at least one register specified using the instruction. At least a portion of the value is converted from the one decimal format to the other decimal format different from the one decimal format to provide a converted result. A result obtained from the converted result is written into a single register specified using the instruction.
    Type: Grant
    Filed: February 26, 2021
    Date of Patent: September 13, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eric Mark Schwarz, Timothy Slegel, Jonathan D. Bradbury, Michael Klein, Reid Copeland, Xin Guo
  • Patent number: 11436010
    Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, and a source vector identifier, wherein the execution circuit is to, for each data element position of a source vector identified by the source vector identifier, determine a nearest matching data element position in the source vector storing a same data value as stored at the data element position, the nearest matching data element position located between the data element position and a least significant data element position of the source vector, and store, in a corresponding data element position of a destination vector identified by the destination vector identifier, a value identifying the determined nearest data element position.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: September 6, 2022
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Christopher J. Hughes, Andrey Naraikin
  • Patent number: 11182561
    Abstract: A similar expression collecting unit (103) collects a word/phrase of an expression similar to an analysis viewpoint word/phrase and a word embedding corresponding the word/phrase from distributed representations of words data. A dimension selecting unit (104) selects a dimension of a word embedding depending on the analysis viewpoint word/phrase, and compresses a word embedding corresponding to the word/phrase collected by the similar expression collecting unit (103) in the selected dimension.
    Type: Grant
    Filed: February 14, 2017
    Date of Patent: November 23, 2021
    Assignee: MITSUBISHI ELECTRIC CORPORATION
    Inventors: Takeyuki Aikawa, Hiroyasu Itsui
  • Patent number: 11126588
    Abstract: A processor circuit is disclosed. The processor circuit includes a data path block circuit configured to perform a data path operation to generate one or more results. The processor circuit also includes a data register files circuit, having a first register file, where the first register file has a first quantity of read and write ports. The data register files circuit also includes a second register file, where the second register file has a second different quantity of read and write ports. The processor circuit also includes an instruction decoder circuit configured to provide an operation signal to the data path block circuit, where the operation signal identifies a particular data path operation to be performed by the data path block circuit and identifies one or more read ports of the data register files circuit for retrieving data encoding the first and second operands.
    Type: Grant
    Filed: July 28, 2020
    Date of Patent: September 21, 2021
    Assignee: SHENZHEN GOODIX TECHNOLOGY CO., LTD.
    Inventor: Jaehoon Heo
  • Patent number: 11113054
    Abstract: Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (“SIMD”) architecture are presented herein. Specifically, methods and apparatuses are discussed for compressing or packing, in parallel, multiple fixed-length values into a stream of multiple variable-length values using SIMD architecture.
    Type: Grant
    Filed: July 15, 2016
    Date of Patent: September 7, 2021
    Assignee: Oracle International Corporation
    Inventors: Shasank K. Chavan, Phumpong Watanaprakornkul, Victor Chen
  • Patent number: 11080049
    Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a controller unit configured to receive a matrix-multiply-matrix (MM) instruction that includes a first starting address of a first matrix, a first size of the first matrix, a second starting address of a second matrix, and a second size of the second matrix; multiple computation modules configured to respectively multiply, in response to the MM instruction, row vectors of the first matrix with column vectors of the second matrix to generate one or more result elements; and an interconnection unit configured to combine the result elements to generate one or more row vectors of a result matrix.
    Type: Grant
    Filed: October 17, 2019
    Date of Patent: August 3, 2021
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 11003450
    Abstract: A vector data transfer instruction is provided for triggering a data transfer between storage locations corresponding to a contiguous block of addresses and multiple data elements of at least one vector register. The instruction specifies a start address of the contiguous block using a base register and an immediate offset value specifies as a multiple of the size of the contiguous block of addresses. This is useful for loop unrolling which can help to improve performance of vectorised code by combining multiple iterations of a loop into a single iteration of an unrolled loop, to reduce the loop control overhead.
    Type: Grant
    Filed: September 14, 2016
    Date of Patent: May 11, 2021
    Assignee: ARM Limited
    Inventor: Nigel John Stephens
  • Patent number: 10963251
    Abstract: There is provided an apparatus that includes a set of vector registers, each of the vector registers being arranged to store a vector comprising a plurality of portions. The set of vector registers is logically divided into a plurality of columns, each of the columns being arranged to store a same portion of each vector. The apparatus also includes register access circuitry that comprises a plurality of access blocks. Each access block is arranged to access a portion in a different column when accessing one of the vector registers than when accessing at least one other of the vector registers. The register access circuitry is arranged to simultaneously access portions in any one of: the vector registers and the columns.
    Type: Grant
    Filed: June 15, 2017
    Date of Patent: March 30, 2021
    Assignee: ARM Limited
    Inventor: Thomas Christopher Grocutt
  • Patent number: 10795678
    Abstract: Neural network processors including a vector register file (VRF) having a multi-port memory and related methods are provided. The processor may include tiles to process an N by N matrix of data elements and an N by 1 vector of data elements. The VRF may, in response to a write instruction, store N data elements in a multi-port memory and during each one of out of P clock cycles provide N data elements to each one of P input interface circuits of the multi-port memory comprising an input lane configured to carry L data elements in parallel. During the each one of the P clock cycles the multi-port memory may be configured to receive N data elements via a selected at least one of the P input interface circuits. The VRF may include output interface circuits for providing N data elements in response to a read instruction.
    Type: Grant
    Filed: April 21, 2018
    Date of Patent: October 6, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Patent number: 10713046
    Abstract: Methods and System for use on a memory controller are disclosed which provides atomic compute operations of any size using an asynchronous, pipelined message passing interface between clients and the memory controller.
    Type: Grant
    Filed: December 20, 2017
    Date of Patent: July 14, 2020
    Assignee: EXTEN TECHNOLOGIES, INC.
    Inventors: Daniel B. Reents, Michael Enz, Ashwin Kamath
  • Patent number: 10671392
    Abstract: Systems, apparatuses, and methods for performing delta decoding on packed data elements of a source and storing the results in packed data elements of a destination using a single packed delta decode instruction are described. A processor may include a decoder to decode an instruction, and execution unit to execute the decoded instruction to calculate for each packed data element position of a source operand, other than a first packed data element position, a value that comprises a packed data element of that packed data element position and all packed data elements of packed data element positions that are of lesser significance, store a first packed data element from the first packed data element position of the source operand into a corresponding first packed data element position of a destination operand, and for each calculated value, store the value into a corresponding packed data element position of the destination operand.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: June 2, 2020
    Assignee: INTEL CORPORATION
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale
  • Patent number: 10671292
    Abstract: A method of orchestrated shuffling of data in a non-uniform memory access device including a plurality of processing nodes that connected by interconnects. The method includes running an application on a plurality of threads executing on the plurality of processing nodes. Data to be shuffled is identified from source threads running on source processing nodes among the processing nodes to target threads executing on target processing nodes among the processing nodes. The method further includes generating a plan for orchestrating the shuffling of the data among the all of the memory devices associated with the threads and for simultaneously transmitting data over different interconnects to a plurality of different target processing nodes from a plurality of different source processing nodes. The data is shuffled among all of the memory devices based on the plan and each processing node is capable of accessing data from first and second local memory devices.
    Type: Grant
    Filed: December 1, 2017
    Date of Patent: June 2, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yinan Li, Guy M. Lohman, Rene Mueller, Ippokratis Pandis, Vijayshankar Raman
  • Patent number: 10666984
    Abstract: Methods and apparatus for coding video information having a plurality of video samples include partitioning samples into groups for transmission within a single clock cycle, wherein the samples are associated with a bit length B, and a group having a group size K. The sample group is mapped to a code number and coded to form a vector-based code comprising a first portion identifying a type of look-up-table used to performing the mapping, and a second portion representing the samples of the group. The look-up-table may be constructed based upon occurrence probabilities of different sample groups. In addition, different types of look-up-tables may be used for different B and K values.
    Type: Grant
    Filed: March 3, 2017
    Date of Patent: May 26, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Vijayaraghavan Thirumalai, Natan Haim Jacobson, Rajan Laxman Joshi
  • Patent number: 10534544
    Abstract: A method of orchestrated shuffling of data in a non-uniform memory access device that includes a plurality of processing nodes that are connected by interconnects. The method includes running an application on a plurality of threads executing on the plurality of processing nodes. Data to be shuffled is identified from source threads running on source processing nodes among the processing nodes to target threads executing on target processing nodes among the processing nodes. The method further includes generating a plan for orchestrating the shuffling of the data among the all of the memory devices associated with the threads and for simultaneously transmitting data over different interconnects to a plurality of different target processing nodes from a plurality of different source processing nodes. The data is shuffled among all of the memory devices based on the plan. The data includes operand data and operational state data of the source threads.
    Type: Grant
    Filed: December 1, 2017
    Date of Patent: January 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yinan Li, Guy M. Lohman, Rene Mueller, Ippokratis Pandis, Vijayshankar Raman
  • Patent number: 10521128
    Abstract: A method of orchestrated shuffling of data in a non-uniform memory access device that includes a plurality of processing nodes connected by interconnects. The method includes running an application on a plurality of threads executing on the plurality of processing nodes. Data to be shuffled is identified from source threads running on source processing nodes among the processing nodes to target threads executing on target processing nodes among the processing nodes. The method further includes generating a plan for orchestrating the shuffling of the data among the all of the memory devices associated with the threads and for simultaneously transmitting data over different interconnects to a plurality of different target processing nodes from a plurality of different source processing nodes. The data is shuffled among all of the memory devices based on the plan. Shifting the data-shifting table includes rotating a first ring with respect to a second ring.
    Type: Grant
    Filed: December 1, 2017
    Date of Patent: December 31, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yinan Li, Guy M. Lohman, Rene Mueller, Ippokratis Pandis, Vijayshankar Raman
  • Patent number: 10430192
    Abstract: Data processing apparatus comprises processing circuitry to selectively apply vector processing operations to one or more data items of a data vector comprising a plurality of data items at respective positions in the data vector, according to the state of respective predicate flags associated with the positions; the processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; wherein the instruction decoder circuitry is responsive to a WHILE instruction and a CHANGE instruction, to control the instruction processing dependent upon a number of the predicate flags.
    Type: Grant
    Filed: July 28, 2016
    Date of Patent: October 1, 2019
    Assignee: ARM Limited
    Inventors: Nigel John Stephens, Grigorios Magklis, Alejandro Martinez Vicente, Nathanael Premillieu, Mbou Eyole
  • Patent number: 10430229
    Abstract: To use SIMD lanes efficiently for domain shader execution, domain point data from different domain shader patches may be packed together into a single SIMD thread. To generate an efficient code sequence, each domain point occupies one SIMD lane and all attributes for the domain point reside in their own partition of General Register File (GRF) space. This technique is called the multiple-patch SIMD dispatch mode.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: October 1, 2019
    Assignee: Intel Corporation
    Inventors: Jayashree Venkatesh, Guei-Yuan Lueh, Subramaniam Maiyuran
  • Patent number: 10411881
    Abstract: A data processing apparatus for rearranging multiple items of data to be input, includes a processor; a memory; and an input unit configured to receive as input a rearrangement number with which a rearrangement pattern of the data can be identified. The processor executes calculating a rearrangement destination for each of the items of the data based on the rearrangement number; and rearranging the data based on the rearrangement destinations.
    Type: Grant
    Filed: April 24, 2017
    Date of Patent: September 10, 2019
    Assignee: FUJI ELECTRIC CO., LTD.
    Inventor: Kenji Takatsukasa
  • Patent number: 10372447
    Abstract: An instruction defined to be a looping instruction is obtained and processed. A determination is made as to whether an obtained selected character is an expected selected character. Based on the obtained selected character being the expected selected character, an execution process is used that includes a sequence of operations to perform an operation, the sequence of operations replacing a loop and providing a non-looping sequence to perform the operation on up to a defined number of units of data. The sequence of operations is configured to repeat one or more times and to terminate based on the obtained selected character. Based on the obtained selected character being different than the expected selected character, an alternate execution process is chosen.
    Type: Grant
    Filed: December 7, 2018
    Date of Patent: August 6, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 10372448
    Abstract: An instruction defined to be a looping instruction is obtained and processed. A determination is made as to whether an obtained selected character is an expected selected character. Based on the obtained selected character being the expected selected character, an execution process is used that includes a sequence of operations to perform an operation, the sequence of operations replacing a loop and providing a non-looping sequence to perform the operation on up to a defined number of units of data. The sequence of operations is configured to repeat one or more times and to terminate based on the obtained selected character. Based on the obtained selected character being different than the expected selected character, an alternate execution process is chosen.
    Type: Grant
    Filed: December 7, 2018
    Date of Patent: August 6, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 10268580
    Abstract: Processors and methods implementing a machine instruction to perform cache line demotion on multiple cache lines to enable efficient sharing of cache lines between processor cores. One general aspect includes a processor comprising: a plurality of hardware processor cores, where each of the hardware processor cores to include a first cache. The processor also includes a second cache, communicatively coupled to and shared by the plurality of hardware processor cores. The processor to support a first machine instruction, the first machine instruction to include a vector register operand identifying a vector register which contains a plurality of data elements each used to identify a cache line. An execution of the first machine instruction by one of the plurality of hardware processor cores to cause a plurality of identified cache lines to be demoted, such that the demoted cache lines are moved from the first cache to the second cache.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: April 23, 2019
    Assignee: Intel Corporation
    Inventors: Kshitij A. Doshi, Namakkal N. Venkatesan, Ren Wang, Andrew J. Herdrich
  • Patent number: 10180838
    Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.
    Type: Grant
    Filed: September 19, 2017
    Date of Patent: January 15, 2019
    Assignee: Intel Corporation
    Inventor: Ashish Jha
  • Patent number: 10152323
    Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.
    Type: Grant
    Filed: October 21, 2016
    Date of Patent: December 11, 2018
    Assignee: Intel Corporation
    Inventors: Patrice L. Roussel, William W. Macy, Jr., Huy V. Nguyen, Eric L. Debes
  • Patent number: 10111049
    Abstract: A method, an apparatus, and a computer program product for wireless communication are provided. The apparatus may be a UE. The UE receives at least one of a unicast or a broadcast/multicast communication on a first frequency from a first cell of a serving eNB through a first receive chain. In addition, the UE receives at least one of broadcast/multicast signal, synchronization signal, or reference signal communication on a second frequency from a second cell of the serving eNB through a second receive chain without having received instruction from the serving eNB to receive the at least one of the broadcast/multicast signal, the synchronization signal, or the reference signal communication.
    Type: Grant
    Filed: July 8, 2013
    Date of Patent: October 23, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Jack Shyh-Hurng Shauh, Srinivasan Balasubramanian, Kuo-Chun Lee, Sivaramakrishna Veerepalli, Jun Wang
  • Patent number: 10055225
    Abstract: A processor fetches a multi-register scatter instruction that includes a source operand and a destination operand. The source operand specifies a source vector register that includes multiple source data elements. The destination operand identifies multiple destination data elements that each specify a destination vector register and an index into that destination vector register. The instruction is decoded and executed, causing, for each of those identified destination data elements, the one of the source data elements that is in a position in the source vector register that corresponds with a position of that destination data element to be stored in the destination vector register at the index specified by that destination data element.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: August 21, 2018
    Assignee: Intel Corporation
    Inventor: Ashish Jha
  • Patent number: 10019262
    Abstract: A processor comprises a plurality of vector registers, and an execution unit, operatively coupled to the plurality of vector registers, the execution unit comprising a logic circuit implementing a load instruction for loading, into two or more vector registers, two or more data items associated with a data structure stored in a memory, wherein each one of the two or more vector registers is to store a data item associated with a certain position number within the data structure.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: July 10, 2018
    Assignee: Intel Corporation
    Inventors: Ashish Jha, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark J. Charney, Milind B. Girkar
  • Patent number: 9893880
    Abstract: A method for secure comparison of encrypted symbols. According to one embodiment, a user may encrypt two symbols, share the encrypted symbols with an untrusted third party that can compute algorithms on these symbols without access the original data or encryption keys such that the result of running the algorithm on the encrypted data can be decrypted to a result which is equivalent to the result of running the algorithm on the original unencrypted data. In one embodiment the untrusted third party may perform a sequence of operations on the encrypted symbols to produce an encrypted result which, when decrypted by a trusted party, indicates whether the two symbols are the same.
    Type: Grant
    Filed: November 15, 2013
    Date of Patent: February 13, 2018
    Assignee: RAYTHEON BBN TECHNOLOGIES CORP.
    Inventors: Kurt Rohloff, David Bruce Cousins, Richard Schantz
  • Patent number: 9894371
    Abstract: A method, system and computer program for decompressing video data, storing the compressed video data in such a way that random access is possible and the data can be mapped efficiently into existing memory systems and interface protocols. The compression is accomplished via lossless compression using an algorithm optimized for video data. Due to the compressed format, data transmission consumes less bandwidth than using uncompressed data and prevents degradation in the decoded video.
    Type: Grant
    Filed: October 19, 2016
    Date of Patent: February 13, 2018
    Assignee: Entropic Communications, LLC
    Inventors: Matthew Damian Bates, Mark Alan Baur
  • Patent number: 9843551
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing and transferring messages. An example method includes providing a queue having an ordered plurality of storage blocks. Each storage block stores one or more respective messages and is associated with a respective time. The times increase from a block designating a head of the queue to a block designating a tail of the queue. The method also includes reading, by each of a plurality of first sender processes, messages from one or more blocks in the queue beginning at the head of the queue. The read messages are sent, by each of the plurality of first sender processes, to a respective recipient. One or more of the blocks are designated as old when they have associated times that are earlier than a first time. A block is designated as a new head of the queue when the block is associated with a time later than or equal to the first time.
    Type: Grant
    Filed: October 12, 2016
    Date of Patent: December 12, 2017
    Assignee: Machine Zone, Inc.
    Inventor: Igor Milyakov
  • Patent number: 9798541
    Abstract: An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: October 24, 2017
    Assignee: INTEL CORPORATION
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 9792115
    Abstract: A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: October 17, 2017
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Andrew T. Forsyth, Thomas D. Fletcher, Lisa K. Wu, Eric Sprangle
  • Patent number: 9766887
    Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: September 19, 2017
    Assignee: Intel Corporation
    Inventor: Ashish Jha
  • Patent number: 9760372
    Abstract: A method for combining data values through associative operations. The method includes, with a processor, arranging any number of data values into a plurality of columns according to natural parallelism of the associative operations and reading each column to a register of an individual processor. The processors are directed to combine the data values in the columns in parallel using a first associative operation. The results of the first associative operation for each column are stored in a register of each processor.
    Type: Grant
    Filed: September 1, 2011
    Date of Patent: September 12, 2017
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Ren Wu, Bin Zhang, Meichun Hsu, Qiming Chen
  • Patent number: 9665368
    Abstract: Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: May 30, 2017
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Elmoustapha Ould-Ahmed_Vall, Bret L. Toll, Robert Valentine
  • Patent number: 9594724
    Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.
    Type: Grant
    Filed: August 9, 2012
    Date of Patent: March 14, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
  • Patent number: 9594556
    Abstract: A circuit arrangement and program product provide support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.
    Type: Grant
    Filed: March 18, 2016
    Date of Patent: March 14, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9594557
    Abstract: A method provides support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.
    Type: Grant
    Filed: March 18, 2016
    Date of Patent: March 14, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9582466
    Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.
    Type: Grant
    Filed: August 13, 2012
    Date of Patent: February 28, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
  • Patent number: 9507601
    Abstract: An apparatus for processing a plurality of data sets is disclosed, wherein one data set of the plurality of data sets includes N components and has a data type of one of a scalar type and a vector type, wherein N is a positive integer number. The apparatus includes a memory module and a data accessing module. The memory module comprises N memory units configured to store the plurality of data sets. The data accessing module is configured to write the data set into the memory module according to a write data index corresponding to the data set and one of a first writing mapping information and a second writing mapping information, wherein the first writing mapping information is employed when the data type is one of the scalar and the vector type and the second writing mapping information is employed when the data type is the other of the scalar and the vector type.
    Type: Grant
    Filed: February 19, 2014
    Date of Patent: November 29, 2016
    Assignee: MEDIATEK INC.
    Inventors: Pei-Kuei Tsung, Mu-Fan Murphy Chang, Po-Chun Fan
  • Patent number: 9471289
    Abstract: Systems and methods for source-to-source transformation for compiler optimization for many integrated core (MIC) coprocessors, including identifying data dependencies in candidate loops and data elements used in each iteration for arrays, profiling candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time, and creating an outer loop outside the candidate loop, with each iteration of the outer loop executing m iterations of the candidate loop. Data streaming is performed by determining optimum buffer size for one or more arrays and inserting code before the outer loop to create optimum sized buffers, overlapping data transfer between central processing units (CPUs) and MICs with the computation; reusing buffers to reduce memory employed on the MICs, and reusing threads on MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.
    Type: Grant
    Filed: March 25, 2015
    Date of Patent: October 18, 2016
    Assignee: NEC Corporation
    Inventors: Min Feng, Srimat Chakradhar, Linhai Song
  • Patent number: 9436465
    Abstract: A processor, which executes m number of arithmetic operations in parallel, executes a partial sum instruction which takes an i-th to (i+m?1)-th elements of an input data series as input elements, so as to obtain first vector data, executes the partial sum instruction which takes a (i+x)-th to (i+x+m?1)-th elements of the input data series as the input elements, so as to obtain second vector data, and performs operations to subtract the p-th element of the first vector data and add the p-th element of the second vector data from and to a sum of the i-th to (i+x?1)-th elements of the input data series in parallel for each of the 0-th to (m?1)-th elements, so as to calculate sums of elements for m sections different from each other in parallel, and moving average processing to calculate a moving average from the sums of elements of the sections.
    Type: Grant
    Filed: April 7, 2014
    Date of Patent: September 6, 2016
    Assignee: FUJITSU LIMITED
    Inventors: Makiko Ito, Manabu Kubota, Kazuhiro Nomoto
  • Patent number: 9424031
    Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: August 23, 2016
    Assignee: INTEL CORPORATION
    Inventors: Hariharan Thantry, Mani Azimi
  • Patent number: 9350584
    Abstract: An element selection unit (200) and a method therein for vector element selection. The element selection unit comprises a selector control circuit (404) and a selector data path circuit (406), which data path circuit comprises a plurality of layers of multiplexers. The element selection unit further comprises a receiving circuit (401) configured to receive an instruction to perform a selection of elements from an input vector. The selector control circuit (404) is configured to generate a multiplexer control signal for each multiplexer based on a bit map and on a plurality of relative offset values. The data path circuit is configured to propagate the elements comprised in the input vector through the plurality of layers of multiplexers towards an output vector based on the generated multiplexer control signals. The data path circuit is further configured to write the propagated elements to enabled elements of the output vector.
    Type: Grant
    Filed: June 10, 2013
    Date of Patent: May 24, 2016
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventor: David Van Kampen
  • Patent number: 9268566
    Abstract: Multiple sets of character data having termination characters are compared using parallel processing and without causing unwarranted exceptions. Each set of character data to be compared is loaded within one or more vector registers. In particular, in one embodiment, for each set of character data to be compared, an instruction is used that loads data in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. Further, an instruction is used to find the index of the first delimiter character, i.e., the first zero or null character, or the index of unequal characters. Using these instructions, a location of the end of one of the sets of data or a location of an unequal character is efficiently provided.
    Type: Grant
    Filed: March 15, 2012
    Date of Patent: February 23, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Timothy J. Slegel
  • Patent number: 9268626
    Abstract: An apparatus and method are described for detecting and responding to fault conditions in a processor. For example, one embodiment of a method comprises: reading each active element in succession from a first vector register, each active element specifying an address for a gather or load operation; detecting one or more fault conditions associated with one or more of the active elements; for each active element read in succession prior to a detected fault condition on an element other than the first active element, storing the data loaded from an address associated with the active element in a first output vector register; and for each active element associated with the detected fault condition and following the detected fault condition, setting a bit in an output mask register to indicate the detected fault condition.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: February 23, 2016
    Assignee: INTEL CORPORATION
    Inventors: Jayashankar Bharadwaj, Victor W. Lee, Kim Daehyun, Nalini Vasudevan, Tin-Fook Ngai, Albert Hartono, Sara S. Baghsorkhi
  • Publication number: 20150143074
    Abstract: Vector exception handling is facilitated. A vector instruction is executed that operates on one or more elements of a vector register. When an exception is encountered during execution of the instruction, a vector exception code is provided that indicates a position within the vector register that caused the exception. The vector exception code also includes a reason for the exception.
    Type: Application
    Filed: December 5, 2014
    Publication date: May 21, 2015
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz, Timothy J. Slegel