Distributing Of Vector Data To Vector Registers Patents (Class 712/4)

Masking to control an access to data in vector register (Class 712/5)

Instructions to convert from FP16 to BF8

Patent number: 12135968

Abstract: Techniques for converting FP16 to BF8 using bias are described.

Type: Grant

Filed: December 26, 2020

Date of Patent: November 5, 2024

Assignee: Intel Corporation

Inventors: Alexander Heinecke, Naveen Mellempudi, Robert Valentine, Mark Charney, Christopher Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Performing load and store operations of 2D arrays in a single cycle in a system on a chip

Patent number: 12099439

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: August 2, 2021

Date of Patent: September 24, 2024

Assignee: NVIDIA Corporation

Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Implementing specialized instructions for accelerating Smith-Waterman sequence alignments

Patent number: 11550584

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Grant

Filed: September 30, 2021

Date of Patent: January 10, 2023

Assignee: NVIDIA CORPORATION

Inventors: Maciej Piotr Tyrlik, Ajay Sudarshan Tirumala, Shirish Gadre
Monolithic vector processor configured to operate on variable length vectors using a vector length register

Patent number: 11544214

Abstract: A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.

Type: Grant

Filed: May 12, 2015

Date of Patent: January 3, 2023

Assignee: Optimum Semiconductor Technologies, Inc.

Inventors: Mayan Moudgill, Gary J. Nacer, C. John Glossner, Arthur Joseph Hoane, Paul Hurtley, Murugappan Senthilvelan, Pablo Balzola, Vitaly Kalashnikov, Sitij Agrawal
Compression assist instructions

Patent number: 11537399

Abstract: In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.

Type: Grant

Filed: July 12, 2021

Date of Patent: December 27, 2022

Assignee: Apple Inc.

Inventors: Eric Bainville, Ali Sazegari
Vector table load instruction with address generation field to access table offset value

Patent number: 11468003

Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop.

Type: Grant

Filed: September 21, 2020

Date of Patent: October 11, 2022

Assignee: Texas Instruments Incorporated

Inventors: Ching-Yu Hung, Shinri Inamori, Jagadeesh Sankaran, Peter Chang
Vector pack and unpack instructions

Patent number: 11442726

Abstract: Vector pack and unpack instructions are described. An instruction to perform a conversion between one decimal format and another decimal format is executed, in which the one decimal format or the other decimal format is a zoned decimal format. The executing includes obtaining a value from at least one register specified using the instruction. At least a portion of the value is converted from the one decimal format to the other decimal format different from the one decimal format to provide a converted result. A result obtained from the converted result is written into a single register specified using the instruction.

Type: Grant

Filed: February 26, 2021

Date of Patent: September 13, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eric Mark Schwarz, Timothy Slegel, Jonathan D. Bradbury, Michael Klein, Reid Copeland, Xin Guo
Method and apparatus for vectorizing indirect update loops

Patent number: 11436010

Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, and a source vector identifier, wherein the execution circuit is to, for each data element position of a source vector identified by the source vector identifier, determine a nearest matching data element position in the source vector storing a same data value as stored at the data element position, the nearest matching data element position located between the data element position and a least significant data element position of the source vector, and store, in a corresponding data element position of a destination vector identified by the destination vector identifier, a value identifying the determined nearest data element position.

Type: Grant

Filed: June 30, 2017

Date of Patent: September 6, 2022

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Christopher J. Hughes, Andrey Naraikin
Data analyzer and data analysis method

Patent number: 11182561

Abstract: A similar expression collecting unit (103) collects a word/phrase of an expression similar to an analysis viewpoint word/phrase and a word embedding corresponding the word/phrase from distributed representations of words data. A dimension selecting unit (104) selects a dimension of a word embedding depending on the analysis viewpoint word/phrase, and compresses a word embedding corresponding to the word/phrase collected by the similar expression collecting unit (103) in the selected dimension.

Type: Grant

Filed: February 14, 2017

Date of Patent: November 23, 2021

Assignee: MITSUBISHI ELECTRIC CORPORATION

Inventors: Takeyuki Aikawa, Hiroyasu Itsui
RISC processor having specialized registers

Patent number: 11126588

Abstract: A processor circuit is disclosed. The processor circuit includes a data path block circuit configured to perform a data path operation to generate one or more results. The processor circuit also includes a data register files circuit, having a first register file, where the first register file has a first quantity of read and write ports. The data register files circuit also includes a second register file, where the second register file has a second different quantity of read and write ports. The processor circuit also includes an instruction decoder circuit configured to provide an operation signal to the data path block circuit, where the operation signal identifies a particular data path operation to be performed by the data path block circuit and identifies one or more read ports of the data register files circuit for retrieving data encoding the first and second operands.

Type: Grant

Filed: July 28, 2020

Date of Patent: September 21, 2021

Assignee: SHENZHEN GOODIX TECHNOLOGY CO., LTD.

Inventor: Jaehoon Heo
Efficient hardware instructions for single instruction multiple data processors: fast fixed-length value compression

Patent number: 11113054

Abstract: Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (“SIMD”) architecture are presented herein. Specifically, methods and apparatuses are discussed for compressing or packing, in parallel, multiple fixed-length values into a stream of multiple variable-length values using SIMD architecture.

Type: Grant

Filed: July 15, 2016

Date of Patent: September 7, 2021

Assignee: Oracle International Corporation

Inventors: Shasank K. Chavan, Phumpong Watanaprakornkul, Victor Chen
Apparatus and methods for matrix multiplication

Patent number: 11080049

Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a controller unit configured to receive a matrix-multiply-matrix (MM) instruction that includes a first starting address of a first matrix, a first size of the first matrix, a second starting address of a second matrix, and a second size of the second matrix; multiple computation modules configured to respectively multiply, in response to the MM instruction, row vectors of the first matrix with column vectors of the second matrix to generate one or more result elements; and an interconnection unit configured to combine the result elements to generate one or more row vectors of a result matrix.

Type: Grant

Filed: October 17, 2019

Date of Patent: August 3, 2021

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
Vector data transfer instruction

Patent number: 11003450

Abstract: A vector data transfer instruction is provided for triggering a data transfer between storage locations corresponding to a contiguous block of addresses and multiple data elements of at least one vector register. The instruction specifies a start address of the contiguous block using a base register and an immediate offset value specifies as a multiple of the size of the contiguous block of addresses. This is useful for loop unrolling which can help to improve performance of vectorised code by combining multiple iterations of a loop into a single iteration of an unrolled loop, to reduce the loop control overhead.

Type: Grant

Filed: September 14, 2016

Date of Patent: May 11, 2021

Assignee: ARM Limited

Inventor: Nigel John Stephens
Vector register access

Patent number: 10963251

Abstract: There is provided an apparatus that includes a set of vector registers, each of the vector registers being arranged to store a vector comprising a plurality of portions. The set of vector registers is logically divided into a plurality of columns, each of the columns being arranged to store a same portion of each vector. The apparatus also includes register access circuitry that comprises a plurality of access blocks. Each access block is arranged to access a portion in a different column when accessing one of the vector registers than when accessing at least one other of the vector registers. The register access circuitry is arranged to simultaneously access portions in any one of: the vector registers and the columns.

Type: Grant

Filed: June 15, 2017

Date of Patent: March 30, 2021

Assignee: ARM Limited

Inventor: Thomas Christopher Grocutt
Matrix vector multiplier with a vector register file comprising a multi-port memory

Patent number: 10795678

Abstract: Neural network processors including a vector register file (VRF) having a multi-port memory and related methods are provided. The processor may include tiles to process an N by N matrix of data elements and an N by 1 vector of data elements. The VRF may, in response to a write instruction, store N data elements in a multi-port memory and during each one of out of P clock cycles provide N data elements to each one of P input interface circuits of the multi-port memory comprising an input lane configured to carry L data elements in parallel. During the each one of the P clock cycles the multi-port memory may be configured to receive N data elements via a selected at least one of the P input interface circuits. The VRF may include output interface circuits for providing N data elements in response to a read instruction.

Type: Grant

Filed: April 21, 2018

Date of Patent: October 6, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
System memory controller with atomic operations

Patent number: 10713046

Abstract: Methods and System for use on a memory controller are disclosed which provides atomic compute operations of any size using an asynchronous, pipelined message passing interface between clients and the memory controller.

Type: Grant

Filed: December 20, 2017

Date of Patent: July 14, 2020

Assignee: EXTEN TECHNOLOGIES, INC.

Inventors: Daniel B. Reents, Michael Enz, Ashwin Kamath
Systems, apparatuses, and methods for performing delta decoding on packed data elements

Patent number: 10671392

Abstract: Systems, apparatuses, and methods for performing delta decoding on packed data elements of a source and storing the results in packed data elements of a destination using a single packed delta decode instruction are described. A processor may include a decoder to decode an instruction, and execution unit to execute the decoded instruction to calculate for each packed data element position of a source operand, other than a first packed data element position, a value that comprises a packed data element of that packed data element position and all packed data elements of packed data element positions that are of lesser significance, store a first packed data element from the first packed data element position of the source operand into a corresponding first packed data element position of a destination operand, and for each calculated value, store the value into a corresponding packed data element position of the destination operand.

Type: Grant

Filed: July 31, 2018

Date of Patent: June 2, 2020

Assignee: INTEL CORPORATION

Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale
Data shuffling in a non-uniform memory access device

Patent number: 10671292

Abstract: A method of orchestrated shuffling of data in a non-uniform memory access device including a plurality of processing nodes that connected by interconnects. The method includes running an application on a plurality of threads executing on the plurality of processing nodes. Data to be shuffled is identified from source threads running on source processing nodes among the processing nodes to target threads executing on target processing nodes among the processing nodes. The method further includes generating a plan for orchestrating the shuffling of the data among the all of the memory devices associated with the threads and for simultaneously transmitting data over different interconnects to a plurality of different target processing nodes from a plurality of different source processing nodes. The data is shuffled among all of the memory devices based on the plan and each processing node is capable of accessing data from first and second local memory devices.

Type: Grant

Filed: December 1, 2017

Date of Patent: June 2, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yinan Li, Guy M. Lohman, Rene Mueller, Ippokratis Pandis, Vijayshankar Raman
Apparatus and method for vector-based entropy coding for display stream compression

Patent number: 10666984

Abstract: Methods and apparatus for coding video information having a plurality of video samples include partitioning samples into groups for transmission within a single clock cycle, wherein the samples are associated with a bit length B, and a group having a group size K. The sample group is mapped to a code number and coded to form a vector-based code comprising a first portion identifying a type of look-up-table used to performing the mapping, and a second portion representing the samples of the group. The look-up-table may be constructed based upon occurrence probabilities of different sample groups. In addition, different types of look-up-tables may be used for different B and K values.

Type: Grant

Filed: March 3, 2017

Date of Patent: May 26, 2020

Assignee: QUALCOMM Incorporated

Inventors: Vijayaraghavan Thirumalai, Natan Haim Jacobson, Rajan Laxman Joshi
Data shuffling in a non-uniform memory access device

Patent number: 10534544

Abstract: A method of orchestrated shuffling of data in a non-uniform memory access device that includes a plurality of processing nodes that are connected by interconnects. The method includes running an application on a plurality of threads executing on the plurality of processing nodes. Data to be shuffled is identified from source threads running on source processing nodes among the processing nodes to target threads executing on target processing nodes among the processing nodes. The method further includes generating a plan for orchestrating the shuffling of the data among the all of the memory devices associated with the threads and for simultaneously transmitting data over different interconnects to a plurality of different target processing nodes from a plurality of different source processing nodes. The data is shuffled among all of the memory devices based on the plan. The data includes operand data and operational state data of the source threads.

Type: Grant

Filed: December 1, 2017

Date of Patent: January 14, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yinan Li, Guy M. Lohman, Rene Mueller, Ippokratis Pandis, Vijayshankar Raman
Data shuffling in a non-uniform memory access device

Patent number: 10521128

Abstract: A method of orchestrated shuffling of data in a non-uniform memory access device that includes a plurality of processing nodes connected by interconnects. The method includes running an application on a plurality of threads executing on the plurality of processing nodes. Data to be shuffled is identified from source threads running on source processing nodes among the processing nodes to target threads executing on target processing nodes among the processing nodes. The method further includes generating a plan for orchestrating the shuffling of the data among the all of the memory devices associated with the threads and for simultaneously transmitting data over different interconnects to a plurality of different target processing nodes from a plurality of different source processing nodes. The data is shuffled among all of the memory devices based on the plan. Shifting the data-shifting table includes rotating a first ring with respect to a second ring.

Type: Grant

Filed: December 1, 2017

Date of Patent: December 31, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yinan Li, Guy M. Lohman, Rene Mueller, Ippokratis Pandis, Vijayshankar Raman
Vector processing using loops of dynamic vector length

Patent number: 10430192

Abstract: Data processing apparatus comprises processing circuitry to selectively apply vector processing operations to one or more data items of a data vector comprising a plurality of data items at respective positions in the data vector, according to the state of respective predicate flags associated with the positions; the processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; wherein the instruction decoder circuitry is responsive to a WHILE instruction and a CHANGE instruction, to control the instruction processing dependent upon a number of the predicate flags.

Type: Grant

Filed: July 28, 2016

Date of Patent: October 1, 2019

Assignee: ARM Limited

Inventors: Nigel John Stephens, Grigorios Magklis, Alejandro Martinez Vicente, Nathanael Premillieu, Mbou Eyole
Multiple-patch SIMD dispatch mode for domain shaders

Patent number: 10430229

Abstract: To use SIMD lanes efficiently for domain shader execution, domain point data from different domain shader patches may be packed together into a single SIMD thread. To generate an efficient code sequence, each domain point occupies one SIMD lane and all attributes for the domain point reside in their own partition of General Register File (GRF) space. This technique is called the multiple-patch SIMD dispatch mode.

Type: Grant

Filed: December 21, 2015

Date of Patent: October 1, 2019

Assignee: Intel Corporation

Inventors: Jayashree Venkatesh, Guei-Yuan Lueh, Subramaniam Maiyuran
Data processing apparatus, method for processing data, and medium

Patent number: 10411881

Abstract: A data processing apparatus for rearranging multiple items of data to be input, includes a processor; a memory; and an input unit configured to receive as input a rearrangement number with which a rearrangement pattern of the data can be identified. The processor executes calculating a rearrangement destination for each of the items of the data based on the rearrangement number; and rearranging the data based on the rearrangement destinations.

Type: Grant

Filed: April 24, 2017

Date of Patent: September 10, 2019

Assignee: FUJI ELECTRIC CO., LTD.

Inventor: Kenji Takatsukasa
Selecting processing based on expected value of selected character

Patent number: 10372447

Abstract: An instruction defined to be a looping instruction is obtained and processed. A determination is made as to whether an obtained selected character is an expected selected character. Based on the obtained selected character being the expected selected character, an execution process is used that includes a sequence of operations to perform an operation, the sequence of operations replacing a loop and providing a non-looping sequence to perform the operation on up to a defined number of units of data. The sequence of operations is configured to repeat one or more times and to terminate based on the obtained selected character. Based on the obtained selected character being different than the expected selected character, an alternate execution process is chosen.

Type: Grant

Filed: December 7, 2018

Date of Patent: August 6, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Michael K. Gschwind
Selecting processing based on expected value of selected character

Patent number: 10372448

Abstract: An instruction defined to be a looping instruction is obtained and processed. A determination is made as to whether an obtained selected character is an expected selected character. Based on the obtained selected character being the expected selected character, an execution process is used that includes a sequence of operations to perform an operation, the sequence of operations replacing a loop and providing a non-looping sequence to perform the operation on up to a defined number of units of data. The sequence of operations is configured to repeat one or more times and to terminate based on the obtained selected character. Based on the obtained selected character being different than the expected selected character, an alternate execution process is chosen.

Type: Grant

Filed: December 7, 2018

Date of Patent: August 6, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Michael K. Gschwind
Processors and methods for managing cache tiering with gather-scatter vector semantics

Patent number: 10268580

Abstract: Processors and methods implementing a machine instruction to perform cache line demotion on multiple cache lines to enable efficient sharing of cache lines between processor cores. One general aspect includes a processor comprising: a plurality of hardware processor cores, where each of the hardware processor cores to include a first cache. The processor also includes a second cache, communicatively coupled to and shared by the plurality of hardware processor cores. The processor to support a first machine instruction, the first machine instruction to include a vector register operand identifying a vector register which contains a plurality of data elements each used to identify a cache line. An execution of the first machine instruction by one of the plurality of hardware processor cores to cause a plurality of identified cache lines to be demoted, such that the demoted cache lines are moved from the first cache to the second cache.

Type: Grant

Filed: September 30, 2016

Date of Patent: April 23, 2019

Assignee: Intel Corporation

Inventors: Kshitij A. Doshi, Namakkal N. Venkatesan, Ren Wang, Andrew J. Herdrich
Multi-register gather instruction

Patent number: 10180838

Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.

Type: Grant

Filed: September 19, 2017

Date of Patent: January 15, 2019

Assignee: Intel Corporation

Inventor: Ashish Jha
Method and apparatus for shuffling data

Patent number: 10152323

Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.

Type: Grant

Filed: October 21, 2016

Date of Patent: December 11, 2018

Assignee: Intel Corporation

Inventors: Patrice L. Roussel, William W. Macy, Jr., Huy V. Nguyen, Eric L. Debes
Multiband eMBMS enhancement using carrier aggregation

Patent number: 10111049

Abstract: A method, an apparatus, and a computer program product for wireless communication are provided. The apparatus may be a UE. The UE receives at least one of a unicast or a broadcast/multicast communication on a first frequency from a first cell of a serving eNB through a first receive chain. In addition, the UE receives at least one of broadcast/multicast signal, synchronization signal, or reference signal communication on a second frequency from a second cell of the serving eNB through a second receive chain without having received instruction from the serving eNB to receive the at least one of the broadcast/multicast signal, the synchronization signal, or the reference signal communication.

Type: Grant

Filed: July 8, 2013

Date of Patent: October 23, 2018

Assignee: QUALCOMM Incorporated

Inventors: Jack Shyh-Hurng Shauh, Srinivasan Balasubramanian, Kuo-Chun Lee, Sivaramakrishna Veerepalli, Jun Wang
Multi-register scatter instruction

Patent number: 10055225

Abstract: A processor fetches a multi-register scatter instruction that includes a source operand and a destination operand. The source operand specifies a source vector register that includes multiple source data elements. The destination operand identifies multiple destination data elements that each specify a destination vector register and an index into that destination vector register. The instruction is decoded and executed, causing, for each of those identified destination data elements, the one of the source data elements that is in a position in the source vector register that corresponds with a position of that destination data element to be stored in the destination vector register at the index specified by that destination data element.

Type: Grant

Filed: December 23, 2011

Date of Patent: August 21, 2018

Assignee: Intel Corporation

Inventor: Ashish Jha
Vector store/load instructions for array of structures

Patent number: 10019262

Abstract: A processor comprises a plurality of vector registers, and an execution unit, operatively coupled to the plurality of vector registers, the execution unit comprising a logic circuit implementing a load instruction for loading, into two or more vector registers, two or more data items associated with a data structure stored in a memory, wherein each one of the two or more vector registers is to store a data item associated with a certain position number within the data structure.

Type: Grant

Filed: December 22, 2015

Date of Patent: July 10, 2018

Assignee: Intel Corporation

Inventors: Ashish Jha, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark J. Charney, Milind B. Girkar
Video decoder memory bandwidth compression

Patent number: 9894371

Abstract: A method, system and computer program for decompressing video data, storing the compressed video data in such a way that random access is possible and the data can be mapped efficiently into existing memory systems and interface protocols. The compression is accomplished via lossless compression using an algorithm optimized for video data. Due to the compressed format, data transmission consumes less bandwidth than using uncompressed data and prevents degradation in the decoded video.

Type: Grant

Filed: October 19, 2016

Date of Patent: February 13, 2018

Assignee: Entropic Communications, LLC

Inventors: Matthew Damian Bates, Mark Alan Baur
Method for secure symbol comparison

Patent number: 9893880

Abstract: A method for secure comparison of encrypted symbols. According to one embodiment, a user may encrypt two symbols, share the encrypted symbols with an untrusted third party that can compute algorithms on these symbols without access the original data or encryption keys such that the result of running the algorithm on the encrypted data can be decrypted to a result which is equivalent to the result of running the algorithm on the original unencrypted data. In one embodiment the untrusted third party may perform a sequence of operations on the encrypted symbols to produce an encrypted result which, when decrypted by a trusted party, indicates whether the two symbols are the same.

Type: Grant

Filed: November 15, 2013

Date of Patent: February 13, 2018

Assignee: RAYTHEON BBN TECHNOLOGIES CORP.

Inventors: Kurt Rohloff, David Bruce Cousins, Richard Schantz
Systems and methods for storing and transferring message data

Patent number: 9843551

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing and transferring messages. An example method includes providing a queue having an ordered plurality of storage blocks. Each storage block stores one or more respective messages and is associated with a respective time. The times increase from a block designating a head of the queue to a block designating a tail of the queue. The method also includes reading, by each of a plurality of first sender processes, messages from one or more blocks in the queue beginning at the head of the queue. The read messages are sent, by each of the plurality of first sender processes, to a respective recipient. One or more of the blocks are designated as old when they have associated times that are earlier than a first time. A block is designated as a new head of the queue when the block is associated with a time later than or equal to the first time.

Type: Grant

Filed: October 12, 2016

Date of Patent: December 12, 2017

Assignee: Machine Zone, Inc.

Inventor: Igor Milyakov
Apparatus and method for propagating conditionally evaluated values in SIMD/vector execution using an input mask register

Patent number: 9798541

Abstract: An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.

Type: Grant

Filed: December 23, 2011

Date of Patent: October 24, 2017

Assignee: INTEL CORPORATION

Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
Super multiply add (super MADD) instructions with three scalar terms

Patent number: 9792115

Abstract: A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.

Type: Grant

Filed: December 23, 2011

Date of Patent: October 17, 2017

Assignee: Intel Corporation

Inventors: Jesus Corbal, Andrew T. Forsyth, Thomas D. Fletcher, Lisa K. Wu, Eric Sprangle
Multi-register gather instruction

Patent number: 9766887

Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.

Type: Grant

Filed: December 23, 2011

Date of Patent: September 19, 2017

Assignee: Intel Corporation

Inventor: Ashish Jha
Parallel processing in plural processors with result register each performing associative operation on respective column data

Patent number: 9760372

Abstract: A method for combining data values through associative operations. The method includes, with a processor, arranging any number of data values into a plurality of columns according to natural parallelism of the associative operations and reading each column to a register of an individual processor. The processors are directed to combine the data values in the columns in parallel using a first associative operation. The results of the first associative operation for each column are stored in a register of each processor.

Type: Grant

Filed: September 1, 2011

Date of Patent: September 12, 2017

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Ren Wu, Bin Zhang, Meichun Hsu, Qiming Chen
Systems, apparatuses, and methods for performing conflict detection and broadcasting contents of a register to data element positions of another register

Patent number: 9665368

Abstract: Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.

Type: Grant

Filed: September 28, 2012

Date of Patent: May 30, 2017

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Elmoustapha Ould-Ahmed_Vall, Bret L. Toll, Robert Valentine
Floating point execution unit for calculating packed sum of absolute differences

Patent number: 9594556

Abstract: A circuit arrangement and program product provide support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.

Type: Grant

Filed: March 18, 2016

Date of Patent: March 14, 2017

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Floating point execution unit for calculating packed sum of absolute differences

Patent number: 9594557

Abstract: A method provides support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.

Type: Grant

Filed: March 18, 2016

Date of Patent: March 14, 2017

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Vector register file

Patent number: 9594724

Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.

Type: Grant

Filed: August 9, 2012

Date of Patent: March 14, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
Vector register file

Patent number: 9582466

Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.

Type: Grant

Filed: August 13, 2012

Date of Patent: February 28, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
Apparatus for mutual-transposition of scalar and vector data sets and related method

Patent number: 9507601

Abstract: An apparatus for processing a plurality of data sets is disclosed, wherein one data set of the plurality of data sets includes N components and has a data type of one of a scalar type and a vector type, wherein N is a positive integer number. The apparatus includes a memory module and a data accessing module. The memory module comprises N memory units configured to store the plurality of data sets. The data accessing module is configured to write the data set into the memory module according to a write data index corresponding to the data set and one of a first writing mapping information and a second writing mapping information, wherein the first writing mapping information is employed when the data type is one of the scalar and the vector type and the second writing mapping information is employed when the data type is the other of the scalar and the vector type.

Type: Grant

Filed: February 19, 2014

Date of Patent: November 29, 2016

Assignee: MEDIATEK INC.

Inventors: Pei-Kuei Tsung, Mu-Fan Murphy Chang, Po-Chun Fan
Compiler optimization for many integrated core processors

Patent number: 9471289

Abstract: Systems and methods for source-to-source transformation for compiler optimization for many integrated core (MIC) coprocessors, including identifying data dependencies in candidate loops and data elements used in each iteration for arrays, profiling candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time, and creating an outer loop outside the candidate loop, with each iteration of the outer loop executing m iterations of the candidate loop. Data streaming is performed by determining optimum buffer size for one or more arrays and inserting code before the outer loop to create optimum sized buffers, overlapping data transfer between central processing units (CPUs) and MICs with the computation; reusing buffers to reduce memory employed on the MICs, and reusing threads on MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.

Type: Grant

Filed: March 25, 2015

Date of Patent: October 18, 2016

Assignee: NEC Corporation

Inventors: Min Feng, Srimat Chakradhar, Linhai Song
Moving average processing in processor and processor

Patent number: 9436465

Abstract: A processor, which executes m number of arithmetic operations in parallel, executes a partial sum instruction which takes an i-th to (i+m?1)-th elements of an input data series as input elements, so as to obtain first vector data, executes the partial sum instruction which takes a (i+x)-th to (i+x+m?1)-th elements of the input data series as the input elements, so as to obtain second vector data, and performs operations to subtract the p-th element of the first vector data and add the p-th element of the second vector data from and to a sum of the i-th to (i+x?1)-th elements of the input data series in parallel for each of the 0-th to (m?1)-th elements, so as to calculate sums of elements for m sections different from each other in parallel, and moving average processing to calculate a moving average from the sums of elements of the sections.

Type: Grant

Filed: April 7, 2014

Date of Patent: September 6, 2016

Assignee: FUJITSU LIMITED

Inventors: Makiko Ito, Manabu Kubota, Kazuhiro Nomoto
Techniques for enabling bit-parallel wide string matching with a SIMD register

Patent number: 9424031

Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.

Type: Grant

Filed: March 13, 2013

Date of Patent: August 23, 2016

Assignee: INTEL CORPORATION

Inventors: Hariharan Thantry, Mani Azimi
Element selection unit and a method therein

Patent number: 9350584

Abstract: An element selection unit (200) and a method therein for vector element selection. The element selection unit comprises a selector control circuit (404) and a selector data path circuit (406), which data path circuit comprises a plurality of layers of multiplexers. The element selection unit further comprises a receiving circuit (401) configured to receive an instruction to perform a selection of elements from an input vector. The selector control circuit (404) is configured to generate a multiplexer control signal for each multiplexer based on a bit map and on a plurality of relative offset values. The data path circuit is configured to propagate the elements comprised in the input vector through the plurality of layers of multiplexers towards an output vector based on the generated multiplexer control signals. The data path circuit is further configured to write the propagated elements to enabled elements of the output vector.

Type: Grant

Filed: June 10, 2013

Date of Patent: May 24, 2016

Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Inventor: David Van Kampen
Apparatus and method for vectorization with speculation support

Patent number: 9268626

Abstract: An apparatus and method are described for detecting and responding to fault conditions in a processor. For example, one embodiment of a method comprises: reading each active element in succession from a first vector register, each active element specifying an address for a gather or load operation; detecting one or more fault conditions associated with one or more of the active elements; for each active element read in succession prior to a detected fault condition on an element other than the first active element, storing the data loaded from an address associated with the active element in a first output vector register; and for each active element associated with the detected fault condition and following the detected fault condition, setting a bit in an output mask register to indicate the detected fault condition.

Type: Grant

Filed: December 23, 2011

Date of Patent: February 23, 2016

Assignee: INTEL CORPORATION

Inventors: Jayashankar Bharadwaj, Victor W. Lee, Kim Daehyun, Nalini Vasudevan, Tin-Fook Ngai, Albert Hartono, Sara S. Baghsorkhi

1 2 3 4 5 … next