Patents by Inventor Vinodh Gopal

Vinodh Gopal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Instructions and logic to provide SIMD SM4 cryptographic block cipher functionality

Patent number: 11303438

Abstract: Instructions and logic provide for a Single Instruction Multiple Data (SIMD) SM4 round slice operation. Embodiments of an instruction specify a first and a second source data operand set, and substitution function indicators, e.g. in an immediate operand. Embodiments of a processor may include encryption units, responsive to the first instruction, to: perform a slice of SM4-round exchanges on a portion of the first source data operand set with a corresponding keys from the second source data operand set in response to a substitution function indicator that indicates a first substitution function, perform a slice of SM4 key generations using another portion of the first source data operand set with corresponding constants from the second source data operand set in response to a substitution function indicator that indicates a second substitution function, and store a set of result elements of the first instruction in a SIMD destination register.

Type: Grant

Filed: July 14, 2020

Date of Patent: April 12, 2022

Assignee: Intel Corporation

Inventors: Sean M. Gulley, Gilbert M. Wolrich, Vinodh Gopal, Kirk S. Yap, Wajdi K. Feghali
ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION EITHER WITHOUT WRITING OR READING FLAGS

Publication number: 20220107806

Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.

Type: Application

Filed: August 30, 2021

Publication date: April 7, 2022

Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
METHODS AND APPARATUS TO HASH DATA

Publication number: 20220103345

Abstract: Methods, apparatus, and software for hashing data. The methods and apparatus employ novel improvements to hash algorithms, such as a SHA-2 hash algorithm to reduce computations and increase performance. In one aspect, calculation of SHA-2 message scheduling and SHA compression operations are separated under which an SHA-2 message schedule is applied to multiple rounds of SHA compression operations over multiple chunks of data for the data item being hashed. In another aspect, the SHA-2 message schedule is implemented such that message schedules for multiple message words or data blocks are performed in parallel. The approaches may be employed to reduce hash calculations for various purposes, including generating Filecoin nodes.

Type: Application

Filed: December 9, 2021

Publication date: March 31, 2022

Inventors: Tomasz KANTECKI, Wei LI, Wajdi FEGHALI, James GUILFORD, Vinodh GOPAL
APPARATUS AND METHOD FOR LOW-LATENCY DECOMPRESSION ACCELERATION VIA A SINGLE JOB DESCRIPTOR

Publication number: 20220100526

Abstract: Apparatus and method for performing low-latency multi-job submission via a single job descriptor is described herein. An apparatus embodiment includes a plurality of descriptor queues to stores job descriptors describing work to be performed and enqueue circuitry to receive a first job descriptor which includes a first field to store a Single Instruction Multiple Data (SIMD) width. If the SIMD width indicates that the first job descriptor is an SIMD job descriptor and open slots are available in the descriptor queues to store new job descriptors, then the enqueue circuitry is to generate a plurality of job descriptors based on fields of the first job descriptor and to store them in the open slots of the descriptor queues. The generated job descriptors are processed by processing pipelines to perform the work described. At least some of the generated job descriptors are processed concurrently or in parallel by different processing pipelines.

Type: Application

Filed: September 26, 2020

Publication date: March 31, 2022

Inventors: James Guilford, George Powley, Vinodh Gopal, Wajdi Feghali
SM4 NEW INSTRUCTIONS

Publication number: 20220100517

Abstract: Disclosed embodiments relate to systems and methods to performing instructions structured to compute a plurality of cryptic rounds of the block cipher. In one example, a processor includes fetch and decode circuitry to fetch and decode a single instruction comprising a first field to identify a destination of a first operand, a second field to identify a source of a second operand comprising an input state, a third field to identify a source of a third operand comprising a round key. The processor includes execution circuitry to execute the decoded instruction to compute a plurality of cryptic rounds of the block cipher by performing a round function on data elements of the second operand and the third operand to generate a word.

Type: Application

Filed: September 26, 2020

Publication date: March 31, 2022

Inventors: Ilya Albrekht, Wajdi Feghali, Regev Shemy, Or Beit Aharon, Mrinmay Dutta, Vinodh Gopal, Vikram B. Suresh
EFFICIENT ACCELERATOR OFFLOAD IN MULTI-ACCELERATOR FRAMEWORK

Publication number: 20220075655

Abstract: Methods, apparatus, and software for efficient accelerator offload in multi-accelerator frameworks. One multi-accelerator framework employs a compute platform including a plurality of processor cores and a plurality of accelerator devices. An application is executed on a first core and a portion of the application workload is offloaded to a first accelerator device. In connection with moving execution of the application to a second core, a second accelerator devices to be used for the offloaded workload is selected based on core-to-accelerator cost information for the second core. This core-to-accelerator cost information includes core-accelerator cost information for combinations of core-accelerator pairs, which are based, at least on part, on latencies projected for interconnect paths between cores and accelerators. Both single-socket and multi-socket platform are supported.

Type: Application

Filed: November 17, 2021

Publication date: March 10, 2022

Inventors: Akhilesh S. THYAGATURU, Mohit Kumar GARG, Vinodh GOPAL
Methods and apparatus to parallelize data decompression

Patent number: 11258459

Abstract: Methods and apparatus to parallelize data decompression are disclosed. An example method selecting initial starting positions in a compressed data bitstream; adjusting a first one of the initial starting positions to determine a first adjusted starting position by decoding the bitstream starting at a training position in the bitstream, the decoding including traversing the bitstream from the training position as though first data located at the training position is a valid token; outputting first decoded data generated by decoding a first segment of the bitstream starting from the first adjusted starting position; and merging the first decoded data with second decoded data generated by decoding a second segment of the bitstream, the decoding of the second segment starting from a second position in the bitstream and being performed in parallel with the decoding of the first segment, and the second segment preceding the first segment in the bitstream.

Type: Grant

Filed: August 18, 2020

Date of Patent: February 22, 2022

Assignee: INTEL CORPORATION

Inventors: Vinodh Gopal, James D. Guilford, Sudhir K. Satpathy, Sanu K. Mathew
Systems and methods for performing matrix compress and decompress instructions

Patent number: 11249761

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Grant

Filed: July 20, 2020

Date of Patent: February 15, 2022

Assignee: Intel Corporation

Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
Supporting random access of compressed data

Patent number: 11243836

Abstract: A processing device comprising compression circuitry to: determine a compression configuration to compress source data; generate a checksum of the source data in an uncompressed state; compress the source data into at least one block based on the compression configuration, wherein the at least one block comprises: a plurality of sub-blocks, wherein the plurality of sub-block includes a predetermined size; a block header corresponding to the plurality of sub-blocks; and decompression circuitry coupled to the compression circuitry, wherein the decompression circuitry to: while not outputting a decompressed data stream of the source data: generate index information corresponding to the plurality of sub-blocks; in response to generating the index information, generate a checksum of the compressed source data associated with the plurality of sub-blocks; and determine whether the checksum of the source data in the uncompressed format matches the checksum of the compressed source data.

Type: Grant

Filed: June 22, 2020

Date of Patent: February 8, 2022

Assignee: Intel Corporation

Inventors: Vinodh Gopal, James Guilford, Daniel Cutter, Kirk Yap
ADDITION INSTRUCTIONS WITH INDEPENDENT CARRY CHAINS

Publication number: 20220027154

Abstract: A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.

Type: Application

Filed: October 7, 2021

Publication date: January 27, 2022

Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Matthew C. Merten, Tong Li, Bret T. Toll, I
METHODS AND APPARATUS FOR HIGH THROUGHPUT COMPRESSION OF NEURAL NETWORK WEIGHTS

Publication number: 20220012563

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for high throughput compression of neural network weights. An example apparatus includes at least one memory, instructions in the apparatus and processor circuitry to execute the instructions to determine sizes of data lanes in a partition of neural network weights, determine a slice size based on a size difference between a first data lane and a second data lane of the data lanes in the partition, the first data lane including first data, the second data lane including second data, the second data of a smaller size than the first data, cut a portion of the first data from the first data lane based on the slice size, and append the portion of the first data to the second data lane.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Alejandro Castro Gonzalez, Praveen Nair, Somnath Paul, Sudheendra Kadri, Palanivel Guruvareddiar, Aaron Gubrud, Vinodh Gopal
Atomic-copy-XOR instruction for replacing data in a first cacheline with data from a second cacheline

Patent number: 11200054

Abstract: Apparatus and associated methods for implementing atomic instructions for copy-XOR of data. An atomic-copy-xor instruction is defined having a first operand comprising an address of a first cacheline and a second operand comprising an address of a second cacheline. The atomic-copy-xor instruction, which may be included in an instruction set architecture (ISA) of a processor, performs a bitwise XOR operation on copies of data retrieved from the first cacheline and second cacheline to generate an XOR result, and replaces the data in the first cacheline with a copy of data from the second cacheline when the XOR result is non-zero.

Type: Grant

Filed: June 26, 2018

Date of Patent: December 14, 2021

Assignee: Intel Corporation

Inventor: Vinodh Gopal
System, Apparatus And Method For Adaptive Peer-To-Peer Communication With Edge Platform

Publication number: 20210377356

Abstract: In one embodiment, a method includes: receiving, in an edge platform, a plurality of messages from a plurality of edge devices coupled to the edge platform, the plurality of messages comprising metadata including priority information and granularity information; extracting at least the priority information from the plurality of messages; storing the plurality of messages in entries of a pending request queue according to the priority information; selecting a first message stored in the pending request queue for delivery to a destination circuit; and sending a message header for the first message to the destination circuit via at least one interface circuit, the message header including the priority information, and thereafter sending a plurality of packets including payload information of the first message to the destination circuit via the at least one interface circuit. Other embodiments are described and claimed.

Type: Application

Filed: May 29, 2020

Publication date: December 2, 2021

Inventors: FRANCESC GUIM BERNAT, KSHITIJ ARUN DOSHI, KENNETH SHOEMAKER, VINODH GOPAL, NED M. SMITH
Apparatuses, methods, and systems for hashing instructions

Patent number: 11188335

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.

Type: Grant

Filed: November 2, 2020

Date of Patent: November 30, 2021

Assignee: Intel Corporation

Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
ADDITION INSTRUCTIONS WITH INDEPENDENT CARRY CHAINS

Publication number: 20210365264

Abstract: A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.

Type: Application

Filed: August 3, 2021

Publication date: November 25, 2021

Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Matthew C. Merten, Tong Li, Bret T. Toll, I
APPARATUS AND METHOD FOR TWO-STAGE LOSSLESS DATA COMPRESSION, AND TWO-STAGE LOSSLESS DATA DECOMPRESSION

Publication number: 20210351790

Abstract: A lossless data compressor of an aspect includes a first lossless data compressor circuitry coupled to receive input data. The first lossless data compressor circuitry is to apply a first lossless data compression approach to compress the input data to generate intermediate compressed data. The apparatus also includes a second lossless data compressor circuitry coupled with the first lossless data compressor circuitry to receive the intermediate compressed data. The second lossless data compressor circuitry is to apply a second lossless data compression approach to compress at least some of the intermediate compressed data to generate compressed data. The second lossless data compression approach different than the first lossless data compression approach. Lossless data decompressors are also disclosed, as are methods of lossless data compression and decompression.

Type: Application

Filed: May 11, 2020

Publication date: November 11, 2021

Inventors: James GUILFORD, Vinodh GOPAL, Dan CUTTER, Kirk YAP, Wajdi FEGHALI, George POWLEY
EFFICIENT ENCRYPTION IN VPN SESSIONS

Publication number: 20210314359

Abstract: Methods, apparatus, and software for efficient encryption in virtual private network (VPN) sessions. A VPN link and an auxiliary link (and associated sessions) are established between computing platforms to support end-to-end communication between respective application running on the platforms. The VPN link may employ a conventional VPN protocol such as TLS or IPsec, while the auxiliary link comprises a NULL encryption VPN tunnel. To transfer data, a determination is made to whether the data are encrypted or non-encrypted. Encrypted data are transferred over the auxiliary link to avoid re-encryption of the data. Non-encrypted are transferred over the VPN link. TLS and IPsec VPN agents may be used to assist in setting up the VPN and auxiliary sessions. The techniques avoid double encryption of VPN traffic, while ensuring that various types of traffic transferred between platforms is encrypted.

Type: Application

Filed: June 16, 2021

Publication date: October 7, 2021

Inventors: Akhilesh S. THYAGATURU, Vinodh GOPAL
Method and apparatus for energy efficient decompression using ordered tokens

Patent number: 11126663

Abstract: In one embodiment, an apparatus comprises a decompression engine to determine a plurality of tokens used to encode a block of data; populate a lookup table with at least two of the tokens in order of increasing token length; disable a first portion of the lookup table and enable a second portion of the lookup table based on a value of a payload of the block of data; and search for a match between a token and the payload in the second portion of the lookup table.

Type: Grant

Filed: May 25, 2017

Date of Patent: September 21, 2021

Assignee: Intel Corporation

Inventors: Sudhir K. Satpathy, Vikram B. Suresh, Sanu K. Mathew, Vinodh Gopal
System, apparatus and method for dynamic priority-aware compression for interconnect fabrics

Patent number: 11108406

Abstract: In one embodiment, an apparatus includes: a compression circuit to compress data blocks of one or more traffic classes; and a control circuit coupled to the compression circuit, where the control circuit is to enable the compression circuit to concurrently compress data blocks of a first traffic class and not to compress data blocks of a second traffic class. Other embodiments are described and claimed.

Type: Grant

Filed: June 19, 2019

Date of Patent: August 31, 2021

Assignee: Intel Corporation

Inventors: Simon N. Peffers, Vinodh Gopal, Kirk Yap
Rotate instructions that complete execution either without writing or reading flags

Patent number: 11106461

Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.

Type: Grant

Filed: March 29, 2018

Date of Patent: August 31, 2021

Assignee: Intel Corporation

Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy

prev 1 2 3 4 5 6 7 8 … next