Patents by Inventor Vinodh Gopal

Vinodh Gopal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11303438
    Abstract: Instructions and logic provide for a Single Instruction Multiple Data (SIMD) SM4 round slice operation. Embodiments of an instruction specify a first and a second source data operand set, and substitution function indicators, e.g. in an immediate operand. Embodiments of a processor may include encryption units, responsive to the first instruction, to: perform a slice of SM4-round exchanges on a portion of the first source data operand set with a corresponding keys from the second source data operand set in response to a substitution function indicator that indicates a first substitution function, perform a slice of SM4 key generations using another portion of the first source data operand set with corresponding constants from the second source data operand set in response to a substitution function indicator that indicates a second substitution function, and store a set of result elements of the first instruction in a SIMD destination register.
    Type: Grant
    Filed: July 14, 2020
    Date of Patent: April 12, 2022
    Assignee: Intel Corporation
    Inventors: Sean M. Gulley, Gilbert M. Wolrich, Vinodh Gopal, Kirk S. Yap, Wajdi K. Feghali
  • Publication number: 20220107806
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Application
    Filed: August 30, 2021
    Publication date: April 7, 2022
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
  • Publication number: 20220103345
    Abstract: Methods, apparatus, and software for hashing data. The methods and apparatus employ novel improvements to hash algorithms, such as a SHA-2 hash algorithm to reduce computations and increase performance. In one aspect, calculation of SHA-2 message scheduling and SHA compression operations are separated under which an SHA-2 message schedule is applied to multiple rounds of SHA compression operations over multiple chunks of data for the data item being hashed. In another aspect, the SHA-2 message schedule is implemented such that message schedules for multiple message words or data blocks are performed in parallel. The approaches may be employed to reduce hash calculations for various purposes, including generating Filecoin nodes.
    Type: Application
    Filed: December 9, 2021
    Publication date: March 31, 2022
    Inventors: Tomasz KANTECKI, Wei LI, Wajdi FEGHALI, James GUILFORD, Vinodh GOPAL
  • Publication number: 20220100526
    Abstract: Apparatus and method for performing low-latency multi-job submission via a single job descriptor is described herein. An apparatus embodiment includes a plurality of descriptor queues to stores job descriptors describing work to be performed and enqueue circuitry to receive a first job descriptor which includes a first field to store a Single Instruction Multiple Data (SIMD) width. If the SIMD width indicates that the first job descriptor is an SIMD job descriptor and open slots are available in the descriptor queues to store new job descriptors, then the enqueue circuitry is to generate a plurality of job descriptors based on fields of the first job descriptor and to store them in the open slots of the descriptor queues. The generated job descriptors are processed by processing pipelines to perform the work described. At least some of the generated job descriptors are processed concurrently or in parallel by different processing pipelines.
    Type: Application
    Filed: September 26, 2020
    Publication date: March 31, 2022
    Inventors: James Guilford, George Powley, Vinodh Gopal, Wajdi Feghali
  • Publication number: 20220100517
    Abstract: Disclosed embodiments relate to systems and methods to performing instructions structured to compute a plurality of cryptic rounds of the block cipher. In one example, a processor includes fetch and decode circuitry to fetch and decode a single instruction comprising a first field to identify a destination of a first operand, a second field to identify a source of a second operand comprising an input state, a third field to identify a source of a third operand comprising a round key. The processor includes execution circuitry to execute the decoded instruction to compute a plurality of cryptic rounds of the block cipher by performing a round function on data elements of the second operand and the third operand to generate a word.
    Type: Application
    Filed: September 26, 2020
    Publication date: March 31, 2022
    Inventors: Ilya Albrekht, Wajdi Feghali, Regev Shemy, Or Beit Aharon, Mrinmay Dutta, Vinodh Gopal, Vikram B. Suresh
  • Publication number: 20220075655
    Abstract: Methods, apparatus, and software for efficient accelerator offload in multi-accelerator frameworks. One multi-accelerator framework employs a compute platform including a plurality of processor cores and a plurality of accelerator devices. An application is executed on a first core and a portion of the application workload is offloaded to a first accelerator device. In connection with moving execution of the application to a second core, a second accelerator devices to be used for the offloaded workload is selected based on core-to-accelerator cost information for the second core. This core-to-accelerator cost information includes core-accelerator cost information for combinations of core-accelerator pairs, which are based, at least on part, on latencies projected for interconnect paths between cores and accelerators. Both single-socket and multi-socket platform are supported.
    Type: Application
    Filed: November 17, 2021
    Publication date: March 10, 2022
    Inventors: Akhilesh S. THYAGATURU, Mohit Kumar GARG, Vinodh GOPAL
  • Patent number: 11258459
    Abstract: Methods and apparatus to parallelize data decompression are disclosed. An example method selecting initial starting positions in a compressed data bitstream; adjusting a first one of the initial starting positions to determine a first adjusted starting position by decoding the bitstream starting at a training position in the bitstream, the decoding including traversing the bitstream from the training position as though first data located at the training position is a valid token; outputting first decoded data generated by decoding a first segment of the bitstream starting from the first adjusted starting position; and merging the first decoded data with second decoded data generated by decoding a second segment of the bitstream, the decoding of the second segment starting from a second position in the bitstream and being performed in parallel with the decoding of the first segment, and the second segment preceding the first segment in the bitstream.
    Type: Grant
    Filed: August 18, 2020
    Date of Patent: February 22, 2022
    Assignee: INTEL CORPORATION
    Inventors: Vinodh Gopal, James D. Guilford, Sudhir K. Satpathy, Sanu K. Mathew
  • Patent number: 11249761
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: July 20, 2020
    Date of Patent: February 15, 2022
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 11243836
    Abstract: A processing device comprising compression circuitry to: determine a compression configuration to compress source data; generate a checksum of the source data in an uncompressed state; compress the source data into at least one block based on the compression configuration, wherein the at least one block comprises: a plurality of sub-blocks, wherein the plurality of sub-block includes a predetermined size; a block header corresponding to the plurality of sub-blocks; and decompression circuitry coupled to the compression circuitry, wherein the decompression circuitry to: while not outputting a decompressed data stream of the source data: generate index information corresponding to the plurality of sub-blocks; in response to generating the index information, generate a checksum of the compressed source data associated with the plurality of sub-blocks; and determine whether the checksum of the source data in the uncompressed format matches the checksum of the compressed source data.
    Type: Grant
    Filed: June 22, 2020
    Date of Patent: February 8, 2022
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James Guilford, Daniel Cutter, Kirk Yap
  • Publication number: 20220027154
    Abstract: A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.
    Type: Application
    Filed: October 7, 2021
    Publication date: January 27, 2022
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Matthew C. Merten, Tong Li, Bret T. Toll, I
  • Publication number: 20220012563
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for high throughput compression of neural network weights. An example apparatus includes at least one memory, instructions in the apparatus and processor circuitry to execute the instructions to determine sizes of data lanes in a partition of neural network weights, determine a slice size based on a size difference between a first data lane and a second data lane of the data lanes in the partition, the first data lane including first data, the second data lane including second data, the second data of a smaller size than the first data, cut a portion of the first data from the first data lane based on the slice size, and append the portion of the first data to the second data lane.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Alejandro Castro Gonzalez, Praveen Nair, Somnath Paul, Sudheendra Kadri, Palanivel Guruvareddiar, Aaron Gubrud, Vinodh Gopal
  • Patent number: 11200054
    Abstract: Apparatus and associated methods for implementing atomic instructions for copy-XOR of data. An atomic-copy-xor instruction is defined having a first operand comprising an address of a first cacheline and a second operand comprising an address of a second cacheline. The atomic-copy-xor instruction, which may be included in an instruction set architecture (ISA) of a processor, performs a bitwise XOR operation on copies of data retrieved from the first cacheline and second cacheline to generate an XOR result, and replaces the data in the first cacheline with a copy of data from the second cacheline when the XOR result is non-zero.
    Type: Grant
    Filed: June 26, 2018
    Date of Patent: December 14, 2021
    Assignee: Intel Corporation
    Inventor: Vinodh Gopal
  • Publication number: 20210377356
    Abstract: In one embodiment, a method includes: receiving, in an edge platform, a plurality of messages from a plurality of edge devices coupled to the edge platform, the plurality of messages comprising metadata including priority information and granularity information; extracting at least the priority information from the plurality of messages; storing the plurality of messages in entries of a pending request queue according to the priority information; selecting a first message stored in the pending request queue for delivery to a destination circuit; and sending a message header for the first message to the destination circuit via at least one interface circuit, the message header including the priority information, and thereafter sending a plurality of packets including payload information of the first message to the destination circuit via the at least one interface circuit. Other embodiments are described and claimed.
    Type: Application
    Filed: May 29, 2020
    Publication date: December 2, 2021
    Inventors: FRANCESC GUIM BERNAT, KSHITIJ ARUN DOSHI, KENNETH SHOEMAKER, VINODH GOPAL, NED M. SMITH
  • Patent number: 11188335
    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.
    Type: Grant
    Filed: November 2, 2020
    Date of Patent: November 30, 2021
    Assignee: Intel Corporation
    Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
  • Publication number: 20210365264
    Abstract: A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.
    Type: Application
    Filed: August 3, 2021
    Publication date: November 25, 2021
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Matthew C. Merten, Tong Li, Bret T. Toll, I
  • Publication number: 20210351790
    Abstract: A lossless data compressor of an aspect includes a first lossless data compressor circuitry coupled to receive input data. The first lossless data compressor circuitry is to apply a first lossless data compression approach to compress the input data to generate intermediate compressed data. The apparatus also includes a second lossless data compressor circuitry coupled with the first lossless data compressor circuitry to receive the intermediate compressed data. The second lossless data compressor circuitry is to apply a second lossless data compression approach to compress at least some of the intermediate compressed data to generate compressed data. The second lossless data compression approach different than the first lossless data compression approach. Lossless data decompressors are also disclosed, as are methods of lossless data compression and decompression.
    Type: Application
    Filed: May 11, 2020
    Publication date: November 11, 2021
    Inventors: James GUILFORD, Vinodh GOPAL, Dan CUTTER, Kirk YAP, Wajdi FEGHALI, George POWLEY
  • Publication number: 20210314359
    Abstract: Methods, apparatus, and software for efficient encryption in virtual private network (VPN) sessions. A VPN link and an auxiliary link (and associated sessions) are established between computing platforms to support end-to-end communication between respective application running on the platforms. The VPN link may employ a conventional VPN protocol such as TLS or IPsec, while the auxiliary link comprises a NULL encryption VPN tunnel. To transfer data, a determination is made to whether the data are encrypted or non-encrypted. Encrypted data are transferred over the auxiliary link to avoid re-encryption of the data. Non-encrypted are transferred over the VPN link. TLS and IPsec VPN agents may be used to assist in setting up the VPN and auxiliary sessions. The techniques avoid double encryption of VPN traffic, while ensuring that various types of traffic transferred between platforms is encrypted.
    Type: Application
    Filed: June 16, 2021
    Publication date: October 7, 2021
    Inventors: Akhilesh S. THYAGATURU, Vinodh GOPAL
  • Patent number: 11126663
    Abstract: In one embodiment, an apparatus comprises a decompression engine to determine a plurality of tokens used to encode a block of data; populate a lookup table with at least two of the tokens in order of increasing token length; disable a first portion of the lookup table and enable a second portion of the lookup table based on a value of a payload of the block of data; and search for a match between a token and the payload in the second portion of the lookup table.
    Type: Grant
    Filed: May 25, 2017
    Date of Patent: September 21, 2021
    Assignee: Intel Corporation
    Inventors: Sudhir K. Satpathy, Vikram B. Suresh, Sanu K. Mathew, Vinodh Gopal
  • Patent number: 11108406
    Abstract: In one embodiment, an apparatus includes: a compression circuit to compress data blocks of one or more traffic classes; and a control circuit coupled to the compression circuit, where the control circuit is to enable the compression circuit to concurrently compress data blocks of a first traffic class and not to compress data blocks of a second traffic class. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 19, 2019
    Date of Patent: August 31, 2021
    Assignee: Intel Corporation
    Inventors: Simon N. Peffers, Vinodh Gopal, Kirk Yap
  • Patent number: 11106461
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: August 31, 2021
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy