Patents by Inventor Wajdi Feghali

Wajdi Feghali has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190310848
    Abstract: An apparatus and method are described for performing efficient Boolean operations in a pipelined processor which, in one embodiment, does not natively support three operand instructions. For example, in one embodiment, a processor comprises: a set of registers for storing packed operands; Boolean operation logic to execute a single instruction which uses three or more source operands packed in the set of registers, the Boolean operation logic to read at least three source operands and an immediate value to perform a Boolean operation on the three source operands, wherein the Boolean operation comprises: combining a bit read from each of the three operands to form an index to the immediate value, the index identifying a bit position within the immediate value; reading the bit from the identified bit position of the immediate value; and storing the bit from the identified bit position of the immediate value in a destination register.
    Type: Application
    Filed: June 25, 2019
    Publication date: October 10, 2019
    Inventors: Vinodh GOPAL, Wajdi FEGHALI, Gilbert WOLRICH, Kirk YAP
  • Publication number: 20190236022
    Abstract: Methods and apparatus for ultra-secure accelerators. New ISA enqueue (ENQ) instructions with a wrapping key (WK) are provided to facilitate secure access to on-chip and off-chip accelerators in computer platforms and systems. The ISA ENQ with WK instructions include a dest operand having an address of an accelerator portal and a scr operand having the address of a request descriptor in system memory defining a job to be performed by an accelerator and including a wrapped key. Execution of the instruction writes a record including the src and a WK to the portal, and the record is enqueued in an accelerator queue if a slot is available. The accelerator reads the enqueued request descriptor and uses the WK to unwrap the wrapped key, which is then used to decrypt encrypted data read from one or more buffers in memory. The accelerator then performs one or more functions on the decrypted data as defined by the job and writes the output of the processing back to memory with optional encryption.
    Type: Application
    Filed: April 7, 2019
    Publication date: August 1, 2019
    Inventors: Vinodh Gopal, Wajdi Feghali, Raghunandan Makaram
  • Patent number: 10270464
    Abstract: An apparatus and method for performing efficient lossless compression.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: April 23, 2019
    Assignee: Intel Corporation
    Inventors: James Guilford, Kirk Yap, Vinodh Gopal, Daniel Cutter, Wajdi Feghali
  • Publication number: 20190042481
    Abstract: Systems, methods, and circuitries are disclosed for a per-process memory encryption system. At least one translation lookaside buffer (TLB) is configured to encode key identifiers for keys in one or more bits of either the virtual memory address or the physical address. The process state memory configured to store a first process key table for a first process that maps key identifiers to unique keys and a second process key table that maps the key identifiers to different unique keys. The active process key table memory configured to store an active key table. In response to a request for data corresponding to a virtual memory address, the at least one TLB is configured to provide a key identifier for the data to the active process key table to cause the active process key table to return the unique key mapped to the key identifier.
    Type: Application
    Filed: September 28, 2018
    Publication date: February 7, 2019
    Inventors: Wajdi Feghali, Vinodh Gopal, Kirk Yap, Sean Gulley, Raghunandan Makaram
  • Publication number: 20190045031
    Abstract: Methods and apparatus for low-latency link compression schemes. Under the schemes, selected packets or messages are dynamically selected for compression in view of current transmit queue levels. The latency incurred during compression and decompression is not added to the data-path, but sits on the side of the transmit queue. The system monitors the queue depth and, accordingly, initiates compression jobs based on the depth. Different compression levels may be dynamically selected and used based on queue depth. Under various schemes, either packets or messages are enqueued in the transmit queue or pointers to such packets and messages are enqueued. Additionally, packets/message may be compressed prior to being enqueued, or after being enqueued, wherein an original uncompressed packet is replaced with a compressed packet. Compressed and uncompressed packets may be stored in queues or buffers and transmitted using a different numbers of transmit cycles based on their compression ratios.
    Type: Application
    Filed: June 21, 2018
    Publication date: February 7, 2019
    Inventors: Wajdi Feghali, Vinodh Gopal, Kirk Yap, Sean Gulley, Simon Peffers
  • Publication number: 20190042496
    Abstract: Apparatus, systems and methods for implementing delayed decompression schemes. As a burst of packets comprising compressed packets and uncompressed packets are received over an interconnect link, they are buffered in a receive buffer without decompression. Subsequently, the packets are forwarded from the receive buffer to a consumer such as processor core, with the compressed packets being decompressed prior to reaching the processor core. Under a first delayed decompression approach, packets are decompressed when they are read from the receive buffer in conjunction with forwarding the uncompressed packet (or uncompressed data contained therein) to the consumer. Under a second delayed decompression scheme, the packets are read from the receive buffer and forwarded to a decompressor using a first datapath width matching the width of the packets, decompressed, and then forwarded to the consumer using a second datapath width matching the width of the uncompressed data.
    Type: Application
    Filed: September 24, 2018
    Publication date: February 7, 2019
    Inventors: Simon N. Peffers, Kirk S. Yap, Sean Gulley, Vinodh Gopal, Wajdi Feghali
  • Patent number: 9503256
    Abstract: Embodiments of an invention for SMS4 acceleration hardware are disclosed. In an embodiment, an apparatus includes SMS4 hardware and key transformation hardware. The SMS4 hardware is to execute a round of encryption and a round of key expansion. The key transformation hardware is to transform a key to provide for the SMS4 hardware to execute a round of decryption.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: November 22, 2016
    Assignee: Intel Corporation
    Inventors: Kirk Yap, Gilbert Wolrich, Sudhir Satpathy, Sean Gulley, Vinodh Gopal, Sanu Mathew, Wajdi Feghali
  • Publication number: 20160191238
    Abstract: Embodiments of an invention for SMS4 acceleration hardware are disclosed. In an embodiment, an apparatus includes SMS4 hardware and key transformation hardware. The SMS4 hardware is to execute a round of encryption and a round of key expansion. The key transformation hardware is to transform a key to provide for the SMS4 hardware to execute a round of decryption.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: Kirk YAP, Gilbert Wolrich, Sudhir Satpathy, Sean Gulley, Vinodh Gopal, Sanu Mathew, Wajdi Feghali
  • Patent number: 9047082
    Abstract: A method and apparatus to perform Cyclic Redundancy Check (CRC) operations on a data block using a plurality of different n-bit polynomials is provided. A flexible CRC instruction performs a CRC operation using a programmable n-bit polynomial. The n-bit polynomial is provided to the CRC instruction by storing the n-bit polynomial in one of two operands.
    Type: Grant
    Filed: April 16, 2014
    Date of Patent: June 2, 2015
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, Shay Gueron, Gilbert Wolrich, Wajdi Feghali, Kirk Yap, Bradley Burres
  • Publication number: 20140229807
    Abstract: A method and apparatus to perform Cyclic Redundancy Check (CRC) operations on a data block using a plurality of different n-bit polynomials is provided. A flexible CRC instruction performs a CRC operation using a programmable n-bit polynomial. The n-bit polynomial is provided to the CRC instruction by storing the n-bit polynomial in one of two operands.
    Type: Application
    Filed: April 16, 2014
    Publication date: August 14, 2014
    Inventors: VINODH GOPAL, SHAY GUERON, GILBERT WOLRICH, WAJDI FEGHALI, KIRK YAP, BRADLEY BURRES
  • Patent number: 8732548
    Abstract: A method and apparatus to perform Cyclic Redundancy Check (CRC) operations on a data block using a plurality of different n-bit polynomials is provided. A flexible CRC instruction performs a CRC operation using a programmable n-bit polynomial. The n-bit polynomial is provided to the CRC instruction by storing the n-bit polynomial in one of two operands.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: May 20, 2014
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, Shay Gueron, Gilbert Wolrich, Wajdi Feghali, Kirk Yap, Bradley Burres
  • Publication number: 20140095845
    Abstract: An apparatus and method are described for performing efficient Boolean operations in a pipelined processor which, in one embodiment, does not natively support three operand instructions. For example, a processor according to one embodiment of the invention comprises: a set of registers for storing packed operands; Boolean operation logic to execute a single instruction which uses three or more source operands packed in the set of registers, the Boolean operation logic to read at least three source operands and an immediate value to perform a Boolean operation on the three source operands, wherein the Boolean operation comprises: combining a bit read from each of the three operands to form an index to the immediate value, the index identifying a bit position within the immediate value; reading the bit from the identified bit position of the immediate value; and storing the bit from the identified bit position of the immediate value in a destination register.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Vinodh Gopal, Wajdi Feghali, Gilbert Wolrich, Kirk Yap
  • Publication number: 20130191699
    Abstract: A method and apparatus to perform Cyclic Redundancy Check (CRC) operations on a data block using a plurality of different n-bit polynomials is provided. A flexible CRC instruction performs a CRC operation using a programmable n-bit polynomial. The n-bit polynomial is provided to the CRC instruction by storing the n-bit polynomial in one of two operands.
    Type: Application
    Filed: March 11, 2013
    Publication date: July 25, 2013
    Inventors: VINODH GOPAL, SHAY GUERON, GILBERT WOLRICH, WAJDI FEGHALI, KIRK YAP, BRADLEY BURRES
  • Patent number: 8417943
    Abstract: A method and apparatus is described for processing of network data packets by a network processor having cipher processing cores and authentication processing cores which operate on data within the network data packets, in order to provide a one-pass ciphering and authentication processing of the network data packets.
    Type: Grant
    Filed: October 11, 2011
    Date of Patent: April 9, 2013
    Assignee: Intel Corporation
    Inventors: Jaroslaw J. Sydir, Kamal J. Koshy, Wajdi Feghali, Bradley A. Burres, Gilbert M. Woolrich
  • Patent number: 8363828
    Abstract: An embodiment includes at least one processing unit to perform at least first and second sets of diffusion-related operations to produce a resulting block from a data block, and that includes at least one stage and at least one other stage. The at least one stage is to select one of first operands and second operands input to the at least one other stage. The first and second operands are respectively associated with the first and second sets of operations, respectively. The at least one other stage involves arithmetic and logical operations common to both the first and second sets of operations. At least one other processing unit is to perform at least one set of cryptographic-related operations (different, at least in part, from the first and second sets of operations) on at least one of (1) another block to produce the data block and (2) the resulting block.
    Type: Grant
    Filed: February 9, 2009
    Date of Patent: January 29, 2013
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, Kirk Yap, Gilbert Wolrich, Wajdi Feghali, Robert Ottavi, Sean Gulley
  • Patent number: 8312363
    Abstract: In one embodiment, circuitry is provided to generate a residue based at least in part upon operations and a data stream generated based at least in part upon a packet. The operations may include at least one iteration of at least one reduction operation including (a) multiplying a first value with at least one portion of the data stream, and (b) producing a reduction by adding at least one other portion of the data stream to a result of the multiplying. The operations may include at least one other reduction operation including (c) producing another result by multiplying with a second value at least one portion of another stream based at least in part upon the reduction, (d) producing a third value by adding at least one other portion of the another stream to the another result, and (e) producing the residue by performing a Barrett reduction based at least in part upon the third value.
    Type: Grant
    Filed: December 16, 2008
    Date of Patent: November 13, 2012
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, Erdinc Ozturk, Gilbert Wolrich, Wajdi Feghali
  • Patent number: 8189792
    Abstract: In one embodiment, the present invention includes a processor having logic to perform a round of a cryptographic algorithm responsive to first and second round micro-operations to perform the round on first and second pairs of columns, where the logic includes dual datapaths that are half the width of the cryptographic algorithm width (or smaller). Additional logic may be used to combine the results of the first and second round micro-operations to obtain a round result. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 28, 2007
    Date of Patent: May 29, 2012
    Assignee: Intel Corporation
    Inventors: Brent Boswell, Kirk Yap, Gilbert Wolrich, Wajdi Feghali, Vinodh Gopal, Srinivas Chennupaty, Makaram Raghunandan
  • Publication number: 20120079564
    Abstract: A method and apparatus is described for processing of network data packets by a network processor having cipher processing cores and authentication processing cores which operate on data within the network data packets, in order to provide a one-pass ciphering and authentication processing of the network data packets.
    Type: Application
    Filed: October 11, 2011
    Publication date: March 29, 2012
    Applicant: Intel Corporation
    Inventors: Jaroslaw J. Sydir, Kamal J. Koshy, Wajdi Feghali, Bradley A. Burres, Gilbert M. Wolrich
  • Publication number: 20120060159
    Abstract: A method and apparatus for scheduling the processing of commands by a plurality of cryptographic algorithm cores in a network processor.
    Type: Application
    Filed: November 10, 2011
    Publication date: March 8, 2012
    Applicant: Intel Corporation
    Inventors: Jaroslaw J. Sydir, Chen-Chi Kuo, Kamal J. Koshy, Wajdi Feghali, Bradley A. Burres, Gilbert M. Wolrich
  • Patent number: 8065678
    Abstract: A method and apparatus for scheduling the processing of commands by a plurality of cryptographic algorithm cores in a network processor.
    Type: Grant
    Filed: February 27, 2009
    Date of Patent: November 22, 2011
    Assignee: Intel Corporation
    Inventors: Jaroslaw J. Sydir, Chen-Chi Kuo, Kamal J. Koshy, Wajdi Feghali, Bradley A. Burres, Gilbert M. Wolrich