Patents by Inventor Ephrem Wu

Ephrem Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12273668
    Abstract: A non-blocking crossbar switch architecture circumvents the problem present in prior art crossbar switches where input signals may oversubscribe the available inter-die bandwidth. The new non-blocking crossbar switch architecture is split across a plurality of semiconductor dice, including a plurality of interleaved crossbar switch segments. Only one crossbar switch segment is implemented on each semiconductor die. A plurality of input ports and output ports are coupled to the crossbar switch. The crossbar switch is non-blocking, i.e., any one output port not currently receiving data may receive data from any one input port.
    Type: Grant
    Filed: December 14, 2022
    Date of Patent: April 8, 2025
    Assignee: XILINX, INC.
    Inventor: Ephrem Wu
  • Patent number: 12159212
    Abstract: A digital processing engine is configured to receive input data from a memory. The input data comprises first input channels. The digital processing engine is further configured to convolve, with a convolution model, the input data. The convolution model comprises a first filter layer configured to generate first intermediate data having first output channels. A number of the first output channels is less than a number of the first input channels. The convolution model further comprises a second filter layer comprising shared spatial filters and is configured to generate second intermediate data by convolving each of the first output channels with a respective one of the shared spatial filters. Each of the shared spatial filters comprises first weights. The digital processing engine is further configured to generate output data from the second intermediate data and store the output data in the memory.
    Type: Grant
    Filed: February 22, 2021
    Date of Patent: December 3, 2024
    Assignee: XILINX, INC.
    Inventor: Ephrem Wu
  • Patent number: 12147379
    Abstract: Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.
    Type: Grant
    Filed: December 28, 2022
    Date of Patent: November 19, 2024
    Assignee: XILINX, INC.
    Inventors: Rajeev Patwari, Jorn Tuyls, Elliott Delaye, Xiao Teng, Ephrem Wu
  • Publication number: 20240220444
    Abstract: Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.
    Type: Application
    Filed: December 28, 2022
    Publication date: July 4, 2024
    Inventors: Rajeev PATWARI, Jorn TUYLS, Elliott DELAYE, Xiao TENG, Ephrem WU
  • Publication number: 20240205570
    Abstract: A non-blocking crossbar switch architecture is disclosed that circumvents the problem present in prior art crossbar switches where input signals may oversubscribe the available inter-die bandwidth. The new non-blocking crossbar switch architecture is split across a plurality of semiconductor dice, including a plurality of interleaved crossbar switch segments. Only one crossbar switch segment is implemented on each semiconductor die. A plurality of input ports and output ports are coupled to the crossbar switch. The crossbar switch is non-blocking, i.e. any one output port not currently receiving data may receive data from any one input port.
    Type: Application
    Filed: December 14, 2022
    Publication date: June 20, 2024
    Inventor: Ephrem WU
  • Publication number: 20230077616
    Abstract: Examples herein describe a hardware accelerator for affine transformations (matrix multiplications followed by additions) using an outer products process. In general, the hardware accelerator reduces memory bandwidth by computing matrix multiplications as a sum of outer products. Moreover, the sum of outer products benefits parallel hardware that accelerates matrix multiplication, and is compatible with both scalar and block affine transformations, and more generally, both scalar and block matrix multiplications.
    Type: Application
    Filed: September 10, 2021
    Publication date: March 16, 2023
    Inventor: Ephrem WU
  • Patent number: 8473657
    Abstract: Described embodiments provide a first-in, first-out (FIFO) buffer for packet switching in a crossbar switch with a speedup factor of m. The FIFO buffer comprises a first logic module that receives m N-bit data portions from a switch fabric, the m N-bit data portions comprising one or more N-bit data words of one or more data packets. A plurality of one-port memories store the received data portions. Each one-port memory has a width W segmented into S portions of width W/S, where W/S is related to N. A second logic module provides one or more N-bit data words, from the one-port memories, corresponding to the received m N-bit data portions. In a sequence of clock cycles, the data portions are alternately transferred from corresponding segments of the one-port memories in a round-robin fashion, and, for each clock cycle, the second logic module constructs data out read from the one-port memories.
    Type: Grant
    Filed: March 22, 2010
    Date of Patent: June 25, 2013
    Assignee: LSI Corporation
    Inventors: Ting Zhou, Sheng Liu, Ephrem Wu
  • Patent number: 8438641
    Abstract: Described embodiments provide a network processor that includes a security protocol processor to prevent replay attacks on the network processor. A memory stores security associations for anti-replay operations. A pre-fetch module retrieves an anti-replay window corresponding to a data stream of the network processor. The anti-replay window has a range of sequence numbers. When the network processor receives a data packet, the security hardware accelerator determines a value of the received sequence number with respect to minimum and maximum values of a sequence number range of the anti-replay window. Depending on the value, the data packet is either received or accepted. The anti-replay window might be updated to reflect the receipt of the most recent data packet.
    Type: Grant
    Filed: December 29, 2010
    Date of Patent: May 7, 2013
    Assignee: LSI Corporation
    Inventors: Vojislav Vukovic, Brian Vanderwarn, Nikola Radovanovic, Ephrem Wu
  • Patent number: 8396985
    Abstract: Described embodiments provide a network processor that includes a security sub-processor to prevent replay attacks on the network processor. A memory stores an anti-replay window corresponding to a data stream of the network processor. The anti-replay window has N bits initialized to correspond to data packet sequence numbers in the range 1 to N. The anti-replay memory is stored in a plurality of data words. A plurality of flip-flops store word valid bits corresponding to each of the data words. A multiplexer selects the word valid bit corresponding to a data word requested by the security processor, and an AND gate performs a bitwise AND operation between the selected data word and word valid bit. When the network processor receives a data packet, the security sub-processor determines a value of the received sequence number with respect to minimum and maximum values of a sequence number range of the anti-replay window.
    Type: Grant
    Filed: August 11, 2010
    Date of Patent: March 12, 2013
    Assignee: LSI Corporation
    Inventor: Ephrem Wu
  • Patent number: 8374050
    Abstract: A memory operative to provide multi-port functionality includes multiple single-port memory cells forming a first memory array. The first memory array is organized into multiple memory banks, each of the memory banks comprising a corresponding subset of the single-port memory cells. The memory further includes a second memory array including multiple multi-port memory cells and is operative to track status information of data stored in corresponding locations in the first memory array. At least one cache memory is connected with the first memory array and is operative to store data for resolving concurrent read and write access conflicts in the first memory array.
    Type: Grant
    Filed: June 4, 2011
    Date of Patent: February 12, 2013
    Assignee: LSI Corporation
    Inventors: Ting Zhou, Ephrem Wu, Sheng Liu, Hyuck Jin Kwon
  • Patent number: 8359466
    Abstract: Described embodiments provide a network processor that includes a security protocol processor for staged security processing of a packet having a security association (SA). An SA request module computes an address for the SA. The SA is fetched to a local memory. An SA prefetch control word (SPCW) is read from the SA in the local memory. The SPCW identifies one or more regions of the SA and the associated stages for the one or more regions. An SPCW parser generates one or more stage SPCWs (SSPCWs) from the SPCW. Each of the SSPCWs is stored in a corresponding SSPCW register. A prefetch module services each SSPCW register in accordance with a predefined algorithm. The prefetch module fetches a requested SA region and provides the requested SA region to a corresponding stage for the staged security processing of an associated portion of the packet.
    Type: Grant
    Filed: April 29, 2011
    Date of Patent: January 22, 2013
    Assignee: LSI Corporation
    Inventors: Sheng Liu, Nikola Radovanovic, Ephrem Wu
  • Patent number: 8352669
    Abstract: Described embodiments provide for transfer of data between data modules. At least two crossbar switches are employed, where input nodes and output nodes of each crossbar switch are coupled to corresponding data modules. The ith crossbar switch has an Ni-input by Mi-output switch fabric, wherein Ni and Mi are positive integers greater than one. Each crossbar switch includes an input buffer at each input node, a crosspoint buffer at each crosspoint of the switch fabric, and an output buffer at each output node. The input buffer has an arbiter that reads data packets from the input buffer according to a first scheduling algorithm. An arbiter reads data packets from a crosspoint buffer queue according to a second scheduling algorithm. The output node receives segments of data packets provided from one or more corresponding crosspoint buffers.
    Type: Grant
    Filed: April 27, 2009
    Date of Patent: January 8, 2013
    Assignee: LSI Corporation
    Inventors: Ephrem Wu, Ting Zhou, Steven Pollock
  • Publication number: 20120278615
    Abstract: Described embodiments provide a network processor that includes a security protocol processor for staged security processing of a packet having a security association (SA). An SA request module computes an address for the SA. The SA is fetched to a local memory. An SA prefetch control word (SPCW) is read from the SA in the local memory. The SPCW identifies one or more regions of the SA and the associated stages for the one or more regions. An SPCW parser generates one or more stage SPCWs (SSPCWs) from the SPCW. Each of the SSPCWs is stored in a corresponding SSPCW register. A prefetch module services each SSPCW register in accordance with a predefined algorithm. The prefetch module fetches a requested SA region and provides the requested SA region to a corresponding stage for the staged security processing of an associated portion of the packet.
    Type: Application
    Filed: April 29, 2011
    Publication date: November 1, 2012
    Inventors: Sheng Liu, Nikola Radovanovic, Ephrem Wu
  • Patent number: 8243737
    Abstract: Described embodiments provide a first-in, first-out (FIFO) buffer for packet switching in a crossbar switch with a speedup factor of m. The FIFO buffer comprises a plurality of registers configured to receive N-bit portions of data in packets and a plurality of one-port memories, each having width W segmented into S portions a width W/S. A first logic module is coupled to the registers and the one-port memories and receives the N-bit portions of data in and the outputs of the registers. A second logic module coupled to the one-port memories constructs data out read from the one-port memories. In a sequence of clock cycles, the N-bit data portions are alternately transferred from the first logic module to a segment of the one-port memories, and, for each clock cycle, the second logic module constructs the data out packet with output width based on the speedup factor of m.
    Type: Grant
    Filed: March 22, 2010
    Date of Patent: August 14, 2012
    Assignee: LSI Corporation
    Inventors: Ting Zhou, Sheng Liu, Ephrem Wu
  • Publication number: 20120174216
    Abstract: Described embodiments provide a network processor that includes a security protocol processor to prevent replay attacks on the network processor. A memory stores security associations for anti-replay operations. A pre-fetch module retrieves an anti-replay window corresponding to a data stream of the network processor. The anti-replay window has a range of sequence numbers. When the network processor receives a data packet, the security hardware accelerator determines a value of the received sequence number with respect to minimum and maximum values of a sequence number range of the anti-replay window. Depending on the value, the data packet is either received or accepted. The anti-replay window might be updated to reflect the receipt of the most recent data packet.
    Type: Application
    Filed: December 29, 2010
    Publication date: July 5, 2012
    Inventors: Vojislav Vukovic, Brian Vanderwarn, Nikola Radovanovic, Ephrem Wu
  • Patent number: 8181147
    Abstract: Various embodiments of systems and methods are disclosed for providing adaptive body bias control. One embodiment comprises a method for adaptive body bias control. One such method comprises: modeling parametric data associated with a chip design; modeling critical path data associated with the chip design; providing a chip according to the chip design; storing the parametric data and the critical path data in a memory on the chip; reading data from a parametric sensor on the chip; based on the data from the parametric sensor and the stored critical path and parametric data, determining an optimized bulk node voltage for reducing power consumption of the chip without causing a timing failure; and adjusting the bulk node voltage according to the optimized bulk node voltage.
    Type: Grant
    Filed: June 29, 2009
    Date of Patent: May 15, 2012
    Assignee: LSI Corporation
    Inventors: Robin Tang, Ephrem Wu, Tezaswi Raja
  • Publication number: 20120042096
    Abstract: Described embodiments provide a network processor that includes a security sub-processor to prevent replay attacks on the network processor. A memory stores an anti-replay window corresponding to a data stream of the network processor. The anti-replay window has N bits initialized to correspond to data packet sequence numbers in the range 1 to N. The anti-replay memory is stored in a plurality of data words. A plurality of flip-flops store word valid bits corresponding to each of the data words. A multiplexer selects the word valid bit corresponding to a data word requested by the security processor, and an AND gate performs a bitwise AND operation between the selected data word and word valid bit. When the network processor receives a data packet, the security sub-processor determines a value of the received sequence number with respect to minimum and maximum values of a sequence number range of the anti-replay window.
    Type: Application
    Filed: August 11, 2010
    Publication date: February 16, 2012
    Inventor: Ephrem Wu
  • Publication number: 20110310691
    Abstract: A memory operative to provide multi-port functionality includes multiple single-port memory cells forming a first memory array. The first memory array is organized into multiple memory banks, each of the memory banks comprising a corresponding subset of the single-port memory cells. The memory further includes a second memory array including multiple multi-port memory cells and is operative to track status information of data stored in corresponding locations in the first memory array. At least one cache memory is connected with the first memory array and is operative to store data for resolving concurrent read and write access conflicts in the first memory array.
    Type: Application
    Filed: June 4, 2011
    Publication date: December 22, 2011
    Applicant: LSI Corporation
    Inventors: Ting Zhou, Ephrem Wu, Sheng Liu, Hyuck Jin Kwon
  • Publication number: 20100333057
    Abstract: Various embodiments of systems and methods are disclosed for providing adaptive body bias control. One embodiment comprises a method for adaptive body bias control. One such method comprises: modeling parametric data associated with a chip design; modeling critical path data associated with the chip design; providing a chip according to the chip design; storing the parametric data and the critical path data in a memory on the chip; reading data from a parametric sensor on the chip; based on the data from the parametric sensor and the stored critical path and parametric data, determining an optimized bulk node voltage for reducing power consumption of the chip without causing a timing failure; and adjusting the bulk node voltage according to the optimized bulk node voltage.
    Type: Application
    Filed: June 29, 2009
    Publication date: December 30, 2010
    Inventors: Robin Tang, Ephrem Wu, Tezaswi Raja
  • Publication number: 20100272117
    Abstract: Described embodiments provide for transfer of data between data modules. At least two crossbar switches are employed, where input nodes and output nodes of each crossbar switch are coupled to corresponding data modules. The ith crossbar switch has an Ni-input by Mi-output switch fabric, wherein Ni and Mi are positive integers greater than one. Each crossbar switch includes an input buffer at each input node, a crosspoint buffer at each crosspoint of the switch fabric, and an output buffer at each output node. The input buffer has an arbiter that reads data packets from the input buffer according to a first scheduling algorithm. An arbiter reads data packets from a crosspoint buffer queue according to a second scheduling algorithm. The output node receives segments of data packets provided from one or more corresponding crosspoint buffers.
    Type: Application
    Filed: April 27, 2009
    Publication date: October 28, 2010
    Inventors: Ephrem Wu, Ting Zhou, Steven Pollack