Patents by Inventor Ephrem Wu
Ephrem Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12273668Abstract: A non-blocking crossbar switch architecture circumvents the problem present in prior art crossbar switches where input signals may oversubscribe the available inter-die bandwidth. The new non-blocking crossbar switch architecture is split across a plurality of semiconductor dice, including a plurality of interleaved crossbar switch segments. Only one crossbar switch segment is implemented on each semiconductor die. A plurality of input ports and output ports are coupled to the crossbar switch. The crossbar switch is non-blocking, i.e., any one output port not currently receiving data may receive data from any one input port.Type: GrantFiled: December 14, 2022Date of Patent: April 8, 2025Assignee: XILINX, INC.Inventor: Ephrem Wu
-
Patent number: 12159212Abstract: A digital processing engine is configured to receive input data from a memory. The input data comprises first input channels. The digital processing engine is further configured to convolve, with a convolution model, the input data. The convolution model comprises a first filter layer configured to generate first intermediate data having first output channels. A number of the first output channels is less than a number of the first input channels. The convolution model further comprises a second filter layer comprising shared spatial filters and is configured to generate second intermediate data by convolving each of the first output channels with a respective one of the shared spatial filters. Each of the shared spatial filters comprises first weights. The digital processing engine is further configured to generate output data from the second intermediate data and store the output data in the memory.Type: GrantFiled: February 22, 2021Date of Patent: December 3, 2024Assignee: XILINX, INC.Inventor: Ephrem Wu
-
Patent number: 12147379Abstract: Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.Type: GrantFiled: December 28, 2022Date of Patent: November 19, 2024Assignee: XILINX, INC.Inventors: Rajeev Patwari, Jorn Tuyls, Elliott Delaye, Xiao Teng, Ephrem Wu
-
Publication number: 20240220444Abstract: Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.Type: ApplicationFiled: December 28, 2022Publication date: July 4, 2024Inventors: Rajeev PATWARI, Jorn TUYLS, Elliott DELAYE, Xiao TENG, Ephrem WU
-
Publication number: 20240205570Abstract: A non-blocking crossbar switch architecture is disclosed that circumvents the problem present in prior art crossbar switches where input signals may oversubscribe the available inter-die bandwidth. The new non-blocking crossbar switch architecture is split across a plurality of semiconductor dice, including a plurality of interleaved crossbar switch segments. Only one crossbar switch segment is implemented on each semiconductor die. A plurality of input ports and output ports are coupled to the crossbar switch. The crossbar switch is non-blocking, i.e. any one output port not currently receiving data may receive data from any one input port.Type: ApplicationFiled: December 14, 2022Publication date: June 20, 2024Inventor: Ephrem WU
-
Publication number: 20230077616Abstract: Examples herein describe a hardware accelerator for affine transformations (matrix multiplications followed by additions) using an outer products process. In general, the hardware accelerator reduces memory bandwidth by computing matrix multiplications as a sum of outer products. Moreover, the sum of outer products benefits parallel hardware that accelerates matrix multiplication, and is compatible with both scalar and block affine transformations, and more generally, both scalar and block matrix multiplications.Type: ApplicationFiled: September 10, 2021Publication date: March 16, 2023Inventor: Ephrem WU
-
Patent number: 8473657Abstract: Described embodiments provide a first-in, first-out (FIFO) buffer for packet switching in a crossbar switch with a speedup factor of m. The FIFO buffer comprises a first logic module that receives m N-bit data portions from a switch fabric, the m N-bit data portions comprising one or more N-bit data words of one or more data packets. A plurality of one-port memories store the received data portions. Each one-port memory has a width W segmented into S portions of width W/S, where W/S is related to N. A second logic module provides one or more N-bit data words, from the one-port memories, corresponding to the received m N-bit data portions. In a sequence of clock cycles, the data portions are alternately transferred from corresponding segments of the one-port memories in a round-robin fashion, and, for each clock cycle, the second logic module constructs data out read from the one-port memories.Type: GrantFiled: March 22, 2010Date of Patent: June 25, 2013Assignee: LSI CorporationInventors: Ting Zhou, Sheng Liu, Ephrem Wu
-
Patent number: 8438641Abstract: Described embodiments provide a network processor that includes a security protocol processor to prevent replay attacks on the network processor. A memory stores security associations for anti-replay operations. A pre-fetch module retrieves an anti-replay window corresponding to a data stream of the network processor. The anti-replay window has a range of sequence numbers. When the network processor receives a data packet, the security hardware accelerator determines a value of the received sequence number with respect to minimum and maximum values of a sequence number range of the anti-replay window. Depending on the value, the data packet is either received or accepted. The anti-replay window might be updated to reflect the receipt of the most recent data packet.Type: GrantFiled: December 29, 2010Date of Patent: May 7, 2013Assignee: LSI CorporationInventors: Vojislav Vukovic, Brian Vanderwarn, Nikola Radovanovic, Ephrem Wu
-
Patent number: 8396985Abstract: Described embodiments provide a network processor that includes a security sub-processor to prevent replay attacks on the network processor. A memory stores an anti-replay window corresponding to a data stream of the network processor. The anti-replay window has N bits initialized to correspond to data packet sequence numbers in the range 1 to N. The anti-replay memory is stored in a plurality of data words. A plurality of flip-flops store word valid bits corresponding to each of the data words. A multiplexer selects the word valid bit corresponding to a data word requested by the security processor, and an AND gate performs a bitwise AND operation between the selected data word and word valid bit. When the network processor receives a data packet, the security sub-processor determines a value of the received sequence number with respect to minimum and maximum values of a sequence number range of the anti-replay window.Type: GrantFiled: August 11, 2010Date of Patent: March 12, 2013Assignee: LSI CorporationInventor: Ephrem Wu
-
Patent number: 8374050Abstract: A memory operative to provide multi-port functionality includes multiple single-port memory cells forming a first memory array. The first memory array is organized into multiple memory banks, each of the memory banks comprising a corresponding subset of the single-port memory cells. The memory further includes a second memory array including multiple multi-port memory cells and is operative to track status information of data stored in corresponding locations in the first memory array. At least one cache memory is connected with the first memory array and is operative to store data for resolving concurrent read and write access conflicts in the first memory array.Type: GrantFiled: June 4, 2011Date of Patent: February 12, 2013Assignee: LSI CorporationInventors: Ting Zhou, Ephrem Wu, Sheng Liu, Hyuck Jin Kwon
-
Patent number: 8359466Abstract: Described embodiments provide a network processor that includes a security protocol processor for staged security processing of a packet having a security association (SA). An SA request module computes an address for the SA. The SA is fetched to a local memory. An SA prefetch control word (SPCW) is read from the SA in the local memory. The SPCW identifies one or more regions of the SA and the associated stages for the one or more regions. An SPCW parser generates one or more stage SPCWs (SSPCWs) from the SPCW. Each of the SSPCWs is stored in a corresponding SSPCW register. A prefetch module services each SSPCW register in accordance with a predefined algorithm. The prefetch module fetches a requested SA region and provides the requested SA region to a corresponding stage for the staged security processing of an associated portion of the packet.Type: GrantFiled: April 29, 2011Date of Patent: January 22, 2013Assignee: LSI CorporationInventors: Sheng Liu, Nikola Radovanovic, Ephrem Wu
-
Patent number: 8352669Abstract: Described embodiments provide for transfer of data between data modules. At least two crossbar switches are employed, where input nodes and output nodes of each crossbar switch are coupled to corresponding data modules. The ith crossbar switch has an Ni-input by Mi-output switch fabric, wherein Ni and Mi are positive integers greater than one. Each crossbar switch includes an input buffer at each input node, a crosspoint buffer at each crosspoint of the switch fabric, and an output buffer at each output node. The input buffer has an arbiter that reads data packets from the input buffer according to a first scheduling algorithm. An arbiter reads data packets from a crosspoint buffer queue according to a second scheduling algorithm. The output node receives segments of data packets provided from one or more corresponding crosspoint buffers.Type: GrantFiled: April 27, 2009Date of Patent: January 8, 2013Assignee: LSI CorporationInventors: Ephrem Wu, Ting Zhou, Steven Pollock
-
Publication number: 20120278615Abstract: Described embodiments provide a network processor that includes a security protocol processor for staged security processing of a packet having a security association (SA). An SA request module computes an address for the SA. The SA is fetched to a local memory. An SA prefetch control word (SPCW) is read from the SA in the local memory. The SPCW identifies one or more regions of the SA and the associated stages for the one or more regions. An SPCW parser generates one or more stage SPCWs (SSPCWs) from the SPCW. Each of the SSPCWs is stored in a corresponding SSPCW register. A prefetch module services each SSPCW register in accordance with a predefined algorithm. The prefetch module fetches a requested SA region and provides the requested SA region to a corresponding stage for the staged security processing of an associated portion of the packet.Type: ApplicationFiled: April 29, 2011Publication date: November 1, 2012Inventors: Sheng Liu, Nikola Radovanovic, Ephrem Wu
-
Patent number: 8243737Abstract: Described embodiments provide a first-in, first-out (FIFO) buffer for packet switching in a crossbar switch with a speedup factor of m. The FIFO buffer comprises a plurality of registers configured to receive N-bit portions of data in packets and a plurality of one-port memories, each having width W segmented into S portions a width W/S. A first logic module is coupled to the registers and the one-port memories and receives the N-bit portions of data in and the outputs of the registers. A second logic module coupled to the one-port memories constructs data out read from the one-port memories. In a sequence of clock cycles, the N-bit data portions are alternately transferred from the first logic module to a segment of the one-port memories, and, for each clock cycle, the second logic module constructs the data out packet with output width based on the speedup factor of m.Type: GrantFiled: March 22, 2010Date of Patent: August 14, 2012Assignee: LSI CorporationInventors: Ting Zhou, Sheng Liu, Ephrem Wu
-
Publication number: 20120174216Abstract: Described embodiments provide a network processor that includes a security protocol processor to prevent replay attacks on the network processor. A memory stores security associations for anti-replay operations. A pre-fetch module retrieves an anti-replay window corresponding to a data stream of the network processor. The anti-replay window has a range of sequence numbers. When the network processor receives a data packet, the security hardware accelerator determines a value of the received sequence number with respect to minimum and maximum values of a sequence number range of the anti-replay window. Depending on the value, the data packet is either received or accepted. The anti-replay window might be updated to reflect the receipt of the most recent data packet.Type: ApplicationFiled: December 29, 2010Publication date: July 5, 2012Inventors: Vojislav Vukovic, Brian Vanderwarn, Nikola Radovanovic, Ephrem Wu
-
Patent number: 8181147Abstract: Various embodiments of systems and methods are disclosed for providing adaptive body bias control. One embodiment comprises a method for adaptive body bias control. One such method comprises: modeling parametric data associated with a chip design; modeling critical path data associated with the chip design; providing a chip according to the chip design; storing the parametric data and the critical path data in a memory on the chip; reading data from a parametric sensor on the chip; based on the data from the parametric sensor and the stored critical path and parametric data, determining an optimized bulk node voltage for reducing power consumption of the chip without causing a timing failure; and adjusting the bulk node voltage according to the optimized bulk node voltage.Type: GrantFiled: June 29, 2009Date of Patent: May 15, 2012Assignee: LSI CorporationInventors: Robin Tang, Ephrem Wu, Tezaswi Raja
-
Publication number: 20120042096Abstract: Described embodiments provide a network processor that includes a security sub-processor to prevent replay attacks on the network processor. A memory stores an anti-replay window corresponding to a data stream of the network processor. The anti-replay window has N bits initialized to correspond to data packet sequence numbers in the range 1 to N. The anti-replay memory is stored in a plurality of data words. A plurality of flip-flops store word valid bits corresponding to each of the data words. A multiplexer selects the word valid bit corresponding to a data word requested by the security processor, and an AND gate performs a bitwise AND operation between the selected data word and word valid bit. When the network processor receives a data packet, the security sub-processor determines a value of the received sequence number with respect to minimum and maximum values of a sequence number range of the anti-replay window.Type: ApplicationFiled: August 11, 2010Publication date: February 16, 2012Inventor: Ephrem Wu
-
Publication number: 20110310691Abstract: A memory operative to provide multi-port functionality includes multiple single-port memory cells forming a first memory array. The first memory array is organized into multiple memory banks, each of the memory banks comprising a corresponding subset of the single-port memory cells. The memory further includes a second memory array including multiple multi-port memory cells and is operative to track status information of data stored in corresponding locations in the first memory array. At least one cache memory is connected with the first memory array and is operative to store data for resolving concurrent read and write access conflicts in the first memory array.Type: ApplicationFiled: June 4, 2011Publication date: December 22, 2011Applicant: LSI CorporationInventors: Ting Zhou, Ephrem Wu, Sheng Liu, Hyuck Jin Kwon
-
Publication number: 20100333057Abstract: Various embodiments of systems and methods are disclosed for providing adaptive body bias control. One embodiment comprises a method for adaptive body bias control. One such method comprises: modeling parametric data associated with a chip design; modeling critical path data associated with the chip design; providing a chip according to the chip design; storing the parametric data and the critical path data in a memory on the chip; reading data from a parametric sensor on the chip; based on the data from the parametric sensor and the stored critical path and parametric data, determining an optimized bulk node voltage for reducing power consumption of the chip without causing a timing failure; and adjusting the bulk node voltage according to the optimized bulk node voltage.Type: ApplicationFiled: June 29, 2009Publication date: December 30, 2010Inventors: Robin Tang, Ephrem Wu, Tezaswi Raja
-
Publication number: 20100272117Abstract: Described embodiments provide for transfer of data between data modules. At least two crossbar switches are employed, where input nodes and output nodes of each crossbar switch are coupled to corresponding data modules. The ith crossbar switch has an Ni-input by Mi-output switch fabric, wherein Ni and Mi are positive integers greater than one. Each crossbar switch includes an input buffer at each input node, a crosspoint buffer at each crosspoint of the switch fabric, and an output buffer at each output node. The input buffer has an arbiter that reads data packets from the input buffer according to a first scheduling algorithm. An arbiter reads data packets from a crosspoint buffer queue according to a second scheduling algorithm. The output node receives segments of data packets provided from one or more corresponding crosspoint buffers.Type: ApplicationFiled: April 27, 2009Publication date: October 28, 2010Inventors: Ephrem Wu, Ting Zhou, Steven Pollack