Patents by Inventor Wen-mei W. Hwu
Wen-mei W. Hwu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11977883Abstract: The present disclosure relates to systems and methods that provide a reconfigurable cryptographic coprocessor. An example system includes an instruction memory configured to provide ARX instructions and mode control instructions. The system also includes an adjustable-width arithmetic logic unit, an adjustable-width rotator, and a coefficient memory. A bit width of the adjustable-width arithmetic logic unit and a bit width of the adjustable-width rotator are adjusted according to the mode control instructions. The coefficient memory is configured to provide variable-width words to the arithmetic logic unit and the rotator. The arithmetic logic unit and the rotator are configured to carry out the ARX instructions on the provided variable-width words. The systems and methods described herein could accelerate various applications, such as deep learning, by assigning one or more of the disclosed reconfigurable coprocessors to work as a central computation unit in a neural network.Type: GrantFiled: September 24, 2021Date of Patent: May 7, 2024Assignees: The Board of Trustees of the University of Illinois, University of Virginia Patent FoundationInventors: Mohamed E. Aly, Wen-Mei W. Hwu, Kevin Skadron
-
Publication number: 20220012052Abstract: The present disclosure relates to systems and methods that provide a reconfigurable cryptographic coprocessor. An example system includes an instruction memory configured to provide ARX instructions and mode control instructions. The system also includes an adjustable-width arithmetic logic unit, an adjustable-width rotator, and a coefficient memory. A bit width of the adjustable-width arithmetic logic unit and a bit width of the adjustable-width rotator are adjusted according to the mode control instructions. The coefficient memory is configured to provide variable-width words to the arithmetic logic unit and the rotator. The arithmetic logic unit and the rotator are configured to carry out the ARX instructions on the provided variable-width words. The systems and methods described herein could accelerate various applications, such as deep learning, by assigning one or more of the disclosed reconfigurable coprocessors to work as a central computation unit in a neural network.Type: ApplicationFiled: September 24, 2021Publication date: January 13, 2022Inventors: Mohamed E. Aly, Wen-Mei W. Hwu, Kevin Skadron
-
Patent number: 11157275Abstract: The present disclosure relates to systems and methods that provide a reconfigurable cryptographic coprocessor. An example system includes an instruction memory configured to provide ARX instructions and mode control instructions. The system also includes an adjustable-width arithmetic logic unit, an adjustable-width rotator, and a coefficient memory. A bit width of the adjustable-width arithmetic logic unit and a bit width of the adjustable-width rotator are adjusted according to the mode control instructions. The coefficient memory is configured to provide variable-width words to the arithmetic logic unit and the rotator. The arithmetic logic unit and the rotator are configured to carry out the ARX instructions on the provided variable-width words. The systems and methods described herein could accelerate various applications, such as deep learning, by assigning one or more of the disclosed reconfigurable coprocessors to work as a central computation unit in a neural network.Type: GrantFiled: July 3, 2018Date of Patent: October 26, 2021Assignees: The Board of Trustees of the University of Illinois, University of Virginia Patent FoundationInventors: Mohamed E Aly, Wen-Mei W. Hwu, Kevin Skadron
-
Publication number: 20200012495Abstract: The present disclosure relates to systems and methods that provide a reconfigurable cryptographic coprocessor. An example system includes an instruction memory configured to provide ARX instructions and mode control instructions. The system also includes an adjustable-width arithmetic logic unit, an adjustable-width rotator, and a coefficient memory. A bit width of the adjustable-width arithmetic logic unit and a bit width of the adjustable-width rotator are adjusted according to the mode control instructions. The coefficient memory is configured to provide variable-width words to the arithmetic logic unit and the rotator. The arithmetic logic unit and the rotator are configured to carry out the ARX instructions on the provided variable-width words. The systems and methods described herein could accelerate various applications, such as deep learning, by assigning one or more of the disclosed reconfigurable coprocessors to work as a central computation unit in a neural network.Type: ApplicationFiled: July 3, 2018Publication date: January 9, 2020Inventors: Mohamed E Aly, Wen-Mei W. Hwu, Kevin Skadron
-
Patent number: 7725696Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution.Type: GrantFiled: October 4, 2007Date of Patent: May 25, 2010Inventors: Wen-mei W. Hwu, Matthew C. Merten
-
Patent number: 7302557Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution.Type: GrantFiled: December 1, 2000Date of Patent: November 27, 2007Assignee: Impact Technologies, Inc.Inventors: Wen-mei W. Hwu, Matthew C. Merten
-
Patent number: 6640315Abstract: Disclosed is a method and system for handling inline recovery from speculatively executed instructions. Each register may be provided with an E-tag, that, when set, indicates an exception occurred in the generation of the value stored in its register, and an R-tag, which is used to manage data flow dependencies in recovery mode. Recovery is performed by re-executing speculatively those set of speculative instructions that are data flow dependent upon a first excepting speculative instruction. The disclosed invention provides an architecture and method for efficient exception handling when combining control speculation, data speculation and predication, thereby resulting in substantially enhanced instruction level parallelism.Type: GrantFiled: June 26, 1999Date of Patent: October 28, 2003Assignee: Board of Trustees of the University of IllinoisInventors: Wen-mei W. Hwu, Daniel A. Connors, David I. August, John W. Sias
-
Patent number: 5694577Abstract: An apparatus is provided, for use in a computer having a register bank and a device for operand fetch and instruction execution, for monitoring a store address to maintain coherency of preloaded data that is fetched by a load operation and should be effected by at least one subsequent store operation. The apparatus includes an address register bank having entries for holding the address of a load having loaded data which should be affected by at least one subsequent store operation. Each of the entries has associated therewith a pre-load flag and a type field, the pre-load flag being set when the load is executed and reset when there is no need to be affected by a subsequent store operation.Type: GrantFiled: June 6, 1995Date of Patent: December 2, 1997Assignees: Matsushita Electric Industrial Co., Ltd., The Board of Trustees of the University of IllinoisInventors: Tokuzo Kiyohara, Wen-mei W. Hwu, William Chen