Patents by Inventor Olivia K. Wu
Olivia K. Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240112006Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.Type: ApplicationFiled: December 8, 2023Publication date: April 4, 2024Inventors: Horace H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
-
Publication number: 20230222331Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.Type: ApplicationFiled: March 15, 2023Publication date: July 13, 2023Inventors: Horce H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
-
Patent number: 11567555Abstract: Embodiments include an apparatus comprising an execution unit coupled to a memory, a microcode controller, and a hardware controller. The microcode controller is to identify a global power and performance hint in an instruction stream that includes first and second instruction phases to be executed in parallel, identify a local hint based on synchronization dependence in the first instruction phase, and use the first local hint to balance power consumption between the execution unit and the memory during parallel executions of the first and second instruction phases. The hardware controller is to use the global hint to determine an appropriate voltage level of a compute voltage and a frequency of a compute clock signal for the execution unit during the parallel executions of the first and second instruction phases. The first local hint includes a processing rate for the first instruction phase or an indication of the processing rate.Type: GrantFiled: August 30, 2019Date of Patent: January 31, 2023Assignee: Intel CorporationInventors: Jason Seung-Min Kim, Sundar Ramani, Yogesh Bansal, Nitin N. Garegrat, Olivia K. Wu, Mayank Kaushik, Mrinal Iyer, Tom Schebye, Andrew Yang
-
Publication number: 20220245438Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.Type: ApplicationFiled: April 25, 2022Publication date: August 4, 2022Inventors: Horce H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
-
Publication number: 20190392297Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.Type: ApplicationFiled: December 28, 2017Publication date: December 26, 2019Applicant: Intel CorporationInventors: Horace H. Lau, Prashant Arora, Olivia K. Wu, Tony Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
-
Publication number: 20190384370Abstract: Embodiments include an apparatus comprising an execution unit coupled to a memory, a microcode controller, and a hardware controller. The microcode controller is to identify a global power and performance hint in an instruction stream that includes first and second instruction phases to be executed in parallel, identify a local hint based on synchronization dependence in the first instruction phase, and use the first local hint to balance power consumption between the execution unit and the memory during parallel executions of the first and second instruction phases. The hardware controller is to use the global hint to determine an appropriate voltage level of a compute voltage and a frequency of a compute clock signal for the execution unit during the parallel executions of the first and second instruction phases. The first local hint includes a processing rate for the first instruction phase or an indication of the processing rate.Type: ApplicationFiled: August 30, 2019Publication date: December 19, 2019Inventors: Jason Seung-Min Kim, Sundar Ramani, Yogesh Bansal, Nitin N. Garegrat, Olivia K. Wu, Mayank Kaushik, Mrinal Iyer, Tom Schebye, Andrew Yang
-
Patent number: 10305508Abstract: A processor comprises a first memory to store data elements that are encoded according to a floating point format including a sign field, an exponent field, and a significand field; and a compression engine comprising circuitry, the compression engine to generate a compressed data block that is to include a tag type per data element, wherein responsive to a determination that a first data element includes a value in its exponent field that does not match a value of any entry in a dictionary, a first tag type and an uncompressed value of the data element are included in the compressed data block; and responsive to a determination that a second data element includes a value in its exponent field that matches a value of a first entry in the dictionary, a second tag type and a compressed value of the data element are included in the compressed data block.Type: GrantFiled: May 11, 2018Date of Patent: May 28, 2019Assignee: Intel CorporationInventors: James D. Guilford, Vinodh Gopal, Kirk S. Yap, Olivia K. Wu
-
Publication number: 20190044531Abstract: A processor comprises a first memory to store data elements that are encoded according to a floating point format including a sign field, an exponent field, and a significand field; and a compression engine comprising circuitry, the compression engine to generate a compressed data block that is to include a tag type per data element, wherein responsive to a determination that a first data element includes a value in its exponent field that does not match a value of any entry in a dictionary, a first tag type and an uncompressed value of the data element are included in the compressed data block; and responsive to a determination that a second data element includes a value in its exponent field that matches a value of a first entry in the dictionary, a second tag type and a compressed value of the data element are included in the compressed data block.Type: ApplicationFiled: May 11, 2018Publication date: February 7, 2019Applicant: Intel CorporationInventors: James D. Guilford, Vinodh Gopal, Kirk S. Yap, Olivia K. Wu