Patents by Inventor Eric Wayne Mahurin

Eric Wayne Mahurin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240118902
    Abstract: An aspect of the disclosure relates to a data processing system, including: an input medium configured to include a first set of blocks of data including a first set of block of compressed data and a first set of metadata, respectively; an output medium configured to include a first set of blocks of decompressed data each having a predetermined number of decompressed elements; and a set of single instruction multiple data (SIMD) processors configured to: access the first set of blocks of data from the input medium, respectively; decompress the first set of blocks of compressed data to generate the first set of blocks of decompressed data based on the first set of metadata, respectively; and provide the first set of blocks of decompressed data to the output medium, respectively.
    Type: Application
    Filed: June 22, 2023
    Publication date: April 11, 2024
    Inventors: Eric Wayne MAHURIN, Erich PLONDKE, Hitesh Kumar GUPTA, Colin Beaton VERRILLI, Rexford Alan HILL
  • Publication number: 20230306233
    Abstract: A processor-implemented method includes bit shifting a binary representation of a neural network parameter. The neural network parameter has fewer bits, b, than a number of hardware bits, B, supported by hardware that processes the neural network parameter. The bit shifting effectively multiplies the neural network parameter by 2B-b. The method also includes dividing a quantization scale by 2B-b to obtain an updated quantization scale. The method further includes quantizing the bit shifted binary representation with the updated quantization scale to obtain a value for the neural network parameter.
    Type: Application
    Filed: January 30, 2023
    Publication date: September 28, 2023
    Inventors: Marinus Willem VAN BAALEN, Brian KAHNE, Eric Wayne MAHURIN, Tijmen Pieter Frederik BLANKEVOORT, Andrey KUZMIN, Andrii SKLIAR, Markus NAGEL
  • Patent number: 11669273
    Abstract: A device includes a scoreboard and a processor. The scoreboard includes scoreboard entries configured to store information regarding one or more uncompleted memory access operations. The scoreboard also includes a dependency matrix configured to store dependency information corresponding to the scoreboard entries. The processor is configured to retrieve a first memory access instruction that indicates a first address range of a first memory access operation, and to add an indication of the first memory access instruction to a first scoreboard entry. The processor is further configured to, based on determining that the first address range at least partially overlaps a second address range associated with a second scoreboard entry that corresponds to a second memory access instruction, set an element of the dependency matrix to have a has-dependency value indicating a dependency of the first scoreboard entry on the second scoreboard entry.
    Type: Grant
    Filed: February 3, 2021
    Date of Patent: June 6, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Eric Wayne Mahurin, Hitesh Kumar Gupta, Ahmad Radaideh
  • Patent number: 11669747
    Abstract: A method of constraining data represented in a deep neural network is described. The method includes determining an initial shifting specified to convert a fixed-point input value to a floating-point output value. The method also includes determining an additional shifting specified to constrain a dynamic range during converting of the fixed-point input value to the floating-point output value. The method further includes performing both the initial shifting and the additional shifting together to form a dynamic, range constrained, normalized floating-point output value.
    Type: Grant
    Filed: October 29, 2019
    Date of Patent: June 6, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Rexford Alan Hill, Eric Wayne Mahurin, Aaron Douglass Lamb, Albert Danysh, Erich Plondke, David Hoyle
  • Patent number: 11609764
    Abstract: Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: March 21, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Eric Wayne Mahurin, Ahmad Mahmoud Radaideh
  • Patent number: 11586272
    Abstract: Systems and methods for power control based on performance modification through pulse modulation include an integrated circuit (IC) that may evaluate certain limit conditions within a computing device and compare the limit conditions to corresponding predefined thresholds. When a given predefined threshold is exceeded, an overage signal may be sent to a limits management circuit within the initial IC or another IC. The limits management circuit may generate a single-bit throttle signal through a pulse modulation circuit. The single-bit throttle signal may modify internal processing of an associated processor, which in turn changes power consumption.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: February 21, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Vijay Kiran Kalyanam, Eric Wayne Mahurin
  • Publication number: 20220365780
    Abstract: Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.
    Type: Application
    Filed: August 3, 2020
    Publication date: November 17, 2022
    Inventors: Eric Wayne Mahurin, Ahmad Mahmoud Radaideh
  • Publication number: 20220309314
    Abstract: Various embodiments include methods and devices for processing a neural network by an artificial intelligence (AI) processor. Embodiments may include receiving an AI processor operating condition information, dynamically adjusting an AI quantization level for a segment of a neural network in response to the operating condition information, and processing the segment of the neural network quantization using the adjusted AI quantization level.
    Type: Application
    Filed: March 24, 2021
    Publication date: September 29, 2022
    Inventors: Hee Jun PARK, Eric Wayne MAHURIN, Tijmen Pieter Frederik BLANKEVOORT
  • Patent number: 11287872
    Abstract: Systems and methods for multi-thread power limiting via a shared limit estimates power consumed in a processing core on a thread-by-thread basis by counting how many power events occur in each thread. Power consumed by each thread is approximated based on the number of power events that have occurred. Power consumed by individual threads is compared to a shared power limit derived from a sum of the power consumed by all threads. Threads that are above the shared power limit are stalled while threads below the shared power limit are allowed to continue without throttling. In this fashion, the most power intensive threads are throttled to stay below the shared power limit while still maintaining performance.
    Type: Grant
    Filed: March 25, 2020
    Date of Patent: March 29, 2022
    Assignee: Qualcomm Incorporated
    Inventors: Eric Wayne Mahurin, Vijay Kiran Kalyanam
  • Publication number: 20220035891
    Abstract: Matrix multiple operations may use a reduced result matrix to increase the speed and accuracy of the operation. In one example, each higher precision row/column is decomposed into multiple component rows/columns of the base type that can be combined as weighted sums to form the original higher precision row/column. In another example, the decomposition may be independent for each input matrix and decompose to any multiple of the base type. In another example, the base type for each input matrix could be different. In another example, after decomposition, a matrix operation is performed (e.g. matrix multiply, convolutional layer, or possibly other matrix operation) on decomposed base type input matrices to yield a result matrix that contains components of the higher precision results. The results may be combined together to obtain higher-precision results.
    Type: Application
    Filed: July 30, 2021
    Publication date: February 3, 2022
    Inventors: Eric Wayne MAHURIN, Erich PLONDKE
  • Publication number: 20210240251
    Abstract: Systems and methods for multi-thread power limiting via a shared limit estimates power consumed in a processing core on a thread-by-thread basis by counting how many power events occur in each thread. Power consumed by each thread is approximated based on the number of power events that have occurred. Power consumed by individual threads is compared to a shared power limit derived from a sum of the power consumed by all threads. Threads that are above the shared power limit are stalled while threads below the shared power limit are allowed to continue without throttling. In this fashion, the most power intensive threads are throttled to stay below the shared power limit while still maintaining performance.
    Type: Application
    Filed: March 25, 2020
    Publication date: August 5, 2021
    Inventors: Eric Wayne Mahurin, Vijay Kiran Kalyanam
  • Publication number: 20210241070
    Abstract: A device includes one or more processors configured to retrieve a first block of data, the data corresponding to array of values arranged along at least a first dimension and a second dimension, to retrieve at least a portion of a second block of the data, and to perform a first hybrid convolution operation that applies a filter across the first block and at least the portion of the second block to generate output data. The output data includes a first accumulated block and at least a portion of a second accumulated block. The one or more processors are also configured to store the first accumulated block as first output data. The portion of the second block is adjacent to the first block along the first dimension and the portion of the second accumulated block is adjacent to the first accumulated block along the second dimension.
    Type: Application
    Filed: February 2, 2021
    Publication date: August 5, 2021
    Inventor: Eric Wayne MAHURIN
  • Publication number: 20210240394
    Abstract: A device includes a scoreboard and a processor. The scoreboard includes scoreboard entries configured to store information regarding one or more uncompleted memory access operations. The scoreboard also includes a dependency matrix configured to store dependency information corresponding to the scoreboard entries. The processor is configured to retrieve a first memory access instruction that indicates a first address range of a first memory access operation, and to add an indication of the first memory access instruction to a first scoreboard entry. The processor is further configured to, based on determining that the first address range at least partially overlaps a second address range associated with a second scoreboard entry that corresponds to a second memory access instruction, set an element of the dependency matrix to have a has-dependency value indicating a dependency of the first scoreboard entry on the second scoreboard entry.
    Type: Application
    Filed: February 3, 2021
    Publication date: August 5, 2021
    Inventors: Eric Wayne MAHURIN, Hitesh Kumar GUPTA, Ahmad RADAIDEH
  • Publication number: 20210096635
    Abstract: Systems and methods for power control based on performance modification through pulse modulation include an integrated circuit (IC) that may evaluate certain limit conditions within a computing device and compare the limit conditions to corresponding predefined thresholds. When a given predefined threshold is exceeded, an overage signal may be sent to a limits management circuit within the initial IC or another IC. The limits management circuit may generate a single-bit throttle signal through a pulse modulation circuit. The single-bit throttle signal may modify internal processing of an associated processor, which in turn changes power consumption.
    Type: Application
    Filed: October 31, 2019
    Publication date: April 1, 2021
    Inventors: Vijay Kiran Kalyanam, Eric Wayne Mahurin
  • Patent number: 10860051
    Abstract: A clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption per cycle of the clock signal. The CGS further includes a voltage-clock gate (VCG) circuit coupled to the digital power estimator. The VCG circuit is configured to gate and un-gate the clock signal based on the indications prior to occurrence of a voltage droop event and using hardware voltage model circuitry of the VCG circuit. The VCG circuit is further configured to gate the clock signal based on an undershoot phase associated with the voltage droop event and to un-gate the clock signal based on an overshoot phase associated with the voltage droop event.
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: December 8, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Vijay Kiran Kalyanam, Eric Wayne Mahurin
  • Publication number: 20200134475
    Abstract: A method of constraining data represented in a deep neural network is described. The method includes determining an initial shifting specified to convert a fixed-point input value to a floating-point output value. The method also includes determining an additional shifting specified to constrain a dynamic range during converting of the fixed-point input value to the floating-point output value. The method further includes performing both the initial shifting and the additional shifting together to form a dynamic, range constrained, normalized floating-point output value.
    Type: Application
    Filed: October 29, 2019
    Publication date: April 30, 2020
    Inventors: Rexford Alan HILL, Eric Wayne MAHURIN, Aaron Douglass LAMB, Albert DANYSH, Eric PLONDKE, David HOYLE
  • Publication number: 20200081479
    Abstract: A clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption per cycle of the clock signal. The CGS further includes a voltage-clock gate (VCG) circuit coupled to the digital power estimator. The VCG circuit is configured to gate and un-gate the clock signal based on the indications prior to occurrence of a voltage droop event and using hardware voltage model circuitry of the VCG circuit. The VCG circuit is further configured to gate the clock signal based on an undershoot phase associated with the voltage droop event and to un-gate the clock signal based on an overshoot phase associated with the voltage droop event.
    Type: Application
    Filed: September 6, 2019
    Publication date: March 12, 2020
    Inventors: Vijay Kiran KALYANAM, Eric Wayne MAHURIN
  • Patent number: 10489155
    Abstract: Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.
    Type: Grant
    Filed: July 21, 2015
    Date of Patent: November 26, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Eric Wayne Mahurin, Ajay Anant Ingle
  • Patent number: 10459723
    Abstract: Systems and methods relate to performing data movement operations using single instruction multiple data (SIMD) instructions. A first SIMD instruction comprises a first input data vector having a number N of two or more data elements in corresponding N SIMD lanes and a control vector having N control elements in the corresponding N SIMD lanes. A first multi-stage cube network is controllable by the first SIMD instruction, and includes movement elements, with one movement element per SIMD lane, per stage. A movement element selects between one of two data elements based on a corresponding control element and moves the data elements across the stages of the first multi-stage cube network by a zero distance or power-of-two distance between adjacent stages to generate a first output data vector. A second multi-stage cube network can be used in conjunction to generate all possible data movement operations of the input data vector.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: October 29, 2019
    Assignee: QUALCOMM Incorporated
    Inventor: Eric Wayne Mahurin
  • Patent number: 10152101
    Abstract: Systems and methods relate to controlling voltage deviations in processing systems. A scheduler receives transactions and to be issued for execution in a pipeline. A voltage deviation that will occur if a particular transaction is executed in the pipeline is estimated before the transaction is issued. Threshold comparators are used to determine if the estimated voltage deviation will exceed specified thresholds to cause voltage overshoots or undershoots. The scheduler is configured to implement one or more corrective measures, such as increasing or decreasing energy in the pipeline, to mitigate possible voltage overshoots or undershoots, before the transaction is issued to be executed in the pipeline.
    Type: Grant
    Filed: September 22, 2015
    Date of Patent: December 11, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Eric Wayne Mahurin, Sanjay Bhagawan Patil, Martin Pierre Saint-Laurent