Patents by Inventor Eric Wayne Mahurin
Eric Wayne Mahurin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240118902Abstract: An aspect of the disclosure relates to a data processing system, including: an input medium configured to include a first set of blocks of data including a first set of block of compressed data and a first set of metadata, respectively; an output medium configured to include a first set of blocks of decompressed data each having a predetermined number of decompressed elements; and a set of single instruction multiple data (SIMD) processors configured to: access the first set of blocks of data from the input medium, respectively; decompress the first set of blocks of compressed data to generate the first set of blocks of decompressed data based on the first set of metadata, respectively; and provide the first set of blocks of decompressed data to the output medium, respectively.Type: ApplicationFiled: June 22, 2023Publication date: April 11, 2024Inventors: Eric Wayne MAHURIN, Erich PLONDKE, Hitesh Kumar GUPTA, Colin Beaton VERRILLI, Rexford Alan HILL
-
Publication number: 20230306233Abstract: A processor-implemented method includes bit shifting a binary representation of a neural network parameter. The neural network parameter has fewer bits, b, than a number of hardware bits, B, supported by hardware that processes the neural network parameter. The bit shifting effectively multiplies the neural network parameter by 2B-b. The method also includes dividing a quantization scale by 2B-b to obtain an updated quantization scale. The method further includes quantizing the bit shifted binary representation with the updated quantization scale to obtain a value for the neural network parameter.Type: ApplicationFiled: January 30, 2023Publication date: September 28, 2023Inventors: Marinus Willem VAN BAALEN, Brian KAHNE, Eric Wayne MAHURIN, Tijmen Pieter Frederik BLANKEVOORT, Andrey KUZMIN, Andrii SKLIAR, Markus NAGEL
-
Patent number: 11669273Abstract: A device includes a scoreboard and a processor. The scoreboard includes scoreboard entries configured to store information regarding one or more uncompleted memory access operations. The scoreboard also includes a dependency matrix configured to store dependency information corresponding to the scoreboard entries. The processor is configured to retrieve a first memory access instruction that indicates a first address range of a first memory access operation, and to add an indication of the first memory access instruction to a first scoreboard entry. The processor is further configured to, based on determining that the first address range at least partially overlaps a second address range associated with a second scoreboard entry that corresponds to a second memory access instruction, set an element of the dependency matrix to have a has-dependency value indicating a dependency of the first scoreboard entry on the second scoreboard entry.Type: GrantFiled: February 3, 2021Date of Patent: June 6, 2023Assignee: Qualcomm IncorporatedInventors: Eric Wayne Mahurin, Hitesh Kumar Gupta, Ahmad Radaideh
-
Patent number: 11669747Abstract: A method of constraining data represented in a deep neural network is described. The method includes determining an initial shifting specified to convert a fixed-point input value to a floating-point output value. The method also includes determining an additional shifting specified to constrain a dynamic range during converting of the fixed-point input value to the floating-point output value. The method further includes performing both the initial shifting and the additional shifting together to form a dynamic, range constrained, normalized floating-point output value.Type: GrantFiled: October 29, 2019Date of Patent: June 6, 2023Assignee: Qualcomm IncorporatedInventors: Rexford Alan Hill, Eric Wayne Mahurin, Aaron Douglass Lamb, Albert Danysh, Erich Plondke, David Hoyle
-
Patent number: 11609764Abstract: Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.Type: GrantFiled: August 3, 2020Date of Patent: March 21, 2023Assignee: Qualcomm IncorporatedInventors: Eric Wayne Mahurin, Ahmad Mahmoud Radaideh
-
Patent number: 11586272Abstract: Systems and methods for power control based on performance modification through pulse modulation include an integrated circuit (IC) that may evaluate certain limit conditions within a computing device and compare the limit conditions to corresponding predefined thresholds. When a given predefined threshold is exceeded, an overage signal may be sent to a limits management circuit within the initial IC or another IC. The limits management circuit may generate a single-bit throttle signal through a pulse modulation circuit. The single-bit throttle signal may modify internal processing of an associated processor, which in turn changes power consumption.Type: GrantFiled: October 31, 2019Date of Patent: February 21, 2023Assignee: Qualcomm IncorporatedInventors: Vijay Kiran Kalyanam, Eric Wayne Mahurin
-
Publication number: 20220365780Abstract: Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.Type: ApplicationFiled: August 3, 2020Publication date: November 17, 2022Inventors: Eric Wayne Mahurin, Ahmad Mahmoud Radaideh
-
Publication number: 20220309314Abstract: Various embodiments include methods and devices for processing a neural network by an artificial intelligence (AI) processor. Embodiments may include receiving an AI processor operating condition information, dynamically adjusting an AI quantization level for a segment of a neural network in response to the operating condition information, and processing the segment of the neural network quantization using the adjusted AI quantization level.Type: ApplicationFiled: March 24, 2021Publication date: September 29, 2022Inventors: Hee Jun PARK, Eric Wayne MAHURIN, Tijmen Pieter Frederik BLANKEVOORT
-
Patent number: 11287872Abstract: Systems and methods for multi-thread power limiting via a shared limit estimates power consumed in a processing core on a thread-by-thread basis by counting how many power events occur in each thread. Power consumed by each thread is approximated based on the number of power events that have occurred. Power consumed by individual threads is compared to a shared power limit derived from a sum of the power consumed by all threads. Threads that are above the shared power limit are stalled while threads below the shared power limit are allowed to continue without throttling. In this fashion, the most power intensive threads are throttled to stay below the shared power limit while still maintaining performance.Type: GrantFiled: March 25, 2020Date of Patent: March 29, 2022Assignee: Qualcomm IncorporatedInventors: Eric Wayne Mahurin, Vijay Kiran Kalyanam
-
Publication number: 20220035891Abstract: Matrix multiple operations may use a reduced result matrix to increase the speed and accuracy of the operation. In one example, each higher precision row/column is decomposed into multiple component rows/columns of the base type that can be combined as weighted sums to form the original higher precision row/column. In another example, the decomposition may be independent for each input matrix and decompose to any multiple of the base type. In another example, the base type for each input matrix could be different. In another example, after decomposition, a matrix operation is performed (e.g. matrix multiply, convolutional layer, or possibly other matrix operation) on decomposed base type input matrices to yield a result matrix that contains components of the higher precision results. The results may be combined together to obtain higher-precision results.Type: ApplicationFiled: July 30, 2021Publication date: February 3, 2022Inventors: Eric Wayne MAHURIN, Erich PLONDKE
-
Publication number: 20210240251Abstract: Systems and methods for multi-thread power limiting via a shared limit estimates power consumed in a processing core on a thread-by-thread basis by counting how many power events occur in each thread. Power consumed by each thread is approximated based on the number of power events that have occurred. Power consumed by individual threads is compared to a shared power limit derived from a sum of the power consumed by all threads. Threads that are above the shared power limit are stalled while threads below the shared power limit are allowed to continue without throttling. In this fashion, the most power intensive threads are throttled to stay below the shared power limit while still maintaining performance.Type: ApplicationFiled: March 25, 2020Publication date: August 5, 2021Inventors: Eric Wayne Mahurin, Vijay Kiran Kalyanam
-
Publication number: 20210241070Abstract: A device includes one or more processors configured to retrieve a first block of data, the data corresponding to array of values arranged along at least a first dimension and a second dimension, to retrieve at least a portion of a second block of the data, and to perform a first hybrid convolution operation that applies a filter across the first block and at least the portion of the second block to generate output data. The output data includes a first accumulated block and at least a portion of a second accumulated block. The one or more processors are also configured to store the first accumulated block as first output data. The portion of the second block is adjacent to the first block along the first dimension and the portion of the second accumulated block is adjacent to the first accumulated block along the second dimension.Type: ApplicationFiled: February 2, 2021Publication date: August 5, 2021Inventor: Eric Wayne MAHURIN
-
Publication number: 20210240394Abstract: A device includes a scoreboard and a processor. The scoreboard includes scoreboard entries configured to store information regarding one or more uncompleted memory access operations. The scoreboard also includes a dependency matrix configured to store dependency information corresponding to the scoreboard entries. The processor is configured to retrieve a first memory access instruction that indicates a first address range of a first memory access operation, and to add an indication of the first memory access instruction to a first scoreboard entry. The processor is further configured to, based on determining that the first address range at least partially overlaps a second address range associated with a second scoreboard entry that corresponds to a second memory access instruction, set an element of the dependency matrix to have a has-dependency value indicating a dependency of the first scoreboard entry on the second scoreboard entry.Type: ApplicationFiled: February 3, 2021Publication date: August 5, 2021Inventors: Eric Wayne MAHURIN, Hitesh Kumar GUPTA, Ahmad RADAIDEH
-
Publication number: 20210096635Abstract: Systems and methods for power control based on performance modification through pulse modulation include an integrated circuit (IC) that may evaluate certain limit conditions within a computing device and compare the limit conditions to corresponding predefined thresholds. When a given predefined threshold is exceeded, an overage signal may be sent to a limits management circuit within the initial IC or another IC. The limits management circuit may generate a single-bit throttle signal through a pulse modulation circuit. The single-bit throttle signal may modify internal processing of an associated processor, which in turn changes power consumption.Type: ApplicationFiled: October 31, 2019Publication date: April 1, 2021Inventors: Vijay Kiran Kalyanam, Eric Wayne Mahurin
-
Patent number: 10860051Abstract: A clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption per cycle of the clock signal. The CGS further includes a voltage-clock gate (VCG) circuit coupled to the digital power estimator. The VCG circuit is configured to gate and un-gate the clock signal based on the indications prior to occurrence of a voltage droop event and using hardware voltage model circuitry of the VCG circuit. The VCG circuit is further configured to gate the clock signal based on an undershoot phase associated with the voltage droop event and to un-gate the clock signal based on an overshoot phase associated with the voltage droop event.Type: GrantFiled: September 6, 2019Date of Patent: December 8, 2020Assignee: Qualcomm IncorporatedInventors: Vijay Kiran Kalyanam, Eric Wayne Mahurin
-
Publication number: 20200134475Abstract: A method of constraining data represented in a deep neural network is described. The method includes determining an initial shifting specified to convert a fixed-point input value to a floating-point output value. The method also includes determining an additional shifting specified to constrain a dynamic range during converting of the fixed-point input value to the floating-point output value. The method further includes performing both the initial shifting and the additional shifting together to form a dynamic, range constrained, normalized floating-point output value.Type: ApplicationFiled: October 29, 2019Publication date: April 30, 2020Inventors: Rexford Alan HILL, Eric Wayne MAHURIN, Aaron Douglass LAMB, Albert DANYSH, Eric PLONDKE, David HOYLE
-
Publication number: 20200081479Abstract: A clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption per cycle of the clock signal. The CGS further includes a voltage-clock gate (VCG) circuit coupled to the digital power estimator. The VCG circuit is configured to gate and un-gate the clock signal based on the indications prior to occurrence of a voltage droop event and using hardware voltage model circuitry of the VCG circuit. The VCG circuit is further configured to gate the clock signal based on an undershoot phase associated with the voltage droop event and to un-gate the clock signal based on an overshoot phase associated with the voltage droop event.Type: ApplicationFiled: September 6, 2019Publication date: March 12, 2020Inventors: Vijay Kiran KALYANAM, Eric Wayne MAHURIN
-
Patent number: 10489155Abstract: Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.Type: GrantFiled: July 21, 2015Date of Patent: November 26, 2019Assignee: QUALCOMM IncorporatedInventors: Eric Wayne Mahurin, Ajay Anant Ingle
-
Patent number: 10459723Abstract: Systems and methods relate to performing data movement operations using single instruction multiple data (SIMD) instructions. A first SIMD instruction comprises a first input data vector having a number N of two or more data elements in corresponding N SIMD lanes and a control vector having N control elements in the corresponding N SIMD lanes. A first multi-stage cube network is controllable by the first SIMD instruction, and includes movement elements, with one movement element per SIMD lane, per stage. A movement element selects between one of two data elements based on a corresponding control element and moves the data elements across the stages of the first multi-stage cube network by a zero distance or power-of-two distance between adjacent stages to generate a first output data vector. A second multi-stage cube network can be used in conjunction to generate all possible data movement operations of the input data vector.Type: GrantFiled: July 20, 2015Date of Patent: October 29, 2019Assignee: QUALCOMM IncorporatedInventor: Eric Wayne Mahurin
-
Patent number: 10152101Abstract: Systems and methods relate to controlling voltage deviations in processing systems. A scheduler receives transactions and to be issued for execution in a pipeline. A voltage deviation that will occur if a particular transaction is executed in the pipeline is estimated before the transaction is issued. Threshold comparators are used to determine if the estimated voltage deviation will exceed specified thresholds to cause voltage overshoots or undershoots. The scheduler is configured to implement one or more corrective measures, such as increasing or decreasing energy in the pipeline, to mitigate possible voltage overshoots or undershoots, before the transaction is issued to be executed in the pipeline.Type: GrantFiled: September 22, 2015Date of Patent: December 11, 2018Assignee: QUALCOMM IncorporatedInventors: Eric Wayne Mahurin, Sanjay Bhagawan Patil, Martin Pierre Saint-Laurent