Patents by Inventor Kyle Ernewein
Kyle Ernewein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240112090Abstract: Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.Type: ApplicationFiled: December 13, 2023Publication date: April 4, 2024Inventors: Serag GADELRAB, James Lyall ESLIGER, Meghal VARIA, Kyle ERNEWEIN, Alwyn DOS REMEDIOS, George LEE
-
Patent number: 11907810Abstract: Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.Type: GrantFiled: July 18, 2019Date of Patent: February 20, 2024Assignee: QUALCOMM IncorporatedInventors: Serag Gadelrab, James Esliger, Meghal Varia, Kyle Ernewein, Alwyn Dos Remedios, George Lee
-
Patent number: 11029745Abstract: Systems and methods are disclosed method for controlling instantaneous current changes in parallel processors with arrays of parallel computing elements, such as neural processors. An exemplary method comprises monitoring the array of computing elements and determining a transition from a first activity level of the array to a second activity level of the array, such as an idle-to-active or active-to-idle transition. Once a transition is determined, the array is selectively controlled to minimize the instantaneous current change from the transition from the first activity level to the second activity level.Type: GrantFiled: November 8, 2018Date of Patent: June 8, 2021Assignee: QUALCOMM IncorporatedInventors: Kyle Ernewein, Jason Edward Podaima, Francisco Perez, John Daniels, Alex Miler, Jeffrey Gemar, Rexford Alan Hill, Haoping Xu
-
Publication number: 20210019652Abstract: Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.Type: ApplicationFiled: July 18, 2019Publication date: January 21, 2021Inventors: Serag GADELRAB, James ESLIGER, Meghal VARIA, Kyle ERNEWEIN, Alwyn DOS REMEDIOS, George LEE
-
Publication number: 20200195977Abstract: An exemplary method for intelligent compression defines a threshold value for a temperature reading generated by a temperature sensor. Data blocks received into the compression module are compressed according to either a first mode or a second mode, the selection of which is determined based on a comparison of the active level for the temperature reading to the defined threshold value. The first compression mode may be associated with a lossless compression algorithm while the second compression mode is associated with a lossy compression algorithm. Or, both the first compression mode and the second compression mode may be associated with a lossless compression algorithm, however, for the first compression mode the received data blocks are produced at a default high quality level setting while for the second compression mode the received data blocks are produced at a reduced quality level setting.Type: ApplicationFiled: February 26, 2020Publication date: June 18, 2020Inventors: SERAG GADELRAB, CHINCHUAN CHIU, MOINUL KHAN, KYLE ERNEWEIN, TOM LONGO, SIMON BOOTH, MEGHAL VARIA, MILIVOJE ALEKSIC
-
Patent number: 10609418Abstract: An exemplary method for intelligent compression defines a threshold value for a temperature reading generated by a temperature sensor. Data blocks received into the compression module are compressed according to either a first mode or a second mode, the selection of which is determined based on a comparison of the active level for the temperature reading to the defined threshold value. The first compression mode may be associated with a lossless compression algorithm while the second compression mode is associated with a lossy compression algorithm. Or, both the first compression mode and the second compression mode may be associated with a lossless compression algorithm, however, for the first compression mode the received data blocks are produced at a default high quality level setting while for the second compression mode the received data blocks are produced at a reduced quality level setting.Type: GrantFiled: April 18, 2017Date of Patent: March 31, 2020Assignee: QUALCOMM IncorporatedInventors: Serag Gadelrab, Chinchuan Chiu, Moinul Khan, Kyle Ernewein, Tom Longo, Simon Booth, Meghal Varia, Milivoje Aleksic
-
Publication number: 20200073470Abstract: Systems and methods are disclosed method for controlling instantaneous current changes in parallel processors with arrays of parallel computing elements, such as neural processors. An exemplary method comprises monitoring the array of computing elements and determining a transition from a first activity level of the array to a second activity level of the array, such as an idle-to-active or active-to-idle transition. Once a transition is determined, the array is selectively controlled to minimize the instantaneous current change from the transition from the first activity level to the second activity level.Type: ApplicationFiled: November 8, 2018Publication date: March 5, 2020Inventors: KYLE ERNEWEIN, JASON EDWARD PODAIMA, FRANCISCO PEREZ, JOHN DANIELS, ALEX MILER, JEFFREY GEMAR, REXFORD ALAN HILL, HAOPING XU
-
Patent number: 10484685Abstract: An exemplary method for intelligent compression defines a threshold value for a key performance indicator. Based on the key performance indicator value, data blocks generated by a producer component may be scaled down to reduce power and/or bandwidth consumption when being compressed according to a lossless compression module. The compressed data blocks are then stored in a memory component along with metadata that signals the scaling factor used prior to compression. Consumer components later retrieving the compressed data blocks from the memory component may decompress the data blocks and upscale, if required, based on the scaling factor signaled by the metadata.Type: GrantFiled: April 18, 2017Date of Patent: November 19, 2019Assignee: QUALCOMM IncorporatedInventors: Serag Gadelrab, Chinchuan Chiu, Moinul Khan, Kyle Ernewein, Tom Longo, Simon Booth, Meghal Varia, Milivoje Aleksic, King-Chung Lai
-
Publication number: 20180302624Abstract: An exemplary method for intelligent compression defines a threshold value for a temperature reading generated by a temperature sensor. Data blocks received into the compression module are compressed according to either a first mode or a second mode, the selection of which is determined based on a comparison of the active level for the temperature reading to the defined threshold value. The first compression mode may be associated with a lossless compression algorithm while the second compression mode is associated with a lossy compression algorithm. Or, both the first compression mode and the second compression mode may be associated with a lossless compression algorithm, however, for the first compression mode the received data blocks are produced at a default high quality level setting while for the second compression mode the received data blocks are produced at a reduced quality level setting.Type: ApplicationFiled: April 18, 2017Publication date: October 18, 2018Inventors: SERAG GADELRAB, CHINCHUAN CHIU, MOINUL KHAN, KYLE ERNEWEIN, TOM LONGO, SIMON BOOTH, MEGHAL VARIA, MILIVOJE ALEKSIC
-
Publication number: 20180302625Abstract: An exemplary method for intelligent compression defines a threshold value for a key performance indicator. Based on the key performance indicator value, data blocks generated by a producer component may be scaled down to reduce power and/or bandwidth consumption when being compressed according to a lossless compression module. The compressed data blocks are then stored in a memory component along with metadata that signals the scaling factor used prior to compression. Consumer components later retrieving the compressed data blocks from the memory component may decompress the data blocks and upscale, if required, based on the scaling factor signaled by the metadata.Type: ApplicationFiled: April 18, 2017Publication date: October 18, 2018Inventors: SERAG GADELRAB, CHINCHUAN CHIU, MOINUL KHAN, KYLE ERNEWEIN, TOM LONGO, SIMON BOOTH, MEGHAL VARIA, MILIVOJE ALEKSIC, KING-CHUNG LAI
-
Publication number: 20180253236Abstract: A method and system for dynamic control of shared memory resources within a portable computing device (“PCD”) are disclosed. A limit request of an unacceptable deadline miss (“UDM”) engine of the portable computing device may be determined with a limit request sensor within the UDM element. Next, a memory management unit modifies a shared memory resource arbitration policy in view of the limit request. By modifying the shared memory resource arbitration policy, the memory management unit may smartly allocate resources to service translation requests separately queued based on having emanated from either a flooding engine or a non-flooding engine.Type: ApplicationFiled: March 2, 2017Publication date: September 6, 2018Inventors: SERAG GADELRAB, Jason Edward Podaima, Kyle Ernewein, Meghal Varia
-
Patent number: 10067691Abstract: A method and system for dynamic control of shared memory resources within a portable computing device (“PCD”) are disclosed. A limit request of an unacceptable deadline miss (“UDM”) engine of the portable computing device may be determined with a limit request sensor within the UDM element. Next, a memory management unit modifies a shared memory resource arbitration policy in view of the limit request. By modifying the shared memory resource arbitration policy, the memory management unit may smartly allocate resources to service translation requests separately queued based on having emanated from either a flooding engine or a non-flooding engine.Type: GrantFiled: March 2, 2017Date of Patent: September 4, 2018Assignee: QUALCOMM IncorporatedInventors: Serag Gadelrab, Jason Edward Podaima, Kyle Ernewein, Meghal Varia