Patents by Inventor Ben J. Ashbaugh
Ben J. Ashbaugh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20200019401Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.Type: ApplicationFiled: June 18, 2019Publication date: January 16, 2020Applicant: Intel CorporationInventors: Feng Chen, Narayan Srinivasa, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Joydeep Ray, Nicolas C. Galoppo Von Borries, Prasoonkumar Surti, Ben J. Ashbaugh, Sanjeev Jahagirdar, Vasanth Ranganathan
-
Patent number: 10521349Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.Type: GrantFiled: February 15, 2019Date of Patent: December 31, 2019Assignee: INTEL CORPORATIONInventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
-
Publication number: 20190318550Abstract: One embodiment provides for a computing device within an autonomous vehicle, the compute device comprising a wireless network device to enable a wireless data connection with an autonomous vehicle network, a set of multiple processors including a general-purpose processor and a general-purpose graphics processor, the set of multiple processors to execute a compute manager to manage execution of compute workloads associated with the autonomous vehicle, the compute workload associated with autonomous operations of the autonomous vehicle, and offload logic configured to execute on the set of multiple processors, the offload logic to determine to offload one or more of the compute workloads to one or more autonomous vehicles within range of the wireless network device.Type: ApplicationFiled: April 15, 2019Publication date: October 17, 2019Applicant: Intel CorporationInventors: BARATH LAKSHAMANAN, LINDA l. HURD, BEN J. ASHBAUGH, ELMOUSTAPHA OULD-AHMED-VALL, LIWEI MA, JINGYI JIN, JUSTIN E. GOTTSCHLICH, CHANDRASEKARAN SAKTHIVEL, MICHAEL S. STRICKLAND, BRIAN T. LEWIS, LINDSEY KUPER, ALTUG KOKER, ABHISHEK R. APPU, PRASOONKUMAR SURTI, JOYDEEP RAY, BALAJI VEMBU, JAVIER S. TUREK, NAILA FAROOQUI
-
Publication number: 20190304053Abstract: Embodiments described herein provide a graphics processor that can perform a variety of mixed and multiple precision instructions and operations. One embodiment provides a streaming multiprocessor that can concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to execute multiple threads for each of multiple instructions. The streaming multiprocessor can perform concurrent integer and floating-point operations and includes a mixed precision core to perform operations at multiple precisions.Type: ApplicationFiled: June 19, 2019Publication date: October 3, 2019Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
-
Patent number: 10417731Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type.Type: GrantFiled: April 24, 2017Date of Patent: September 17, 2019Assignee: INTEL CORPORATIONInventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
-
Patent number: 10417734Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.Type: GrantFiled: September 7, 2017Date of Patent: September 17, 2019Assignee: INTEL CORPORATIONInventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
-
Patent number: 10409614Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands having differing precisions and a general-purpose graphics compute unit including a first logic unit and a second logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform a first instruction operation on a first set of operands of the multiple operands at a first precision and a simultaneously perform second instruction operation on a second set of operands of the multiple operands at a second precision.Type: GrantFiled: April 24, 2017Date of Patent: September 10, 2019Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Tatiana Shpeisman, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
-
Publication number: 20190243764Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: February 15, 2019Publication date: August 8, 2019Applicant: Intel CorporationInventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
-
Patent number: 10346166Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.Type: GrantFiled: April 28, 2017Date of Patent: July 9, 2019Assignee: INTEL CORPORATIONInventors: Feng Chen, Narayan Srinivasa, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Joydeep Ray, Nicolas C. Galoppo Von Borries, Prasoonkumar Surti, Ben J. Ashbaugh, Sanjeev Jahagirdar, Vasanth Ranganathan
-
Publication number: 20190206020Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction. The at least one single instruction is to cause at least a portion of the GPU to perform a floating point operation on input having differing precisions. The floating point operation is a two-dimensional matrix multiply and accumulate operation.Type: ApplicationFiled: November 21, 2018Publication date: July 4, 2019Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
-
Patent number: 10338953Abstract: A mechanism is described for facilitating execution-aware hybrid preemption for execution of tasks in computing environments. A method of embodiments, as described herein, includes detecting a software application being hosted by a computing device, where the software applications to facilitate one or more tasks that are capable of being executed by a graphics processor of the computing device. The method may further include selecting at least one of a fine grain preemption or a coarse grain preemption based on comparison of a first time estimation and a second time estimation relating to the one or more tasks at thread level execution and work group level execution, respectively. The method may further include initiating performance of the selected one of the fine grain preemption and the coarse grain preemption.Type: GrantFiled: March 18, 2016Date of Patent: July 2, 2019Assignee: Intel CorporationInventors: Ben J. Ashbaugh, Raun M. Krisch
-
Patent number: 10332320Abstract: One embodiment provides for a computing device within an autonomous vehicle, the compute device comprising a wireless network device to enable a wireless data connection with an autonomous vehicle network, a set of multiple processors including a general-purpose processor and a general-purpose graphics processor, the set of multiple processors to execute a compute manager to manage execution of compute workloads associated with the autonomous vehicle, the compute workload associated with autonomous operations of the autonomous vehicle, and offload logic configured to execute on the set of multiple processors, the offload logic to determine to offload one or more of the compute workloads to one or more autonomous vehicles within range of the wireless network device.Type: GrantFiled: April 17, 2017Date of Patent: June 25, 2019Assignee: Intel CorporationInventors: Barath Lakshamanan, Linda L. Hurd, Ben J. Ashbaugh, Elmoustapha Ould-Ahmed-Vall, Liwei Ma, Jingyi Jin, Justin E. Gottschlich, Chandrasekaran Sakthivel, Michael S. Strickland, Brian T. Lewis, Lindsey Kuper, Altug Koker, Abhishek R. Appu, Prasoonkumar Surti, Joydeep Ray, Balaji Vembu, Javier S. Turek, Naila Farooqui
-
Publication number: 20190188554Abstract: Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.Type: ApplicationFiled: February 22, 2019Publication date: June 20, 2019Applicant: Intel CorporationInventors: Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
-
Publication number: 20190146800Abstract: One embodiment provides for a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The streaming multiprocessor comprises multiple processing blocks including multiple processing cores. The processing cores include independent integer and floating-point data paths that are configurable to concurrently execute multiple independent instructions. A memory is coupled with the multiple processing blocks.Type: ApplicationFiled: December 20, 2018Publication date: May 16, 2019Applicant: Intel CorporationInventors: ELMOUSTAPHA OULD-AHMED-VALL, BARATH LAKSHMANAN, TATIANA SHPEISMAN, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
-
Patent number: 10261903Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.Type: GrantFiled: April 17, 2017Date of Patent: April 16, 2019Assignee: INTEL CORPORATIONInventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
-
Patent number: 10242423Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction; the at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions; and the floating-point operation is a two-dimensional matrix multiply and accumulate operation.Type: GrantFiled: October 20, 2017Date of Patent: March 26, 2019Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
-
Publication number: 20180314521Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.Type: ApplicationFiled: April 28, 2017Publication date: November 1, 2018Applicant: Intel CorporationInventors: Feng Chen, Narayan Srinivasa, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Joydeep Ray, Nicolas C. Galoppo Von Borries, Prasoonkumar Surti, Ben J. Ashbaugh, Sanjeev Jahagirdar, Vasanth Ranganathan
-
Publication number: 20180315159Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction; the at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions; and the floating-point operation is a two-dimensional matrix multiply and accumulate operation.Type: ApplicationFiled: October 20, 2017Publication date: November 1, 2018Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
-
Publication number: 20180307494Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands having differing precisions and a general-purpose graphics compute unit including a first logic unit and a second logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform a first instruction operation on a first set of operands of the multiple operands at a first precision and a simultaneously perform second instruction operation on a second set of operands of the multiple operands at a second precision.Type: ApplicationFiled: April 24, 2017Publication date: October 25, 2018Applicant: Intel CorporationInventors: ELMOUSTAPHA OULD-AHMED-VALL, BARATH LAKSHMANAN, TATIANA SHPEISMAN, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
-
Publication number: 20180308208Abstract: An apparatus to facilitate compute optimization is disclosed.Type: ApplicationFiled: November 21, 2017Publication date: October 25, 2018Applicant: Intel CorporationInventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu