Patents by Inventor Nicolas C. Galoppo Von Borries
Nicolas C. Galoppo Von Borries has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20200051203Abstract: An apparatus to facilitate processing of a sparse matrix is disclosed. The apparatus includes a plurality of processing units each comprising one or more processing elements, including logic to read operands, a multiplication unit to multiply two or more operands and a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit.Type: ApplicationFiled: May 20, 2019Publication date: February 13, 2020Applicant: Intel CorporationInventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajikshore Barik, Nicolas C. Galoppo Von Borries
-
Publication number: 20200034946Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.Type: ApplicationFiled: August 5, 2019Publication date: January 30, 2020Applicant: Intel CorporationInventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
-
Publication number: 20200019401Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.Type: ApplicationFiled: June 18, 2019Publication date: January 16, 2020Applicant: Intel CorporationInventors: Feng Chen, Narayan Srinivasa, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Joydeep Ray, Nicolas C. Galoppo Von Borries, Prasoonkumar Surti, Ben J. Ashbaugh, Sanjeev Jahagirdar, Vasanth Ranganathan
-
Patent number: 10521349Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.Type: GrantFiled: February 15, 2019Date of Patent: December 31, 2019Assignee: INTEL CORPORATIONInventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
-
Publication number: 20190369988Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.Type: ApplicationFiled: June 5, 2019Publication date: December 5, 2019Applicant: Intel CorporationInventors: HIMANSHU KAUL, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Patent number: 10474458Abstract: One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.Type: GrantFiled: October 18, 2017Date of Patent: November 12, 2019Assignee: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Publication number: 20190332903Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a bipolar binary weight associated with a neural network and an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input based on the bipolar binary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.Type: ApplicationFiled: July 8, 2019Publication date: October 31, 2019Applicant: Intel CorporationInventors: Kevin Nealis, Anbang Yao, Xiaoming Chen, Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha
-
Patent number: 10417734Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.Type: GrantFiled: September 7, 2017Date of Patent: September 17, 2019Assignee: INTEL CORPORATIONInventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
-
Patent number: 10417731Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type.Type: GrantFiled: April 24, 2017Date of Patent: September 17, 2019Assignee: INTEL CORPORATIONInventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
-
Patent number: 10410098Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including an input value and a quantized weight value associated with a neural network and an arithmetic logic unit including a barrel shifter, an adder, and an accumulator register, wherein to execute the decoded instruction, the barrel shifter is to shift the input value by the quantized weight value to generate a shifted input value and the adder is to add the shifted input value to a value stored in the accumulator register and update the value stored in the accumulator register.Type: GrantFiled: April 24, 2017Date of Patent: September 10, 2019Assignee: Intel CorporationInventors: Kevin Nealis, Anbang Yao, Xiaoming Chen, Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha
-
Publication number: 20190243764Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: February 15, 2019Publication date: August 8, 2019Applicant: Intel CorporationInventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
-
Patent number: 10353706Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.Type: GrantFiled: November 21, 2017Date of Patent: July 16, 2019Assignee: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Patent number: 10346944Abstract: An apparatus to facilitate processing of a sparse matrix is disclosed. The apparatus includes a plurality of processing units each comprising one or more processing elements, including logic to read operands, a multiplication unit to multiply two or more operands and a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit.Type: GrantFiled: April 9, 2017Date of Patent: July 9, 2019Assignee: INTEL CORPORATIONInventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
-
Patent number: 10346166Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.Type: GrantFiled: April 28, 2017Date of Patent: July 9, 2019Assignee: INTEL CORPORATIONInventors: Feng Chen, Narayan Srinivasa, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Joydeep Ray, Nicolas C. Galoppo Von Borries, Prasoonkumar Surti, Ben J. Ashbaugh, Sanjeev Jahagirdar, Vasanth Ranganathan
-
Publication number: 20190139182Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.Type: ApplicationFiled: November 21, 2018Publication date: May 9, 2019Applicant: Intel CorporationInventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
-
Patent number: 10261903Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.Type: GrantFiled: April 17, 2017Date of Patent: April 16, 2019Assignee: INTEL CORPORATIONInventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
-
Patent number: 10186011Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.Type: GrantFiled: April 28, 2017Date of Patent: January 22, 2019Assignee: Intel CorporationInventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
-
Publication number: 20180315398Abstract: One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.Type: ApplicationFiled: October 18, 2017Publication date: November 1, 2018Applicant: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Publication number: 20180315399Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.Type: ApplicationFiled: November 21, 2017Publication date: November 1, 2018Applicant: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Publication number: 20180314521Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.Type: ApplicationFiled: April 28, 2017Publication date: November 1, 2018Applicant: Intel CorporationInventors: Feng Chen, Narayan Srinivasa, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Joydeep Ray, Nicolas C. Galoppo Von Borries, Prasoonkumar Surti, Ben J. Ashbaugh, Sanjeev Jahagirdar, Vasanth Ranganathan