Patents by Inventor Ramon Matas
Ramon Matas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240046065Abstract: Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to determine options for decisions in connection with design features of a computing device. In a particular implementation, design options for two or more design decisions of neural network processing device may be identified based, at least in part, on combination of a definition of available computing resources and one or more predefined performance constraints.Type: ApplicationFiled: August 3, 2022Publication date: February 8, 2024Inventors: Hokchhay Tann, Ramon Matas Navarro, Igor Fedorov, Chuteng Zhou, Paul Nicholas Whatmough, Matthew Mattina
-
Publication number: 20230042271Abstract: Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to select options for decisions in connection with design features of a computing device. In a particular implementation, design options for two or more design decisions of neural network processing device may be selected based, at least in part, on combination of function values that are computed based, at least in part, on a tensor expressing sample neural network weights.Type: ApplicationFiled: August 4, 2021Publication date: February 9, 2023Inventors: Igor Fedorov, Ramon Matas Navarro, Chuteng Zhou, Hokchhay Tann, Paul Nicholas Whatmough, Matthew Mattina
-
Patent number: 11405106Abstract: The disclosure relates to a setup for receiving an optical data signal having input optics for receiving the signal. An optical receiving fiber with an end facet is provided, which can be injected into the optical receiving fiber by an optical collimation system. A detector for detecting the optical data content is connected to the optical receiving fiber. A receive calibration source is provided, which is connected to the optical receiving fiber by a circulator. An insertable retroreflector is provided in the light path for adjusting the setup into the light path so that light from the receive calibration source is reflected and focused by the optical collimation system onto the end facet of the receiving fiber. The distance in the z-direction between the optical collimation system and the end facet of the receiving fiber is adjusted by the power of the light from the receive calibration source detected.Type: GrantFiled: January 10, 2020Date of Patent: August 2, 2022Assignee: Deutsches Zentrum für Luft- und Raumfahrt e.V.Inventors: Fabian Rein, Juraj Poliak, Ramon Mata Calvo
-
Patent number: 11307903Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.Type: GrantFiled: January 31, 2018Date of Patent: April 19, 2022Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
-
Publication number: 20220092404Abstract: A computer-implemented method of identifying a neural network for processing data includes: clustering a training dataset into a plurality of data clusters based on similarities in activation patterns generated in neurons of a teacher neural network in response to inputting the training dataset into the teacher neural network, training a student neural network for processing each of the plurality of data clusters, and providing a data classifier neural network for identifying one or more of the trained student neural networks to process data based on a data cluster of the data.Type: ApplicationFiled: September 18, 2020Publication date: March 24, 2022Inventors: Mark John O'CONNOR, Ramon Matas NAVARRO
-
Publication number: 20220085885Abstract: The disclosure relates to a setup for receiving an optical data signal having input optics for receiving the signal. An optical receiving fiber with an end facet is provided, which can be injected into the optical receiving fiber by an optical collimation system. A detector for detecting the optical data content is connected to the optical receiving fiber. A receive calibration source is provided, which is connected to the optical receiving fiber by a circulator. An insertable retroreflector is provided in the light path for adjusting the setup into the light path so that light from the receive calibration source is reflected and focused by the optical collimation system onto the end facet of the receiving fiber. The distance in the z-direction between the optical collimation system and the end facet of the receiving fiber is adjusted by the power of the light from the receive calibration source detected.Type: ApplicationFiled: January 10, 2020Publication date: March 17, 2022Inventors: Fabian Rein, Juraj Poliak, Ramon Mata Calvo
-
Patent number: 10817338Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.Type: GrantFiled: January 31, 2018Date of Patent: October 27, 2020Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
-
Patent number: 10637576Abstract: A transmitter for an optical free-beam communication system includes two light transmitters for the optical transmission of a data signal using one single-sideband modulation, wherein each light transmitter emits a side of the band modulation so that a light signal arriving at a receiver corresponds to a double-sideband modulation.Type: GrantFiled: October 28, 2016Date of Patent: April 28, 2020Assignee: Deutsches Zentrum für Luft-und Raumfahrt e.V.Inventors: Ramon Mata Calvo, Dirk Giggenbach, Christian Fuchs, Ahmad Mustafa
-
Publication number: 20190235924Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.Type: ApplicationFiled: January 31, 2018Publication date: August 1, 2019Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
-
Publication number: 20190235928Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.Type: ApplicationFiled: January 31, 2018Publication date: August 1, 2019Inventors: Jerome F. Duluk, JR., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
-
Patent number: 10318427Abstract: An instruction in a first cache line may be identified and an address associated with the instruction may be determined. The address may be determined to cross a cache line boundary associated with the first cache line and a second cache line. In response to determining that the address crosses the cache line boundary, the instruction may be adjusted based on a portion of the address included in the first cache line and a second instruction may be created based on a portion of the address included in the second cache line. The second instruction may be injected into an instruction pipeline after the adjusted instruction.Type: GrantFiled: December 18, 2014Date of Patent: June 11, 2019Assignee: Intel CorporationInventors: Ramon Matas, Chung-Lun Chan, Alexey P. Suprun, Aditya Kesiraju
-
Patent number: 10175986Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.Type: GrantFiled: May 8, 2017Date of Patent: January 8, 2019Assignee: Intel CorporationInventors: Roger Gramunt, Ramon Matas, Benjamin C. Chaffin, Neal S. Moyer, Rammohan Padmanabhan, Alexey P. Suprun, Matthew G. Smith
-
Publication number: 20180337729Abstract: A transmitter for an optical free-beam communication system includes two light transmitters for the optical transmission of a data signal using one single-sideband modulation, wherein each light transmitter emits a side of the band modulation so that a light signal arriving at a receiver corresponds to a double-sideband modulation.Type: ApplicationFiled: October 28, 2016Publication date: November 22, 2018Inventors: Ramon Mata Calvo, Dirk Giggenbach, Christian Fuchs, Mustafa Ahmad
-
Patent number: 10108554Abstract: Methods, systems, and apparatuses relating to sharing translation lookaside buffer entries are described. In one embodiment, a processor includes one or more cores to execute a plurality of threads, a translation lookaside buffer comprising a plurality of entries, each entry comprising a virtual address to physical address translation and a plurality of bit positions, and each set bit of the plurality of bit positions in each entry indicating that the virtual address to physical address translation is valid for a respective thread of the plurality of threads, and a memory management circuit to clear all set bits for a thread by asserting a reset command to a respective reset port of the translation lookaside buffer for the thread, wherein the translation lookaside buffer comprises a separate reset port for each of the plurality of threads.Type: GrantFiled: December 5, 2016Date of Patent: October 23, 2018Assignee: Intel CorporationInventors: Chung-Lun Chan, Ramon Matas
-
Publication number: 20180157598Abstract: Methods, systems, and apparatuses relating to sharing translation lookaside buffer entries are described. In one embodiment, a processor includes one or more cores to execute a plurality of threads, a translation lookaside buffer comprising a plurality of entries, each entry comprising a virtual address to physical address translation and a plurality of bit positions, and each set bit of the plurality of bit positions in each entry indicating that the virtual address to physical address translation is valid for a respective thread of the plurality of threads, and a memory management circuit to clear all set bits for a thread by asserting a reset command to a respective reset port of the translation lookaside buffer for the thread, wherein the translation lookaside buffer comprises a separate reset port for each of the plurality of threads.Type: ApplicationFiled: December 5, 2016Publication date: June 7, 2018Inventors: CHUNG-LUN CHAN, RAMON MATAS
-
Patent number: 9891914Abstract: An apparatus and method for performing an efficient scatter operation. For example, one embodiment of a processor comprises: an allocator unit to receive a scatter operation comprising a number of data elements and responsively allocate resources to execute the scatter operation; a memory execution cluster comprising at least a portion of the resources to execute the scatter operation, the resources including one or more store data buffers and one or more store address buffers; and a senior store pipeline to transfer store data elements from the store data buffers to system memory using addresses from the store address buffers prior to retirement of the scatter operation.Type: GrantFiled: April 10, 2015Date of Patent: February 13, 2018Assignee: Intel CorporationInventors: Ramon Matas, Alexey P. Suprun, Roger Gramunt, Chung-Lun Chan, Rammohan Padmanabhan
-
Patent number: 9886396Abstract: In one embodiment, a processor includes a frontend unit having an instruction decoder to receive and to decode instructions of a plurality of threads, an execution unit coupled to the instruction decoder to receive and execute the decoded instructions, and an instruction retirement unit having a retirement logic to receive the instructions from the execution unit and to retire the instructions associated with one or more of the threads that have an instruction or an event pending to be retired. The instruction retirement unit includes a thread arbitration logic to select one of the threads at a time and to dispatch the selected thread to the retirement logic for retirement processing.Type: GrantFiled: December 23, 2014Date of Patent: February 6, 2018Assignee: Intel CorporationInventors: Roger Gramunt, Rammohan Padmanabhan, Ramon Matas, Neal S. Moyer, Benjamin C. Chaffin, Avinash Sodani, Alexey P. Suprun, Vikram S. Sundaram, Chung-Lun Chan, Gerardo A. Fernandez, Julio Gago, Michael S. Yang, Aditya Kesiraju
-
Publication number: 20170242698Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.Type: ApplicationFiled: May 8, 2017Publication date: August 24, 2017Inventors: Roger Gramunt, Ramon Matas, Benjamin C. Chaffin, Neal S. Moyer, Rammohan Padmanabhan, Alexey P. Suprun, Matthew G. Smith
-
Patent number: 9715432Abstract: Exemplary aspects are directed toward resolving fault suppression in hardware, which at the same time does not incur a performance hit. For example, when multiple instructions are executing simultaneously, a mask can specify which elements need not be executed. If the mask is disabled, those elements do not need to be executed. A determination is then made as to whether a fault happens in one of the elements that have been disabled. If there is a fault in one of the elements that has been disabled, a state machine re-fetches the instructions in a special mode. More specifically, the state machine determines if the fault is on a disabled element, and if the fault is on a disabled element, then the state machine specifies that the fault should be ignored. If during the first execution there was no mask, if there is an error present during execution, then the element is re-run with the mask to see if the error is a “real” fault.Type: GrantFiled: December 23, 2014Date of Patent: July 25, 2017Assignee: INTEL CORPORATIONInventors: Ramon Matas, Roger Gramunt, Chung-Lun Chan, Benjamin C. Chaffin, Aditya Kesiraju, Jonathan C. Hall, Jesus Corbal
-
Patent number: 9658861Abstract: Following a restart or a reboot of a system that includes a multi-core processor, the multi-core processor may assign one of the cores as a boot strap processor (BSP). Initialization logic may detect a state of each of the plurality of processing cores as active or inactive. The initialization logic may detect an attribute of each of the plurality of processing cores as eligible to be assigned as a BSP or as ineligible to be assigned as the BSP. The initialization logic may detect a last processing core of the plurality of processing cores in the interconnect that is an active processing core based at least in part on the state and is eligible to be assigned as the BSP based at least in part on the attribute. In various embodiments, the initialization information may assign the last processing core as the BSP.Type: GrantFiled: December 29, 2011Date of Patent: May 23, 2017Assignee: Intel CorporationInventors: Steven S. Chang, Anshuman Thakur, Ramacharan Sundararaman, Ramon Matas, Jay S. Lawlor, Robert F. Netting