Patents by Inventor Ramon Matas

Ramon Matas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240046065
    Abstract: Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to determine options for decisions in connection with design features of a computing device. In a particular implementation, design options for two or more design decisions of neural network processing device may be identified based, at least in part, on combination of a definition of available computing resources and one or more predefined performance constraints.
    Type: Application
    Filed: August 3, 2022
    Publication date: February 8, 2024
    Inventors: Hokchhay Tann, Ramon Matas Navarro, Igor Fedorov, Chuteng Zhou, Paul Nicholas Whatmough, Matthew Mattina
  • Publication number: 20230042271
    Abstract: Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to select options for decisions in connection with design features of a computing device. In a particular implementation, design options for two or more design decisions of neural network processing device may be selected based, at least in part, on combination of function values that are computed based, at least in part, on a tensor expressing sample neural network weights.
    Type: Application
    Filed: August 4, 2021
    Publication date: February 9, 2023
    Inventors: Igor Fedorov, Ramon Matas Navarro, Chuteng Zhou, Hokchhay Tann, Paul Nicholas Whatmough, Matthew Mattina
  • Patent number: 11405106
    Abstract: The disclosure relates to a setup for receiving an optical data signal having input optics for receiving the signal. An optical receiving fiber with an end facet is provided, which can be injected into the optical receiving fiber by an optical collimation system. A detector for detecting the optical data content is connected to the optical receiving fiber. A receive calibration source is provided, which is connected to the optical receiving fiber by a circulator. An insertable retroreflector is provided in the light path for adjusting the setup into the light path so that light from the receive calibration source is reflected and focused by the optical collimation system onto the end facet of the receiving fiber. The distance in the z-direction between the optical collimation system and the end facet of the receiving fiber is adjusted by the power of the light from the receive calibration source detected.
    Type: Grant
    Filed: January 10, 2020
    Date of Patent: August 2, 2022
    Assignee: Deutsches Zentrum für Luft- und Raumfahrt e.V.
    Inventors: Fabian Rein, Juraj Poliak, Ramon Mata Calvo
  • Patent number: 11307903
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Grant
    Filed: January 31, 2018
    Date of Patent: April 19, 2022
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Publication number: 20220092404
    Abstract: A computer-implemented method of identifying a neural network for processing data includes: clustering a training dataset into a plurality of data clusters based on similarities in activation patterns generated in neurons of a teacher neural network in response to inputting the training dataset into the teacher neural network, training a student neural network for processing each of the plurality of data clusters, and providing a data classifier neural network for identifying one or more of the trained student neural networks to process data based on a data cluster of the data.
    Type: Application
    Filed: September 18, 2020
    Publication date: March 24, 2022
    Inventors: Mark John O'CONNOR, Ramon Matas NAVARRO
  • Publication number: 20220085885
    Abstract: The disclosure relates to a setup for receiving an optical data signal having input optics for receiving the signal. An optical receiving fiber with an end facet is provided, which can be injected into the optical receiving fiber by an optical collimation system. A detector for detecting the optical data content is connected to the optical receiving fiber. A receive calibration source is provided, which is connected to the optical receiving fiber by a circulator. An insertable retroreflector is provided in the light path for adjusting the setup into the light path so that light from the receive calibration source is reflected and focused by the optical collimation system onto the end facet of the receiving fiber. The distance in the z-direction between the optical collimation system and the end facet of the receiving fiber is adjusted by the power of the light from the receive calibration source detected.
    Type: Application
    Filed: January 10, 2020
    Publication date: March 17, 2022
    Inventors: Fabian Rein, Juraj Poliak, Ramon Mata Calvo
  • Patent number: 10817338
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Grant
    Filed: January 31, 2018
    Date of Patent: October 27, 2020
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Patent number: 10637576
    Abstract: A transmitter for an optical free-beam communication system includes two light transmitters for the optical transmission of a data signal using one single-sideband modulation, wherein each light transmitter emits a side of the band modulation so that a light signal arriving at a receiver corresponds to a double-sideband modulation.
    Type: Grant
    Filed: October 28, 2016
    Date of Patent: April 28, 2020
    Assignee: Deutsches Zentrum für Luft-und Raumfahrt e.V.
    Inventors: Ramon Mata Calvo, Dirk Giggenbach, Christian Fuchs, Ahmad Mustafa
  • Publication number: 20190235928
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Application
    Filed: January 31, 2018
    Publication date: August 1, 2019
    Inventors: Jerome F. Duluk, JR., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Publication number: 20190235924
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Application
    Filed: January 31, 2018
    Publication date: August 1, 2019
    Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Patent number: 10318427
    Abstract: An instruction in a first cache line may be identified and an address associated with the instruction may be determined. The address may be determined to cross a cache line boundary associated with the first cache line and a second cache line. In response to determining that the address crosses the cache line boundary, the instruction may be adjusted based on a portion of the address included in the first cache line and a second instruction may be created based on a portion of the address included in the second cache line. The second instruction may be injected into an instruction pipeline after the adjusted instruction.
    Type: Grant
    Filed: December 18, 2014
    Date of Patent: June 11, 2019
    Assignee: Intel Corporation
    Inventors: Ramon Matas, Chung-Lun Chan, Alexey P. Suprun, Aditya Kesiraju
  • Patent number: 10175986
    Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.
    Type: Grant
    Filed: May 8, 2017
    Date of Patent: January 8, 2019
    Assignee: Intel Corporation
    Inventors: Roger Gramunt, Ramon Matas, Benjamin C. Chaffin, Neal S. Moyer, Rammohan Padmanabhan, Alexey P. Suprun, Matthew G. Smith
  • Publication number: 20180337729
    Abstract: A transmitter for an optical free-beam communication system includes two light transmitters for the optical transmission of a data signal using one single-sideband modulation, wherein each light transmitter emits a side of the band modulation so that a light signal arriving at a receiver corresponds to a double-sideband modulation.
    Type: Application
    Filed: October 28, 2016
    Publication date: November 22, 2018
    Inventors: Ramon Mata Calvo, Dirk Giggenbach, Christian Fuchs, Mustafa Ahmad
  • Patent number: 10108554
    Abstract: Methods, systems, and apparatuses relating to sharing translation lookaside buffer entries are described. In one embodiment, a processor includes one or more cores to execute a plurality of threads, a translation lookaside buffer comprising a plurality of entries, each entry comprising a virtual address to physical address translation and a plurality of bit positions, and each set bit of the plurality of bit positions in each entry indicating that the virtual address to physical address translation is valid for a respective thread of the plurality of threads, and a memory management circuit to clear all set bits for a thread by asserting a reset command to a respective reset port of the translation lookaside buffer for the thread, wherein the translation lookaside buffer comprises a separate reset port for each of the plurality of threads.
    Type: Grant
    Filed: December 5, 2016
    Date of Patent: October 23, 2018
    Assignee: Intel Corporation
    Inventors: Chung-Lun Chan, Ramon Matas
  • Publication number: 20180157598
    Abstract: Methods, systems, and apparatuses relating to sharing translation lookaside buffer entries are described. In one embodiment, a processor includes one or more cores to execute a plurality of threads, a translation lookaside buffer comprising a plurality of entries, each entry comprising a virtual address to physical address translation and a plurality of bit positions, and each set bit of the plurality of bit positions in each entry indicating that the virtual address to physical address translation is valid for a respective thread of the plurality of threads, and a memory management circuit to clear all set bits for a thread by asserting a reset command to a respective reset port of the translation lookaside buffer for the thread, wherein the translation lookaside buffer comprises a separate reset port for each of the plurality of threads.
    Type: Application
    Filed: December 5, 2016
    Publication date: June 7, 2018
    Inventors: CHUNG-LUN CHAN, RAMON MATAS
  • Patent number: 9891914
    Abstract: An apparatus and method for performing an efficient scatter operation. For example, one embodiment of a processor comprises: an allocator unit to receive a scatter operation comprising a number of data elements and responsively allocate resources to execute the scatter operation; a memory execution cluster comprising at least a portion of the resources to execute the scatter operation, the resources including one or more store data buffers and one or more store address buffers; and a senior store pipeline to transfer store data elements from the store data buffers to system memory using addresses from the store address buffers prior to retirement of the scatter operation.
    Type: Grant
    Filed: April 10, 2015
    Date of Patent: February 13, 2018
    Assignee: Intel Corporation
    Inventors: Ramon Matas, Alexey P. Suprun, Roger Gramunt, Chung-Lun Chan, Rammohan Padmanabhan
  • Patent number: 9886396
    Abstract: In one embodiment, a processor includes a frontend unit having an instruction decoder to receive and to decode instructions of a plurality of threads, an execution unit coupled to the instruction decoder to receive and execute the decoded instructions, and an instruction retirement unit having a retirement logic to receive the instructions from the execution unit and to retire the instructions associated with one or more of the threads that have an instruction or an event pending to be retired. The instruction retirement unit includes a thread arbitration logic to select one of the threads at a time and to dispatch the selected thread to the retirement logic for retirement processing.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: February 6, 2018
    Assignee: Intel Corporation
    Inventors: Roger Gramunt, Rammohan Padmanabhan, Ramon Matas, Neal S. Moyer, Benjamin C. Chaffin, Avinash Sodani, Alexey P. Suprun, Vikram S. Sundaram, Chung-Lun Chan, Gerardo A. Fernandez, Julio Gago, Michael S. Yang, Aditya Kesiraju
  • Publication number: 20170242698
    Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.
    Type: Application
    Filed: May 8, 2017
    Publication date: August 24, 2017
    Inventors: Roger Gramunt, Ramon Matas, Benjamin C. Chaffin, Neal S. Moyer, Rammohan Padmanabhan, Alexey P. Suprun, Matthew G. Smith
  • Patent number: 9715432
    Abstract: Exemplary aspects are directed toward resolving fault suppression in hardware, which at the same time does not incur a performance hit. For example, when multiple instructions are executing simultaneously, a mask can specify which elements need not be executed. If the mask is disabled, those elements do not need to be executed. A determination is then made as to whether a fault happens in one of the elements that have been disabled. If there is a fault in one of the elements that has been disabled, a state machine re-fetches the instructions in a special mode. More specifically, the state machine determines if the fault is on a disabled element, and if the fault is on a disabled element, then the state machine specifies that the fault should be ignored. If during the first execution there was no mask, if there is an error present during execution, then the element is re-run with the mask to see if the error is a “real” fault.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: July 25, 2017
    Assignee: INTEL CORPORATION
    Inventors: Ramon Matas, Roger Gramunt, Chung-Lun Chan, Benjamin C. Chaffin, Aditya Kesiraju, Jonathan C. Hall, Jesus Corbal
  • Patent number: 9658861
    Abstract: Following a restart or a reboot of a system that includes a multi-core processor, the multi-core processor may assign one of the cores as a boot strap processor (BSP). Initialization logic may detect a state of each of the plurality of processing cores as active or inactive. The initialization logic may detect an attribute of each of the plurality of processing cores as eligible to be assigned as a BSP or as ineligible to be assigned as the BSP. The initialization logic may detect a last processing core of the plurality of processing cores in the interconnect that is an active processing core based at least in part on the state and is eligible to be assigned as the BSP based at least in part on the attribute. In various embodiments, the initialization information may assign the last processing core as the BSP.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: May 23, 2017
    Assignee: Intel Corporation
    Inventors: Steven S. Chang, Anshuman Thakur, Ramacharan Sundararaman, Ramon Matas, Jay S. Lawlor, Robert F. Netting