Patents by Inventor Xinmin Tian
Xinmin Tian has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230028666Abstract: Embodiments are directed to systems and methods for performing global memory atomics in a private cache of a sub-core of a GPU. An embodiment of a GPU includes multiple sub-cores each including a load/store pipeline. The load/store pipeline is operable to receive information specifying an atomic operation to be performed within a primary data cache of the load/store pipeline. The load/store pipeline is also operable to read data to be modified by the atomic operation into the primary data cache from a memory hierarchy shared by the multiple sub-cores. The load/store pipeline is further operable to produce an atomic result of the atomic operation by modifying the data within the primary data cache based on the atomic operation.Type: ApplicationFiled: July 19, 2021Publication date: January 26, 2023Applicant: Intel CorporationInventors: Joydeep Ray, Prathamesh Raghunath Shinde, Yue Qi, Abhishek R. Appu, Xinmin Tian, Vasanth Ranganathan, Ben J. Ashbaugh
-
Publication number: 20220350751Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.Type: ApplicationFiled: July 12, 2022Publication date: November 3, 2022Applicant: Intel CorporationInventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K, SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian
-
Publication number: 20220156202Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.Type: ApplicationFiled: February 1, 2022Publication date: May 19, 2022Applicant: Intel CorporationInventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K, SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian
-
Publication number: 20220114108Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received.Type: ApplicationFiled: March 14, 2020Publication date: April 14, 2022Applicant: Intel CorporationInventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K., SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian
-
Publication number: 20210382717Abstract: Examples described herein relate to a graphics processing apparatus that includes a memory device and a graphics processing unit (GPU) coupled to the memory device, the GPU can be configured to: execute an instruction thread; determine if a signal barrier is associated with the instruction thread; for a signal barrier associated with the instruction thread, determine if the signal barrier is cleared; and based on the signal barrier being cleared, permit any waiting instruction thread associated with the signal barrier identifier to commence with execution but not permit any waiting thread that is not associated with the signal barrier identifier to commence with execution. In some examples, the signal barrier includes a signal barrier identifier. In some examples, the signal barrier identifier is one of a plurality of values.Type: ApplicationFiled: June 3, 2020Publication date: December 9, 2021Inventors: Hong JIANG, Sabareesh GANAPATHY, Xinmin TIAN, Fangwen FU, James VALERIO
-
Publication number: 20210326504Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to improve FPGA pipeline emulation efficiency on CPUs. An example disclosed apparatus includes a loop detector to identify a register shift loop in field programmable gate array (FPGA) code, an unroller to shift and store pipeline stages in the register shift loop to a temporary unroll array, an intermediate canceller to cancel out intermediate load and store values of the temporary unroll array to retain last shifted values of the pipeline stages, and a propagator to improve emulation efficiency of the FPGA code by generating a scalar loop of the retained last shifted values for a vectorization input.Type: ApplicationFiled: December 26, 2020Publication date: October 21, 2021Inventors: Xinmin Tian, Geoff Lowney
-
Publication number: 20210149763Abstract: Apparatuses including a graphics processing unit, graphics multiprocessor, or graphics processor having an error detection correction logic for cache memory or shared memory are disclosed. In one embodiment, a graphics multiprocessor includes cache or local memory for storing data and error detection correction circuitry integrated with or coupled to the cache or local memory. The error detection correction circuitry is configured to perform a tag read for data of the cache or local memory to check error detection correction information.Type: ApplicationFiled: November 11, 2020Publication date: May 20, 2021Applicant: Intel CorporationInventors: Vasanth Ranganathan, Joydeep Ray, Abhishek R. Appu, Nikos Kaburlasos, Lidong Xu, Subramaniam Maiyuran, Altug Koker, Naveen Matam, James Holland, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Durgaprasad Bilagi, Xinmin Tian
-
Publication number: 20210150663Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.Type: ApplicationFiled: November 11, 2020Publication date: May 20, 2021Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Durgaprasad Bilagi, Joydeep Ray, Scott Janus, Sanjeev Jahagirdar, Brent Insko, Lidong Xu, Abhishek R. Appu, James Holland, Vasanth Ranganathan, Nikos Kaburlasos, Altug Koker, Xinmin Tian, Guei-Yuan Lueh, Changliang Wang
-
Patent number: 10909287Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to improve FPGA pipeline emulation efficiency on CPUs. An example disclosed apparatus includes a loop detector to identify a register shift loop in field programmable gate array (FPGA) code, an unroller to shift and store pipeline stages in the register shift loop to a temporary unroll array, an intermediate canceller to cancel out intermediate load and store values of the temporary unroll array to retain last shifted values of the pipeline stages, and a propagator to improve emulation efficiency of the FPGA code by generating a scalar loop of the retained last shifted values for a vectorization input.Type: GrantFiled: June 28, 2017Date of Patent: February 2, 2021Assignee: INTEL CORPORATIONInventors: Xinmin Tian, Geoff Lowney
-
Patent number: 10877910Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: GrantFiled: March 31, 2017Date of Patent: December 29, 2020Assignee: Intel CorporationInventors: Hong Wang, Per Hammarlund, Xiang Zou, John P. Shen, Xinmin Tian, Milind Girkar, Perry H. Wang, Piyush N. Desai
-
Patent number: 10795682Abstract: In one example, a system for generating vector based selection control statements can include a processor to determine a vector cost of the selection control statement is below a scalar cost and determine the selection control statement is to be executed in a sorted order based on dependencies between branch instructions of the selection control statement. The processor can also determine a program ordering of labels of the selection control statement does not match a mathematical ordering of the labels and execute the selection control statement with a vector of values, wherein the selection control statement is to be executed based on a jump table and a sorted unique value technique, wherein the sorted unique value technique comprises selecting at least one of the plurality of branch instructions from the jump table.Type: GrantFiled: December 28, 2016Date of Patent: October 6, 2020Assignee: Intel CorporationInventors: Hideki Saito Ido, Eric N Garcia, Xinmin Tian, Milind B. Girkar, James Brodman
-
Patent number: 10776093Abstract: Methods, apparatus, and system to optimize compilation of source code into vectorized compiled code, notwithstanding the presence of output dependencies which might otherwise preclude vectorization.Type: GrantFiled: July 1, 2016Date of Patent: September 15, 2020Assignee: Intel CorporationInventors: Mikhail Plotnikov, Hideki Ido, Xinmin Tian, Sergey Preis, Milind B. Girkar, Maxim Shutov
-
Patent number: 10642587Abstract: Technologies for indirectly calling vector functions include a compute device that includes a memory device to store source code and a compiler module. The compiler module is to identify a set of declarations of vector variants for scalar functions in the source code, generate a vector variant address map for each set of vector variants, generate an offset map for each scalar function, and identify, in the source code, an indirect call to the scalar functions, wherein the indirect call is to be vectorized. The compiler module is also to determine, based on a context of the indirect call, a vector variant to be called and store, in object code and in association with the indirect call, an offset into one of the vector variant address maps based on (i) the determined vector variant to be called and (ii) the offset map that corresponds to each scalar function.Type: GrantFiled: March 11, 2016Date of Patent: May 5, 2020Assignee: Intel CorporationInventors: Hideki Saito Ido, Serge V. Preis, Sergey S. Kozhukhov, Xinmin Tian, Sergey V. Maslov, Clark Nelson, Jianfei Yu
-
Patent number: 10459858Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: GrantFiled: November 6, 2017Date of Patent: October 29, 2019Assignee: Intel CorporationInventors: Hong Wang, Per Hammarlund, Xiang Zou, John P. Shen, Xinmin Tian, Milind Girkar, Perry H. Wang, Piyush N. Desai
-
Patent number: 10452403Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.Type: GrantFiled: September 26, 2015Date of Patent: October 22, 2019Assignee: Intel CorporationInventors: Hong Wang, John P. Shen, Edward T. Grochowski, Richard A. Hankins, Gautham N. Chinya, Bryant E. Bigbee, Shivnandan D. Kaushik, Xiang Chris Zou, Per Hammarlund, Scott Dion Rodgers, Xinmin Tian, Anil Aggawal, Prashant Sethi, Baiju V. Patel, James P Held
-
Publication number: 20190278577Abstract: Methods, apparatus, and system to optimize compilation of source code into vectorized compiled code, notwithstanding the presence of output dependencies which might otherwise preclude vectorization.Type: ApplicationFiled: July 1, 2016Publication date: September 12, 2019Inventors: Mikhail PLOTNIKOV, Hideki IDO, Xinmin TIAN, Sergey PREIS, Milind B. GIRKAR, Maxim SHUTOV
-
Publication number: 20190050212Abstract: Technologies for indirectly calling vector functions include a compute device that includes a memory device to store source code and a compiler module. The compiler module is to identify a set of declarations of vector variants for scalar functions in the source code, generate a vector variant address map for each set of vector variants, generate an offset map for each scalar function, and identify, in the source code, an indirect call to the scalar functions, wherein the indirect call is to be vectorized. The compiler module is also to determine, based on a context of the indirect call, a vector variant to be called and store, in object code and in association with the indirect call, an offset into one of the vector variant address maps based on (i) the determined vector variant to be called and (ii) the offset map that corresponds to each scalar function.Type: ApplicationFiled: March 11, 2016Publication date: February 14, 2019Inventors: Hideki Saito IDO, Serge V. PREIS, Sergey S. KOZHUKHOV, Xinmin TIAN, Sergey V. MASLOV, Clark NELSON, Jianfei YU
-
Publication number: 20190005175Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to improve FPGA pipeline emulation efficiency on CPUs. An example disclosed apparatus includes a loop detector to identify a register shift loop in field programmable gate array (FPGA) code, an unroller to shift and store pipeline stages in the register shift loop to a temporary unroll array, an intermediate canceller to cancel out intermediate load and store values of the temporary unroll array to retain last shifted values of the pipeline stages, and a propagator to improve emulation efficiency of the FPGA code by generating a scalar loop of the retained last shifted values for a vectorization input.Type: ApplicationFiled: June 28, 2017Publication date: January 3, 2019Inventors: Xinmin Tian, Geoff Lowney
-
Publication number: 20180181404Abstract: In one example, a system for generating vector based selection control statements can include a processor to determine a vector cost of the selection control statement is below a scalar cost and determine the selection control statement is to be executed in a sorted order based on dependencies between branch instructions of the selection control statement. The processor can also determine a program ordering of labels of the selection control statement does not match a mathematical ordering of the labels and execute the selection control statement with a vector of values, wherein the selection control statement is to be executed based on a jump table and a sorted unique value technique, wherein the sorted unique value technique comprises selecting at least one of the plurality of branch instructions from the jump table.Type: ApplicationFiled: December 28, 2016Publication date: June 28, 2018Applicant: Intel CorporationInventors: Hideki Saito Ido, Eric N. Garcia, Xinmin Tian, Milind B. Girkar, James Brodman
-
Patent number: 9990206Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.Type: GrantFiled: March 15, 2013Date of Patent: June 5, 2018Assignee: INTEL CORPORATIONInventors: Hong Wang, John Shen, Edward Grochowski, Richard Hankins, Gautham Chinya, Bryant Bigbee, Shivnandan Kaushik, Xiang Chris Zou, Per Hammarlund, Scott Dion Rodgers, Xinmin Tian, Anil Aggawal, Prashant Sethi, Baiju Patel, James Held