Patents by Inventor Dibyendu Das
Dibyendu Das has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230205923Abstract: Disclosed is an electronic apparatus. The electronic apparatus may comprise: a memory storing an application and a neural network model; a display; and a processor which, by executing the application, differentiates an input image into a plurality of patches, obtains information about a privacy object included in the input image by inputting the plurality of patches to the neural network model, obtains an output image including a guide GUI for guiding the privacy object based on the obtained information, and controls the display to display the obtained output image.Type: ApplicationFiled: March 1, 2023Publication date: June 29, 2023Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Nabil IBTEHAZ, Dibyendu DAS
-
Patent number: 10866790Abstract: An electronic device acquires, from program code, two or more program code loops having specified data dependencies. The electronic device places each of the program code loops into a corresponding blocking loop, each blocking loop including at least one blocking loop induction variable that is incremented by a corresponding block size and used to specify a number of iterations for at least one internal loop induction variable of the respective program code loop. The electronic device fuses the blocking loops into a fused loop by placing all of the blocking loops in the fused loop and replacing the blocking loop induction variables of the blocking loops with a fused loop induction variable that is incremented by the corresponding block size and used to specify the number of iterations for respective internal loop induction variables in the blocking loops.Type: GrantFiled: November 30, 2018Date of Patent: December 15, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Dibyendu Das, Pradeep H. Rao
-
Publication number: 20200174764Abstract: An electronic device acquires, from program code, two or more program code loops having specified data dependencies. The electronic device places each of the program code loops into a corresponding blocking loop, each blocking loop including at least one blocking loop induction variable that is incremented by a corresponding block size and used to specify a number of iterations for at least one internal loop induction variable of the respective program code loop. The electronic device fuses the blocking loops into a fused loop by placing all of the blocking loops in the fused loop and replacing the blocking loop induction variables of the blocking loops with a fused loop induction variable that is incremented by the corresponding block size and used to specify the number of iterations for respective internal loop induction variables in the blocking loops.Type: ApplicationFiled: November 30, 2018Publication date: June 4, 2020Inventors: Dibyendu Das, Pradeep H. Rao
-
Patent number: 10387128Abstract: A method and apparatus provides for compiling a computer-readable computer program having a plurality of computer-readable statements into a plurality of computer-executable instructions. In one example, the method and apparatus determines when at least one pair of the computer-readable statements is partial-isomorphic having an equivalent operation and same order of operation but causing non-consecutive memory accesses, and generates the computer-executable instructions causing the at least one pair of the partial-isomorphic statements to perform sequential physical memory accesses.Type: GrantFiled: April 27, 2017Date of Patent: August 20, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Md Asghar Ahmad Shahid, Shivarama Kokrady Rao, Dibyendu Das
-
Patent number: 10353708Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.Type: GrantFiled: September 23, 2016Date of Patent: July 16, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Anupama Rajesh Rasale, Dibyendu Das, Ashutosh Nema, Md Asghar Ahmad Shahid, Prathiba Kumar
-
Patent number: 10275230Abstract: Methods of compiling source code are provided. A method includes identifying a first array of structures (AOS), having a plurality of array elements, each array element being a structure with a plurality of fields, and performing structure peeling on the first AOS to convert a data layout of the first AOS to an array of structure of arrays (AOSOA) including a plurality of memory blocks of uniform block size. At least one of the plurality of memory blocks is allocated for each field of the plurality of fields. The method further includes allocating a number of complete memory blocks to accommodate all of the plurality of array elements of the AOS.Type: GrantFiled: July 14, 2017Date of Patent: April 30, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Suresh Mani, Dibyendu Das, Shivarama Rao, Ashutosh Nema
-
Publication number: 20190018664Abstract: Methods of compiling source code are provided. A method includes identifying a first array of structures (AOS), having a plurality of array elements, each array element being a structure with a plurality of fields, and performing structure peeling on the first AOS to convert a data layout of the first AOS to an array of structure of arrays (AOSOA) including a plurality of memory blocks of uniform block size. At least one of the plurality of memory blocks is allocated for each field of the plurality of fields. The method further includes allocating a number of complete memory blocks to accommodate all of the plurality of array elements of the AOS.Type: ApplicationFiled: July 14, 2017Publication date: January 17, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Suresh Mani, Dibyendu Das, Shivarama Rao, Ashutosh Nema
-
Patent number: 10180826Abstract: A compiler generates transfer functions for blocks of a program during compilation of the program. The transfer functions estimate bit widths of variables in the blocks based on numbers of bits needed to carry out at least one instruction in the blocks and whether the variables are live in the blocks. For example, a transfer function may return a number indicating how many bits of a variable are needed to execute a current instruction as a function of the number of bits of the variable used by the program in subsequent instructions. Numbers of bits to represent the variables in the compiled program based on the transfer functions.Type: GrantFiled: October 22, 2015Date of Patent: January 15, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Prakash Sathyanath Raghavendra, Dibyendu Das, Arun Rangasamy
-
Publication number: 20180314506Abstract: A method and apparatus provides for compiling a computer-readable computer program having a plurality of computer-readable statements into a plurality of computer-executable instructions. In one example, the method and apparatus determines when at least one pair of the computer-readable statements is partial-isomorphic having an equivalent operation and same order of operation but causing non-consecutive memory accesses, and generates the computer-executable instructions causing the at least one pair of the partial-isomorphic statements to perform sequential physical memory accesses.Type: ApplicationFiled: April 27, 2017Publication date: November 1, 2018Inventors: Md Asghar Ahmad Shahid, Shivarama Kokrady Rao, Dibyendu Das
-
Publication number: 20180085896Abstract: An abrasive tool including a bonded abrasive including a body comprising abrasive particles contained within a three-dimensional matrix of bond material, the bond material including an organic material, the abrasive tool further including a first filler contained within the three-dimensional matrix of bond material including a silicate in a first content and a second filler contained within the three-dimensional matrix of bond material including a sulfate in a second content, and the first content is greater than the second content.Type: ApplicationFiled: March 21, 2016Publication date: March 29, 2018Inventors: Udayan PARAMBAN, Adiseshaiah K. SATHYANARAYANAIAH, Jagadis SANKARANARAYANAN, Dibyendu DAS
-
Publication number: 20180088948Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.Type: ApplicationFiled: September 23, 2016Publication date: March 29, 2018Inventors: Anupama Rajesh Rasale, Dibyendu Das, Ashutosh Nema, Md Asghar Ahmad Shahid, Prathiba Kumar
-
Publication number: 20170115970Abstract: A compiler generates transfer functions for blocks of a program during compilation of the program. The transfer functions estimate bit widths of variables in the blocks based on numbers of bits needed to carry out at least one instruction in the blocks and whether the variables are live in the blocks. For example, a transfer function may return a number indicating how many bits of a variable are needed to execute a current instruction as a function of the number of bits of the variable used by the program in subsequent instructions. Numbers of bits to represent the variables in the compiled program based on the transfer functions.Type: ApplicationFiled: October 22, 2015Publication date: April 27, 2017Inventors: Prakash Sathyanath Raghavendra, Dibyendu Das, Arun Rangasamy
-
Patent number: 9235445Abstract: A method of mapping processes to processors in a parallel computing environment where a parallel application is to be run on a cluster of nodes wherein at least one of the nodes has multiple processors sharing a common memory, the method comprising using compiler based communication analysis to map Message Passing Interface processes to processors on the nodes, whereby at least some more heavily communicating processes are mapped to processors within nodes. Other methods, apparatus, and computer readable media are also provided.Type: GrantFiled: November 28, 2011Date of Patent: January 12, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Dibyendu Das, Nagarajan Kathiresan, Rajan Ravindran, Bhaskaran Venkatsubramaniam
-
Patent number: 9158511Abstract: A system, method and computer program product to compute latencies of a plurality of expression trees in a basic block and to select a first and a second expression tree from the plurality of expression trees based on the computed latencies. The first expression tree is isomorphic to the second expression tree and the first and second expression trees are selected in order of largest to smallest latency. This selection ensures that the largest isomorphic expression trees are vectorized first. By vectorizing the largest isomorphic expression trees first, a basic block containing hundreds of statements can be vectorized without significant compile time. Moreover, vectorization of the largest isomorphic expression trees results in a significant improvement in system performance on SIMD processors.Type: GrantFiled: May 20, 2013Date of Patent: October 13, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Ramshankar Ramanarayanan, Meghana Gupta, Soham S. Chakraborty, Dibyendu Das
-
Patent number: 9063826Abstract: A method of mapping processes to processors in a parallel computing environment where a parallel application is to be run on a cluster of nodes wherein at least one of the nodes has multiple processors sharing a common memory, the method comprising using compiler based communication analysis to map Message Passing Interface processes to processors on the nodes, whereby at least some more heavily communicating processes are mapped to processors within nodes. Other methods, apparatus, and computer readable media are also provided.Type: GrantFiled: November 28, 2011Date of Patent: June 23, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Dibyendu Das, Nagarajan Kathiresan, Rajan Ravindran, Bhaskaran Venkatsubramaniam
-
Publication number: 20140344555Abstract: A system, method and computer program product to compute latencies of a plurality of expression trees in a basic block and to select a first and a second expression tree from the plurality of expression trees based on the computed latencies. The first expression tree is isomorphic to the second expression tree and the first and second expression trees are selected in order of largest to smallest latency. This selection ensures that the largest isomorphic expression trees are vectorized first. By vectorizing the largest isomorphic expression trees first, a basic block containing hundreds of statements can be vectorized without significant compile time. Moreover, vectorization of the largest isomorphic expression trees results in a significant improvement in system performance on SIMD processors.Type: ApplicationFiled: May 20, 2013Publication date: November 20, 2014Applicant: Advanced Micro Devices, Inc.Inventors: Ramshankar Ramanarayanan, Meghana Gupta, Soham S. Chakraborty, Dibyendu Das
-
Patent number: 8701099Abstract: A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates.Type: GrantFiled: November 2, 2010Date of Patent: April 15, 2014Assignee: International Business Machines CorporationInventors: Ganesh Bikshandi, Dibyendu Das, Smruti Ranjan Sarangi
-
Patent number: 8612953Abstract: A method, system and article of manufacture are disclosed for registering and deregistering memory pages in a computer system. The method comprises the steps of hoisting register and deregister calls in a given routine where temporal locality is present to overlap computation and communication; using software pipelined registration and deregistration where spatial locality is observed; and using intra-procedural and inter-procedural analysis by a compiler of the computer system to deregister dynamically allocated buffers. The preferred embodiment of the invention is based on an optimizing compiler. The compiler is used to extract information such as addresses of buffers which are being reused repeatedly (temporal locality), preferably in a loop. The compiler may also find information about spatial locality, such as arrays whose indexes are used in a well-defined manner in a series of messages, for example, array pages being accessed in a pre-defined pattern in a loop.Type: GrantFiled: September 13, 2012Date of Patent: December 17, 2013Assignee: International Business Machines CorporationInventors: Dibyendu Das, Manish Gupta
-
Publication number: 20130139175Abstract: A method of mapping processes to processors in a parallel computing environment where a parallel application is to be run on a cluster of nodes wherein at least one of the nodes has multiple processors sharing a common memory, the method comprising using compiler based communication analysis to map Message Passing Interface processes to processors on the nodes, whereby at least some more heavily communicating processes are mapped to processors within nodes. Other methods, apparatus, and computer readable media are also provided.Type: ApplicationFiled: November 28, 2011Publication date: May 30, 2013Applicant: International Business Machines CorporationInventors: DIBYENDU DAS, Nagarajan Kathiresan, Rajan Ravindran, Bhaskaran Venkatsubramaniam
-
Publication number: 20130138929Abstract: A method of mapping processes to processors in a parallel computing environment where a parallel application is to be run on a cluster of nodes wherein at least one of the nodes has multiple processors sharing a common memory, the method comprising using compiler based communication analysis to map Message Passing Interface processes to processors on the nodes, whereby at least some more heavily communicating processes are mapped to processors within nodes. Other methods, apparatus, and computer readable media are also provided.Type: ApplicationFiled: November 28, 2011Publication date: May 30, 2013Applicant: International Business Machines CorporationInventors: DIBYENDU DAS, Nagarajan Kathiresan, Rajan Ravindran, Bhaskaran Venkatsubramaniam