Patents by Inventor David Hennah Mansell

David Hennah Mansell has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10908916
    Abstract: An apparatus and method are provided for executing a plurality of threads. The apparatus has processing circuitry arranged to execute the plurality of threads, with each thread executing a program to perform processing operations on thread data. Each thread has a thread identifier, and the thread data includes a value which is dependent on the thread identifier. Value generator circuitry is provided to perform a computation using the thread identifier of a chosen thread in order to generate the above mentioned value for that chosen thread, and to make that value available to the processing circuitry for use by the processing circuitry when executing the chosen thread. Such an arrangement can give rise to significant performance benefits when executing the plurality of threads on the apparatus.
    Type: Grant
    Filed: March 2, 2016
    Date of Patent: February 2, 2021
    Assignee: ARM Limited
    Inventors: Timothy Holroyd Glauert, David Hennah Mansell, Rune Holm
  • Publication number: 20200372097
    Abstract: There is provided a data processing apparatus to perform an operation on a first matrix and a second matrix. The data processing apparatus includes receiver circuitry to receive elements of the first matrix, elements of the second matrix, and correspondence data to indicate where the elements of the first matrix are located in the first matrix. Determination circuitry performs, using the correspondence data, a determination of whether, for a given element of the first matrix in column i of the first matrix, a given element of the second matrix occurs in row i of the second matrix. Aggregation circuitry calculates an aggregation between a given row in the first matrix and a given column in the second matrix and includes: functional circuitry to perform, in dependence on the determination, a function on the given element of the first matrix and the given element of the second matrix to produce a partial result.
    Type: Application
    Filed: May 21, 2019
    Publication date: November 26, 2020
    Inventors: Matthew MATTINA, Zhigang LIU, Paul Nicholas WHATMOUGH, David Hennah MANSELL
  • Patent number: 10712965
    Abstract: An apparatus and method are provided for transferring data between address ranges in memory. The apparatus comprises a data transfer controller, that is responsive to a data transfer request received by the apparatus from a processing element, to perform a transfer operation to transfer data from at least one source address range in memory to at least one destination address range in the memory. A redirect controller is then arranged, whilst the transfer operation is being performed, to intercept an access request that specifies a target address within a target address range, and to perform a memory redirection operation so as to cause the access request to be processed without awaiting completion of the transfer operation.
    Type: Grant
    Filed: November 8, 2017
    Date of Patent: July 14, 2020
    Assignee: ARM Limited
    Inventors: Andreas Lars Sandberg, Nikos Nikoleris, David Hennah Mansell
  • Publication number: 20200218538
    Abstract: A data processing apparatus, a method of operating a data processing apparatus, a non-transitory computer readable storage medium, and an instruction are provided. The instruction specifies a first source register and a second source register. In response to the instruction control signals are generated, causing processing circuitry to perform a dot product operation. For this operation at least a first data element and a second data element are extracted from each of the first source register and the second source register, such that then at least first data element pairs and second data element pairs are multiplied together. The dot product operation is performed independently in each of multiple intra-register lanes across each of the first source register and the second source register. A widening operation with a large density of operations per instruction is thus provided.
    Type: Application
    Filed: January 26, 2018
    Publication date: July 9, 2020
    Inventor: David Hennah MANSELL
  • Publication number: 20200117450
    Abstract: Techniques for performing matrix multiplication in a data processing apparatus are disclosed, comprising apparatuses, matrix multiply instructions, methods of operating the apparatuses, and virtual machine implementations. Registers, each register for storing at least four data elements, are referenced by a matrix multiply instruction and in response to the matrix multiply instruction a matrix multiply operation is carried out. First and second matrices of data elements are extracted from first and second source registers, and plural dot product operations, acting on respective rows of the first matrix and respective columns of the second matrix are performed to generate a square matrix of result data elements, which is applied to a destination register. A higher computation density for a given number of register operands is achieved with respect to vector-by-element techniques.
    Type: Application
    Filed: June 8, 2018
    Publication date: April 16, 2020
    Inventors: David Hennah MANSELL, Rune HOLM, Ian Michael CAULFIELD, Jelena MILANOVIC
  • Patent number: 10528350
    Abstract: A data processing apparatus (100) executes threads and includes a general program counter (PC) (120) identifying an instruction to be executed for at least a subset of the threads. Each thread has a thread PC (184). The subset of threads has at least one lock parameter (188, 500-504) for tracking exclusive access to shared resources. In response to a first instruction executed for a thread, the processor (160) modifies the at least one lock parameter (188), (500-504) to indicate that the thread has gained exclusive access to the shared resource. In response to a second instruction, the processor modifies the at least one lock parameter (188, 500-504) to indicate that the thread no longer has exclusive access. A selector (110) selects one of the subset of threads based on the at least one lock parameter (188, 500-504) and sets the general PC (120) to the thread PC (184) of the selected thread.
    Type: Grant
    Filed: July 28, 2015
    Date of Patent: January 7, 2020
    Assignee: ARM Limited
    Inventors: Rune Holm, David Hennah Mansell
  • Publication number: 20190377573
    Abstract: A data processing apparatus, a method of operating a data processing apparatus, a non-transitory computer readable storage medium, and an instruction are provided. The instruction specifies a first source register, a second source register, and an index. In response to the instruction control signals are generated, causing processing circuitry to perform a data processing operation with respect to each data group in the first source register and the second source register to generate respective result data groups forming a result of the data processing operation. Each of the first source register and the second source register has a size which is an integer multiple at least twice a predefined size of the data group, and each data group comprises a plurality of data elements. The operands of the data processing operation for each data group are a selected data element identified in the data group of the first source register by the index and each data element in the data group of the second source register.
    Type: Application
    Filed: February 2, 2018
    Publication date: December 12, 2019
    Inventors: Grigorios MAGKLIS, Nigel John STEPHENS, Jacob EAPEN, Mbou EYOLE, David Hennah MANSELL
  • Publication number: 20190369989
    Abstract: A data processing apparatus, a method of operating a data processing apparatus, a non-transitory computer readable storage medium, and an instruction are provided. The instruction specifies a first source register, a second source register, and a set of N accumulation registers. In response to the instruction control signals are generated, causing processing circuitry to extract N data elements from content of the first source register, perform a multiplication of each of the N data elements by content of the second source register, and apply a result of each multiplication to content of a respective target register of the set of N accumulation registers. As a result plural (N) multiplications are performed in a manner that effectively provides a multiplier N times the register width, but without requiring the register file to be made N times larger.
    Type: Application
    Filed: January 26, 2018
    Publication date: December 5, 2019
    Inventors: David Hennah MANSELL, Grigorios MAGKLIS
  • Publication number: 20190355142
    Abstract: A method of processing image data representative of an image using a multi-stage system comprising a first neural network (NN) for identifying a first image characteristic and a second NN for identifying a second image characteristic. The method comprises processing the image data using t a first at least one layer of the first NN to generate feature data representative of at least one feature of the image and processing the feature data using a second at least one layer of the first NN to generate first image characteristic data indicative of whether the image includes the first image characteristic. The feature data is transferred from the first NN to the second NN. The feature data is processed using the second NN to generate second image characteristic data representative of whether the image includes the second image characteristic.
    Type: Application
    Filed: May 15, 2018
    Publication date: November 21, 2019
    Inventors: Daren CROXFORD, David Hennah MANSELL
  • Patent number: 10423467
    Abstract: A data processing apparatus and method are provided for executing a plurality of threads. Processing circuitry performs processing operations required by the plurality of threads, the processing operations including a lock-protected processing operation with which a lock is associated, where the lock needs to be acquired before the processing circuitry performs the lock-protected processing operation. Baton maintenance circuitry is used to maintain a baton in association with the plurality of threads, the baton forming a proxy for the lock, and the baton maintenance circuitry being configured to allocate the baton between the threads.
    Type: Grant
    Filed: May 19, 2015
    Date of Patent: September 24, 2019
    Assignee: ARM Limited
    Inventors: David Hennah Mansell, Timothy Holroyd Glauert
  • Patent number: 10296340
    Abstract: A data processing apparatus 10 for executing an access instruction for n threads in order to access data values for the n threads includes storage circuitry 100 that stores data values associated with the n threads in groups defined by storage boundaries. The data processing apparatus also includes processing circuitry 80 that processes the access instruction for a set of threads at a time (where each set of threads comprises fewer than n threads) and splitting circuitry 110, responsive to the access instruction, to divide the n threads into multiple sets of threads, and to generate at least one control signal identifying the multiple sets. For each of the sets, the processing circuitry responds to the at least one control signal by issuing at least one access request to the storage circuitry in order to access the data values for that set. The splitting circuitry determines into which set each of the n threads is allocated having regards to the storage boundaries.
    Type: Grant
    Filed: March 10, 2015
    Date of Patent: May 21, 2019
    Assignee: ARM Limited
    Inventors: David Hennah Mansell, Timothy Holroyd Glauert
  • Publication number: 20180157437
    Abstract: An apparatus and method are provided for transferring data between address ranges in memory. The apparatus comprises a data transfer controller, that is responsive to a data transfer request received by the apparatus from a processing element, to perform a transfer operation to transfer data from at least one source address range in memory to at least one destination address range in the memory. A redirect controller is then arranged, whilst the transfer operation is being performed, to intercept an access request that specifies a target address within a target address range, and to perform a memory redirection operation so as to cause the access request to be processed without awaiting completion of the transfer operation.
    Type: Application
    Filed: November 8, 2017
    Publication date: June 7, 2018
    Inventors: Andreas Lars SANDBERG, Nikos NIKOLERIS, David Hennah MANSELL
  • Publication number: 20170286107
    Abstract: A data processing apparatus (100) executes threads and includes a general program counter (PC) (120) identifying an instruction to be executed for at least a subset of the threads. Each thread has a thread PC (184). The subset of threads has at least one lock parameter (188, 500-504) for tracking exclusive access to shared resources. In response to a first instruction executed for a thread, the processor (160) modifies the at least one lock parameter (188), (500-504) to indicate that the thread has gained exclusive access to the shared resource. In response to a second instruction, the processor modifies the at least one lock parameter (188, 500-504) to indicate that the thread no longer has exclusive access. A selector (110) selects one of the subset of threads based on the at least one lock parameter (188, 500-504) and sets the general PC (120) to the thread PC (184) of the selected thread.
    Type: Application
    Filed: July 28, 2015
    Publication date: October 5, 2017
    Inventors: Rune HOLM, David Hennah MANSELL
  • Publication number: 20170139757
    Abstract: A data processing apparatus and method are provided for executing a plurality of threads. Processing circuitry performs processing operations required by the plurality of threads, the processing operations including a lock-protected processing operation with which a lock is associated, where the lock needs to be acquired before the processing circuitry performs the lock-protected processing operation. Baton maintenance circuitry is used to maintain a baton in association with the plurality of threads, the baton forming a proxy for the lock, and the baton maintenance circuitry being configured to allocate the baton between the threads.
    Type: Application
    Filed: May 19, 2015
    Publication date: May 18, 2017
    Applicant: ARM LIMITED
    Inventors: David Hennah MANSELL, Timothy Holroyd GLAUERT
  • Patent number: 9547530
    Abstract: A data processing apparatus has processing circuitry for processing threads each having thread state data. The threads may be processed in thread groups, with each thread group comprising a number of threads processed in parallel with a common program executed for each thread. Several thread state storage regions are provided with fixed number of thread state entries for storing thread state data for a corresponding thread. At least two of the storage regions have different fixed numbers of entries. The processing circuitry processes as the same thread group threads having thread state data stored in the same storage region and processes threads having thread state data stored in different storage regions as different thread groups.
    Type: Grant
    Filed: November 1, 2013
    Date of Patent: January 17, 2017
    Assignee: ARM Limited
    Inventor: David Hennah Mansell
  • Publication number: 20160259668
    Abstract: An apparatus and method are provided for executing a plurality of threads. The apparatus has processing circuitry arranged to execute the plurality of threads, with each thread executing a program to perform processing operations on thread data. Each thread has a thread identifier, and the thread data includes a value which is dependent on the thread identifier. Value generator circuitry is provided to perform a computation using the thread identifier of a chosen thread in order to generate the above mentioned value for that chosen thread, and to make that value available to the processing circuitry for use by the processing circuitry when executing the chosen thread. Such an arrangement can give rise to significant performance benefits when executing the plurality of threads on the apparatus.
    Type: Application
    Filed: March 2, 2016
    Publication date: September 8, 2016
    Inventors: Timothy Holroyd GLAUERT, David Hennah MANSELL, Rune HOLM
  • Patent number: 9436473
    Abstract: A single instruction multiple thread (SIMT) processor includes scheduling circuitry for calculating a next scheduled execution point for execution circuits which execute respective threads corresponding to a common program. In addition to calculating the next scheduled execution point, the scheduling circuitry determines a runner up execution point which would have been determined as the next scheduled execution point if the threads which actually correspond to the next scheduled execution point had been removed from consideration. This runner up execution point is used to identify points of re-convergence within the program flow and as part of the operation of a static branch predictor.
    Type: Grant
    Filed: October 8, 2013
    Date of Patent: September 6, 2016
    Assignee: ARM Limited
    Inventors: Rune Holm, Jr., David Hennah Mansell
  • Patent number: 9311088
    Abstract: An apparatus and method are provided for performing register renaming. Available register identifying circuitry is provided to identify which physical registers form a pool of physical registers available to be mapped by register renaming circuitry to an architectural register specified by an instruction to be executed. Configuration data whose value is modified during operation of the processing circuitry is stored such that, when the configuration data has a first value, the configuration data identifies at least one architectural register of the architectural register set which does not require mapping to a physical register by the register renaming circuitry. The register identifying circuitry is arranged to reference the modified data value, such that when the configuration data has the first value, the number of physical registers in the pool is increased due to the reduction in the number of architectural registers which require mapping to physical registers.
    Type: Grant
    Filed: June 26, 2013
    Date of Patent: April 12, 2016
    Assignee: ARM Limited
    Inventors: Frederic Claude Marie Piry, Louis-Marie Vincent Mouton, Luca Scalabrino, Richard Roy Grisenthwaite, David Hennah Mansell
  • Patent number: 9158574
    Abstract: A method and apparatus for processing data when an interrupt is received during processing of a function at a point during the processing at which a portion of the function has been processed then a control parameter is accessed. In response to a control parameter having a value indicting that the function has idempotence, processing of the function is stopped, and information on progress of the function is discarded such that following completion of the interrupt the portion of the function that has already been processed is processed again. In response to the control parameter having a value indicating that the function does not have idempotence, processing of the function is suspended without discarding information on progress of the function that has already been processed such that following completion of the interrupt the processing is resumed from a point that it reached when it was suspended.
    Type: Grant
    Filed: November 18, 2011
    Date of Patent: October 13, 2015
    Assignee: ARM Limited
    Inventors: David Hennah Mansell, Timothy Holroyd Glauert
  • Publication number: 20150261538
    Abstract: A data processing apparatus 10 for executing an access instruction for n threads in order to access data values for the n threads includes storage circuitry 100 that stores data values associated with the n threads in groups defined by storage boundaries. The data processing apparatus also includes processing circuitry 80 that processes the access instruction for a set of threads at a time (where each set of threads comprises fewer than n threads) and splitting circuitry 110, responsive to the access instruction, to divide the n threads into multiple sets of threads, and to generate at least one control signal identifying the multiple sets. For each of the sets, the processing circuitry responds to the at least one control signal by issuing at least one access request to the storage circuitry in order to access the data values for that set. The splitting circuitry determines into which set each of the n threads is allocated having regards to the storage boundaries.
    Type: Application
    Filed: March 10, 2015
    Publication date: September 17, 2015
    Inventors: David Hennah MANSELL, Timothy Holroyd GLAUERT