Patents by Inventor Jens Olson

Jens Olson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20260119604
    Abstract: There is provided tensor processing circuitry comprising a plurality of dot-product units, each of which is configured to perform a multiply accumulate operation. A format conversion unit is configured to convert the format of a first data element before processing by the plurality of dot product units. The format conversion unit is configured to convert the first data element from a first data format to one or more data elements in a second floating point data format, the first data format being one of a plurality of data formats supported by the tensor processing circuitry and the second data format being a predefined floating-point data format in which data elements are input to the dot-product units. If the first data format is a higher precision data format than the second floating-point data format, the format conversion unit generates two or more data elements in the second floating-point data format.
    Type: Application
    Filed: October 30, 2024
    Publication date: April 30, 2026
    Inventors: John Wakefield BROTHERS, III, Jens OLSON, Fredrik Peter STOLT
  • Publication number: 20260111174
    Abstract: A tensor processing circuitry comprising a plurality of dot product units and normalization circuitry. Each dot product unit comprises first-stage circuitry and second-stage circuitry. The first-stage circuitry is configured to receive a plurality of input values and perform at least a multiply-accumulate operation on pairs of the plurality of input values, the multiply-accumulate operation produces an output value in a unnormalized floating-point format. The second stage circuitry is configured to receive a plurality of the unnormalized floating-point output values from the first stage circuitry and perform an accumulate operation on each of the received unnormalized floating-point output values to generate an unnormalized result. The unnormalized result of the accumulate operation is then output to the normalization circuitry which normalizes the unnormalized results.
    Type: Application
    Filed: October 17, 2024
    Publication date: April 23, 2026
    Inventors: John Wakefield BROTHERS, III, Jens OLSON
  • Publication number: 20260111230
    Abstract: A processor comprises a handling unit configured to issue invocation data to a storage access controller to load multi-dimensional bricks from the tensor. The multidimensional bricks comprise a brick of primary data and a brick of auxiliary data. The storage access controller configured to: identify a location of the brick of primary data in the storage of the processor using one or more stride of the primary data in one or more dimension of the tensor, load the brick of primary data from the identified location, determine one or more virtual strides for one or more dimensions of the auxiliary data based on the one or more strides of the primary data, identify a location of the brick of auxiliary data in the first storage using the determined one or more virtual strides, and load the brick of the auxiliary data from the identified location.
    Type: Application
    Filed: October 17, 2024
    Publication date: April 23, 2026
    Inventors: Andreas Herman HANSSON, Elliot Maurice Simon ROSEMARINE, Dominic Hugo SYMES, Jens OLSON
  • Publication number: 20260104929
    Abstract: A processor comprising storage, execution circuitry and a handling unit. The handling unit is configured to obtain task data that describes a task to be executed. The task comprises a plurality of operations representable as a directed graph of operations comprising operations connected by connections corresponding to respective logical storage locations. In executing the task, the execution circuitry is configured to operate over a multi-dimensional nested loop. The task data comprises operation-specific control data for an operation of the operations, the operation-specific control data providing an indication, for each respective dimension of a plurality of dimensions of the multi-dimensional nested loop on a per-dimension basis, of whether the operation is to be executed for each iteration of a plurality of iterations over the respective dimension. The handling unit manages execution of the operation, using the execution circuitry, based on the operation-specific control data.
    Type: Application
    Filed: October 15, 2024
    Publication date: April 16, 2026
    Inventors: Dominic Hugo SYMES, Jens OLSON, Jared Corey SMOLENS, Rune HOLM
  • Patent number: 12547330
    Abstract: A processor comprising storage, execution circuitry and a handling unit configured to obtain task data that describes a task to be executed, comprising a plurality of operations representable as a directed graph of operations. The plurality of operations comprises: a set of production operations comprising generating a set of blocks comprising an intermediate block generated by a production operation in determining a final block; and a consumption operation. The handling unit generates a set of location data indicative of respective physical storage locations allocated to store respective blocks, traverses the set of location data to obtain location data indicative of a physical storage location for storing the intermediate block, and generates and sends execution instructions to instruct the execution circuitry to execute at least part of the consumption operation to read the intermediate block from the physical storage location, the execution instructions comprising the location data.
    Type: Grant
    Filed: July 19, 2024
    Date of Patent: February 10, 2026
    Assignee: Arm Limited
    Inventors: Dominic Hugo Symes, Jens Olson, Elliot Maurice Simon Rosemarine, Ian Rudolf Bratt, Jared Corey Smolens, Rajanarayana Priyanka Marigi, Fredrik Peter Stolt
  • Patent number: 12547450
    Abstract: A processor to generate position data indicative of a position within a compressed data stream, wherein, previously, in executing a task, data of the compressed data stream ending at the position has been read by the processor from storage storing the compressed data stream. After reading the data, the processor reads further data of the compressed data stream from the storage, in executing the task, the further data located beyond the position within the compressed data stream. After reading the further data, the processor reads, based on the position data, a portion of the compressed data stream from the storage, in executing the task, starting from the position within the compressed data stream. The processor decompresses the portion of the compressed data stream to generate decompressed data, in executing the task.
    Type: Grant
    Filed: January 20, 2023
    Date of Patent: February 10, 2026
    Assignee: Arm Limited
    Inventors: Elliot Maurice Simon Rosemarine, Jared Corey Smolens, Rune Holm, John Wakefield Brothers, III, Jens Olson
  • Publication number: 20260023568
    Abstract: A data processing unit is provided comprising a handling unit configured to send invocation data including the first and second operation to an execution unit to cause the execution unit to process the invocation data. The execution unit processes the data by: obtaining data from a non-local storage based on a logical source pipe of a first operation, performing the first and a second operation for portions of the data received from the logical source pipe. In response to the output of the first operation and input of the second operation referring to a logical forwarding pipe, the execution unit performs processing for a portion of the data for the first and second operation without storing the output data of the first operation in the non-local storage.
    Type: Application
    Filed: July 19, 2024
    Publication date: January 22, 2026
    Inventors: Elliot Maurice Simon ROSEMARINE, Jens OLSON, John Wakefield BROTHERS, III, Dominic Hugo SYMES, Thomas NYBERG, Ola Markus LEMBKE
  • Publication number: 20260023687
    Abstract: A method and processing unit for performing a reduction operation on a tensor, where the processing unit comprises a plurality of processing slices having access to a portion of a shared storage and comprising circuitry configured to perform operations on the tensor, and a transform unit having access to portions of the shared storage and coupled to each of the processing slices. A part of the tensor is transferred to at least one of the processing slices for processing; and at each of the processing slices, processing circuitry performs a reduction operation on the part of the tensor. Each processing slice is constrained in a particular dimension of the tensor. The processing slices output to the transform unit, a partially reduced part of the tensor; which performs, a further reduction operation on the partially reduced parts of the tensor, such that the further reduction operation outputs a further reduced tensor.
    Type: Application
    Filed: July 19, 2024
    Publication date: January 22, 2026
    Inventors: Dominic Hugo SYMES, III, John Wakefield BROTHERS, III, Rune HOLM, Jens OLSON
  • Publication number: 20260023684
    Abstract: A processor comprising storage, execution circuitry and a handling unit configured to obtain task data that describes a task to be executed, the task comprising a plurality of operations representable as a directed graph of operations, a consumption operation comprising reading of an intermediate block of intermediate data values generated by a production operation in determining a final block of final data values based on the intermediate block. The handling unit allocates a physical storage location of the storage for storing the intermediate block, generates location data indicative of the physical storage location and generates and sends execution instructions to instruct the execution circuitry to at least partly execute the production operation to generate the intermediate block and to store the intermediate block in the physical storage location, the execution instructions comprising the location data.
    Type: Application
    Filed: July 19, 2024
    Publication date: January 22, 2026
    Inventors: Jens OLSON, Elliot Maurice Simon ROSEMARINE, Ian Rudolf BRATT, Jared Corey SMOLENS
  • Publication number: 20260023593
    Abstract: A method for executing a task using a processing unit, wherein the task comprises at least one operation. The method comprises obtaining, by a command unit of the processing unit, a pseudo-random number, and scheduling by the command unit, the at least one operation. The command unit generates at least one second pseudo-random number based on the pseudo-random number, and one or more scheduling-independent parameters relating to the operation. The scheduling-independent parameters are independent of the scheduling of the operation. The processing unit executes the at least one operation based on the at least one second pseudo-random number.
    Type: Application
    Filed: July 19, 2024
    Publication date: January 22, 2026
    Inventors: Elliot Maurice Simon ROSEMARINE, Sven Ola Johannes HUGOSSON, John Wakefield BROTHERS, III, Jared Corey SMOLENS, Rune HOLM, Jens OLSON, Dominic Hugo SYMES
  • Publication number: 20260023603
    Abstract: A processor comprising a storage unit comprising a plurality of storage elements. The processor is configured to obtain task data that describes a task to be executed, the task comprising a plurality of operations representable as a directed graph of operations comprising operations connected by connections corresponding to respective logical storage locations. A first connection associated with a first output of a first operation corresponds to a first logical storage location and a second connection associated with a second output of a second operation corresponds to a second logical storage location. The processor is configured to dynamically allocate a first set of the plurality of storage elements of the storage unit to correspond to the first logical storage location and a second set of the plurality of storage elements of the storage unit to correspond to the second logical storage location.
    Type: Application
    Filed: July 19, 2024
    Publication date: January 22, 2026
    Inventors: Jens OLSON, Jared Corey SMOLENS
  • Publication number: 20260023489
    Abstract: A processor comprising storage, execution circuitry and a handling unit configured to obtain task data that describes a task to be executed, comprising a plurality of operations representable as a directed graph of operations. The plurality of operations comprises: a set of production operations comprising generating a set of blocks comprising an intermediate block generated by a production operation in determining a final block; and a consumption operation. The handling unit generates a set of location data indicative of respective physical storage locations allocated to store respective blocks, traverses the set of location data to obtain location data indicative of a physical storage location for storing the intermediate block, and generates and sends execution instructions to instruct the execution circuitry to execute at least part of the consumption operation to read the intermediate block from the physical storage location, the execution instructions comprising the location data.
    Type: Application
    Filed: July 19, 2024
    Publication date: January 22, 2026
    Inventors: Dominic Hugo SYMES, Jens OLSON, Elliot Maurice Simon ROSEMARINE, Ian Rudolf BRATT, Jared Corey SMOLENS, Rajanarayana Priyanka MARIGI, Fredrik Peter STOLT
  • Patent number: 12498976
    Abstract: A processor to execute a plurality of tasks comprising a first task and a second task. At least a part of the first task is to be executed simultaneously with at least a part of the second task. The processor comprises a handling unit to: determine an available portion of a storage available during execution of the part of the first task; determine a mapping between at least one logical address associated with data associated with the part of the second task and a corresponding at least one physical address of the storage corresponding to the available portion; and identify, based on the mapping, the at least one physical address corresponding to the at least one logical address associated with the data, for storing the data in the available portion of the storage.
    Type: Grant
    Filed: October 17, 2022
    Date of Patent: December 16, 2025
    Assignee: Arm Limited
    Inventors: Jens Olson, John Wakefield Brothers, III
  • Patent number: 12499045
    Abstract: A method and processing unit for mapping coordinates of a tensor to a shared storage of the processing unit. The processing unit comprises processing slices, each for performing a suboperation of the operation, and having prioritized access to a preferred portion of the shared storage. The method comprises obtaining a layout indicating configurable data regions in the shared storage. The configurable data regions are banks comprising regions of shared storage. A stride based on for at least one dimension of the tensor and the layout is obtained. Coordinates of the tensor are mapped to locations in the shared storage, wherein the mapping comprises calculating a bank number and offset which are calculated based on coordinates of the tensor, the layout of the configurable data regions, and the stride of the at least one dimension. Each processing slice performs the suboperation on the data of the tensor based on the mapping.
    Type: Grant
    Filed: July 19, 2024
    Date of Patent: December 16, 2025
    Assignee: Arm Limited
    Inventors: Jens Olson, John Wakefield Brothers, III
  • Publication number: 20250362966
    Abstract: A processor and method for handling data, by obtaining operations from storage, analyzing each of the operations to determine an associated operation space, and generating at least one operation set, wherein the operations of the operation set have substantially similar operation spaces. Receiving input data in the form of a tensor; and allocate the input data, as the input to a given operation of the operation set. The input data having the predetermined input characteristics associated with the given operation. Executing the given operations using the input to produces an output with the known output characteristics. Storing in a segment being associated with an operation of the operation set, the input data; and the output associated with the operation of the operation set.
    Type: Application
    Filed: January 12, 2024
    Publication date: November 27, 2025
    Applicant: Arm Limited
    Inventors: Rune Holm, Jared Corey Smolens, Elliot Maurice Simon Rosemarine, Jens Olson
  • Patent number: 12475046
    Abstract: A method and processing unit for mapping coordinates of a tensor to a shared storage of the processing unit. The processing unit comprises processing slices, each for performing a suboperation of the operation, and having prioritized access to a preferred portion of the shared storage. The method comprises obtaining a layout indicating configurable data regions in the shared storage. The configurable data regions are banks comprising regions of shared storage. A stride based on for at least one dimension of the tensor and the layout is obtained. Coordinates of the tensor are mapped to locations in the shared storage, wherein the mapping comprises calculating a bank number and offset which are calculated based on coordinates of the tensor, the layout of the configurable data regions, and the stride of the at least one dimension. Each processing slice performs the suboperation on the data of the tensor based on the mapping.
    Type: Grant
    Filed: July 19, 2024
    Date of Patent: November 18, 2025
    Assignee: Arm Limited
    Inventors: Jens Olson, John Wakefield Brothers, III
  • Patent number: 12271608
    Abstract: A processor to generate accumulated data comprising, for an operation cycle: performing an operation on a first bit range of a set of first input data to generate a set of operation data, which is accumulated with stored data within a first storage device. A lowest n bits of the accumulated data are accumulated with first further stored data within a first bit range of a second storage device, and are bit-shifted from the first storage device. Further accumulated data is generated, comprising, for an operation cycle: performing the operation on a second bit range of the set of first input data to generate a further set of operation data, which is accumulated with the stored data within the first storage device. A lowest m bits of the further accumulated data is accumulated with second further stored data within a second bit range of the second storage device.
    Type: Grant
    Filed: January 20, 2023
    Date of Patent: April 8, 2025
    Assignee: Arm Limited
    Inventors: Dominic Hugo Symes, John Wakefield Brothers, III, Jens Olson, Peter Mattias Hansson
  • Patent number: 12072808
    Abstract: A processor comprising a first storage managed as a circular buffer to store a plurality of data structures. Each data structure comprises: an identifier, a size indicator and first data associated with instructions for execution of a task. The processor is configured for searching for a data structure in the first storage. A data structure subsequent to the tail data structure can be located using a storage address in the first storage of a tail data structure and the size indicator of all data structures preceding the second data structure among the plurality of data structures. When a data structure is found, the task may be executed based at least in part on the first data of the found data structure.
    Type: Grant
    Filed: December 8, 2022
    Date of Patent: August 27, 2024
    Assignee: Arm Limited
    Inventors: Jens Olson, Jared Corey Smolens
  • Publication number: 20240248621
    Abstract: A processor to generate accumulated data comprising, for an operation cycle: performing an operation on a first bit range of a set of first input data to generate a set of operation data, which is accumulated with stored data within a first storage device. A lowest n bits of the accumulated data are accumulated with first further stored data within a first bit range of a second storage device, and are bit-shifted from the first storage device. Further accumulated data is generated, comprising, for an operation cycle: performing the operation on a second bit range of the set of first input data to generate a further set of operation data, which is accumulated with the stored data within the first storage device. A lowest m bits of the further accumulated data is accumulated with second further stored data within a second bit range of the second storage device.
    Type: Application
    Filed: January 20, 2023
    Publication date: July 25, 2024
    Inventors: Dominic Hugo SYMES, John Wakefield BROTHERS, III, Jens OLSON, Peter Mattias HANSSON
  • Publication number: 20240248764
    Abstract: A memory unit configured for handling task data, the task data describing a task to be executed as a directed acyclic graph of operations, wherein each operation maps to a corresponding execution unit, and wherein each connection between operations in the acyclic graph maps to a corresponding storage element of the execution unit. The task data defines an operation space representing the dimensions of a multi-dimensional arrangement of the connected operations to be executed represented by the data blocks; the memory unit configured to receive a sequence of processing requests comprising the one or more data blocks with each data block assigned a priority value and comprising a block command. The memory unit is configured to arbitrate between the data blocks based upon the priority value and block command to prioritize the sequence of processing requests and wherein the processing requests include writing data to, or reading data from storage.
    Type: Application
    Filed: May 12, 2023
    Publication date: July 25, 2024
    Inventors: Rune HOLM, Jens OLSON, Elliot Maurice Simon ROSEMARINE, Jared SMOLENS