Patents by Inventor Jens Olson
Jens Olson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260119604Abstract: There is provided tensor processing circuitry comprising a plurality of dot-product units, each of which is configured to perform a multiply accumulate operation. A format conversion unit is configured to convert the format of a first data element before processing by the plurality of dot product units. The format conversion unit is configured to convert the first data element from a first data format to one or more data elements in a second floating point data format, the first data format being one of a plurality of data formats supported by the tensor processing circuitry and the second data format being a predefined floating-point data format in which data elements are input to the dot-product units. If the first data format is a higher precision data format than the second floating-point data format, the format conversion unit generates two or more data elements in the second floating-point data format.Type: ApplicationFiled: October 30, 2024Publication date: April 30, 2026Inventors: John Wakefield BROTHERS, III, Jens OLSON, Fredrik Peter STOLT
-
Publication number: 20260111174Abstract: A tensor processing circuitry comprising a plurality of dot product units and normalization circuitry. Each dot product unit comprises first-stage circuitry and second-stage circuitry. The first-stage circuitry is configured to receive a plurality of input values and perform at least a multiply-accumulate operation on pairs of the plurality of input values, the multiply-accumulate operation produces an output value in a unnormalized floating-point format. The second stage circuitry is configured to receive a plurality of the unnormalized floating-point output values from the first stage circuitry and perform an accumulate operation on each of the received unnormalized floating-point output values to generate an unnormalized result. The unnormalized result of the accumulate operation is then output to the normalization circuitry which normalizes the unnormalized results.Type: ApplicationFiled: October 17, 2024Publication date: April 23, 2026Inventors: John Wakefield BROTHERS, III, Jens OLSON
-
Publication number: 20260111230Abstract: A processor comprises a handling unit configured to issue invocation data to a storage access controller to load multi-dimensional bricks from the tensor. The multidimensional bricks comprise a brick of primary data and a brick of auxiliary data. The storage access controller configured to: identify a location of the brick of primary data in the storage of the processor using one or more stride of the primary data in one or more dimension of the tensor, load the brick of primary data from the identified location, determine one or more virtual strides for one or more dimensions of the auxiliary data based on the one or more strides of the primary data, identify a location of the brick of auxiliary data in the first storage using the determined one or more virtual strides, and load the brick of the auxiliary data from the identified location.Type: ApplicationFiled: October 17, 2024Publication date: April 23, 2026Inventors: Andreas Herman HANSSON, Elliot Maurice Simon ROSEMARINE, Dominic Hugo SYMES, Jens OLSON
-
Publication number: 20260104929Abstract: A processor comprising storage, execution circuitry and a handling unit. The handling unit is configured to obtain task data that describes a task to be executed. The task comprises a plurality of operations representable as a directed graph of operations comprising operations connected by connections corresponding to respective logical storage locations. In executing the task, the execution circuitry is configured to operate over a multi-dimensional nested loop. The task data comprises operation-specific control data for an operation of the operations, the operation-specific control data providing an indication, for each respective dimension of a plurality of dimensions of the multi-dimensional nested loop on a per-dimension basis, of whether the operation is to be executed for each iteration of a plurality of iterations over the respective dimension. The handling unit manages execution of the operation, using the execution circuitry, based on the operation-specific control data.Type: ApplicationFiled: October 15, 2024Publication date: April 16, 2026Inventors: Dominic Hugo SYMES, Jens OLSON, Jared Corey SMOLENS, Rune HOLM
-
Patent number: 12547330Abstract: A processor comprising storage, execution circuitry and a handling unit configured to obtain task data that describes a task to be executed, comprising a plurality of operations representable as a directed graph of operations. The plurality of operations comprises: a set of production operations comprising generating a set of blocks comprising an intermediate block generated by a production operation in determining a final block; and a consumption operation. The handling unit generates a set of location data indicative of respective physical storage locations allocated to store respective blocks, traverses the set of location data to obtain location data indicative of a physical storage location for storing the intermediate block, and generates and sends execution instructions to instruct the execution circuitry to execute at least part of the consumption operation to read the intermediate block from the physical storage location, the execution instructions comprising the location data.Type: GrantFiled: July 19, 2024Date of Patent: February 10, 2026Assignee: Arm LimitedInventors: Dominic Hugo Symes, Jens Olson, Elliot Maurice Simon Rosemarine, Ian Rudolf Bratt, Jared Corey Smolens, Rajanarayana Priyanka Marigi, Fredrik Peter Stolt
-
Patent number: 12547450Abstract: A processor to generate position data indicative of a position within a compressed data stream, wherein, previously, in executing a task, data of the compressed data stream ending at the position has been read by the processor from storage storing the compressed data stream. After reading the data, the processor reads further data of the compressed data stream from the storage, in executing the task, the further data located beyond the position within the compressed data stream. After reading the further data, the processor reads, based on the position data, a portion of the compressed data stream from the storage, in executing the task, starting from the position within the compressed data stream. The processor decompresses the portion of the compressed data stream to generate decompressed data, in executing the task.Type: GrantFiled: January 20, 2023Date of Patent: February 10, 2026Assignee: Arm LimitedInventors: Elliot Maurice Simon Rosemarine, Jared Corey Smolens, Rune Holm, John Wakefield Brothers, III, Jens Olson
-
Publication number: 20260023568Abstract: A data processing unit is provided comprising a handling unit configured to send invocation data including the first and second operation to an execution unit to cause the execution unit to process the invocation data. The execution unit processes the data by: obtaining data from a non-local storage based on a logical source pipe of a first operation, performing the first and a second operation for portions of the data received from the logical source pipe. In response to the output of the first operation and input of the second operation referring to a logical forwarding pipe, the execution unit performs processing for a portion of the data for the first and second operation without storing the output data of the first operation in the non-local storage.Type: ApplicationFiled: July 19, 2024Publication date: January 22, 2026Inventors: Elliot Maurice Simon ROSEMARINE, Jens OLSON, John Wakefield BROTHERS, III, Dominic Hugo SYMES, Thomas NYBERG, Ola Markus LEMBKE
-
Publication number: 20260023687Abstract: A method and processing unit for performing a reduction operation on a tensor, where the processing unit comprises a plurality of processing slices having access to a portion of a shared storage and comprising circuitry configured to perform operations on the tensor, and a transform unit having access to portions of the shared storage and coupled to each of the processing slices. A part of the tensor is transferred to at least one of the processing slices for processing; and at each of the processing slices, processing circuitry performs a reduction operation on the part of the tensor. Each processing slice is constrained in a particular dimension of the tensor. The processing slices output to the transform unit, a partially reduced part of the tensor; which performs, a further reduction operation on the partially reduced parts of the tensor, such that the further reduction operation outputs a further reduced tensor.Type: ApplicationFiled: July 19, 2024Publication date: January 22, 2026Inventors: Dominic Hugo SYMES, III, John Wakefield BROTHERS, III, Rune HOLM, Jens OLSON
-
Publication number: 20260023684Abstract: A processor comprising storage, execution circuitry and a handling unit configured to obtain task data that describes a task to be executed, the task comprising a plurality of operations representable as a directed graph of operations, a consumption operation comprising reading of an intermediate block of intermediate data values generated by a production operation in determining a final block of final data values based on the intermediate block. The handling unit allocates a physical storage location of the storage for storing the intermediate block, generates location data indicative of the physical storage location and generates and sends execution instructions to instruct the execution circuitry to at least partly execute the production operation to generate the intermediate block and to store the intermediate block in the physical storage location, the execution instructions comprising the location data.Type: ApplicationFiled: July 19, 2024Publication date: January 22, 2026Inventors: Jens OLSON, Elliot Maurice Simon ROSEMARINE, Ian Rudolf BRATT, Jared Corey SMOLENS
-
Publication number: 20260023593Abstract: A method for executing a task using a processing unit, wherein the task comprises at least one operation. The method comprises obtaining, by a command unit of the processing unit, a pseudo-random number, and scheduling by the command unit, the at least one operation. The command unit generates at least one second pseudo-random number based on the pseudo-random number, and one or more scheduling-independent parameters relating to the operation. The scheduling-independent parameters are independent of the scheduling of the operation. The processing unit executes the at least one operation based on the at least one second pseudo-random number.Type: ApplicationFiled: July 19, 2024Publication date: January 22, 2026Inventors: Elliot Maurice Simon ROSEMARINE, Sven Ola Johannes HUGOSSON, John Wakefield BROTHERS, III, Jared Corey SMOLENS, Rune HOLM, Jens OLSON, Dominic Hugo SYMES
-
Publication number: 20260023603Abstract: A processor comprising a storage unit comprising a plurality of storage elements. The processor is configured to obtain task data that describes a task to be executed, the task comprising a plurality of operations representable as a directed graph of operations comprising operations connected by connections corresponding to respective logical storage locations. A first connection associated with a first output of a first operation corresponds to a first logical storage location and a second connection associated with a second output of a second operation corresponds to a second logical storage location. The processor is configured to dynamically allocate a first set of the plurality of storage elements of the storage unit to correspond to the first logical storage location and a second set of the plurality of storage elements of the storage unit to correspond to the second logical storage location.Type: ApplicationFiled: July 19, 2024Publication date: January 22, 2026Inventors: Jens OLSON, Jared Corey SMOLENS
-
Publication number: 20260023489Abstract: A processor comprising storage, execution circuitry and a handling unit configured to obtain task data that describes a task to be executed, comprising a plurality of operations representable as a directed graph of operations. The plurality of operations comprises: a set of production operations comprising generating a set of blocks comprising an intermediate block generated by a production operation in determining a final block; and a consumption operation. The handling unit generates a set of location data indicative of respective physical storage locations allocated to store respective blocks, traverses the set of location data to obtain location data indicative of a physical storage location for storing the intermediate block, and generates and sends execution instructions to instruct the execution circuitry to execute at least part of the consumption operation to read the intermediate block from the physical storage location, the execution instructions comprising the location data.Type: ApplicationFiled: July 19, 2024Publication date: January 22, 2026Inventors: Dominic Hugo SYMES, Jens OLSON, Elliot Maurice Simon ROSEMARINE, Ian Rudolf BRATT, Jared Corey SMOLENS, Rajanarayana Priyanka MARIGI, Fredrik Peter STOLT
-
Patent number: 12498976Abstract: A processor to execute a plurality of tasks comprising a first task and a second task. At least a part of the first task is to be executed simultaneously with at least a part of the second task. The processor comprises a handling unit to: determine an available portion of a storage available during execution of the part of the first task; determine a mapping between at least one logical address associated with data associated with the part of the second task and a corresponding at least one physical address of the storage corresponding to the available portion; and identify, based on the mapping, the at least one physical address corresponding to the at least one logical address associated with the data, for storing the data in the available portion of the storage.Type: GrantFiled: October 17, 2022Date of Patent: December 16, 2025Assignee: Arm LimitedInventors: Jens Olson, John Wakefield Brothers, III
-
Patent number: 12499045Abstract: A method and processing unit for mapping coordinates of a tensor to a shared storage of the processing unit. The processing unit comprises processing slices, each for performing a suboperation of the operation, and having prioritized access to a preferred portion of the shared storage. The method comprises obtaining a layout indicating configurable data regions in the shared storage. The configurable data regions are banks comprising regions of shared storage. A stride based on for at least one dimension of the tensor and the layout is obtained. Coordinates of the tensor are mapped to locations in the shared storage, wherein the mapping comprises calculating a bank number and offset which are calculated based on coordinates of the tensor, the layout of the configurable data regions, and the stride of the at least one dimension. Each processing slice performs the suboperation on the data of the tensor based on the mapping.Type: GrantFiled: July 19, 2024Date of Patent: December 16, 2025Assignee: Arm LimitedInventors: Jens Olson, John Wakefield Brothers, III
-
Publication number: 20250362966Abstract: A processor and method for handling data, by obtaining operations from storage, analyzing each of the operations to determine an associated operation space, and generating at least one operation set, wherein the operations of the operation set have substantially similar operation spaces. Receiving input data in the form of a tensor; and allocate the input data, as the input to a given operation of the operation set. The input data having the predetermined input characteristics associated with the given operation. Executing the given operations using the input to produces an output with the known output characteristics. Storing in a segment being associated with an operation of the operation set, the input data; and the output associated with the operation of the operation set.Type: ApplicationFiled: January 12, 2024Publication date: November 27, 2025Applicant: Arm LimitedInventors: Rune Holm, Jared Corey Smolens, Elliot Maurice Simon Rosemarine, Jens Olson
-
Patent number: 12475046Abstract: A method and processing unit for mapping coordinates of a tensor to a shared storage of the processing unit. The processing unit comprises processing slices, each for performing a suboperation of the operation, and having prioritized access to a preferred portion of the shared storage. The method comprises obtaining a layout indicating configurable data regions in the shared storage. The configurable data regions are banks comprising regions of shared storage. A stride based on for at least one dimension of the tensor and the layout is obtained. Coordinates of the tensor are mapped to locations in the shared storage, wherein the mapping comprises calculating a bank number and offset which are calculated based on coordinates of the tensor, the layout of the configurable data regions, and the stride of the at least one dimension. Each processing slice performs the suboperation on the data of the tensor based on the mapping.Type: GrantFiled: July 19, 2024Date of Patent: November 18, 2025Assignee: Arm LimitedInventors: Jens Olson, John Wakefield Brothers, III
-
Patent number: 12271608Abstract: A processor to generate accumulated data comprising, for an operation cycle: performing an operation on a first bit range of a set of first input data to generate a set of operation data, which is accumulated with stored data within a first storage device. A lowest n bits of the accumulated data are accumulated with first further stored data within a first bit range of a second storage device, and are bit-shifted from the first storage device. Further accumulated data is generated, comprising, for an operation cycle: performing the operation on a second bit range of the set of first input data to generate a further set of operation data, which is accumulated with the stored data within the first storage device. A lowest m bits of the further accumulated data is accumulated with second further stored data within a second bit range of the second storage device.Type: GrantFiled: January 20, 2023Date of Patent: April 8, 2025Assignee: Arm LimitedInventors: Dominic Hugo Symes, John Wakefield Brothers, III, Jens Olson, Peter Mattias Hansson
-
Patent number: 12072808Abstract: A processor comprising a first storage managed as a circular buffer to store a plurality of data structures. Each data structure comprises: an identifier, a size indicator and first data associated with instructions for execution of a task. The processor is configured for searching for a data structure in the first storage. A data structure subsequent to the tail data structure can be located using a storage address in the first storage of a tail data structure and the size indicator of all data structures preceding the second data structure among the plurality of data structures. When a data structure is found, the task may be executed based at least in part on the first data of the found data structure.Type: GrantFiled: December 8, 2022Date of Patent: August 27, 2024Assignee: Arm LimitedInventors: Jens Olson, Jared Corey Smolens
-
Publication number: 20240248621Abstract: A processor to generate accumulated data comprising, for an operation cycle: performing an operation on a first bit range of a set of first input data to generate a set of operation data, which is accumulated with stored data within a first storage device. A lowest n bits of the accumulated data are accumulated with first further stored data within a first bit range of a second storage device, and are bit-shifted from the first storage device. Further accumulated data is generated, comprising, for an operation cycle: performing the operation on a second bit range of the set of first input data to generate a further set of operation data, which is accumulated with the stored data within the first storage device. A lowest m bits of the further accumulated data is accumulated with second further stored data within a second bit range of the second storage device.Type: ApplicationFiled: January 20, 2023Publication date: July 25, 2024Inventors: Dominic Hugo SYMES, John Wakefield BROTHERS, III, Jens OLSON, Peter Mattias HANSSON
-
Publication number: 20240248764Abstract: A memory unit configured for handling task data, the task data describing a task to be executed as a directed acyclic graph of operations, wherein each operation maps to a corresponding execution unit, and wherein each connection between operations in the acyclic graph maps to a corresponding storage element of the execution unit. The task data defines an operation space representing the dimensions of a multi-dimensional arrangement of the connected operations to be executed represented by the data blocks; the memory unit configured to receive a sequence of processing requests comprising the one or more data blocks with each data block assigned a priority value and comprising a block command. The memory unit is configured to arbitrate between the data blocks based upon the priority value and block command to prioritize the sequence of processing requests and wherein the processing requests include writing data to, or reading data from storage.Type: ApplicationFiled: May 12, 2023Publication date: July 25, 2024Inventors: Rune HOLM, Jens OLSON, Elliot Maurice Simon ROSEMARINE, Jared SMOLENS