AUTOMATICALLY GENERATING TRAINING DATA FOR A LIDAR USING SIMULATED VEHICLES IN VIRTUAL SPACE
Automated training dataset generators that generate feature training datasets for use in real-world autonomous driving applications based on virtual environments are disclosed herein. The feature training datasets may be associated with training a machine learning model to control real-world autonomous vehicles. In some embodiments, an occupancy grid generator is used to generate an occupancy grid indicative of an environment of an autonomous vehicle from an imaging scene that depicts the environment. The occupancy grid is used to control the vehicle as the vehicle moves through the environment. In further embodiments, a sensor parameter optimizer may determine parameter settings for use by real-world sensors in autonomous driving applications. The sensor parameter optimizer may determine, based on operation of the autonomous vehicle, an optimal parameter setting of the parameter setting where the optimal parameter setting may be applied to a real-world sensor associated with real-world autonomous driving applications.
This application claims the benefit, under 35 U.S.C. § 119(e), of U.S. Provisional Patent Application No. 62/726,841, filed Sep. 4, 2018, which is incorporated herein by reference.
FIELD OF TECHNOLOGYThe present disclosure generally relates to autonomous vehicles, and, more particularly, to generating feature training datasets, and/or other data, for use in real-world autonomous driving applications based on virtual environments.
BACKGROUNDThe background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Machine learning techniques allow correlations, or other associations, to be defined between training datasets and labels. Typical machine learning models require only a trivial amount of data tuples in order to generate sufficiently accurate models. However, sufficiently training a machine-learning based model to control a real-world autonomous vehicle generally requires numerous (e.g., tens of millions of) feature-rich training datasets that correspond to real-world driving concerns and experiences. While collecting real-world observations for trivial datasets to build datasets for traditional machine learning models may be feasible, in contrast, it is generally extremely costly, burdensome, dangerous, and/or impracticable to collect sufficient amounts of training datasets for real-world driving and/or autonomous vehicle activities or purposes. For example, collecting such large amounts of real-world driving datasets may not only be time intensive, but also dangerous because it would necessarily include collecting data related to dangerous real-world vehicular events such as crashes, risky driving, vehicle-and-pedestrian interaction (e.g., including serious injury), etc.
For the foregoing reasons, there is a need for alternative systems and methods to generate feature training datasets for use in real-world autonomous driving applications.
SUMMARYAs described in various embodiments herein, simulated or virtual data may be used to generate and/or obtain feature-rich and plentiful training datasets. In addition, the techniques and embodiments disclosed in the various embodiments herein improve the efficiency and effectiveness of generating and/or collecting numerous autonomous driving datasets, and also address safety concerns with respect to generating sufficient datasets in a non-dangerous and controlled manner when training autonomous vehicles in real-world driving applications.
For example, in various embodiments a non-transitory computer-readable medium, storing thereon instructions executable by one or more processors, may implement an automated training dataset generator that generates feature training datasets for use in real-world autonomous driving applications based on virtual environments. In various aspects, the automated training dataset generator may include an imaging engine configured to generate a plurality of imaging scenes defining a virtual environment. The plurality of imaging scenes may include a plurality of photo-realistic scenes and a plurality of corresponding depth-map-realistic scenes. The automated training dataset generator may include a physics component may configured to generate environment-object data defining how objects or surfaces interact with each other in the virtual environment. The automated training dataset generator may further include an autonomous vehicle simulator configured to control an autonomous vehicle within the virtual environment based on one or both of (i) the plurality of photo-realistic scenes and (ii) the plurality of depth-map-realistic scenes. The automated training dataset generator may further include a dataset component configured to generate one or more feature training datasets based on at least one of (i) the plurality of photo-realistic scenes, (ii) the plurality of depth-map-realistic scenes, or (iii) the environment-object data. The feature training dataset may be associated with training a machine learning model to control an autonomous vehicle in a real-world autonomous driving application.
In additional embodiments, an automated training dataset generation method is disclosed for generating feature training datasets for use in real-world autonomous driving applications based on virtual environments. The automated training dataset generation method may include generating a plurality of imaging scenes defining a virtual environment. The plurality of imaging scenes may include a plurality of photo-realistic scenes and a plurality of corresponding depth-map-realistic scenes. The automated training dataset generation method may further include generating environment-object data defining how objects or surfaces interact with each other in the virtual environment. The automated training dataset generation method may further include controlling an autonomous vehicle within the virtual environment based on one or both of (i) the plurality of photo-realistic scenes and (ii) the plurality of depth-map-realistic scenes. The automated training dataset generation method may further include generating one or more feature training datasets based on at least one of (i) the plurality of photo-realistic scenes, (ii) the plurality of depth-map-realistic scenes, or (iii) the environment-object data. The feature training dataset may be associated with training a machine learning model to control an autonomous vehicle in a real-world autonomous driving application.
In further embodiments, a non-transitory computer-readable medium, storing thereon instructions executable by one or more processors, may implement an occupancy grid generator for generating an occupancy grid indicative of an environment of a vehicle from an imaging scene that depicts the environment. The occupancy grid generator may include a normal layer component configured to generate a normal layer based on the imaging scene. The normal layer may define a two-dimensional (2D) view of the imaging scene. The occupancy grid generator may further include a label layer component configured to generate a label layer. In various aspects, the label layer may be mapped to the normal layer and encoded with a first channel set. The first channel set may be associated with one or more text-based or state-based values of one or more objects of the environment. The occupancy grid generator may further include a velocity layer component configured to generate a velocity layer. In various aspects, the velocity layer may be mapped to the normal layer and encoded with a second channel set. In various aspects the second channel set may be associated with one or more velocity values of one or more objects of the environment. In some embodiments, the occupancy grid generator may generate an occupancy grid based on the normal layer, the label layer, and the velocity layer. The occupancy grid may be used to control the vehicle as the vehicle moves through the environment.
In additional embodiments, an occupancy grid generation method is disclosed for generating an occupancy grid indicative of an environment of a vehicle from an imaging scene that depicts the environment. The occupancy grid generation method may include generating a normal layer based on the imaging scene, the normal layer defining a two-dimensional (2D) view of the imaging scene. The occupancy grid generation method may further include generating a label layer. The label layer may be mapped to the normal layer and encoded with a first channel set. The first channel set may be associated with one or more text-based or state-based values of one or more objects of the environment. The occupancy grid generation method may further include generating a velocity layer. The velocity layer may be mapped to the normal layer and encoded with a second channel set. The second channel set may be associated with one or more velocity values of one or more objects of the environment. The occupancy grid generation method may further include generating an occupancy grid based on the normal layer, the label layer, and the velocity layer. The occupancy grid may be used to control the vehicle as the vehicle moves through the environment.
In further embodiments, a non-transitory computer-readable medium, storing thereon instructions executable by one or more processors, may be configured to implement a sensor parameter optimizer that determines parameter settings for use by real-world sensors in autonomous driving applications. In various aspects, the sensor parameter optimizer may include an imaging engine configured to generate a plurality of imaging scenes defining a virtual environment. The sensor parameter optimizer may further include a sensor simulator configured to receive a parameter setting for each of one or more virtual sensors. The sensor simulator may be to generate, based on the parameter settings and the plurality of imaging scenes, sensor data indicative of current states of the virtual environment. The sensor parameter optimizer may also include an autonomous vehicle simulator configured to control an autonomous vehicle within the virtual environment based on the sensor data. In various aspects, the sensor parameter optimizer may determine, based on operation of the autonomous vehicle, an optimal parameter setting of the parameter setting where the optimal parameter setting may be applied to a real-world sensor associated with real-world autonomous driving applications.
In additional embodiments, a sensor parameter optimizer method for determining parameter settings for use by real-world sensors in autonomous driving applications is disclosed. The sensor parameter optimizer method may include generating a plurality of imaging scenes defining a virtual environment. The sensor parameter optimizer method may further include receiving a parameter setting for each of one or more virtual sensors, and generating, based on the parameter settings and the plurality of imaging scenes, sensor data indicative of current states of the virtual environment. The sensor parameter optimizer method may further include controlling an autonomous vehicle within the virtual environment based on the sensor data, and determining, based on operation of the autonomous vehicle, an optimal parameter setting of the parameter setting. The optimal parameter setting may be applied to a real-world sensor associated with real-world autonomous driving applications.
In accordance with the above, and with the disclosure herein, the present disclosure includes improvements in computer functionality or in improvements to other technologies at least because the claims recite, e.g., generating feature training datasets, or other data, for use in real-world autonomous driving applications based on virtual environments. That is, the present disclosure describes improvements in the functioning of the computer itself or “any other technology or technical field” because feature training datasets, or other data, may be generated for use in real-world autonomous driving applications based on virtual environments. This improves over the prior art at least because collecting large amounts of training data in a real-world environment is both time extensive, dangerous, and generally infeasible.
The present disclosure includes specific features other than what is well-understood, routine, conventional activity in the field, or add unconventional steps that confine the claim to a particular useful application, e.g., because use of the techniques disclosed allow machine learning models and self-driving control architectures for controlling virtual or autonomous vehicles to be generated or developed in a safe, efficient, and effective manner compared with collection of such data to train such models or develop such self-control architectures in the real-world.
Advantages will become more apparent to those of ordinary skill in the art from the following description of the preferred embodiments which have been shown and described by way of illustration. As will be realized, the present embodiments may be capable of other and different embodiments, and their details are capable of modification in various respects. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
The Figures described below depict various aspects of the system and methods disclosed therein. It should be understood that each Figure depicts an embodiment of a particular aspect of the disclosed system and methods, and that each of the Figure is intended to accord with a possible embodiment thereof. Further, wherever possible, the following description refers to the reference numerals included in the following Figures, in which features depicted in multiple Figures are designated with consistent reference numerals.
There are shown in the drawings arrangements which are presently discussed, it being understood, however, that the present embodiments are not limited to the precise arrangements and instrumentalities shown, wherein:
The Figures depict preferred embodiments for purposes of illustration only.
Alternative embodiments of the systems and methods illustrated herein may be employed without departing from the principles of the invention described herein.
DETAILED DESCRIPTIONOverview
Accordingly, a software architecture includes an automated training dataset generator that generates feature training datasets based on simulated or virtual environments. The feature training datasets may be used to train various machine learning models for use in real-world autonomous driving applications, e.g., to control the maneuvering of autonomous vehicles. The feature training datasets may include virtual data based on photo-realistic scenes (e.g., simulated 2D image data), depth-map realistic scenes (e.g., simulated 3D image data), and/or environment-object data (e.g., simulated data defining how objects or surfaces interact), each corresponding to the same virtual environment. For example, the environment-object data for a particular vehicle in the virtual environment may relate to the vehicle's motion (e.g., position, velocity, acceleration, trajectory, etc.). In some embodiments, interactions between objects or surfaces within the virtual environment can affect the data outputted for the simulated environment, e.g., rough roads or potholes may affect measurements of a virtual inertial measurement unit (IMU) of a vehicle. As one embodiment, for example, environment-object data could include data regarding geometry or physics related to a vehicle striking a pothole in a virtual environment. More generally, environment-object data may broadly to refer to information about objects/surfaces within a virtual environment, e.g., interactions between objects or surfaces in the virtual environment and how those interactions effect the objects or surfaces in the virtual environment, e.g., a vehicle hitting a pothole. In addition, or alternatively, environment-object data may define how objects or surfaces will interact if they come into contact with other objects or surfaces in a virtual environment (e.g., indicating hardness, shape/profile, roughness, etc. of objects or surfaces in a virtual environment). Still further, in addition, or alternatively, environment-object data may define how objects or surfaces interact when such objects or surfaces do, in fact, interact with each other within a virtual environment (e.g., data indicating shock to a virtual vehicle when it strikes a virtual pothole, etc.).
The virtual environment may be generated and/or rendered from the viewpoint of one or more autonomous vehicle(s) operating within the virtual environment. In some implementations, the feature training datasets may be updated with real-world data such that the feature training datasets include both simulated data and real-world data.
In some implementations, the autonomous vehicle may follow either a standard or a randomized route within the virtual environment. The standard route may cause the training dataset generator to produce virtual data that tests autonomous vehicle behavior via a predefined route (e.g., to provide a better comparative assessment of the performance of the autonomous vehicle as design changes are made over time). The randomized route may cause the training dataset generator to produce virtual data that tests autonomous vehicle behavior via a route with a number of randomly-generated aspects (e.g., random street layouts, random driving behaviors of other vehicles, etc.). In this way, the randomized route may cause the generation of robust training data by ensuring that a broad array of environments and scenarios are encountered.
Each object or surface within the virtual environment may be associated with one or more descriptors or labels. Such descriptors or labels can include a unique identifier (ID) identifying the surface or object within the virtual environment. The descriptors or labels can also be used to define starting points, starting orientations, and/or other states or statuses of objects or surfaces within the virtual environment. The descriptors or labels can also be used to define object class(es) and/or future trajectory of an object or surface within the virtual environment.
In some implementations, a fully autonomous vehicle may interact with simple waypoint vehicles that follow predetermined routes within the virtual environment. The training dataset generator may generate feature training datasets for the fully autonomous vehicle based in part on interactions between the fully autonomous vehicle and the waypoint vehicles. Despite their relatively simple control algorithms or architectures, the waypoint vehicles may simulate different driving strategies so as to vary the interactions between the waypoint vehicles and the fully autonomous vehicle, and thereby vary the feature training datasets generated from such interactions. For example, one or more virtual waypoint vehicles may be configured to navigate respective predetermined route(s) including a number of roads or intersections. The one or more virtual waypoint vehicles may also be configured to perform certain activities within the virtual environment or have certain behaviors. For example, in one embodiment, a waypoint vehicle may be configured to exceed a speed limit or to run a red light. Such activity or behavior may cause the fully autonomous vehicle to react in a particular manner within the virtual environment, which, in turn, would cause the training dataset generator to generate feature training datasets for the fully autonomous vehicle based on the reaction.
In some implementations, a sensor simulator may generate simulated sensor data within the virtual environment. For example, one or more virtual sensors may be placed in various positions around one or more vehicles in the virtual environment for the purpose of generating the simulated sensor data. The sensor simulator may simulate lidar (e.g., light detection and ranging) readings using ray casting or depth maps, for example, and/or images captured by a camera, etc. In addition, particular objects or surfaces in the virtual environment may be associated with reflectivity values for the purpose of simulating lidar and/or thermal camera readings. Lidar parameters such as scan patterns, etc., can be optimized, and/or models that control lidar parameters may be trained, using the data collected by simulating lidar readings in the virtual environment. The reflectively data or other simulated data may be accessed efficiently and quickly using direct memory access (DMA) techniques.
In still further implementations, the virtual environment may be at least partially generated based on geo-spatial data. Such geo-spatial data may be sourced from predefined or existing images or other geo-spatial data (e.g., height maps or geo-spatial semantic data such as road versus terrain versus building data) as retrieved from remote sources (e.g., Mapbox images, Google Maps images, etc.). For example, the geo-spatial data may be used as a starting point to construct detailed representations of roads, lanes for the roads, and/or other objects or surfaces within the virtual environment. If previously collected image or depth data is available for a particular region of the virtual environment, then the system also can use real-world lidar data, and/or use techniques such as SLAM or photogrammetry to construct the virtual environment to provide additional real-world detail not specified by the map-based geo-spatial data.
The autonomous vehicle may implement configurable driving strategies for more diversified generation of feature training datasets. In some implementations, generative machine learning models, such as generative adversarial networks (GANs), may be used to dynamically generate objects, surfaces, or scenarios within the virtual environment, including, for example, dynamically generated signs, obstacles, intersections, etc. In other embodiments, standard procedural generation (“proc gen”) may also be used.
More generally, generative machine learning models may be used to generate at least a portion of the virtual environment. In addition, user-built (by users) and procedurally generated parts of the virtual world can be combined. Configurable parameters may allow a user to set the status or state of objects, surfaces, or other attributes of the virtual environment. For example, the configurable parameters may include the starting position of a vehicle within the virtual environment, or time of day, weather conditions, etc., or ranges thereof. A configuration file manager may be used to accept a predefined configuration that defines the configurable parameters.
In some implementations, correspondences between actions (e.g., driving forward in a certain setting) and safety-related outcomes (e.g., avoiding collision) can be expressed as a ground truth and used in generating training dataset(s) or other data as described herein. For example, the ground truth may be expressed as a series of ground truth values that each include an action parameter and a corresponding safety parameter. The safety parameter may define a safety-related outcome (e.g., crash, no crash, etc.), or a degree of safety (e.g., 1% collision risk, etc.). Unlike in the real-world, ground truth correspondences may be learned by simulating alternative virtual realities relative to any given starting point/scenario. For example, the simulator may show that maintaining a lane in a certain scenario results in no crash (or results in a situation with a 0.002% crash risk, etc.), while moving to the right lane in the exact same scenario results in a crash (or results in a situation with a 1.5% crash risk, etc.). The ground truth data may be used for various types of training. For example, in an embodiment where an autonomous vehicle implements a number of independent, self-driving control architectures (SDCAs) in parallel, and makes driving decisions by selecting the driving maneuvers that are indicated by the most SDCAs (i.e., a “vote counting” process), the ground truth data may be useful to learn which SDCAs are more trustworthy in various scenarios. As another example, because the simulator can be forward-run many times from any starting point/scenario, the likelihood that a given prediction (e.g., of the state of the vehicle environment) will come to pass can be determined with a fairly high level of confidence. Thus, the ground truth data may be used to train a neural network that predicts future states of the vehicle environment (e.g., for purposes of making driving decisions).
In some implementations, an occupancy grid indicative of an environment of a vehicle is generated from an imaging scene that depicts the environment. The occupancy grid includes, or is generated from several layers (e.g., rendering layers), including a normal layer (e.g., a game engine or camera layer representing a virtual camera view, e.g., of a road/building picture scene), a label layer (e.g., text-based or state-based values describing objects in the virtual environment), and a velocity layer (e.g., velocity values defining direction/speed of moving objects). The label layer and velocity layers have channel sets (e.g., RGB based channel) for encoding their respective values within the layers. For example, class labels and velocity vectors can be transformed into an RGB encoding at different rendering layers, including, for example, the label layer and velocity layer, each of which a virtual camera of the virtual environment would recognize. The RGB encoding may then be decoded to generate information related to, for example, the locations of objects of different classes and their velocities. The occupancy grid may be used to control an autonomous vehicle as the autonomous vehicle moves through the environment, in either a virtual or a real-world environment. The multi-layer encoding of the occupancy grid, including normal, label, and velocity layers, provides a highly efficient representation of the environment.
In still further implementations, a sensor parameter optimizer may determine parameter settings for use by real-world sensors in autonomous driving applications. For example, the sensor parameter optimizer may include the sensor simulator discussed above, and may determine, based on the operation of an autonomous vehicle reacting to the simulated sensor data within the virtual environment, an optimal parameter setting, and/or a range or an approximation of such settings, for use with a real-world sensor associated with real-world autonomous driving applications.
Example Automated Training Dataset GeneratorGraphics platform 101 may include one or more processor(s) 150 as well computer memory 152, which could comprise one or more computer memories, memory chips, etc. as illustrated in
Processor(s) 150 may be connected to memory 152 via a computer bus 151 responsible for transmitting electronic data, data packets, or otherwise electronic signals to and from the processor(s) 150 and memory 152 in order to implement or perform the machine readable instructions, methods, processes, elements or limitations, as illustrated, depicted, or described for the various flowcharts, illustrations, diagrams, figures, and/or other disclosure herein.
Processor(s) 150 may interface with memory 152 via computer bus 151 to execute the operating system (OS). The processor(s) 150 may also interface with memory 152 via computer bus 151 to create, read, update, delete, or otherwise access or interact with the data stored in memory 152. In some embodiments, the memory(s) may store information or other data as described herein in a database (e.g., a relational database, such as Orcale, DB2, MySQL, or a NoSQL based database, such as MongoDB). The data stored in memory 152 may include all or part of any of the data or information described herein, including, for example, the photo-realistic scenes, the depth-map-realistic scenes, the environment-object data, feature training dataset(s), or other information or scenes as described herein.
Graphics platform 101 may include one or more graphical processing unit(s) (GPU) 154 for rendering, generating, visualizing, or otherwise determining the photo-realistic scenes, depth-map-realistic scenes, point cloud information, the feature training dataset(s), views, visualizations, 2D or 3D scenes, or other information as described herein.
Graphics platform 101 may further include a communication component 156 configured to communicate (e.g., send and receive) data via one or more external/network port(s) to one or more network(s) 166. According to some embodiments, communication component 156 may include, or interact with, one or more transceivers (e.g., WWAN, WLAN, and/or WPAN transceivers) functioning in accordance with IEEE standards, 3GPP standards, or other standards, and that may be used in receipt and transmission of data via external/network ports.
In some embodiments, graphics platform 101 may include a client-server platform technology such as ASP.NET, Java J2EE, Ruby on Rails, Node.js, a web service or online API, responsive for receiving and responding to electronic requests via communication component 156.
Processor(s) 150 may interact, via the computer bus 151, with memor(ies) 152 (including the applications(s), component(s), API(s), data, etc. stored therein) to implement or perform the machine readable instructions, methods, processes, elements or limitations, as illustrated, depicted, or described for the various flowcharts, illustrations, diagrams, figures, and/or other disclosure herein.
Graphics platform 101 may further include or implement I/O connections 158 that interface with I/O device(s) 168 configured to present information to an administrator or operator and/or receive inputs from the administrator or operator. For example, an operator interface may include a display screen. I/O device(s) 168 may include touch sensitive input panels, keys, keyboards, buttons, lights, LEDs, which may be accessible via graphics platform 101. According to some embodiments, an administrator or operator may access the graphics platform 101 via I/O connections 158 and I/O device(s) 168 to review information, make changes, input training data, and/or perform other functions.
In some embodiments, graphics platform 101 may perform the functionalities as discussed herein as part of a “cloud” network or may otherwise communicate with other hardware or software components within the cloud to send, retrieve, or otherwise analyze data, dataset(s), or information described herein.
In general, a computer program or computer based product in accordance with some embodiments may include a computer usable storage medium, or tangible, non-transitory computer-readable medium (e.g., standard random access memory (RAM), an optical disc, a universal serial bus (USB) drive, or the like) having computer-readable program code or computer instructions embodied therein, wherein the computer-readable program code or computer instructions may be installed on or otherwise adapted to be executed by the processor(s) 150 (e.g., working in connection with the respective operating system in memory 152) to facilitate, implement, or perform the machine readable instructions, methods, processes, elements or limitations, as illustrated, depicted, or described for the various flowcharts, illustrations, diagrams, figures, and/or other disclosure herein. In this regard, the program code may be implemented in any desired program language, and may be implemented as machine code, assembly code, byte code, interpretable source code or the like (e.g., via Golang, Python, C, C++, C #, Objective-C, Java, Scala, Actionscript, JavaScript, HTML, CSS, XML, etc.).
Automated training dataset generator 100, in some implementations, may include one or more software engines, components, or simulators for rendering, generating, or otherwise determining the feature training dataset(s), scenes, data, or other information as described herein. In some embodiments, imaging engine 102, sensor simulator 104, physics component 106, autonomous vehicle simulator 108, dataset component 110, and/or sensor parameter optimizer 112 may be separate software entities. For example, the imaging engine 102 may be provided by a third-party provider, such as a commercial or open source based gaming engine. For example, the in some embodiments, imaging engine 102 of automated training dataset generator 100 may be a gaming engine implemented via multimedia application programming interface(s) (e.g., DirectX, OpenGL, etc.) that is/are executed by the automated training dataset generator 100. In other embodiments, imaging engine 102, sensor simulator 104, physics component 106, autonomous vehicle simulator 108, dataset component 110, and/or sensor parameter optimizer 112 may be part of the same software library, package, API, or other comprehensive software stack designed to implement the functionality as described herein.
It will be understood that various arrangements and configurations of the components of automated training dataset generator 100 (e.g., imaging engine 102, sensor simulator 104, physics component 106, autonomous vehicle simulator 108, dataset component 110, and/or sensor parameter optimizer 112) such that the disclosure of the components of automated training dataset generator 100 do not limit the disclosure to any one particular embodiment. It is to be further understood that, in some embodiments (not shown), certain components may perform the features of other components. For example, in some embodiments the imaging engine 102 may perform one or more of the features of the sensor simulator 104 and/or physics component 106. Thus, the components of automated training dataset generator 100 (e.g., imaging engine 102, sensor simulator 104, physics component 106, autonomous vehicle simulator 108, dataset component 110, and/or sensor parameter optimizer 112) are not limited and may perform the features of other components of automated training dataset generator 100 as described herein.
Automated training dataset generator 100 of
Imaging engine 102 may be configured to generate a plurality of imaging scenes defining a virtual environment. The plurality of imaging scenes generated by imaging engine 102 may include a plurality of photo-realistic scenes and a plurality of corresponding depth-map-realistic scenes. The imaging scenes may be generated by processor(s) 150 and/or GPU(s) 154.
In various embodiments, the imaging engine 102 may be a virtual engine or gaming engine (e.g., a DirectX-based, OpenGL-based, or other gaming engine) that can render 2D and/or 3D images of a virtual environment. The virtual environment, as referred to herein, may include a computer rendered environment including streets, roads, intersections, overpasses, vehicles, pedestrians, buildings or other structures, traffic lights or signs, or any other object or surface capable of being rendered in a virtual environment, such as a 2D or 3D environment. In some embodiments, imaging engine 102 may consist of a third-party engine, such as a gaming engine including any of the Unreal gaming engine, the Unity gaming engine, the Godot gaming engine, the Amazon Lumberyard gaming engine, or other such engines. In other embodiments, imaging engine 102 may also be a proprietary engine, or a partial-proprietary engine (e.g., comprising third-party and proprietary source code), developed for the purpose of generating imaging scenes, e.g., photo-realistic scenes, depth-map-realistic scenes, or other such information as described herein. Imaging engine 102 may implement one or many graphic API(s) for rendering or generating imaging scenes, depth-map-realistic scenes, or other such information as described herein. Such APIs may include the OpenGL API, DirectX API, Vulkan API, or other such graphics and rendering APIs. The APIs may interact with GPU(s) 154 to render the imaging scenes, e.g., photo-realistic scenes, depth-map-realistic scenes, or other such information as described herein, and/or to provide hardware-accelerated rendering, which, in some embodiments, could increase the performance, speed, or efficiency in rendering such scenes or information.
Imaging scenes generated, rendered or otherwise determined via imaging engine 102 of the automated training dataset generator 100 of
A photo-realistic scene, such as photo-realistic scene 200 of
In various embodiments, a 2D image representing a photo-realistic scene (e.g., photo-realistic scene 200) may comprise 2D pixel data (e.g., RGB pixel data) that may be a part of, may include, may be used for, or otherwise may be associated with the feature training dataset(s) described herein. It is to be understood that 2D images, in at least some (but not necessarily all) embodiments, may be initially generated by imaging engine 102 (e.g., a gaming engine) as a 3D image. The 3D image may then be rasterized, converted, or otherwise transformed into a 2D image, e.g., having RGB pixel data. Such RGB pixel data may be used as training data, datasets(s), or as otherwise described herein. In addition, the 3D and/or 2D image may also be converted or otherwise transformed into point cloud data and/or simulated point cloud data, e.g., as described with respect to
Additionally, or alternatively, imaging scenes generated, rendered or otherwise determined via imaging engine 102 of the automated training dataset generator 100 of
In some embodiments, a vehicle's action(s), e.g., processor(s) 150 may determine or predict how a vehicle is to move within a virtual environment or otherwise act. For example, processor(s) 150 may be configured to operate an autonomous vehicle in accordance with a predetermined ground truth route. For example, in a reinforcement learning simulation (e.g., a simulation ran against a ground truth route 100 times) a vehicle acting according to, and/or in operation with, a ground truth (e.g., by staying on, or operating in accordance with, a ground truth route) would cause the generation of a digital or electronic reward (e.g., incrementing an overall success rate based on the vehicle's behavior). Based on the reward, the automated training dataset generator 100 may adjust vehicle or driving parameters to maximize reward/increase performance of predictions (e.g., update weights of a machine model to correspond to a higher margin of safety) in order to cause the autonomous vehicle to operate more closely with the predetermined ground truth route. For example, rewards, e.g., positive values generated based on positive actions by the vehicle, may be generated when the vehicle, e.g., avoids safety violations (e.g., crashes, and/or disobeying rules of the road, etc.), executes a particular driving style (e.g., aggressive/fast, or smooth with low G-force levels, etc.), and/or any other similar or suitable action indicating a positive action (e.g., by operating in accordance with, or closer to, the ground truth route). In some aspects, the standard route may be useful for implementing vote counters and the like.
In some embodiments, a standard route (e.g., such as a ground truth route) may be used to collect safety data. In such embodiments, ground truth correspondences (e.g., data) may be determined and generated based on an autonomous vehicle's behavior, and autonomous decisions (e.g., as determined by processor(s) 150, etc.) when choosing between actions taking safety into account (e.g., whether to swerve away from a group of pedestrians at the risk of colliding with a wall). In certain embodiments, one or more outputs of a machine learning model may be compared to ground truth value(s). In such embodiments, the ground truth value(s) may each include representations of vehicle action (e.g., from vehicles including vehicles 401, 451, 700, and/or 756 as described herein) and a corresponding safety parameter defining, e.g., a safety-related outcome, or a degree of safety that is associated with the vehicle action. In some embodiments, a machine learning model may be updated to choose vehicle actions that maximize a degree of safety across a plurality of ground truth values. However, in other embodiments, a machine learning model may be updated to choose vehicle actions that vary the degree of safety (e.g., risking driving to safe driving) across a plurality of ground truth values.
In other embodiments, a video (e.g., multiple frames, images, scenes, etc. as described herein) may define an autonomous vehicle (e.g., vehicle 401, 451, 700, and/or 756 as described herein) moving along an undetermined route within the virtual environment. In such embodiments, the undetermined route may be a randomized route. Such randomized route may have multiple different permutations (e.g., different environment characteristics, streets, or other objects or surfaces) for testing or verifying a virtual autonomous vehicle, and its actions, in its virtual environment.
A point cloud representation of
Imaging scenes generated via imaging engine 102 of automated training dataset generator 100 of
As represented in
In other embodiments, one or more color or RGB pixels of a depth-map-realistic scene (e.g., depth-map-realistic scene 390) may be associated with one or more corresponding simulated intensity or reflectivity values. An intensity value may correspond to the intensity of scattered light received at the lidar sensor, and a reflectivity value may correspond to the reflectivity of an object or surface in the virtual environment. In such embodiments, the intensity or reflectivity values may represent one or more virtual lidar sensors, e.g., of a virtual autonomous vehicle such as vehicle 700 or 760, which may simulate one or more real-world lidar sensors as described herein.
Physics component 106 may be configured to generate environment-object data defining how objects or surfaces interact with each other in the virtual environment. Environment-object data provides the feature training dataset(s) with high quality metric(s), e.g., of how a vehicle (e.g., vehicle 700 or 760) reacts to virtual environment stimuli. In various embodiments, environment-object data defines how a first object or surface interacts with a second object or surface within the virtual environment. For example, a first object or surface may be a virtual autonomous vehicle (e.g., vehicle 401, 451, 700, and/or 756 as described herein) operating within the virtual environment and a second object or surface may be a virtual pothole within the virtual environment. Environment-object data may be generated that details, or explains, how the virtual vehicle reacts to striking the pothole. In such an embodiment, for example, environment-object data may be physics based data such as force, speed, timing, damage, or other such metrics may be generated by physics component 106 detailing how the virtual autonomous vehicle reacts to physics. In some embodiments, the environment-object data may indicate or detail how parts of the car may react to such physical stimuli (e.g., striking the pothole 398). For example, an autonomous vehicle (e.g., vehicle 401, 451, 700, and/or 756 as described herein) may be associated with virtual or simulated shocks or sensors, which may record, or cause the recordation, of environment-object data when a car interacts with objects or surfaces within the virtual environment (e.g., strikes a pothole). In other words, the environment-object data may describe, or numerically explain, what happens to the autonomous vehicle as it interacts with objects or surfaces in its virtual environment. Further examples of environment-object data are described with respects to
In some embodiments, environment-object data may be generated for, and thus relate to, the motion of the autonomous vehicle 401 itself within the virtual environment. For example, in some embodiments, the motion of an autonomous vehicle (e.g., vehicle 401) may be defined by one or more of a position of the autonomous vehicle (e.g., vehicle 401), a velocity of the autonomous vehicle (e.g., vehicle 401), an acceleration of the autonomous vehicle (e.g., vehicle 401), or a trajectory of the autonomous vehicle (e.g., vehicle 401) as depicted in
In other embodiments, an autonomous vehicle simulator 108 (as further disclosed herein for
In some embodiments, the automated training dataset generator 100 may include a configuration manager (not shown). The configuration manager may accept a predefined configuration defining configuration information for one or more objects or surfaces (e.g., objects or surfaces 401-418) within the virtual environment (e.g., the virtual environment of scene 400 from
In some embodiments, objects or surfaces (e.g., 401-418 of
In some embodiments automated training dataset generator 100 may include a geo-spatial component (not shown) configured to generate a virtual environment based on geo-spatial data. In various embodiments, geo-spatial data may define one or more positions of simulated objects or surfaces within a virtual environment. For example, as illustrated by
In some embodiments, geo-spatial data may include geo-spatial metadata. The geo-spatial metadata may include or expose detail parameters used by automated training dataset generator 100 (e.g., by the imaging engine 102) for generating the one or more simulated objects or surfaces (e.g., 401-418 of
Together, geo-spatial data and its related metadata may be used by the automated training dataset generator 100 and/or geo-spatial component to render such data within a virtual environment into a detailed roadway that has realistic lanes and shoulders, etc. For example, in such embodiments, geo-spatial metadata may define a four-lane, two-way highway with a particular width and particular waypoints which may be rendered by the automated training dataset generator 100 and/or geo-spatial component into virtual four lane highway mesh suitable for simulation with a virtual environment (e.g., the virtual environment of scene 400).
In still further embodiments, the objects or surfaces generated via geo-spatial data and/or the geo-spatial component may include predefined images. In some instances, the predefined images may be sourced (e.g., downloaded) from a remote server (e.g., via computer network(s), such as network(s) 166), which such the predefined images are loaded into a virtual environment (e.g., the virtual environment of scene 400 of
Similarly, in additional embodiments, geo-spatial data may include real-world lidar based data. Such real-world lidar based data may, for example, be loaded into, and used to update and/or build, a virtual environment (e.g., the virtual environment of scene 400 of
In still further embodiments, the geo-spatial component of automated training dataset generator 100 may update a virtual environment via a simultaneous localization and mapping (SLAM) technique. SLAM is a mapping and navigation technique that constructs and/or updates a map of an unknown environment while simultaneously keeping track of an agent's (e.g., vehicle's, such as vehicle 401 and/or 451) location within it. For example, in the embodiment of
Photo-realistic scene 450 illustrates the application of descriptors to various environment-object data of various objects or surfaces with the virtual environment. In particular, various objects or surfaces include descriptors 451-482 that may indicate the type of objects or surfaces of the environment-object data that may interact with one another. In various embodiments, descriptor data (e.g., descriptors 451-482) may be included in training data/datasets to train machine learning models and/or self-driving control architectures for controlling autonomous vehicles as described herein. In some embodiments, each of the objects or surfaces may be associated with a tracking identifier (TID) (e.g., a unique identifier (ID)) that tracks objects and surfaces (e.g., vehicles) within each frame. In certain embodiments, a descriptor of each object or surface may include any one or more of the following: a unique identifier (ID) of the object or surface in the virtual environment, a category of the object or surface as defined within the virtual environment, a position value of the object or surface within the virtual environment, an orientation of the object or surface within the virtual environment, a velocity of the object or surface within the virtual environment, a reflectivity of the object or surface within the virtual environment, or a status of the object within the virtual environment. An orientation of an object or a surface may be represented by a surface normal vector (e.g., a vector that is orthogonal to the object or surface at a particular location or pixel on the object or surface).
In still further embodiments, a descriptor (e.g., any of descriptors 451-482) of an object or surface may include one or both of the following: an object class of an object or surface in the virtual environment or a future trajectory of an object or surface in the virtual environment. In this way, each object or surface within the virtual environment of
For example, the descriptors 451-482 may be used to determine features or descriptors, e.g., feature training dataset(s) used to train machine learning models as described herein. As depicted in
Each of the descriptors 451-482 may represent, mark, identify, or otherwise describe individual pixels, or multiple pixels, within photo-realistic scene 450 of
As depicted by
In some embodiments, a virtual environment may include simple waypoint vehicles (e.g., vehicles 412 and/or 414 of
In certain aspects, the one or more waypoint vehicles may implement, via autonomous vehicle simulator 108, one or more driving strategies, which may include, e.g., a conservative driving strategy, an aggressive driving strategy, or a normal driving strategy. The different driving strategies may add variability to waypoint vehicle behavior thereby adding variability to any feature training dataset(s) generated from the autonomous vehicle interacting with the waypoint vehicle. In some embodiments, a machine learning model, as described herein, may be trained with reinforcement learning techniques based on vehicle operation data captured when the autonomous vehicle interacts with the one or more waypoint vehicles. For example, reinforcement learning may be used on full-stack autonomous vehicles to train such vehicle in environments having waypoint vehicles moving in predicable ways.
In other embodiments, autonomous vehicle simulator 108 may be further configured to apply or execute one or more driving strategies within a virtual environment (e.g., the virtual environment of
In some embodiments, a scenario simulator (not shown) may be configured to generate one or more simulated environment scenarios, wherein each of the simulated environment scenario(s) corresponds to a variation of a particular object, surface, or situation within the virtual environment. A particular object, surface, or situation may include, for example, a road (e.g., 402 or 403), an intersection (e.g., 407), a stop sign, or a traffic light (e.g., 416). For example, in one embodiment, automatic generation of simulated scenarios may include generation of variations on scenarios including traffic signage, e.g., the generation of thousands of different stop signs with weeds or other obstructions in front of them to determine how an autonomous vehicle (e.g., vehicle 401) would react to such variation with in the virtual environment. In still further embodiments, a particular object, surface, or situation is may be a pedestrian's activity within the virtual environment (e.g., of pedestrian 409) or a railroad arm's behavior within the virtual environment. Accordingly, such embodiments, as associated with the scenario simulator, may include the automatic generation of simulated situations, e.g., various vehicle, surface, and/or object situations that may provide diversity and or variability with respect to the generation of feature training dataset(s) of a virtual environment e.g., for training machine learning models as described herein.
In some embodiments, simulated scenarios may be generated by the scenario simulator via procedural generation (“proc gen”) or other techniques, including machine learning models, such as generative adversarial networks (GANs). For example, at least portion of the virtual environment depicted in scene 400 may be generated via a plurality of generative machine learning models. In such embodiments, at least one of the plurality of generative machine learning may be a GAN. A GAN-based approach may involve artificial intelligence algorithm(s) implementing unsupervised machine learning. The algorithms may include two neural networks contesting with each other to generate or determine feature training data(s) or other information within a virtual environment (e.g., the virtual environment of
As depicted by
In some embodiments, the sensor simulator 104 may position one or more virtual sensors in a virtual environment (e.g., any of the virtual environment(s) depicted and described for
In other embodiments, the sensor simulator 104 may generate the sensor data via ray casting. Ray casting may include a rendering technique to create a 3D perspective of a scene of a virtual environment. Ray casting may include casting a virtual ray, from a point of origin in a scene in direction against colliders (e.g., objects or surfaces) in the scene. Ray casting may be performed, for example, for validation purposes (e.g., to validate depths, etc.). In some aspects the, sensor simulator 104 may generate simulated lidar data or simulated radar data.
In further embodiments, the sensor simulator 104 may generate the sensor data based on the depth-map-realistic scenes. In some aspects, the sensor simulator may generate the sensor data using a graphic shader, e.g., such as a graphic shader of a gaming engine.
In some embodiments, a particular object or surface may be associated with a reflectivity value within a virtual environment, and the sensor simulator 104 may generate at least a portion of the sensor data (e.g., virtual lidar data) based on the reflectivity value. In such embodiments, the reflectivity value is derived from a color of the particular object or surface. For example, a reflectivity value may be derived from color of object, where brighter objects have higher reflectively values. This may be based on albedo properties and/or basic light properties such as, in the real-world, white objects reflects all color light (i.e., reflects all light spectrums), but black objects absorb all color light). Basing reflectivity values on colors in a virtual environment allows a variability of color so that objects may be detected differently by sensors of an autonomous vehicle (e.g., vehicle 401 and/or 451) in the virtual environment (e.g., black car and black shirt person may be seen by infrared and not regular colors). In still further embodiments, the reflectivity value is derived from a normal angle to a position of a virtual sensor, e.g., within the virtual environment.
Automated training dataset generator 100 may further include a dataset component 110 configured to generate one or more feature training datasets based on at least one of (i) the plurality of photo-realistic scenes, (ii) the plurality of depth-map-realistic scenes, or (iii) the environment-object data. For example, pixel data or other such information of a virtual environment (e.g., any of the virtual environment(s) depicted and described for
In various embodiments, pixel data or information of the imaging scenes and/or virtual environments disclosed herein simulates or mimics pixel data captured from, and or generated by, real-world cameras or other sensors. For example, as described in U.S. Provisional Patent Application Ser. No. 62/573,795 entitled “Software Systems and Methods for controlling an Autonomous Vehicle,” which was filed on Oct. 28, 2017, the entire disclosure of which is hereby incorporated by reference, a real-world lidar system of a vehicle (e.g., of vehicles 700 or 760) may be used to determine the distance to one or more downrange targets, objects, or surfaces. The lidar system may scan a field of regard to map the distance to a number of points within the field of regard. Each of these depth-mapped points may be referred to as a pixel. A collection of pixels captured in succession (which may be referred to as a depth map, a point cloud, or a point cloud frame) may be rendered as an image or may be analyzed to identify or detect objects or to determine a shape and/or distance of objects within the field of regard. For example, a depth map may cover a field of regard that extends 60° horizontally and 15° vertically, and the depth map may include a frame of 100-2000 pixels in the horizontal direction by 4-400 pixels in the vertical direction. Accordingly, each pixel may be associated with a distance (e.g., a distance to a portion of a target, object, or surface from which the corresponding laser pulse was scattered) or one or more angular values. Thus, the pixel data or information of the imaging scenes and/or virtual environments disclosed herein simulates or mimics pixel data captured from, and or generated by, real-world cameras or other sensors, and thus can be used to effectively train machine learning models applicable to real-world driving applications, such as real-world autonomous vehicles operating in real-world environments.
In some embodiments, virtual data and real-world data may be combined for purposes of generating feature training dataset(s) and/or for generating machine learning model(s) for operation of real or virtual autonomous vehicle(s). For example, one or more virtual objects (e.g., a virtual road or street, a virtual building, a virtual tree, a virtual traffic sign, a virtual traffic light, a virtual pedestrian, a virtual vehicle, or a virtual bicycle) may be superimposed onto real-world sensor data to generate a training dataset. As another example, real-world sensor data and simulated sensor data may be combined, and in some instances, normalized using a same format (e.g., having same data fields). In some embodiments, for example, dataset component 110 of automated training dataset generator 100 may be configured to generate at least one real-world training dataset. The real-world data may include real-world environment-object data as captured by one or more sensors (e.g., accelerometers, gyroscopes, motion sensors, or GPS devices) associated with a real-world vehicle, or as derived from such sensor data (e.g., in some embodiments, real-world environment-object data could be derived or determined indirectly or calculated from sensor data). The real-world training dataset may be based on real-world data and may be normalized with respect to one or more feature training datasets (e.g., one or more feature training datasets data formats). In such embodiments, the real-world training dataset may be associated with training a machine learning model to control an autonomous vehicle in a real-world autonomous driving application. In some embodiments, the real-world data may include a real-world photo-realistic scene as captured by a two-dimensional (2D) camera. In still further embodiments, the real-world data may include a real-world depth-map realistic scene as captured by a three-dimensional (3D) sensor. In such embodiments, the three-dimensional (3D) sensor may be a lidar-based sensor.
Feature training dataset(s) as generated by automated training dataset generator 100 may be used to train a machine learning model to control an autonomous vehicle in a real-world autonomous driving application. In some embodiments, the feature training dataset(s) may be stored in memory 152. The machine learning model may be trained, for example, via the processor(s) 150 executing one or more machine learning algorithms using the feature training dataset(s), stored in memory 152 or read directly from dataset component 110, input (e.g., used as features and labels) to the one or more machine learning algorithms.
The machine learning model, as trained with the training dataset(s) as generated by the automated training dataset generator 100, may be trained using a supervised or unsupervised machine learning program or algorithm. The machine learning program or algorithm may employ a neural network, which may be a convolutional neural network, a deep learning neural network, or a combined learning module or program that learns in two or more features or feature datasets in a particular areas of interest. The machine learning programs or algorithms may also include natural language processing, semantic analysis, automatic reasoning, regression analysis, support vector machine (SVM) analysis, decision tree analysis, random forest analysis, K-Nearest neighbor analysis, naïve Bayes analysis, clustering, reinforcement learning, and/or other machine learning algorithms and/or techniques. Machine learning may involve identifying and recognizing patterns in data (such as pixel or other data or information in of the imaging scenes, e.g., photo-realistic scenes, depth-map-realistic scenes, or other such information as described herein) in order to facilitate making predictions for subsequent data (to predict or determine actions and behaviors of objects or surfaces in an environment for the purpose of controlling an autonomous vehicle in a real-world autonomous driving application in that environment).
Machine learning model(s), such as those trained using feature training dataset(s) as generated by automated training dataset generator 100, may be created and trained based upon example (e.g., “training data,”) inputs or data (which may be termed “features” and “labels”) in order to make valid and reliable predictions for new inputs, such as testing level or production level data or inputs. In supervised machine learning, a machine learning program operating on a server, computing device, or otherwise processor(s), may be provided with example inputs (e.g., “features”) and their associated outputs (e.g., “labels”) in order for the machine learning program or algorithm to determine or discover rules, relationships, or otherwise machine learning “models” that map such inputs (e.g., “features”) to the outputs (e.g., labels), for example, by determining and/or assigning weights or other metrics to the model across its various feature categories. For example, in at least some embodiments, virtual environments as described herein may include various labels and relate features that may be used in training data (see, e.g.,
In unsupervised machine learning, the server, computing device, or otherwise processor(s), may be required to find its own structure in unlabeled example inputs, where, for example, multiple training iterations are executed by the server, computing device, or otherwise processor(s) to train multiple generations of models until a satisfactory model, e.g., a model that provides sufficient prediction accuracy when given test level or production level data or inputs, is generated. The disclosures herein may use one or both of such supervised or unsupervised machine learning techniques.
A machine learning model, as used herein to control an a real-world autonomous vehicle, may be trained using pixel data, label data, or other such information associated with an imaging scene, e.g., photo-realistic scenes, depth-map-realistic scenes, or other such information as described herein, as feature and/or label data. The machine learning models may then be implemented as, or as part of, a self-driving control architecture (SDCA) to control a real-world autonomous vehicle as further described herein.
Control of a real-world autonomous vehicle may involve a machine learning model, as trained in accordance with the disclosure herein, to predict, detect, and/or track various objects or surfaces experienced in a virtual environment (such as the environments illustrated by each of
The sensor data 502 is input to a perception component 506 of the SDCA 500, and is processed by the perception component 506 to generate perception signals 508 descriptive of a current state of the autonomous vehicle's environment, whether virtual or real-world. It is understood that the term “current” may actually refer to a very short time prior to the generation of any given perception signals 508, e.g., due to the short processing delay introduced by the perception component 506 and other factors. To generate the perception signals, the perception component may include a segmentation module 510, a classification module 512, and a tracking module 514.
The segmentation module 510 is generally configured to identify distinct objects within the sensor data representing the sensed environment. Depending on the embodiment and/or scenario, the segmentation task may be performed separately for each of a number of different types of sensor data, or may be performed jointly on a fusion of multiple types of sensor data. In some embodiments where lidar devices are used, the segmentation module 510 analyzes point cloud or other data frames to identify subsets of points within each frame that correspond to probable physical objects or surfaces in the environment. In other embodiments, the segmentation module 510 jointly analyzes lidar point cloud or other data frames in conjunction with camera image frames to identify objects in the environment. Other suitable techniques, and/or data from other suitable sensor types, may also be used to identify objects or surfaces. It is noted that, as used herein, references to different or distinct “objects” or “surfaces” may encompass physical things that are entirely disconnected (e.g., with two vehicles being two different “objects”), as well as physical things that are connected or partially connected (e.g., with a vehicle being a first “object” and the vehicle's hitched trailer being a second “object”).
The segmentation module 510 may use predetermined rules or algorithms to identify objects. For example, the segmentation module 510 may identify as distinct objects, within a point cloud, any clusters of points that meet certain criteria (e.g., having no more than a certain maximum distance between all points in the cluster, etc.). Alternatively, the segmentation module 510 may utilize a neural network that has been trained to identify distinct objects or surfaces within the environment (e.g., using supervised learning with manually generated labels for different objects within test data point clouds, etc.), or another type of machine learning based model. For example, the machine learning model associated with segmentation module 510 could be trained using virtual sensor (e.g., lidar and/or camera) data from a virtual environment/scene as described herein (e.g., virtual environments/scenes as described for any of
The classification module 512 is generally configured to determine classes (labels, descriptors, categories, etc.) for different objects that have been identified by the segmentation module 510. Like the segmentation module 510, the classification module 512 may perform classification separately for different sets of the sensor data 502, or may classify objects based on data from multiple sensors, etc. Moreover, and also similar to the segmentation module 510, the classification module 512 may execute predetermined rules or algorithms to classify objects, or may utilize a neural network or other machine learning based model to classify objects. For example, in some embodiments, machine learning model(s) may be trained for classification module 512 using virtual sensor data as described herein. In further example embodiments, virtual data output by a virtual version of segmentation module 510 may be used to train a machine learning model of classification module 512. Further example, operation of the classification module 512 is discussed in more detail in
The tracking module 514 is generally configured to track distinct objects or surfaces over time (e.g., across multiple lidar point cloud or camera image frames). The tracked objects or surfaces are generally objects or surfaces that have been identified by the segmentation module 510, but may or may not be objects that were classified by the classification module 512, depending on the embodiment and/or scenario. The segmentation module 510 may assign identifiers and/or descriptors to identified objects or surfaces, and the tracking module 514 may associate existing identifiers with specific objects or surfaces where appropriate (e.g., for lidar data, by associating the same identifier with different clusters of points, at different locations, in successive point cloud frames). Like the segmentation module 510 and the classification module 512, the tracking module 514 may perform separate object tracking based on different sets of the sensor data 502, or may track objects based on data from multiple sensors. Moreover, and also similar to the segmentation module 510 and the classification module 512, the tracking module 514 may execute predetermined rules or algorithms to track objects or surfaces, or may utilize a neural network or other machine learning model to track objects. For example, in some embodiments, a machine learning model for tracking module 514 may be trained using virtual sensor data. In additional embodiments, virtual data may be used as output by a virtual version of classification module 512 to train a machine learning model of tracking module 514.
The SDCA 500 also includes a prediction component 520, which processes the perception signals 508 to generate prediction signals 522 descriptive of one or more predicted future states of the autonomous vehicle's environment. For a given object, for example, the prediction component 520 may analyze the type/class of the object (as determined by the classification module 512) along with the recent tracked movement of the object (as determined by the tracking module 514) to predict one or more future positions of the object. As a relatively simple example, the prediction component 520 may assume that any moving objects will continue to travel on their current direction and with their current speed, possibly taking into account first or higher-order derivatives to better track objects that have continuously changing directions, objects that are accelerating, and so on. In some embodiments, the prediction component 520 also predicts movement of objects based on more complex behaviors. For example, the prediction component 520 may assume that an object that has been classified as another vehicle will follow rules of the road (e.g., stop when approaching a red light), and will react in a certain way to other dynamic objects (e.g., attempt to maintain some safe distance from other vehicles). The prediction component 520 may inherently account for such behaviors by utilizing a neural network or other machine learning model, for example. For example, in some embodiments, a machine learning model for prediction component 520 may be trained using virtual sensor data. In additional embodiments, virtual data may be used as output by a virtual version of perception component 506 to train a machine learning model of prediction component 520. The prediction component 520 may be omitted from the SDCA 500, in some embodiments.
In some embodiments, the perception signals 508 include data representing “occupancy grids” (e.g., one grid per T milliseconds), with each occupancy grid indicating object positions (and possibly object boundaries, orientations, etc.) within an overhead view of the autonomous vehicle's environment. Within the occupancy grid, each “cell” (e.g., pixel) may be associated with a particular class as determined by the classification module 512, possibly with an “unknown” class for certain pixels that were not successfully classified. Similarly, the prediction signals 522 may include, for each such grid generated by the perception component 506, one or more “future occupancy grids” that indicate predicted object positions, boundaries and/or orientations at one or more future times (e.g., one, two, and five seconds ahead). Occupancy grids are discussed further below in connection with
A mapping component 530 obtains map data (e.g., a digital map including the area currently being traversed by the autonomous vehicle) and/or navigation data (e.g., data indicating a route for the autonomous vehicle to reach the destination, such as turn-by-turn instructions), and outputs the data (possibly in a converted format) as mapping and navigation signals 532. In some embodiments, the mapping and navigation signals 532 include other map or location-related information, such as speed limits, traffic indicators, and so on. The navigation signals 532 may be obtained from a remote server (e.g., via a network, or, in the event of a real-world implementation, from a cellular or other communication network of the autonomous vehicle, or of a smartphone coupled to the autonomous vehicle, etc.), and/or may be locally stored in a persistent memory of the autonomous vehicle or other computing devices (e.g., graphics platform 101 and memory 152).
A motion planner 540 processes the perception signals 508, the prediction signals 522, and the mapping and navigation signals 532 to generate decisions 542 regarding the next movements of the autonomous vehicle. Depending on the type of the motion planner 540, the decisions 542 may be operational parameters (e.g., braking, speed, and steering parameters) or particular maneuvers (e.g., turn left, move to right lane, move onto shoulder of road, etc.). Decisions 542 may be provided to one or more operational subsystems of the autonomous vehicle (e.g., if decisions 542 indicate specific operational parameters), or may be provided to one or more intermediate stages that convert the decisions 542 to operational parameters (e.g., if the decisions indicate specific maneuvers).
The motion planner 540 may utilize any suitable type(s) of rules, algorithms, heuristic models, machine learning models, or other suitable techniques to make driving decisions based on the perception signals 508, prediction signals 522, and mapping and navigation signals 532. For example, in some embodiments, a machine learning model for motion planner 540 may be trained using virtual sensor data. In additional embodiments, virtual data may be used as output by a virtual version of any of mapping component 530, perception component 506, and/or prediction component 520, to train a machine learning model of motion planner 540. For example, the motion planner 540 may be a “learning based” planner (e.g., a planner that is trained using supervised learning or reinforcement learning), a “search based” planner (e.g., a continuous A* planner), a “sampling based” planner (e.g., a planner that performs random searches in a space that represents a universe of possible decisions), a “predictive control based” planner (e.g., a model predictive control (MPC) planner), and so on.
Referring back to
For various reasons, it may be more difficult for the segmentation module 510 to identify certain objects 296, and/or for the classification module 512 to classify certain objects 296, within the point cloud 290. As can also be seen in
Despite such difficulties, the segmentation module 510, classification module 512, and/or tracking module 514 may use techniques that make object identification, classification and/or tracking highly accurate across a very wide range of scenarios, with scarce or otherwise suboptimal point cloud or other data representations of objects. For example, as discussed above in connection with
In some embodiments, a non-transitory computer-readable medium, storing thereon instructions executable by one or more processors, may be configured to implement a sensor parameter optimizer 112 that determines parameter settings for use by real-world sensors in autonomous driving applications. For example, sensor parameter optimizer 112, as shown in
In some embodiments, the sensor parameter optimizer 112 may be used for virtual autonomous driving applications in a virtual environment (e.g., scenes 400 or 450 described herein) in order to train, test, generate or otherwise determine enhanced parameters for use by a real-world sensor (or virtual sensor) in autonomous driving applications. In still further embodiments, parameter settings for use by virtual or real-world sensors may be determined, via sensor parameter optimizer 112, by one or more machine learning models or self-driving control architectures, where, for example, a number of various parameter settings are tested against operation of a vehicle (e.g., any of vehicles 401, 451, 700, and/or 760) in a real or virtual environment to determine parameters that cause the vehicle to operate in a desired manner (e.g., operate in a safe manner or operate in accordance with a ground truth).
Sensor parameter optimizer 112 may include, or use, an imaging engine (e.g., imaging engine 102) configured to generate a plurality of imaging scenes (e.g., scenes 400 or 450) defining a virtual environment.
Sensor parameter optimizer 112 may further include, or use, a sensor simulator (e.g., sensor simulator 104) configured to receive a parameter setting for each of one or more virtual sensors (e.g., virtual sensors associated with any of vehicles 401, 451, 700, and/or 760). The parameter setting may have different types. For example, the parameter setting may define a spatial distribution of scan lines of a point cloud (e.g., as described and depicted for
Sensor simulator 104 may generate, based on the parameter settings and the plurality of imaging scenes (e.g., scene 400 of
In certain embodiments, sensor data may be generated by sensor simulator 104 via ray casting. For example, sensor simulator 104 may be configured to detect objects or surfaces within a virtual environment (e.g., by casting rays against such objects or surfaces and determining respective distances and/or depths within the virtual environment). In still further embodiments, sensor simulator 104 may simulate sensor data using a graphic shader (e.g., using imaging engine 102). In other embodiments, sensor simulator 104 may generate simulated lidar or radar data.
Sensor parameter optimizer 112 may also include, or use, an autonomous vehicle simulator (e.g., autonomous vehicle similar 108) configured to control an autonomous vehicle (e.g., vehicles 401 and/or 451) within the virtual environment (e.g., the virtual environment(s) depicted by each of
In various aspects, sensor parameter optimizer 112 may determine, based on operation of the autonomous vehicle (e.g., vehicles 401, 451, 700, and/or 760), an optimal parameter setting of the parameter setting where the optimal parameter setting may be applied to a real-world sensor associated with real-world autonomous driving applications. For example, optimal parameters of real-world sensor(s) (e.g., regarding scan patterns, field of view, range, etc.) may be based on simulation performance determined and experienced in a virtual environment based on different choices on the limitations of the sensor(s). In some embodiments, the optimal parameter setting may be determined, by sensor parameter optimizer 112, via evolutionary learning based on vehicle operation data captured when an autonomous vehicle (e.g., vehicles 401 and/or 451) interacts with one or more objects or surfaces (e.g., 402-418 and/or 452-482) with a virtual environment (e.g., virtual environments of
With reference to
In the embodiment of
Occupancy grid generator 600 may include a normal layer component 602 configured to generate a normal layer 612 of an occupancy grid 610 based on the imaging scene (e.g., scene 400 or 450 of
Occupancy grid generator 600 may further include a label layer component 604 configured to generate a label layer 614. In various aspects, label layer 614 may be mapped to normal layer 612 (e.g., as depicted by occupancy grid 610), and encoded with a first channel set. While occupancy grid 610 is represented as a series of layered objects, it is to be understood that occupancy grid 610 need not be visualized and may exist as a computing structure or object, e.g., in memory 152 of graphics platform 101. The first channel set may be associated with one or more text-based or state-based values of one or more objects of the environment (e.g., objects or surfaces 402-418 of the virtual environment of
Occupancy grid generator 600 may further include a velocity layer component 606 configured to generate a velocity layer 616. In various aspects, velocity layer 616 may be mapped to normal layer 612 (e.g., as depicted by occupancy grid 610), and encoded with a second channel set. In various aspects, the second channel set may be associated with one or more velocity values of one or more objects of the environment (e.g., vehicles 401, 412, 414 of the virtual environment of
In various embodiments, occupancy grid generator 600 may generate an occupancy grid 610 based on normal layer 612, label layer 614, and velocity layer 616. Occupancy grid 610 may be used to control a vehicle (e.g., a vehicle 401, 412, 414, and/or 656A-C) as the vehicle moves through the environment (e.g., virtual environment of
In additional embodiments, occupancy grid generator 600 may further include a height layer component (not shown) configured to generate a height layer (not shown). In such embodiments, the height layer may be mapped to normal layer 612 of occupancy grid 610. The height layer may be encoded with a third channel set associated with one or more height values. The third channel set may include a plurality of third channels of a pixel. For example, the plurality of third channels of the pixel may include red (R), green (G), and blue (B) channels. Each of the plurality of third channels of the pixel indicates a particular height value. For example, channel R may relate to ground values, channel B may relate to sky values, and channel G may relate to mid-range (e.g., between ground and sky) values. As for the first and second channel sets, the third channel set may be defined by 256 bit RGB values that act as a hash values for respective height values of objects or surfaces. For example, height channels may indicate a height of a building (e.g., building 418 of
While depicted as a visual image in
In the example scenario of
Object classes/types may be indicated at a relatively high level of generality (e.g., with each of objects 656A-C having the class “vehicle,” each of objects 660, 662 having the class “lane marker,” etc.), or with more specificity (e.g., with object 556A having the class “sport utility vehicle” and object 656B having the class “sedan,” and/or with objects 660 having the class “lane marker: solid” and objects 662 having the class “lane marker: dashed,” etc.). Globally or locally unique identifiers (e.g., labels or descriptors) may also be specified by the occupancy grid 650 (e.g., “VEH001” through “VEH003” for vehicles 656A through 656C, respectively, and “PED001” for pedestrian 656D, etc.). Depending on the embodiment, the occupancy grid 650 may also be associated with state data, such as a current direction and/or speed of some or all depicted objects. In other embodiments, however, the state of each object or area is not embedded in the occupancy grid 650, and the occupancy grid 650 only includes data representing a stateless snapshot in time. For example, the prediction component 520 may infer the speed, direction, and/or other state parameters of dynamic objects using the unique identifiers of specific objects, and the change in the positions of those objects within a succession of occupancy grids over time.
In some embodiments, the occupancy grid 650 only associates certain types of objects and/or types of areas with current states. For each of the 16 different traffic light areas 664 (e.g., each corresponding to an area in which vehicles are expected to stop when the light is red), for example, the traffic occupancy grid 650 may include not only data specifying the location of the traffic light position 664, but also data indicating whether the traffic light associated with that area 664 is currently red, yellow, or green (or possibly whether the traffic light is blinking, an arrow versus a circle, etc.).
Virtual and Real-world Autonomous VehiclesVehicle 700 includes lidar system 702. The lidar system 702 includes a laser 710 with multiple sensor heads 712A-D coupled to the laser 710 via multiple laser-sensor links 714. Each of the sensor heads 712 may include some or all of the components of the lidar system 300 as illustrated and described in U.S. Provisional Patent Application Ser. No. 62/573,795 entitled “Software Systems and Methods for controlling an Autonomous Vehicle,” which was filed on Oct. 28, 2017, the entire disclosure of which is hereby incorporated by reference.
Each of the laser-sensor links 714 may include one or more optical links and/or one or more electrical links. The sensor heads 712 in
In the example of
Data from each of the sensor heads 712 may be combined, processed, or otherwise stitched together to generate a point cloud or other image (e.g., 2D, 3D, and/or RGB image as described herein) that covers a greater than or equal to 30-degree horizontal view around a vehicle. For example, the laser 710 may include a controller or processor that receives data from each of the sensor heads 712 (e.g., via a corresponding electrical link 720) and processes the received data to construct a point cloud or other image (e.g., 2D, 3D, and/or RGB image as described herein) covering a 360-degree horizontal view around a vehicle or to determine distances to one or more targets. The point cloud, information from the point cloud, or other image may be provided to a vehicle controller 722 via a corresponding electrical, optical, or radio link 720. The vehicle controller 722 may include one or more CPUs, GPUs, and a non-transitory memory with persistent components (e.g., flash memory, an optical disk) and/or non-persistent components (e.g., RAM).
In some implementations, the point cloud or other image (e.g., 2D, 3D, and/or RGB image as described herein) is generated by combining data from each of the multiple sensor heads 712 at a controller included within the laser 710, and is provided to the vehicle controller 422. In other implementations, each of the sensor heads 712 includes a controller or processor that constructs a point cloud or other image (e.g., 2D, 3D, and/or RGB image) for a portion of the 360-degree horizontal view around the vehicle and provides the respective point cloud to the vehicle controller 722. The vehicle controller 722 then combines or stitches together the points clouds from the respective sensor heads 712 to construct a combined point cloud or other image (e.g., 2D, 3D, and/or RGB image) covering a 360-degree horizontal view. Still further, the vehicle controller 722 in some implementations communicates with a remote server to process point cloud or other image (e.g., 2D, 3D, and/or RGB image) data.
In any event, the vehicle 700 may be an autonomous vehicle where the vehicle controller 722 provides control signals to various components 730 within the vehicle 700 to maneuver and otherwise control operation of the vehicle 700. It is to be understood that, for embodiments where vehicle 700 is a virtual vehicle, some or all of components 730 may be omitted, or approximated via a simplified model, where such simplified model accounts for only those portions used for testing or generating training data as described herein.
The components 730 are depicted in an expanded view in
In some implementations, the vehicle controller 722 may receive point cloud or other image (e.g., 2D, 3D, and/or RGB image) data from the sensor heads 712 via the link 720 and analyzes the received point cloud data or other image (e.g., 2D, 3D, and/or RGB image), using any one or more of aggregate or individual SDCAs as disclosed herein or in U.S. Provisional Patent Application Ser. No. 62/573,795 entitled “Software Systems and Methods for controlling an Autonomous Vehicle,” which was filed on Oct. 28, 2017, the entire disclosure of which is hereby incorporated by reference, to sense or identify targets, objects, or surfaces (see, e.g.,
In addition to the lidar system 702, the vehicle 700 may also be equipped with other sensors such a camera, a thermal imager, a conventional radar (none illustrated to avoid clutter), etc. The sensors can provide additional data to the vehicle controller 722 via wired or wireless communication links. Further, the vehicle 700 in an example implementation includes a microphone array operating as a part of an acoustic source localization system configured to determine sources of sounds.
Computing system 800 may be included, or partially included, within the vehicle controller 722 of
In embodiments where the processor(s) 802 include more than a single processor, each processor may be a different programmable microprocessor that executes software instructions stored in the memory 804. Alternatively, each of the processor(s) 802 may be a different set of such microprocessors, or a set that includes one or more microprocessors and one or more other processor types (e.g., ASICs, FPGAs, etc.) for certain functions.
The memory 804 may include one or more physical memory devices with non-volatile memory. Any suitable memory type or types may be used, such as ROM, solid-state drives (SSDs), hard disk drives (HDDs), and so on. The processor(s) 802 are coupled to the memory 804 via a bus or other network 808. The network 808 may be a single wired network, or may include any suitable number of wired and/or wireless networks. For example, the network 808 may be or include a controller area network (CAN) bus, a Local Interconnect Network (LNN) bus, and so on.
In some embodiments where the SDCA instructions 806 correspond an SDCA or machine learning model as described herein, where processor(s) 802 execute a corresponding SDCA or machine learning model for control and/or operation of a virtual or real-world autonomous vehicle.
Also coupled to the network 808 are a vehicle control interface 810, a passenger interface 812, a sensor interface 814, and a network interface 816. Each of the interfaces 810, 812, 814, and 816 may include one or more processors (e.g., ASICs, FPGAs, microprocessors, etc.) and/or other hardware, firmware and/or software to enable communication with systems, subsystems, devices, etc., whether real or simulated, that are external to the computing system 800.
The vehicle control interface 810 is generally configured to provide control data generated by the processor(s) 802 to the appropriate operational subsystems of the autonomous vehicle, such that the appropriate subsystems can effectuate driving decisions made by the processor(s) 802. For example, the vehicle control interface 810 may provide the control signals to the appropriate subsystem(s) (e.g., accelerator 740, brakes 742, and steering mechanism 746 of
The passenger interface 812 is generally configured to provide alerts, warnings, notifications, and/or other information to one or more passengers of the autonomous vehicle. In some embodiments where the vehicle is not fully autonomous (e.g., allowing human driving in certain modes and/or situations), the passenger interface 812 may specifically provide such information to the driver (e.g., via dashboard indicators, etc.). As just one example, the passenger interface 812, whether real or virtual, may cause a display and/or speaker in the vehicle to generate an alert when the processor(s) 802 (executing the SDCA instructions 806) determine that a collision with another object is likely. As another example, the passenger interface 812 may cause a display in the vehicle to show an estimated time of arrival (ETA) to passengers. In some embodiments, the passenger interface 812 also permits certain user inputs. If the vehicle supports passenger selection of specific driving styles (e.g., as discussed above in connection with
The sensor interface 814 is generally configured to convert raw sensor data, whether real or virtual, from one or more real or simulated sensor devices (e.g., lidar, camera, microphones, thermal imaging units, IMUs, etc.) to a format that is consistent with a protocol of the network 808 and is recognized by one or more of the processor(s) 802. The sensor interface 814 may be coupled to a lidar system, whether real or virtual (e.g., the lidar system 702 of
The network interface 816, whether real or virtual, is generally configured to convert data received from one or more devices or systems external to the autonomous vehicle to a format that is consistent with a protocol of the network 808 and is recognized by one or more of the processor(s) 802. In some embodiments, the network interface 816 includes separate interface hardware, firmware, and/or software for different external sources. For example, a remote mapping/navigation server may send mapping and navigation/route data (e.g., mapping and navigation signals 532 of
In some embodiments, no sensor data (or only limited sensor data) of the autonomous vehicle is received via the sensor interface 814, whether real or virtual. Instead, the processor(s) 802 execute the SDCA instructions 806 using, as input, only (or primarily) data that is received by the network interface 816 from other vehicles, infrastructure, and/or other external devices/systems. In such an embodiment, the external data may include raw sensor data that is indicative of the vehicle environment (but was generated off-vehicle), and/or may include higher-level information that was generated externally using raw sensor data (e.g., occupancy grids, as discussed herein for
The network 808, whether real or virtual, may also couple to other types of interfaces and/or components, and/or some of the interfaces shown in
At block 906, method 900 may further include generating (e.g., via automated training dataset generator 100) environment-object data defining how objects or surfaces (e.g., objects and surfaces 391-398 of
At block 908, method 900 may further include controlling an autonomous vehicle within the virtual environment based on one or both of (i) the plurality of photo-realistic scenes (e.g., scenes 400 and 450) and (ii) the plurality of depth-map-realistic scenes (e.g., scene 390).
At block 910, method 900 may further include generating one or more feature training datasets based on at least one of (i) the plurality of photo-realistic scenes (e.g., scenes 400 and 450), (ii) the plurality of depth-map-realistic scenes (e.g., scene 390), or (iii) the environment-object data (e.g., data associated with objects and surfaces 391-398 of
Method 1000 may begin (1002) at block 1004 where, e.g., occupancy grid generator 600, generates a normal layer (e.g., normal layer 612) based on the imaging scene (e.g., as exemplified by scenes 400 and 450). As described elsewhere herein, the normal layer may define a two-dimensional (2D) view of the imaging scene.
At block 1006, method 1000 may further include generating a label layer (e.g., label layer 614). The label layer may be mapped to the normal layer and encoded with a first channel set (e.g., plurality of first channels of a pixel that may include RGB channels). The first channel set may be associated with one or more text-based or state-based values of one or more objects of the environment (e.g., one or more classifications or one or more states of the one or more objects of the environment).
At block 1008, method 1000 may include generating, e.g., via occupancy grid generator 600, a velocity layer (e.g., velocity layer 616). The velocity layer (e.g., velocity layer 616) may be mapped to the normal layer (e.g., normal layer 612) and encoded with a second channel set (e.g., a plurality of second channels of a pixel, which may include RGB values). The second channel set may be associated with one or more velocity values of one or more objects of the environment.
At block 1010, method 1000 may include generating, e.g., via occupancy grid generator 600, an occupancy grid (e.g., occupancy grid 610 or 650) based on the normal layer, the label layer, and the velocity layer. The occupancy grid may be used to control the vehicle (e.g., vehicle 401, 451, 700, and/or 760) as the vehicle moves through an environment (e.g., any of the environments depicted in
At block 1106, method 1100 may further include receiving, e.g., at automated training dataset generator 100, or specifically at sensor parameter optimizer 112, a parameter setting for each of one or more virtual sensors. The virtual sensors may be associated with a virtual vehicle, e.g., vehicles 700 and/or 760 as described herein for
At block 1108, method 1100 may further include controlling an autonomous vehicle within the virtual environment based on the sensor data.
At block 1110, method 1100 may further include determining, based on operation of the autonomous vehicle within the virtual environment, an optimal parameter setting of the parameter setting. The optimal parameter may be determined while the autonomous vehicle is operating within the virtual environment, or the optimal parameter may be determined at a later time after data for the autonomous vehicle operating within the virtual environment has been collected. As the term is used herein “optimal parameter” may refer to a value, control signal, setting, or other parameter within a range or ranges of such values, control signals, settings, or other parameters within which an autonomous vehicle operates in a controlled, safe, efficient, and/or otherwise desired manner. That is, in various embodiments there may more than one such “optimal” value, control signal, setting, or other parameter, that an autonomous vehicle may operate by in order to achieve such controlled, safe, efficient, and/or otherwise desired operation(s). Instead, a range of such values may apply. The optimal parameter setting(s), so determined, may be applied to a real-world sensor associated with real-world autonomous driving applications.
General ConsiderationsAlthough the disclosure herein sets forth a detailed description of numerous different embodiments, it should be understood that the legal scope of the description is defined by the words of the claims set forth at the end of this patent and equivalents. The detailed description is to be construed as exemplary only and does not describe every possible embodiment since describing every possible embodiment would be impractical. Numerous alternative embodiments may be implemented, using either current technology or technology developed after the filing date of this patent, which would still fall within the scope of the claims.
The following additional considerations apply to the foregoing discussion. Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Additionally, certain embodiments are described herein as including logic or a number of routines, subroutines, applications, or instructions. These may constitute either software (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware. In hardware, the routines, etc., are tangible units capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Hardware modules may provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and may operate on a resource (e.g., a collection of information).
The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
Similarly, the methods or routines described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location, while in other embodiments the processors may be distributed across a number of locations.
The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
This detailed description is to be construed as exemplary only and does not describe every possible embodiment, as describing every possible embodiment would be impractical, if not impossible. A person of ordinary skill in the art may implement numerous alternate embodiments, using either current technology or technology developed after the filing date of this application.
Those of ordinary skill in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described embodiments without departing from the scope of the invention, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the inventive concept.
The patent claims at the end of this patent application are not intended to be construed under 35 U.S.C. § 112(f) unless traditional means-plus-function language is expressly recited, such as “means for” or “step for” language being explicitly recited in the claim(s). The systems and methods described herein are directed to an improvement to computer functionality, and improve the functioning of conventional computers.
Claims
1. A non-transitory computer-readable medium storing thereon instructions executable by one or more processors to implement an occupancy grid generator for generating an occupancy grid indicative of an environment of a vehicle from an imaging scene that depicts the environment, the occupancy grid generator comprising:
- a normal layer component configured to generate a normal layer based on the imaging scene, the normal layer defining a two-dimensional (2D) view of the imaging scene,
- a label layer component configured to generate a label layer, the label layer being mapped to the normal layer and encoded with a first channel set, and the first channel set being associated with one or more text-based or state-based values of one or more objects of the environment; and
- a velocity layer component configured to generate a velocity layer, the velocity layer being mapped to the normal layer, the velocity layer being encoded with a second channel set, and the second channel set being associated with one or more velocity values of one or more objects of the environment,
- wherein the occupancy grid generator generates an occupancy grid based on the normal layer, the label layer, and the velocity layer, the occupancy grid being used to control the vehicle as the vehicle moves through the environment.
2. The non-transitory computer-readable medium of claim 1, wherein the normal layer is a top-down graphical view of the virtual environment.
3. The non-transitory computer-readable medium of claim 1, wherein the first channel set includes a plurality of first channels of a pixel.
4. The non-transitory computer-readable medium of claim 3, wherein the plurality of first channels of the pixel include red (R), green (G), and blue (B) channels.
5. The non-transitory computer-readable medium of claim 3, wherein each of the plurality of first channels of the pixel indicates a particular text-based or state-based value.
6. The non-transitory computer-readable medium of claim 1, wherein the one or more text-based or state-based values define one or more classifications or one or more states of the one or more objects of the environment.
7. The non-transitory computer-readable medium of claim 1, wherein the second channel set includes a plurality of second channels of a pixel.
8. The non-transitory computer-readable medium of claim 7, wherein the plurality of second channels of the pixel include a red (R) channel, a green (G) channel, and a blue (B) channel.
9. The non-transitory computer-readable medium of claim 8, wherein the R channel defines a first component for the velocity layer, the G channel defines a second component for the velocity layer, and the B channel defines a third component for the velocity layer.
10. The non-transitory computer-readable medium of claim 7, wherein each of the plurality of second channels of the pixel indicates a particular velocity value.
11. The non-transitory computer-readable medium of claim 1, wherein the one or more velocity values define corresponding one or more velocities of one or more vehicles moving within the environment.
12. The non-transitory computer-readable medium of claim 1, wherein the occupancy grid generator further comprises a height layer component configured to generate a height layer, the height layer mapped to the normal layer, the height layer encoded with a third channel set, the third channel set associated with one or more height values.
13. The non-transitory computer-readable medium of claim 12, wherein the third channel set includes a plurality of third channels of a pixel.
14. The non-transitory computer-readable medium of claim 13, wherein the plurality of third channels of the pixel include red (R), green (G), and blue (B) channels.
15. The non-transitory computer-readable medium of claim 13, wherein each of the plurality of third channels of the pixel indicates a particular height value.
16. The non-transitory computer-readable medium of claim 1, wherein the imaging scene of the virtual environment is a frame in a set of frames, the set of frames defining the operation of the virtual vehicle within the virtual environment.
17. The non-transitory computer-readable medium of claim 16, wherein the set of frames form a video of the virtual vehicle operating in the virtual environment.
18. The non-transitory computer-readable medium of claim 1, wherein the environment is a virtual environment.
19. The non-transitory computer-readable medium of claim 1, wherein the environment is a real-world environment.
20. An occupancy grid generation method for generating an occupancy grid indicative of an environment of a vehicle from an imaging scene that depicts the environment, the occupancy grid generation method comprising:
- generating a normal layer based on the imaging scene, the normal layer defining a two-dimensional (2D) view of the imaging scene,
- generating a label layer, the label layer being mapped to the normal layer and encoded with a first channel set, and the first channel set being associated with one or more text-based or state-based values of one or more objects of the environment;
- generating a velocity layer, the velocity layer being mapped to the normal layer, the velocity layer being encoded with a second channel set, and the second channel set being associated with one or more velocity values of one or more objects of the environment; and
- generating an occupancy grid based on the normal layer, the label layer, and the velocity layer, the occupancy grid being used to control the vehicle as the vehicle moves through the environment.
Type: Application
Filed: Sep 4, 2019
Publication Date: Mar 5, 2020
Inventors: Benjamin Englard (Palo Alto, CA), Miguel Alexander Peake (Daly City, CA)
Application Number: 16/560,018