SYSTEMS AND METHODS FOR USING REAL-TIME IMAGERY IN NAVIGATION
To generate navigation directions for a driver of a vehicle, a route for guiding the driver to a destination is obtained, visual landmarks corresponding to prominent physical objects disposed along the route are retrieved, and real-time imagery is collected at the vehicle approximately from a vantage point of the driver during navigation along the route. Using (i) the retrieved visual landmarks and (ii) the imagery collected at the vehicle, a subset of the visual landmarks that are currently visible to the driver is selected. Navigation directions describing the route are provided the driver, the navigation directions referencing the selected subset of the visual landmarks and excluding the remaining visual landmarks.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation of U.S. patent application Ser. No. 15/144,300, filed May 2, 2016; the disclosure of which is incorporated herein by reference in its entirety for all purposes.
FIELD OF THE DISCLOSURE
The present disclosure relates to navigation directions and, in particular, to using imagery in navigation directions.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Systems that automatically route drivers between geographic locations generally utilize indications of distance, street names, building numbers, to generate navigation directions based on the route. For example, these systems can provide to a driver such instructions as “proceed for one-fourth of a mile, then turn right onto Maple Street.” However, it is difficult for drivers to accurately judge distance, nor is it always easy for drivers to see street signs. Moreover, there are geographic areas where street and road signage is poor.
To provide guidance to a driver that is more similar to what another person may say to the driver, it is possible to augment navigation directions with references to prominent objects along the route, such as visually salient buildings or billboards. These prominent object can be referred to as “visual landmarks.” Thus, a system can generate such navigation directions as “in one fourth of a mile, you will see a McDonald's restaurant on your right; make the next right turn onto Maple Street.” To this end, an operator can enter descriptions and indications of locations (e.g., street addresses, coordinates) for visual landmarks, so that the system can automatically select suitable visual landmarks when generating navigation directions.
However, not every landmark is visible at all times. For example, some billboards may be brightly illuminated at night but may be generally unnoticeable during the day. On the other hand, an intricate façade of a building may be easy to notice during the day but may be poorly illuminated and accordingly unnoticeable at night.
Generally speaking, a system of this disclosure provides a driver with navigation directions using visual landmarks that are likely to be visible at the time when the driver reaches the corresponding geographic location. In one implementation, the system selects visual landmarks from a relatively large and redundant set of previously identified visual landmarks. To make the selection, the system can consider one or more of the time of day, the current weather conditions, the current season, etc. Moreover, the system can utilize real-time imagery collected by the dashboard camera, the camera of a smartphone mounted on the dashboard, or another camera that approximately corresponds to the vantage point of the driver. As discussed in more detail below, the system also can use implicit and explicit feedback regarding visibility and/or prominence of physical objects to improve subsequent references to visual landmarks.
An example embodiment of these techniques is a method for generating navigation directions for drivers, executed by one or more processors. The method includes obtaining a route for guiding a driver of a vehicle to a destination, retrieving visual landmarks corresponding to prominent physical objects disposed along the route, obtaining real-time imagery collected at the vehicle approximately from a vantage point of the driver during navigation along the route, and using (i) the retrieved visual landmarks and (ii) the imagery collected at the vehicle, selecting a subset of the visual landmarks that are currently visible to the driver. The method further includes providing, to the driver, navigation directions describing the route, the navigation directions referencing the selected subset of the visual landmarks and excluding the remaining visual landmarks.
Another example embodiment of these techniques is a system operating in a vehicle. The system includes a camera configured to capture real-time imagery approximately from a vantage point of the driver, a positioning module configured to determine a current geographic location of the vehicle, a network interface to communicate with a server system via a communication network, a user interface, and processing hardware configured to (i) obtain, using the captured real-time imagery and the current geographic location of the vehicle, driving directions including an instruction that references a visual landmark automatically determined as being visible in the captured real-time imagery, and (ii) provide the instruction to the driver via the user interface.
Yet another example embodiment of these techniques is a method in a mobile system operating in a vehicle for providing driving directions. The method comprises receiving a request for driving directions to a destination from a driver of the vehicle, receiving real-time imagery collected at the vehicle approximately from a vantage point of the driver, obtaining, using the real-time imagery and a current location of the vehicle, the driving directions including an instruction that references a visual landmark automatically determined as being visible in the real-time imagery, and providing the instruction to the driver in response to the request.
Still another example embodiment of this technique is a method for generating navigation directions for drivers. The method includes obtaining, by one or more processors, a route for guiding a driver of a vehicle to a destination as well as real-time imagery collected at the vehicle approximately from a vantage point of the driver during navigation along the route. The method further includes automatically identifying, by the one or more processors, a physical object within the real-time imagery to be used as a visual landmark in navigation, including recognizing at least one of (i) one of a finite set of pre-set objects or (ii) text within the real-time imagery. Further, the method includes determining a position of the physical object relative to a point on the route, and providing, to the driver, navigation directions describing the route, the navigation directions including a reference to the identified physical object.
BRIEF DESCRIPTION OF THE DRAWINGS
To better guide a driver along a navigation route, a system collects real-time imagery from approximately the user's vantage point (e.g., using a dashboard camera, a camera built into the vehicle, the user's smartphone mounted on the dashboard), retrieves a set of visual landmarks for the user's current position along the navigation route, and uses the real-time imagery to determine which of the retrieved visual landmarks should be used to augment step-by-step navigation directions for the navigation route, according to one implementation. In this manner, the system omits visual landmarks that are occluded by trees or vehicles, obscured due to current lighting conditions, or poorly visible from the user's current vantage for some other reason.
In addition to selecting salient visual landmarks from among pre-stored static landmarks, the system can identify dynamic visual landmarks, such as changing electronic billboards or trucks with machine-readable text. When capable of automatically recognizing such an object in the video or photo feed, the system can position the object relative to the next navigation instruction and reference in the object in the navigation instruction. For example, the system can modify the instruction “turn left in 200 feet” to “turn left by the red truck.” Moreover, the system in some scenarios may select a route from among multiple routing options based on live states of traffic lights. For example, the system may determine that the red light at the intersection the driver is approaching makes another routing option more appealing.
Additionally or alternatively to processing real-time imagery, the system can assess the usefulness of a certain visual landmark based on explicit and/or implicit user signals. For example, the driver can indicate that she cannot see a landmark, using a voice command. When it is desirable to collect more information about visual landmarks, the system can present visual landmarks in interrogative sentences, e.g., “do you see the billboard on the left?” As an example of an implicit signal, when drivers tend to miss a turn which the system describes using a visual landmark, the system may flag the visual landmark as not useful. The system can assess usefulness at different times and under weather conditions, so that a certain billboard can be marked as not useful during daytime but useful when illuminated at night. Further, the system can receive signals indicative of current time, weather conditions, etc. from other sources, such as a weather service, and select landmarks suitable for the current environmental conditions. The system can use explicit and/or implicit user feedback to modify subsequent navigation directions even when no real-time video or still photography is available to a driver. For example, the system may be able to determine only that the driver is requesting navigation directions at nighttime, and accordingly provide indications of visual landmarks that have been determined to be visible, or particularly well noticeable, at night.
The system can use object and/or character recognition techniques to automatically recognize vehicles, billboards, text written on surfaces of various kind, etc. Further, to identify currently visible landmarks within real-time imagery, the system can match features of an image captured with an image previously captured from the location and with the same orientation of the camera (i.e., with the same camera pose) and known to depict a visual landmark. In some implementations, the system uses a convolutional neural network to implement an object detector which determines whether a captured scene includes an object of one of predefined classes (e.g., car, person, traffic light). Further, the object detector can implement semantic segmentation to label every pixel in the image.
Example Computing Environment
The mobile system 12 can include a portable electronic device such as a smartphone, a wearable device such as a smartwatch or a head-mounted display, or a tablet computer. In some implementations or scenarios, the mobile system 12 also includes components embedded or mounted in a vehicle. For example, a driver of a vehicle equipped with electronic components such as a head unit with a touchscreen or a built-in camera can use her smartphone for navigation. The smartphone can connect to the head unit via a short-range communication link such as Bluetooth® to access the sensors of the vehicle and/or to project the navigation directions onto the screen of the head unit. As another example, the user's smartphone can connect to a standalone dashboard camera mounted on the windshield of the vehicle. More generally, modules of a portable or wearable user device, modules of a vehicle, and external devices or modules of devices can operate as components of the mobile system 12.
These components can include a camera 20, which can be a standard monocular camera mounted on the dashboard or windshield. In some scenarios, the driver mounts the smartphone so that the camera of the smartphone faces the road similar to a dashboard camera. In other scenarios, the vehicle includes a camera or even multiple cameras built into dashboard or the exterior of the vehicle, and the mobile system 12 accesses these cameras via a standard interface (e.g., USB). Depending on the implementation, the camera 20 is configured to collect a digital video stream or capture still photographs at certain intervals. Moreover, the mobile system 12 in some implementations uses multiple cameras to collected redundant imagery in real time. One camera may be mounted on the left side of the dashboard and another camera may be mounted on the right side of the dashboard to generate a slightly different views of the surroundings, which in some cases may make it easier for the landmark selection system 18 to compare real-time imagery to previously captured images of landmarks.
The mobile system 12 also can include a processing module 22, which can include one or more central processing unit (CPUs), one or more graphics processing unit (GPUs) for efficiently rendering graphics content, an application-specific integrated circuit (ASIC), or any other suitable type of processing hardware. Further, the mobile system 12 can include a memory 24 made up of persistent (e.g., a hard disk, a flash drive) and/or non-persistent (e.g., RAM) components. In the example implementation illustrated in
Further, the mobile system 12 further includes a user interface 28 and a network interface 30. Depending on the scenario, the user interface 28 can correspond to the user interface of the portable electronic device or the user interface of the vehicle. In either case, the user interface 28 can include one or more input components such as a touchscreen, a microphone, a keyboard, etc. as well as one or more output components such as a screen or speaker.
The network interface 30 can support short-range and/or long-range communications. For example, the network interface 30 can support cellular communications, personal area network protocols such as IEEE 802.11 (e.g., Wi-Fi) or 802.15 (Bluetooth). In some implementations, the mobile system 12 includes multiple network interface modules to interconnect multiple devices within the mobile system 12 and to connect the mobile system 12 to the network 16. For example, the mobile system 12 can include a smartphone, the head unit of a vehicle, and a camera mounted on the windshield. The smartphone and the head unit can communicate using Bluetooth, the smartphone and the camera can communicate using USB, and the smartphone can communicate with the server 14 via the network 16 using a 4G cellular service, to pass information to and from various components of the mobile system 16.
Further, the network interface 30 in some cases can support geopositioning. For example, the network interface 30 can support Wi-Fi trilateration. In other cases, the mobile system 12 can include a dedicated positioning module 32 such as a Global Positioning Service (GPS) module. In general, the mobile system 12 can include various additional components, including redundant components such as positioning modules implemented both in the vehicle and in the smartphone.
With continued reference to
In operation, the routing engine 40 can receive a request for navigation directions from the mobile system 12. The request can include a source, a destination, and constraints such as a request to avoid toll roads, for example. The routing engine 40 can retrieve road geometry data, road and intersection restrictions (e.g., one-way, no left turn), road type data (e.g., highway, local road), speed limit data, etc. from the map database 50 to generate a route from the source to the destination. In some implementations, the routing engine 40 also obtains live traffic data when selecting the best route. In addition to the best, or “primary,” route, the routing engine 40 can generate one or several alternate routes.
In addition to road data, the map database 50 can store descriptions of geometry and location indications for various natural geographic features such as rivers, mountains, and forests, as well as artificial geographic features such buildings and parks. The map data can include, among other data, vector graphics data, raster image data, and text data. In an example implementation, the map database 50 organizes map data into map tiles, which generally correspond to a two-dimensional organization of geospatial data into traversable data structure such as a quadtree.
The navigation instructions generator 42 can use the one or more routes generated by the routing engine 40 and generate a sequence of navigation instructions. Examples of navigation instructions include “in 500 feet, turn right on Elm St.” and “continue straight for four miles.” The navigation instructions generator 42 can implement natural language generation techniques to construct these and similar phrases, in the language of the driver associated with the mobile system 12. The instructions can include text, audio, or both.
The visual landmark selection module 44 operates as part of the landmark selection system 18, which also includes the navigation application 26. The visual landmark selection module 44 can augment the navigation directions generated by the navigation instructions generator 42 with references to visual landmarks such as prominent buildings, billboards, traffic lights, stop signs, statues and monuments, and symbols representing businesses. To this end, the visual landmark selection module 44 initially can access the visual landmark database 52 to select a set of visual landmarks disposed along the navigation route. However, as discussed in more detail below, the landmark selection system 18 then can select a subset of these visual landmarks in accordance with the likelihood the driver can actually see the landmarks when driving, and/or dynamically identify visual landmarks that were not previously stored in the visual landmark database 52.
The visual landmark database 52 can store information regarding prominent geographic entities that can be visible when driving (or bicycling, walking, or otherwise moving along a navigation route) and thus serve as visual landmarks. For each visual landmark, the visual landmark database 52 can store one or several photographs, geographic coordinates, a textual description, remarks submitted by users, and numeric metrics indicative of usefulness of the visual landmark and/or of a particular image of the visual landmark. In some implementations, a landmark-specific record in the visual landmark database 52 stores multiple views of the visual landmark from the same vantage point, i.e., captured from the same location and with the same orientation of the camera. However, the multiple views of the visual landmark can differ according to the time of day, weather conditions, season, etc. The data record can include metadata that specifies these parameters for each image. For example, the data record may include a photograph of a billboard at night when it is illuminated along with a timestamp indicating when the photograph was captured and another photograph of the billboard at daytime from the same vantage point along with the corresponding timestamp. Further, the data record may include photographs of the billboard captured during snowy weather, during rainy weather, during foggy weather, etc., and corresponding indicators for each photograph. Still further, the data record may include photographs captured during different seasons.
In short, the visual landmark database 52 can store a large set of visual landmarks that in some cases is redundant both in terms of the number of landmarks available for the same maneuver (e.g., a billboard on the right and a church on the left near the same intersection) and in terms of imagery available for the same landmark. The landmark selection system 18 can determine which of the redundant landmarks are useful for particular lighting conditions, weather conditions, traffic conditions (as drivers may find it difficult to recognize certain visual landmarks when driving fast), and how well the corresponding scene is visible from the driver's vantage point (as inferred from real-time imagery).
In addition to multiple images of a same visual landmark, the visual landmark database 52 can store multiple descriptions of the same landmark, such as “the large glass building,” “the building with a large ‘M’ in front of it,” “the building with international flags,” etc. Operators of the server system 14 and/or users submitting landmark information as part of a crowd-sourcing effort can submit these descriptions, and the server system 14 can determine which description drivers find more helpful using the feedback processing techniques discussed in more detail below. To keep track of drivers' feedback, the visual landmark database 52 in one example implementation stores an overall numeric metric for a visual landmark that can be used to assess whether the visual landmark should be referenced in navigation directions at all, separate numeric metrics for different times of day, different weather conditions, etc. and/or separate numeric metrics for different images.
To populate the visual landmark database 52, the server system 14 can receive satellite imagery, photographs and videos submitted by various users, street-level imagery collected by cars equipped with specialized panoramic cameras, street and sidewalk imagery collected by pedestrians and bicyclists, etc. Similarly, the visual landmark database 52 can receive descriptions of landmarks from various sources such as operators of the server system 14 and people submitting user-generated content.
With continued reference to
In operation, the camera 20 can capture a scene 60 as a still photograph or a frame in a video feed. The scene 60 approximately corresponds to what the driver of the vehicle operating in the mobile system 12 currently sees. Based on the captured scene 60, the landmark selection system 18 can determine that the driver can clearly see the landmark stadium depicted in a pre-stored image 70, but that the landmark building depicted in a pre-stored image 72 is largely obscured. The better visibility of the landmark stadium is at least one of the signals the landmark selection system 18 can use to determine whether to reference the landmark stadium, the landmark building, or both.
As indicated above, functionality of the landmark selection system 18 can be distributed between the mobile system 12 and the server system 14 in any suitable manner. In some implementations, for example, the processing capability of the mobile system 12 is insufficiently robust to implement image processing. The mobile system 12 accordingly can capture photographs and/or video and provide the captured imagery to the server system 14, where the visual landmark selection module executes a video processing pipeline. In other implementations, the mobile system 12 has sufficient processing capability to implement image matching. The server system 14 in this case can provide relevant visual landmark imagery such as the images 70 and 72 to the mobile system 12, and the navigation application 26 can compare the scene 60 to the images 70 and 72 to detect probable matches. In yet other implementations, the mobile system 12 implements a less constrained image processing pipeline and attempts to automatically recognize in the scene 60 objects of certain pre-defined types such as people, small cars, large cars, trucks, traffic lights, billboards, etc.
Next, example methods for generating navigation directions using real-time imagery and for adjusting visual landmark metrics are discussed with reference to
Example Methods for Providing Navigation Directions Using Real-Time Imagery
In an example scenario, a driver request launches a navigation application on her smartphone and requests driving directions to her friends' home. She connects her smartphone to the camera mounted on the windshield of her car and starts driving. As she drives through a busy part of town and approaches the intersection where she must turn left, three objects potentially could serve as visual landmarks: a fast-food restaurant with an easily recognizable logo on the right, a bus stop shelter on the left, and a distinctive building on the left just past the intersection. The scene as captured by the driver's camera indicates that while bus stop shelter is visible, the fast-food restaurant and the distinctive building are obscured by trees. The navigation application accordingly generates the audio message “turn left at the bus stop you will see on your left” when the driver is approximately 200 feet away from the intersection.
The method 100 begins at block 102, where a route for driving to a certain destination from the current location of the user or from some other location is obtained. At block 104, indications of landmarks corresponding to prominent physical objects disposed along the route are retrieved. Each indication can include the coordinates of the corresponding visual landmark and the corresponding pre-stored imagery (e.g., photographs or a video sequence of a short fixed duration). Depending on the implementation, visual landmarks can be retrieved for the entire route or for a portion of the route, e.g., for the current location of the user. In a sense, these visual landmarks are only candidate visual landmarks for the current navigation sessions, and it can be determined that some or all of these visual landmarks are not visible (or, as discussed above, some currently visible visual landmarks may not be selected when better candidates are available).
At block 106, real-time imagery is collected at the vehicle approximately from the vantage point of the driver. The real-time imagery can be one or several still photographs defining a scene. For some image processing techniques, feature comparison or recognition is more reliable when a video stream rather than a single photograph is available, and thus the real-time imagery defining the scene also can be a video feed of a certain duration (e.g., 0.5 sec).
The real-time imagery of the scene then is processed at block 108. To this end, the collected real-time imagery then can be uploaded to a network server. Alternatively, the real-time imagery can be processed at a mobile system such as the user's smartphone or the head unit of the vehicle. For example, the mobile system 12 can receive a representative image of a visual landmark and locally process the real-time imagery using the processing module 22 whether this candidate visual landmark is visible in the real-time imagery. As yet another alternative, processing of the real-time imagery can be distributed between the mobile system and the server system. The processing at block 108 can include comparing the captured scene to the pre-stored imagery of the landmarks obtained at block 106. The processing can produce an indication of which of the visual landmarks identified at block 104 can be identified in the captured scene, and thus probably are visible to the driver.
At block 110, navigation directions referencing the one or more visible visual landmarks are provided to the driver, whereas the visual landmarks identified at block 104 but not located within the scene captured at block 106 are omitted. The instructions can include text to be displayed on the driver's smartphone or projected via the head unit and/or audio announcements, for example. Additionally, a pre-stored image of a visual landmark referenced in the directions can be downloaded from the visual landmark database 52 to the mobile system 12 and displayed in the projected mode on the head unit of the vehicle, so that the user can glance at the display and see to which visual landmark the directions refer.
The method 100 completes after block 110. Thus, in a sense, the system implementing the method 100 uses real-time imagery as a filter applied to the redundant set of visual landmarks. Of course, if more than the necessary number of visual landmarks (typically one) are determined to be visible for a single maneuver, the visual landmarks can be further filtered based on other signals. Some of these signals, including the signals based on user feedback, are discussed below.
Example Methods for Collecting and Utilizing Driver Feedback
Referring back to
Now referring to
The method 150 begins at block 152. Here, the landmark selection system 18 can select a visual landmark for a certain location and maneuver, during navigation. Next, the landmark selection system 18 can provide an indication of the visual landmark to the driver at block 154, and provide a prompt regarding this visual landmark at block 156 so as to assess the quality of the suggestion. For example, the indication can be “after you pass the statue of a bull, turn right on Financial Pl.” To obtain explicit user feedback after the user completes the maneuver by turning right, the follow-up yes/no prompt at block 156 can be “did you see the statue of a bull?” In some implementations, the landmark selection system 18 does not generate a follow-up prompt every time the visual landmark is referenced but rather at a certain relatively low rate, such as once per hundred references to the visual landmarks. Additionally or alternatively, the landmark selection system 18 can collect implicit user feedback by determining whether the user successfully completed the maneuver or missed the turn. Thus, if the prompt above is provided to one hundred drivers over a certain period of time, and only 85% the drivers turn right on Financial Pl. (while the overall success rate for maneuvers specified in the navigation directions and augmented by references to visual landmarks is 99%, for example), it is probable that the statue of a bull is not a good visual landmark. The landmark selection system 18 can utilize any suitable statistical technique to assess the probability of recognizing visual landmarks.
Further, because some users may dislike any follow-up prompts, the landmark selection system 18 can format the reference to the visual landmark at block 154 as a question. Thus, for example, the navigation application can generate the question “do you see the statue of a bull on your right?” If the driver answers in the affirmative, the landmark selection system 18 can immediately provide the complete instruction “after you pass the statue of a bull, turn right on Financial Pl.” Otherwise, the landmark selection system 18 can select the next visual landmark, when available, and generate the next question.
If it is determined at block 158 that the user can see the visual landmark, the flow proceeds to block 160. Otherwise, the flow proceeds to block 162. At block 160, the landmark selection system 18 can adjust the numeric metric for the visual landmark upward to indicate an instance of success. On the other hand, at block 162 the landmark selection system 18 can adjust the numeric metric for the visual landmark downward to indicate an instance of failure. Further, depending on the implementation, the landmark selection system 18 can adjust the metric for a particular time of day, particular weather, particular season, particular lighting conditions, etc.
At block 164, the landmark selection system 18 can also adjust the probability of selecting other landmarks that belong to the same type (or images of landmarks of a certain type). For example, if it determined at block 158 that the driver found a certain billboard to be a useful landmark, the probability of preferring billboards to other types of landmarks can increase. After block 164, the flow proceeds to block 166, where the next maneuver is selected. The flow then returns to block 152, where a set of visual landmarks is selected for the new maneuver and the location of the driver.
Thus, when a redundant set of visual landmarks is available, the landmark selection system 18 can utilize explicit and/or implicit driver feedback to determine which visual landmarks are more likely to be useful for the remainder of the navigation session, and which visual landmarks are likely to be useful to other drivers in the future. The overall accuracy of assessing usefulness of visual landmarks is expected to increase when the method 150 is executed for a large number of navigation sessions, and for a large number of drivers.
In some cases, the method 150 can be extended to other types of navigation directions or geographic suggestions. For example, a navigation system can use the method 150 to determine whether a certain reference to a street name is a reliable reference in navigation directions. Because street signs may be missing or poorly lit, and because some street and road information may be out of date, the navigation system can format certain directions as questions (e.g., “Do you see Elm St. 300 feet ahead?”), receive explicit feedback when the user chooses to comment on the previously provided directions (e.g., “In 300 feet, turn right on Elm St.”—“I cannot see Elm St.”), and/or collect implicit feedback (e.g., missed turn, sudden deceleration prior to the turn).
Further, in a generally similar manner, the devices illustrated in
Example Image Processing Techniques
In some implementations, the landmark selection system 18 compares the captured real-time imagery to pre-stored images to detect a match or absence of a match. As a more specific example, the visual landmark database 52 of
As the camera 20 captures the scene 60, a positioning module operating in the mobile system 12 determines the location from which the scene 60 was captured. The landmark selection system 18 then can retrieve those images of the landmarks depicted in the images 70 and 72 that match the pose of the camera 20 at the time of capture. Thus, the visual landmark database 52 can store numerous photographs of the stadium depicted in
In another implementation, the landmark selection system 18 implements less constrained image processing.
In the example scenario of
The landmark selection system 18 also can process color characteristics of the identified objects. Thus, the instruction above can become “turn where the red sports car is now turning,” which may be more helpful to the driver. Further, the landmark selection system 18 can be configured to recognize alphanumeric characters and generate such instructions as “keep going past the sign that says ‘car wash,’” when the camera captures an image of a person holding up a temporary car wish sign.
In some implementations, the landmark selection system 18 labels every pixel in the scene 60 in accordance with semantic segmentation techniques. For the example scene 60, semantic segmentation can produce an indication of where the sidewalk, the road, and the trees are located. A more robust image processing pipeline generally is required to conduct semantic segmentation, but using semantic segmentation the landmark selection system 18 can identify additional landmarks and/or generate better explanations of where visual landmarks are located. For example, the navigation instruction “turn right after you see a large yellow billboard” can be improved to “turn right after you see a large yellow billboard on the sidewalk.”
Dynamically Identifying Visual Landmarks
Referring back to
At block 302, the landmark selection system 18 can determine a route for guiding a driver to a destination. The route can include a graph traversing several road segments, and the corresponding navigation directions can include a sequence of descriptions of maneuvers. In some implementations, the navigation directions can be generated at the server system 14 and provided to the mobile system 12 in relevant portions.
Next, at block 304, the landmark selection system 18 can receive real-time imagery for a scene, collected at a certain location of the vehicle. Typically but not necessarily, the real-time imagery is collected when the vehicle approaches the location of the next maneuver. The camera pose for the captured imagery approximately corresponds to the vantage point of the driver. When geo-positioning is available, the real-time imagery can be geographically tagged, i.e., include an indication of the location where the real-time imagery was captured.
At block 306, the landmark selection system 18 can identify objects of certain pre-defined types within the captured scene. As discussed above, this identification can be based on training data and can include semantic image segmentation. In some cases, the identification is based on the presence of letters, numbers, and other alphanumeric characters. To this end, the landmark selection system 18 can implement any suitable character recognition technique. Moreover, the landmark selection system 18 may implement both object identification and character recognition to identify objects of pre-defined types with alphanumeric characters.
At block 308, the landmark selection system 18 can determine which of the detected objects appear prominently within the scene. Referring back to
At block 310, the landmark selection system 18 can determine the positions of the one or more prominent objects relative to the current location of the vehicle and/or to the locations of road intersections and other geographic waypoints, in a two- or may three-dimensional coordinate system. Where relevant, the landmark selection system 18 also determine the orientation of the prominent object. Referring back to
At block 312, the landmark selection system 18 can include in the navigation directions a reference to the one or more prominent objects identified at block 306. As discussed above, the landmark selection system 18 can generate such instructions as “turn left on Main. St., where the red sports car is turning” or “turn right on Central St. after the blue billboard.” The instructions can include any suitable combination of text and multimedia.
Modifying Navigation Route using Live States of Traffic Lights
In the scenario schematically illustrated in
Prior to the car 400 reaching the intersection 402, the routing engine 40 (see
The method 450 begins at block 452, where two or more routing options for reaching a certain intermediate point along the route or the endpoint of the route, from a certain location controlled by a traffic light, are identified. At block 454, the current state of the traffic light is determined using real-time imagery captured at the vehicle approaching the location. If the traffic light is determined to be displaying the green arrow, the flow proceeds to block 460, where the first routing option is selected. Otherwise, if the traffic light is determined to not be displaying the green arrow, the flow proceeds to block 462, and the second routing option is selected. The corresponding navigation instruction then is provided to the user at block 464.
Using Real Time Imagery for Lane Guidance and Improving Positioning
In some implementations, the components of the landmark selection system 18 can use real-time imagery to improve lane guidance. In general, positioning solutions such as GPS or Wi-Fi triangulation cannot yield a position fix precise enough to determine in which lane the vehicle is currently located. Using the techniques discussed above and/or other suitable techniques, the landmark selection system 18 can recognize lane marking (e.g., white and yellow divider strips), arrows and highway signs painted on the road, the dimensionality of lanes based on detected boundaries of the sidewalk, presence of other vehicles from which the existence of other lanes can be inferred, etc.
For example, the camera 20 of
Using lane recognition, the navigation application 26 can provide lane-specific guidance. For example, the navigation application 26 can guide the driver to avoid left-turn-only or right-turn-only lanes when the vehicle needs to travel straight, generate more relevant warnings regarding merging left or right, warn the driver when he or she is in a lane that is about to end, etc.
In some implementations, the navigation application 26 and/or the navigation instructions generator 42 can also use lane data available in the map database 50. For example, the navigation application 26 can receive an indication that the vehicle is currently traveling in a three-lane road segment, based on the most recent GPS or Wi-Fi positioning fix. Using this information along with real-time imagery, the navigation application 26 can determine in which lane the vehicle is travelling and generate appropriate instructions when necessary.
Generating Warnings about Potential Traffic Violations using Real Time Imagery
Further, the navigation application 26 can use the imagery captured by the camera 20 to automatically generate warnings regarding potential traffic violations. For example, drivers have been observed making an illegal right-on-red turn onto Shoreline Blvd. from US 101 North in Mountain View, Calif. It is believed that many drivers simply do not notice the “no right on red” sign. While the map database 50 can store an indication that the right turn on red is not allowed at this road junction, preemptively generating a warning whenever the driver is about to turn onto Shoreline Blvd. can be distracting and unnecessary, as the driver may be turning right on green.
Accordingly, the landmark selection system 18 can process the state of the traffic light as discussed above when the driver enters the ramp. When the state of the traffic light is determined to be red, and when the driver appears to start moving based on the positioning data or vehicle sensor data, the landmark selection system 18 can automatically provide an instruction “no right no red here!,” for example. To determine whether such an instruction should be provided, the landmark selection system 18 also can consider statistical indicators for the road junction, when available. For example, an operator can manually provision the server system 14 with an indication that this particular Shoreline Blvd exit is associated with frequent traffic violations. These indications also can be user-generated.
In some embodiments, the landmark selection system 18 also can process and interpret the “no right on red” sign prior to generating the warning. In particular, the map database 50 may not have specific turn restriction data for a certain residential area.
The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
Similarly, the methods or routines described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
The one or more processors may also operate to support performance of the relevant operations in a cloud computing environment or as a software as a service (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
Upon reading this disclosure, those of ordinary skill in the art will appreciate still additional alternative structural and functional designs for the systems for using real-time imagery and/or driver feedback in navigation. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.
1. A method for generating navigation directions for drivers, the method comprising:
- retrieving, by one or more processors, a video feed from a camera operating in a vehicle to obtain real-time imagery of an intersection controlled by a traffic light; determining, by the one or more processors, a current state of the traffic light;
- identifying, by one or more processors, at least two routing options for guiding the vehicle from the intersection to a destination;
- selecting one of the first routing option or the second routing option based on the determined state of the traffic light; and
- providing navigation instructions corresponding to the selected routing option via a user interface.