Patents by Inventor David Lomet
David Lomet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9003162Abstract: A request to modify an object in storage that is associated with one or more computing devices may be obtained, the storage organized based on a latch-free B-tree structure. A storage address of the object may be determined, based on accessing a mapping table that includes map indicators mapping logical object identifiers to physical storage addresses. A prepending of a first delta record to a prior object state of the object may be initiated, the first delta record indicating an object modification associated with the obtained request. Installation of a first state change associated with the object modification may be initiated via a first atomic operation on a mapping table entry that indicates the prior object state of the object. For example, the latch-free B-tree structure may include a B-tree like index structure over records as the objects, and logical page identifiers as the logical object identifiers.Type: GrantFiled: June 20, 2012Date of Patent: April 7, 2015Assignee: Microsoft Technology Licensing, LLCInventors: David Lomet, Justin Levandoski, Sudipta Sengupta
-
Patent number: 8930321Abstract: This patent application relates to enhanced logical recovery techniques for redo recovery operations of a system with an unbundled storage engine. These techniques can be implemented by utilizing an enhanced logical recovery approach in which a dirty page table (DPT) is constructed based on information logged during normal execution. The unbundled storage engine can include a transaction component (TC) that is architecturally independent of a data component (DC). These techniques can enhance redo recovery operations by mitigating the resources needed to determine whether previously executed operations sent from the TC to the DC are to be repeated in response to a recovery-initiating event. This can include using the DPT to avoid fetching every data page corresponding to every previously executed operation received by the DC during recovery and/or pre-fetching data pages and/or index pages that correspond to PIDs in the DPT.Type: GrantFiled: June 30, 2010Date of Patent: January 6, 2015Assignee: Microsoft CorporationInventors: David Lomet, Kostas Tzoumas
-
Patent number: 8868514Abstract: A distributed system with transaction support may have a transaction component and one or more data components. The transaction component may manage a transaction using a log sequence number for each operation, and then transmit operations to one or more data components with log sequence numbers. The data components may perform the data operations in an idempotent manner and return a reply. The transaction component may then write the operation, its log sequence number, and information from the reply message to its log. The transaction component is able to commit a transaction, as well as retry or undo portions of a transaction, by using the information stored on its log. This may be possible even when a single transaction uses multiple data components, which may be located on different devices or manage separate and independent data sources.Type: GrantFiled: January 7, 2011Date of Patent: October 21, 2014Assignee: Microsoft CorporationInventors: David Lomet, Mohamed Mokbel, Justin Levandoski, Keliang Zhao
-
Publication number: 20130346725Abstract: A request to modify an object in storage that is associated with one or more computing devices may be obtained, the storage organized based on a latch-free B-tree structure. A storage address of the object may be determined, based on accessing a mapping table that includes map indicators mapping logical object identifiers to physical storage addresses. A prepending of a first delta record to a prior object state of the object may be initiated, the first delta record indicating an object modification associated with the obtained request. Installation of a first state change associated with the object modification may be initiated via a first atomic operation on a mapping table entry that indicates the prior object state of the object. For example, the latch-free B-tree structure may include a B-tree like index structure over records as the objects, and logical page identifiers as the logical object identifiers.Type: ApplicationFiled: June 20, 2012Publication date: December 26, 2013Applicant: MICROSOFT CORPORATIONInventors: David Lomet, Justin Levandoski, Sudipta Sengupta
-
Publication number: 20120179645Abstract: A distributed system with transaction support may have a transaction component and one or more data components. The transaction component may manage a transaction using a log sequence number for each operation, and then transmit operations to one or more data components with log sequence numbers. The data components may perform the data operations in an idempotent manner and return a reply. The transaction component may then write the operation, its log sequence number, and information from the reply message to its log. The transaction component is able to commit a transaction, as well as retry or undo portions of a transaction, by using the information stored on its log. This may be possible even when a single transaction uses multiple data components, which may be located on different devices or manage separate and independent data sources.Type: ApplicationFiled: January 7, 2011Publication date: July 12, 2012Applicant: MICROSOFT CORPORATIONInventors: David Lomet, Mohamed Mokbel, Justin Levandoski, Keliang Zhao
-
Publication number: 20120005168Abstract: This patent application relates to enhanced logical recovery techniques for redo recovery operations of a system with an unbundled storage engine. These techniques can be implemented by utilizing an enhanced logical recovery approach in which a dirty page table (DPT) is constructed based on information logged during normal execution. The unbundled storage engine can include a transaction component (TC) that is architecturally independent of a data component (DC). These techniques can enhance redo recovery operations by mitigating the resources needed to determine whether previously executed operations sent from the TC to the DC are to be repeated in response to a recovery-initiating event.Type: ApplicationFiled: June 30, 2010Publication date: January 5, 2012Applicant: MICROSOFT CORPORATIONInventors: David Lomet, Kostas Tzoumas
-
Patent number: 7668845Abstract: An abstract indexing structure called a C-tree that it is an access method that exploits search space “containment” is described. The C-tree structure includes objects spaces that overlap, but search spaces that are disjoint. Every part of the search space is indexed by some node. Moreover, every object has a unique location in the index, despite the object space overlap. The C-tree is a tree of pages, typically disk based, like a B-tree, but it handles a greater variety of data, e.g., spatial data, temporal data, etc. In particular, it can handle the indexing of objects that have extents, not merely point data, and hence objects that can overlap. These objects can be indexed without remapping them to a higher dimensional point space. At least one particular aspect of the invention is premised on a notion of an object being contained in some kind of search space.Type: GrantFiled: July 15, 2004Date of Patent: February 23, 2010Assignee: Microsoft CorporationInventors: David Lomet, Betty Joan Salzberg
-
Patent number: 7668961Abstract: The present invention leverages a unilaterally-based interaction contract that enables applications to have a persistent state and ensures an exactly-once execution despite failures between and by communicating entities, permitting disparate software applications to be robustly supported in an environment where little is known about the implementation of the interaction contract. In one instance of the present invention, a web services interaction contract provides a communicating application with duplicate commit request elimination, persistent state transitions, and/or unique persistent reply requests. The present invention permits this interaction contract to be supported by, for example, a persistent application, a workflow, a transaction queue, a database, and a file system to facilitate in providing idempotent executions for requests from a communicating application.Type: GrantFiled: September 23, 2004Date of Patent: February 23, 2010Assignee: Microsoft CorporationInventor: David Lomet
-
Patent number: 7424499Abstract: Lazy timestamping in a transaction time database is performed using volatile reference counting and checkpointing. Volatile reference counting is employed to provide a low cost way of garbage collecting persistent timestamp information about a transaction by identifying exactly when all record versions of a transaction are timestamped and the versions are persistent. A volatile timestamp (VTS) table is created in a volatile memory, and stores timestamp, reference count, transaction ID, and LSN information. Active portions of a persisted timestamp (PTS) table are stored in the VTS table to provide faster and more efficient timestamp processing via accesses to the VTS table information. The reference count information is stored only in the VTS table for faster access. When the reference count information decrements to zero, it is known that all record versions that were updates for a transaction were timestamped.Type: GrantFiled: January 21, 2005Date of Patent: September 9, 2008Assignee: Microsoft CorporationInventor: David Lomet
-
Publication number: 20080071809Abstract: A data structure, added to a modified form of the Blink-tree data structure, tracks delete states for nodes. The index delete state (DX) indicates whether it is safe to directly access an index node without re-traversing the B-tree. The DX state is maintained globally, outside of the tree structure. The data delete state (DD) indicates whether it is safe to post an index term for a new leaf node. A DD state is maintained in each level 1 node for its leaf nodes. Delete states indicate whether a specific node has not been deleted, or whether it may have been deleted. Delete states are used to remove the necessity for atomic node splits and chains of latches for deletes, while not requiring retraversal. This property of not requiring a retraversal is exploited to simplify the tree modification operations.Type: ApplicationFiled: September 21, 2007Publication date: March 20, 2008Applicant: Microsoft CorporationInventor: David Lomet
-
Publication number: 20070226705Abstract: Architecture that facilitates exactly-once application execution via a wrap-up procedure. A logless component (LLcom) processes a last-read activity (or wrap-up read) against external state when processing a method of a middle-tier environment. A client logging component logs results of the method to a client log. The wrap-up procedure of the LLcom is initiated to read external state. External state is read after all GIR requests have been processed but just before the method returns to the client. The LLcom method returns to the client at a point in the caller method that immediately issues the return for the method call. Subsequent client requests become replayable due to the logged results. With client logging, both idempotence of requests and guiding execution back to its original execution path is accomplished. Alternatively, logging can occur via a middle-tier decision service.Type: ApplicationFiled: February 15, 2006Publication date: September 27, 2007Applicant: Microsoft CorporationInventor: David Lomet
-
Publication number: 20070192373Abstract: Recovery processing of logless components is disclosed. Logless components in middle-tier systems can be checkpointed to provide faster recovery. In particular, a client system, executing a persistent component and itself logging, initiates a snapshot method that returns to the client the values of all variables and other state of the logless component during normal execution. The client writes this data to the client log along with information about the initiation call. To recover the logless component, the client invokes a restore method which takes as an argument values returned from the snapshot method and included in the checkpointing portion of the client log relating to the logless component. This information is sufficient for recreating the logless component which is logically identical to the failed logless component and for setting its state to the checkpoint state. This can occur transparently and shorten the recovery time in providing exactly-once execution.Type: ApplicationFiled: February 15, 2006Publication date: August 16, 2007Applicant: Microsoft CorporationInventor: David Lomet
-
Publication number: 20070067266Abstract: A system and methodology that facilitate persistence for an execution state is provided. The system and methodology employ generalized “idempotent” request(s) that have the property they only execute a request once, and always return the result of that first execution should the request be repeated so as to ensure exactly once execution. A calling middle tier component can exploit these procedures so that it can engage in exploratory reads (which are not idempotent) yet still be able to have their state recovered via replay based on the log at the client and the results retained by the generalized idempotent procedures provided by back end services. The system and methodology can be employed to facilitate successful replay of logless persistent component(s), (e.g., components that do not themselves log any information).Type: ApplicationFiled: September 21, 2005Publication date: March 22, 2007Applicant: Microsoft CorporationInventor: David Lomet
-
Publication number: 20070016617Abstract: Systems and methods that create persistence for an execution state via employing a logless component with persistent stateful functionality. The logless component is introduced as part of a runtime service that supplies transparent state persistence and automatic recovery for component based applications. Such logless component can avoid logging at a middle tier, and exploit logging that is already performed at a client side and/or various end point servers. The execution state can be re-created entirely via replay of the component execution, without the need to replicate the execution state or save the component's interactions in the middle tier.Type: ApplicationFiled: July 12, 2005Publication date: January 18, 2007Applicant: Microsoft CorporationInventor: David Lomet
-
Publication number: 20060259518Abstract: The subject invention pertains to data store corruption recovery. More specifically, the invention concerns systems and methods for identifying corrupt data in a manner that prevents de-committing or removal of valid or consistent transactions from a database. This can be accomplished at least in part by logging the identities of data items that a transaction reads. Furthermore, the subject invention provides for employment of a multi-version (or transaction-time) database to reduce significantly reduce any down time or database unavailability caused by a corrupt transaction and associated corrupt data items. Accordingly, no backups need to be installed and only updates by the original corrupt transaction and transactions that read corrupt data need to be de-committed or removed.Type: ApplicationFiled: May 10, 2005Publication date: November 16, 2006Applicant: Microsoft CorporationInventors: David Lomet, Roger Barga
-
Publication number: 20060248386Abstract: Persistent components are provided across both process and server failures, without the application programmer needing take actions for component recoverability. Application interactions with a stateful component are transparently intercepted and stably logged to persistent storage. A “virtual” component isolates an application from component failures, permitting the mapping of a component to an arbitrary “physical” component. Component failures are detected and masked from the application. A virtual component is re-mapped to a new physical component, and the operations required to recreate a component and reinstall state up to the point of the last logged interaction is replayed from the log automatically.Type: ApplicationFiled: April 7, 2006Publication date: November 2, 2006Applicant: Microsoft CorporationInventors: Roger Barga, David Lomet
-
Publication number: 20060167960Abstract: Lazy timestamping in a transaction time database is performed using volatile reference counting and checkpointing. Volatile reference counting is employed to provide a low cost way of garbage collecting persistent timestamp information about a transaction by identifying exactly when all record versions of a transaction are timestamped and the versions are persistent. A volatile timestamp (VTS) table is created in a volatile memory, and stores timestamp, reference count, transaction ID, and LSN information. Active portions of a persisted timestamp (PTS) table are stored in the VTS table to provide faster and more efficient timestamp processing via accesses to the VTS table information. The reference count information is stored only in the VTS table for faster access. When the reference count information decrements to zero, it is known that all record versions that were updates for a transaction were timestamped.Type: ApplicationFiled: January 21, 2005Publication date: July 27, 2006Applicant: Microsoft CorporationInventor: David Lomet
-
Publication number: 20060090095Abstract: A method and system for increasing server cluster availability by requiring at a minimum only one node and a quorum replica set of replica members to form and operate a cluster. Replica members maintain cluster operational data. A cluster operates when one node possesses a majority of replica members, which ensures that any new or surviving cluster includes consistent cluster operational data via at least one replica member from the immediately prior cluster. Arbitration provides exclusive ownership by one node of the replica members, including at cluster formation, and when the owning node fails. Arbitration uses a fast mutual exclusion algorithm and a reservation mechanism to challenge for and defend the exclusive reservation of each member. A quorum replica set algorithm brings members online and offline with data consistency, including updating unreconciled replica members, and ensures consistent read and update operations.Type: ApplicationFiled: September 12, 2005Publication date: April 27, 2006Applicant: Microsoft CorporationInventors: Michael Massa, David Dion, Rajsekhar Das, Rushabh Doshi, David Lomet, Gor Nishanov, Philip Bernstein, Rod Gamache, Rohit Jain, Sunita Shrivastava
-
Publication number: 20060080670Abstract: The present invention leverages a unilaterally-based interaction contract that enables applications to have a persistent state and ensures an exactly-once execution despite failures between and by communicating entities, permitting disparate software applications to be robustly supported in an environment where little is known about the implementation of the interaction contract. In one instance of the present invention, a web services interaction contract provides a communicating application with duplicate commit request elimination, persistent state transitions, and/or unique persistent reply requests. The present invention permits this interaction contract to be supported by, for example, a persistent application, a workflow, a transaction queue, a database, and a file system to facilitate in providing idempotent executions for requests from a communicating application.Type: ApplicationFiled: September 23, 2004Publication date: April 13, 2006Applicant: Microsoft CorporationInventor: David Lomet
-
Publication number: 20060041602Abstract: Logical logging to extend recovery is described. In one aspect, a dependency cycle between at least two objects is detected. The dependency cycle indicates that the two objects should be flushed simultaneously from a volatile main memory to a non-volatile memory to preserve those objects in the event of a system crash. One of the two objects is written to a stable of to break the dependency cycle. The other of the two objects is flushed to the non-volatile memory. The object that has been written to the stable log is then flushed to the stable log to the non-volatile memory.Type: ApplicationFiled: October 7, 2005Publication date: February 23, 2006Applicant: Microsoft CorporationInventors: David Lomet, Mark Tuttle