PSBits

Simple (relatively) things allowing you to dig a bit deeper than usual.

View on GitHub

Forensic value of File History database

File History is a Windows feature that allows users to automatically create a snapshot of selected files and folders on another drive. Users can select their own schedule, retention period, included folders, backup location, etc. To make the solution working efficiently, Windows keeps the history of operations related to File History in a dedicated ESE (Extensible Storage Engine, aka Jet Blue) database located in %localappdata%\Microsoft\Windows\FileHistory\Configuration\. The database is stored in two mirrored files Catalog1.edb, and Catalog2.edb. Analysts should pay attention to File History schedule and/or retention. The default values are: snapshot once per hour, never purge. If some changes (such as file creation and deletion) happen between two consecutive snapshots, they may be not reflected in the database.

The content of the database is not officially documented, however one of Microsoft patents (US9824091B2, available at https://patents.google.com/patent/US9824091B2) provides detailed information about tables, structures, field meaning, etc. Database diagram

The description seems to be accurate, even if it is a patent description, and not a feature documentation. Some fields (or even tables, such as library) may exist in real database but are not mentioned in the patent.

The real Catalog1.edb file contains the following tables:

The structure may be represented by the following graph. The graph in the patent document is slightly different but the overall idea remains very similar.

erDiagram
    namespace {
        int id PK
        int parentId FK "string.id"
        int childId FK "string.id"
        int status "described in patent document"
        int fileAttrib "see GetFileAttributes()"
        FILETIME fileCreated
        FILETIME fileModified
        int usn "NTFS USN Journal entry ID, may be 0"
        int tCreated FK "timestamp from backupset"
        int tVisible FK "timestamp from backupset, may be -1"
        int fileRecordId FK "file.id, may be 0"
    }

    file {
        int id PK
        int parentId FK "string.id"
        int childId FK "string.id"
        int state "described in patent document"
        int status "described in patent document"
        int fileSize
        int tQueued FK "timestamp from backupset"
        int tCaptured FK "timestamp from backupset"
        int tUpdated FK "timestamp from backupset"
    }

    string {
        int id PK
        string string
    }

    backupset {
        int id PK
        FILETIME timestamp 
    }

    library {
        int id PK
        int parentId "???"
        int childId "???"
        int tCreated FK "timestamp?"
        int tVisible FK "timestamp?, may be -1"
    }

    string ||--|{ namespace : ""
    backupset |o--|{ namespace : ""
    string ||--|{ file : ""
    file |o--|{ namespace : ""
    backupset ||--|{ file : ""
    backupset |o--|{ library : ""

Despile lack of an official documentation, some subset of the data seems to be relatively easy to understand and interpret, giving an analyst some valuable forensics information. Scenarios may include:

To simplify processing, the data (all columns from all tables) may be exported to CSV files using the attached PowerShell script. Using PowerShell may slow down the processing (e.g. comparing to native binaries such as Nirsoft ESEDatabaseView), but the portability of such solution seems to justify scripted approach.

As the data structures are relatively simple, there is no need to create a dedicated analytical tool to cover all possible scenarios. Instead, importing into the Excel tables will allow an analyst to manipulate and filter the data according to current needs. Tables can be joined with VLOOKUP, INDEX/MATCH, XLOOKUP, XMATCH and other standard Excel functions. In case of very large databases, an analyst should pre-filter the data to make sure that no more than 1048576 (2^20) rows appear in the source CSV files.