EDG Backup and Restore

Administrators can back up and restore an EDG workspace either by using the built-in Backup and Restore Utility or by manually backing up and restoring the workspace files. These two methods are described in more detail in the sections below.

The Backup and Restore Utility can back up EDG while the system is online and operational; while the manual backup process requires EDG to be shut down while its files are backed up. Both methods require the EDG web application to be restarted when being restored from a backup.

Although these two methods can both be used to back up and restore the same EDG server at different times, their artifacts are not interchangeable. That is, backups created by the Backup and Restore Utility cannot be used by the manual restore process, and vice versa.

Administrators can also back up and restore Explorer using these same options.

Backup and Restore Utility

Administrators can use the EDG Backup and Restore Utility to back up and restore the EDG workspace and its collections. Please test this utility in a development environment to get familiar with the process before using it in a production environment. This utility should not replace your organization’s existing backup strategy; it should be used to supplement and enhance the existing strategy.

Important notes

Only workspaces configured to use Separate TDB, Shared TDB, or Data Platform datastores are supported. Workspaces configured to use an RDBMS datastore are not supported and must be backed up manually.
When restoring a workspace, the workspace performing the restore must match the workspace that originally generated the backup in two ways:
- The two workspaces must be configured to use the same database type (e.g. Shared TDB or Data Platform)
- The two workspaces must have the same major and minor version numbers; but the patch version numbers can differ.
Each backup is created as a zip file. The zip file can be either downloaded to the user’s local machine or stored as an object in an Amazon S3 bucket. See External System Integration Management and S3 Configuration.
Everything in the workspace is backed up and restored, with the exception of the read-only EDG system files.
During the restore process, all existing TDB databases in the workspace will be deleted and replaced by only those TDB databases present in the backup file.
Data Platform-backed workspaces can be backed up and restored using this utility, but the restore process requires manual intervention. See the discussion below.
Once a restore is in progress, do not navigate away from the Backup/Restore page or shut down the server. It may take a while to perform the restore, so allow it to finish. If an error occurs, it may be necessary to restart the web server.

Utility Backup

Administrators can use the Backup and Restore Utility to back up the EDG workspace via either the Backup/Restore page or an API call; and either method can be used to download a backup file or store the file in an Amazon S3 bucket.

The backup process is straightforward: An administrator requests a backup and the utility generates a zip file containing the requested backup.

Backup/Restore Page

Administrators can use the Backup/Restore page to generate a backup using the EDG UI.

Download

To download the backup file, leave the backup-related Amazon S3 settings unconfigured and press the Create Local Backup button. The backup file will be created and downloaded like any other browser download.

Amazon S3 Bucket

To store the backup file in an Amazon S3 bucket, configure the backup-related Amazon S3 settings. (See External System Integration Management and S3 Configuration.) If the appropriate Amazon S3 settings are configured, the Backup/Restore page will display the name of the target Amazon S3 bucket. If the settings are not completely configured, the Backup/Restore page will display links to the appropriate EDG pages for configuring those settings. Once the settings are configured, press the Create Backup button. The backup file will be created and uploaded to the configured Amazon S3 bucket with a time-stamped object name that will look something like this: TopBraid-EDG-Studio-Backup-20220615T195844Z.zip.

API

Administrators can use the Backup and Restore API to generate a backup using any HTTP client. This allows backups to be created using scripts that can be executed automatically by scheduled jobs and the like.

More details about the Backup and Restore API can be found in the EDG online OpenAPI documentation.

Download

To download a newly-generated backup file, use an HTTP GET request similar to this curl command:

curl --request GET \
  http://localhost/tbl/backup \
  --header "Accept: application/zip" \
  --header "Authorization: Basic xxxxxxxxxxxxxx" \
  --output backup.zip

Amazon S3 Bucket

To store a newly-generated backup file in an Amazon S3 bucket, use an HTTP POST request similar to this curl command:

curl --request POST \
  http://localhost/tbl/s3-backups \
  --header "Accept: application/json" \
  --header "Authorization: Basic xxxxxxxxxxxxxx"

Invalid URIs

Warning

The Backup Utility is unable to export and save any graph that contains an invalid URI (e.g. a URI with a space in its path). As a result, if any graph in the EDG workspace contains an invalid URI, the backup will fail; sometimes without any obvious user feedback.

Although rare, invalid URIs can occur in an EDG workspace’s graphs. When backing up an EDG workspace with an invalid URI, with either the Backup/Restore page or an API call, the backup service will produce a valid zip file but not a valid backup file. The zip file will contain only the graphs successfully backed up before the backup encountered the invalid data along with a truncated (and likely unusable) Turtle file of the graph that contained the invalid data. The zip file will also lack the metadata necessary for a restore. If this invalid backup zip file is used for a restore, the restore will fail to initiate with the message “Backup zip file metadata is missing”.

In the case of a downloaded backup, since the backup file is being streamed directly to the client’s download folder, there is no way for the EDG server to communicate a failed backup to the client, whether it is the EDG UI or API client. In this case, there are two ways to determine that a backup failed because of an invalid URI:

The error will be captured in the EDG log, which can be viewed from the Server Administration page. (See View EDG Log.)
The backup zip file can be inspected directly. See the discussion below.

In the case of a backup uploaded to Amazon S3, any failure will be reported directly to the client, whether it is the EDG UI or an API client. The invalid backup zip file will still be uploaded to Amazon S3, but, if it is used for a restore, the restore will fail to initiate. Like the downloaded backup, the same two techniques described above can be used to determine that a backup failed because of an invalid URI.

To determine whether a backup zip file is valid, inspect its zip file comment. Use a utility like zipnote on Linux or 7zip on Windows to view the zip file’s comment. An invalid backup will have no comment; while a valid backup will have a comment that looks something like this:

{"productVersion":"7.3.0.v20220628-2156","databaseType":"SharedTDB"}

If the EDG workspace contains invalid URIs, those URIs must be either deleted or changed to valid URIs before a backup can succeed. The SPARQL query below will return a list of the invalid URIs in an EDG workspace, along with each URI’s graph and triple position (subject, predicate, or object):

PREFIX teamwork: <http://topbraid.org/teamwork#>
SELECT ?graph ?bad_uri ?position
WHERE {
    { () teamwork:graphsUnderTeamControl ?graph }
    UNION
    { () teamwork:graphsUnderTeamControl (?x ?graph) }
    GRAPH ?graph {
        SELECT DISTINCT ?bad_uri ?position {
            { ?uri ?p1 ?o1 . FILTER (isIRI(?uri) && !COALESCE(isIRI(IRI(STR(?uri))), false)) . BIND ("subject" AS ?position) }
            UNION
            { ?s2 ?uri ?o2 . FILTER (isIRI(?uri) && !COALESCE(isIRI(IRI(STR(?uri))), false)) . BIND ("predicate" AS ?position) }
            UNION
            { ?s3 ?p3 ?uri . FILTER (isIRI(?uri) && !COALESCE(isIRI(IRI(STR(?uri))), false)) . BIND ("object" AS ?position) }
            BIND (STR(?uri) AS ?bad_uri)
        }
    }
}
ORDER BY ?graph ?bad_uri DESC(?position)

If an invalid URI cannot be repaired using the EDG UI (e.g. if the URI is in a .tch graph), then an ADS script like the one below can be used to repair a graph’s invalid URIs. This example will replace any spaces in object URIs in the geo.tch graph with underscores:

graph.transaction('urn:x-evn-master:geo.tch', 'Repairing invalid URIs', () => {
    let triples = graph.triples(null, null, null, true);
    triples.forEach(t => {
        if (t.object.isURI() && t.object.uri.includes(' ')) {
            graph.remove(t.subject, t.predicate, t.object);
            graph.add(t.subject, t.predicate, graph.namedNode(t.object.uri.replaceAll(' ', '_')));
        }
    })
});

Note

If it is possible to replicate the creation of invalid URIs in an EDG workspace, please notify TopQuadrant support, as this may well be a bug.

Utility Restore

Administrators can use the Backup and Restore Utility to restore a previously backed up EDG workspace via either the Backup/Restore page or an API call; and either method can be used to upload the backup file to be used for the restore or fetch the file from an Amazon S3 bucket.

To restore an EDG workspace, an administrator initiates the restore with a specified backup file, either an uploaded file or a file stored in an Amazon S3 bucket. This backup file is staged on the EDG server and, in most cases, a restart of the EDG web application is automatically initiated. Once EDG is restarted, it will wipe clean the existing workspace and replace it with the contents of the staged backup file. When EDG has completed its initialization, it will have been restored to the state it was in when the backup file was created.

In some cases, initiating a restore will not automatically restart the EDG web application, and the entire web server must be manually restarted:

EDG Studio: EDG Studio does not support restarting the EDG web application. The entire web server must be manually restarted.
Windows server: EDG cannot be restarted when the web server is running on Windows. The entire web server must be manually restarted.
Data Platform: The restoration of a EDG workspace running on Data Platform requires some manual intervention; so it cannot be automatically restarted. See the discussion below.

Note

When initiating a restore with the Backup and Restore Utility, since the user-specified backup file must be staged on the EDG server, there must be enough disk space available to the server to hold the backup file. If there is not enough disk space, the restore will fail.

Backup/Restore Page

Administrators can use the Backup/Restore page to initiate a restore using the EDG UI.

Upload

To upload a backup zip file for restore, press the Choose File button and select the desired file from the local workstation; then press the Restore button. The backup file will be uploaded and staged on the EDG server and a system restart initiated.

Amazon S3 Bucket

To use a backup zip file stored in an Amazon S3 bucket for restore, configure the backup-related Amazon S3 settings. Once the appropriate Amazon S3 settings are configured, the Backup/Restore page will display a list of the objects in the configured Amazon S3 bucket. Press the Restore button next to the desired file in the list. That backup file will be downloaded from Amazon S3 and staged on the EDG server and a system restart initiated.

API

Administrators can use the Backup and Restore API to initiate a restore using any HTTP client. This allows restores to be initiated using scripts that can be executed automatically by scheduled jobs and the like.

More details about the Backup and Restore API can be found in the EDG online OpenAPI documentation.

Upload

To upload a backup file for restore and initiate a system restart, use an HTTP POST request similar to this curl command:

curl --request POST \
  http://localhost/tbl/backup \
  --header "Accept: */*" \
  --header "Content-Type: multipart/form-data" \
  --header "Authorization: Basic xxxxxxxxxxxxxx" \
  --include \
  --form "file=@TopBraid-EDG-Studio-Backup-20220621T041100Z.zip;type=application/zip"

Amazon S3 Bucket

To use a backup zip file stored in an Amazon S3 bucket for restore and initiate a system restart, use an HTTP POST request similar to this curl command:

curl --request POST \
  http://localhost/tbl/s3-backups/TopBraid-EDG-Studio-Backup-20220621T041100Z.zip/backup \
  --header "Accept: application/json" \
  --header "Authorization: Basic xxxxxxxxxxxxxx"

To list the backup zips file stored in an Amazon S3 bucket, use an HTTP GET request similar to this curl command:

curl --request GET \
  http://localhost/tbl/s3-backups \
  --header "Accept: application/json" \
  --header "Authorization: Basic xxxxxxxxxxxxxx"

The keys of the objects in the returned list can be used in the path of the POST request above to indicate which backup file is to be used in the restore.

Data Platform

Administrators can use either the Backup and Restore Utility or the Backup and Restore API to initiate a restore of a set of clustered Data Platform EDG workspaces; but this will only stage the backup-related files on the EDG web server from which the restore was initiated. It will not restart any of the clustered EDG workspaces, because Data Platform restores must be completed manually.

When a Data Platform workspace is backed up using the Backup Utility or API, the produced zip file contains two nested zip files:

backup.zip contains the files from the EDG node that originally performed the backup. These files will be the same across all the EDG nodes.
dp-server-patch-logs.zip contains the Data Coordinator patch logs, which will be used to restore the Data Coordinator server.

Once the backup files are staged, follow these steps:

Shut down all the EDG nodes.
Shut down the Data Coordinator server.
Replace the Data Coordinator server’s --base (patch logs) directory with the contents of dp-server-patch-logs.zip. This zip file can be found staged on the EDG node that initiated the restore, typically in the Tomcat directory ./webapps/edg/WEB-INF/restore.
Restart the Data Coordinator server.
Restart the EDG node that initiated the restore, automatically triggering its restoration from the locally staged backup.zip file.
Once this EDG node is successfully restored, shut it down again.
Replace the other EDG nodes’ edg/workspace directory trees with the restored node’s edg/workspace directory tree.
Restart all the EDG nodes.

Manual Backup and Restore

Administrators can manually back up and restore an EDG workspace’s files. The steps necessary depend on the workspace’s configured datastore:

TDB (Shared or Separate)
Data Platform
RDBMS

Manual Backup

TDB

Steps to back up an EDG workspace that is configured to use Separate or Shared TDB:

Shut down the EDG web server.
Back up the entire EDG workspace directory tree, edg/workspace; e.g. by adding it to a zip or tar. This backup will include the workspace’s TDB databases.
Store this backup as appropriate.
Once the backup is complete, restart the EDG web server.

Data Platform

Steps to back up a cluster of EDG workspaces that are configured to use Data Platform:

Shut down all the EDG nodes.
Shut down the Data Coordinator server.
Back up the entire EDG workspace directory tree, edg/workspace, of one EDG node; e.g. by adding it to a zip or tar.
Back up the Data Coordinator server’s --base (patch logs) directory; e.g. by adding it to a zip or tar.
Store these two backups together as appropriate.
Once the backup is complete, restart the Data Coordinator server.
Restart all the EDG nodes.

RDBMS

Steps to back up an EDG workspace that is configured to use RDBMS:

Shut down the EDG web server.
Back up the entire EDG workspace directory tree, edg/workspace; e.g. by adding it to a zip or tar.
Back up the configured RDBMS server; ensuring the appropriate database or schema is included in the backup. See the RDBMS documentation for more information.
Store these two backups together as appropriate.
Once the backup is complete, restart the EDG web server.

Manual Restore

TDB

Steps to restore an EDG workspace that is configured to use Separate or Shared TDB:

Shut down the EDG web server.
Replace the entire EDG workspace directory tree, edg/workspace, with the contents of a TDB workspace backup.
Restart the EDG web server.

Data Platform

Steps to restore a cluster of EDG workspaces that are configured to use Data Platform:

Shut down all the EDG nodes.
Shut down the Data Coordinator server.
Replace the entire EDG workspace directory tree, edg/workspace, of each EDG node with the contents of a Data Platform node workspace backup.
Replace the Data Coordinator server’s --base (patch logs) directory with the contents of the corresponding Data Coordinator backup.
Restart the Data Coordinator server.
Restart all the EDG nodes.

RDBMS

Steps to restore an EDG workspace that is configured to use RDBMS:

Shut down the EDG web server.
Replace the entire EDG workspace directory tree, edg/workspace, with the contents of an RDBMS workspace backup.
Restore the configured RDBMS server with the corresponding RDBMS backup. See the RDBMS documentation for more information.
Restart the EDG web server.