Incapsula Management Console Event: Root Cause Analysis

Yesterday the Incapsula management console was unavailable for several hours starting at 9:09 UTC. While the Incapsula service continued to function as normal, our customers were unable to log into the console.  This prevented them from viewing their network and application layer traffic, and implementing changes to their delivery and security logic.

The issue stemmed from human error during routine database administration. An engineer inadvertently executed a “drop database” operation on the production database for the UI. The operation was supposed to be done on our disaster recovery site, with no expected effect on production. This caused a system-wide failure of the Incapsula management console.

Since the command was executed on the master database server, it was replicated to all slaves, requiring the operations team to restore the master database from backup instead of promoting a slave server to master. Restoring the database delayed full resolution of the issue by a few hours. Users were updated about the incident via the Incapsula status page.

We resolved the console outage at 1800 UTC, and no data loss has been detected. All clients are now able to access the management console. To ensure this does not happen again, we’re reviewing our database administration policies, database user permissions, and database cluster topology to ensure that slave servers are available for promotion when needed.

The status of the Incapsula service is always available on the Incapsula status page. In addition, you may subscribe to status alerts via email and SMS by sending an email request to support@incapsula.com.

Keep your finger on the pulse

Sign up for updates from Imperva, our affiliated entities and industry news.