Salto KS Platform degraded performance

Major incident Clay Control Center (CCC) KS Identity Server KS Core API KS Connect API app.saltoks.com Larry Commissioning Larry Commissioning API commissioning.saltoks.com Larry Support Larry Support API support.saltoks.com
2025-04-24 02:00 UTC · 16 hours, 59 minutes

Updates

Resolved

Hi everyone,

LAM Sync queue and Events queue are now caught up and we are seeing events being processed immediately and all access changes have been synced to the IQ’s.

In addition, Offline Access sync has been actived once again and all IQ’s are online.

We are now resolving this incident.

Regards,
The Salto CloudWorks Team

April 24, 2025 · 18:43 UTC
Update

Hi all,

We have now been able to reconnect almost 100% of IQs and have enabled LAM syncs once again. This means that any access related changes are starting to be synced to the IQ’s, there may be some small delay whilst the queue is being processed but this is happening quickly.

Once we’ve processed all LAM syncs we are going to look to reactivate Offline Access syncs.

Regards,
The Salto CloudWorks Team

April 24, 2025 · 18:06 UTC
Update

Hello,

Writing to inform you that we’ve found a sequence of restarting the services that is allowing disconnected IQs to reconnect and we’re now at 50% of IQs back to fully connected. This may take longer to execute completely as we need to be cautious about not overwhelming the platform.

In the meantime we’ve fully stopped LAM (access) syncs so no new access updates are coming through. This was necessary in order to ensure IQ connections are made fully stable. We will start LAM syncs gradually as soon as we reach a higher number of online IQs, hopefully in around an hour.

We will work through the evening and the night in order to ensure full operation is reached as soon as possible.

Regards,
The Salto CloudWorks Team

April 24, 2025 · 15:10 UTC
Investigating

Hello all,

We still investigating the cause of the outage and trying different approaches to get all IQs to permanently reconnect to our platform. IQs are edge devices that transfer information between the locks and the platform.

To give a bit of a better picture of the impact - currently around 40% of IQs are connected, and for 60% of the IQs connectivity is intermittent. This means that new access that was granted after 10PM last night may not have come through. Access given before that should be working.

List of services and features that may be affected:

  • Access creation and access updates (so no new bookings, user access, etc)
  • New access events
  • IQ and lock commissioning
  • Any IQ or lock operation on Larry Support
  • Remote opening and remote lock actions (like office mode) not possible for disconnected IQs

As soon as we have a lead or more information to share, we will send an update.

Regards,
The Salto CloudWorks team

April 24, 2025 · 12:25 UTC
Investigating

Good morning,

We’ve seen that scaling up CCCs has not allowed IQs to reconnect so we will be trying unplugging CCCs from the traffic manager in order to stabilise reconnections and once this is stable we will re-introduce CCCs back to the traffic manager gradually to allow the rest of the IQs to reconnect. This may take an hour to test and apply. Once we know whether this strategy works we will send another update.

Thanks
Salto CloudWorks team

April 24, 2025 · 08:29 UTC
Update

We are continuing to scale up our CCC’s in order to help relieve pressure on the platform and to ensure IQ connectivity and stability.

April 24, 2025 · 04:39 UTC
Investigating

We are currently investigating an incident that has occured during our scheduled database maintenance today.

IQ’s are experiencing disconnections due to an increased load on our CCC’s and events are delayed.

More information on the database maintenance here: https://status.saltoks.com/incidents/284043

April 24, 2025 · 02:18 UTC

← Back