Slowness and login issues
Incident Report for Learnsoft Platform and Support
Postmortem

Update on Login Slowness and Issues (Friday, Jan 10 & Monday, Jan 13):

Timeline:

  • Friday, Jan 10 (9:30 AM ET): Reports of login slowness and failures began. Internal monitors showed no issues, but further investigation revealed a non-responsive server instance in Google Cloud. The instance was replaced by 10:30 AM ET, resolving the issue temporarily.
  • Friday, Jan 10 (1:30 PM ET): The issue resurfaced. A three-hour troubleshooting session involving Development, DevOps, and Client Support identified that upload processing (user, course, and roster) was causing the server failures. Reinstalling the software package that handles the processing resolved the issue by 4:30 PM ET.
  • Monday, Jan 13 (11:30 AM ET): The issue returned. A decision was made to roll back code deployed on Jan 9. After the rollback, no further issues have been reported.

Next Steps:

  • The Development and DevOps teams are investigating the root cause of the Jan 9 code deployment issues. Once fixed, the updated code will undergo thorough internal testing and QA before re-release, targeted within the week.

Further Improvements:

  • Enhancing monitoring tools to detect login health issues proactively and provide quicker alerts.
Posted Jan 14, 2025 - 18:01 UTC

Resolved
Sites are now stable and performance is as expected.
Posted Jan 13, 2025 - 19:02 UTC
Monitoring
We have rolled back last week's release and are monitoring performance
Posted Jan 13, 2025 - 17:20 UTC
Investigating
We have received multiple reports of issues with slowness and logins in the last 20 minutes. We are currently investigating.
Posted Jan 13, 2025 - 16:54 UTC
This incident affected: Learnsoft Platform (Learnsoft Application).