Staffing and automation failures cited by Microsoft as causes for Australian data center outage

Staffing and automation failures cited by Microsoft as causes for Australian data center outage

Microsoft ⁢has blamed ⁤staff strength and failed automation for a data center outage in Australia that took place on August 30, disabling users from ⁤accessing Azure, Microsoft 365, and Power Platform services for over ‌24⁢ hours.

In a post-incident analysis⁢ report, Microsoft ​said the outage⁤ occurred due to a utility power sag in Australia’s East region, which in‍ turn “tripped‍ a subset of the cooling units offline⁤ in one data center, ⁤within one of the Availability Zones.”

As the cooling units were not working properly, the rise in temperature forced an automated shutdown⁤ of the data center in order⁤ to preserve⁣ data and infrastructure health, affecting compute,‍ network, and storage services.

However, ⁣Microsoft said that the cooling units could⁤ have been​ restarted manually, which was not possible due to ‍the ​unavailability of enough personnel at the data center.

“Due to the size of⁢ the data center campus, the staffing of the team at night⁣ was insufficient to restart the chillers in ⁣a timely manner. We have temporarily increased the team size from three ⁣to seven, until the underlying issues are‍ better understood and appropriate mitigations can be put in place,” Microsoft wrote as part of the report.

In addition, ‌the company said it is⁣ working on other major ​reforms, such as improving existing automation for the data center to improve restoration of services when an‌ incident occurs.

“We are exploring⁢ ways to improve existing automation ​to be more ‌resilient to ⁤various voltage sag event ‌types,” Microsoft said,‍ adding that an evaluation was⁢ underway to ensure that the highest-load servers and⁣ their corresponding chillers restarted​ first.

In the past few months,​ Microsoft has reported several⁤ outages, especially the ⁤unavailability of M365 services. In July, an outage took out⁢ its OneDrive for Business and‌ SharePoint Online services.

In June, users faced issues with Outlook Web, Teams, OneDrive for Business, and SharePoint for over‍ eight hours.

In May, the company reported that UK users were facing issues accessing some service offerings under Microsoft 365. In April, ​Microsoft said it was investigating an issue where certain ‌users were unable to use the search functionality in multiple Microsoft 365 services. Outlook on the Web, ⁢Exchange Online, SharePoint Online, Microsoft Teams, and ⁢Outlook ⁢desktop clients were among the⁣ affected services.

In another incident in ​April, users could not access Microsoft 365 web applications, and Teams.

Microsoft also suffered a global outage in February, and yet again, its users could not access emails and Teams.⁣ It suffered a similar outage in January.

⁣ Next read‌ this:

9 ‍career-boosting Wi-Fi…

2023-09-04 21:24:03
Source from⁢ www.networkworld.com rnrn

Exit mobile version