Jump to content

With so many cloud services dependent on it, Azure Active Directory has become a single point of failure for Microsoft


steven36

Recommended Posts

Does Redmond have a reliability problem?

 

 

 

9NvWzgR.jpg

Comment Microsoft has fixed an issue with its OneDrive and SharePoint services where users were unable to sign in, caused by a faulty remediation for the earlier Azure Active Directory outage.

 

"We're investigating an issue affecting access to multiple Microsoft 365 services. We're working to identify the full impact," said a Microsoft 365 status tweet at around 10:45pm last night GMT. It was a reference to a major outage across the company's cloud services, beginning perhaps 20 minutes earlier, including both Microsoft 365 and some Azure services. The incident continued for hours until around 3:20am today when Microsoft reported that "the majority of services are now recovered for most users".

 

The core service affected was Azure Active Directory, which controls login to everything from Outlook email to Teams to the Azure portal, used for managing other cloud services. The five-hour impact was also felt in productivity-stopping annoyances like some installations of Microsoft Office and Visual Studio, even on the desktop, declaring that they could not check their licensing and therefore would not run.

 

There are claims that the US emergency 911 service was affected, which is not implausible given that the RapidDeploy Nimbus Dispatch system describes itself as "a Microsoft Azure–based Computer Aided Dispatch platform". If the problem is authentication, even resilient services with failover to other Azure regions may become inaccessible and therefore useless.

The company has yet to provide full details, but a status report today said that "a recent configuration change impacted a backend storage layer, which caused latency to authentication requests".

 

Status tweets allow us to track some of the developments. 11:36pm: "We've rolled back the change that is likely the source of impact." 11:49pm: "We're not observing an increase in successful connections after rolling back a recent change." 12:48am: "We're rerouting traffic to alternate infrastructure to improve the user experience." 1:40am: "We're seeing improvement for multiple services after applying mitigation steps."

 

It was not completely over even after the main outage was fixed. Microsoft reported today via the Admin Center that "some users were unable to access SharePoint Online or OneDrive for Business" between 7:20am and 11:52am UK time. The problem was that "a change put in place to mitigate impact during the recent AAD outage caused this issue". Microsoft added: "We're reviewing our deployment and provisioning procedures to help prevent similar problems in the future."

 

Every IT administrator will feel sympathy for the engineers working under stress to fix issues that have such wide consequences. "We acknowledge the unfortunate reality that – given the scale of our operations and the pace of change – we will never be able to avoid outages entirely," said CTO Mark Russinovich on 17 August. Subsequent events proved the truth of those words, especially in the UK, where a major Azure data centre suffered an outage only two weeks ago.

 

Outages may be inevitable, but nevertheless Microsoft has some hard questions to answer. Measuring cloud reliability is non-trivial since what matters is not the number of outages but their extent and impact.

 

Microsoft seems to have more than its fair share of problems. Gartner noted recently that it "continues to have concerns related to the overall architecture and implementation of Azure, despite resilience-focused efforts and improved service availability metrics during the past year". The analyst's reservations were based in part on the low ration of availability zones to regions, and that "a limited set of services support the availability zone model".

 

Gartner's concerns are valid, but this was not the cause of the recent disruption. Bill Witten, identity architect at Okta, was to the point, commenting: "So, does everyone get why the mono-directory is not a good idea?"

 

Microsoft has built so much on Azure Active Directory that it is a single point of failure. The company either needs to make it so resilient that failure is near-impossible (which is likely to be its intention), or consider gradually reducing the dependence of so many services.

 

The recent outages are an embarrassment for the company, coming so soon after the Ignite online conference. Microsoft does not talk about it much, but it is perhaps the single biggest issue facing its cloud ambitions and ability to continue its catch-up effort with AWS.

 

Source

Link to comment
Share on other sites


  • Replies 2
  • Views 730
  • Created
  • Last Reply
6 minutes ago, steven36 said:

11:36pm: "We've rolled back the change that is likely the source of impact." 11:49pm: "We're not observing an increase in successful connections after rolling back a recent change."

 

If Microsoft has problems with their own patches and their deployment, what hope do the rest of us have. :P

Link to comment
Share on other sites


4 minutes ago, Karlston said:

 

If Microsoft has problems with their own patches and their deployment, what hope do the rest of us have. :P

M$  nubers have been in decline for a while  cloud growth keeps slowing  down for them  meanwhile AWS is doing good.

 

Quote

A 47% revenue increase for Azure is bad news even though cloud growth was already slowing for the company. The figure has been falling steadily: 76% in Q2 2019, 73% in Q3 2019, 64% in Q4 2019, and 59% in Q1 2020. It rebounded slightly to 62% in Q2 2020 but returned to 59% in Q3 2020. Slowing growth is normal at Azure size, but the pandemic appears to be accelerating the trend.

https://venturebeat.com/2020/07/22/microsoft-earnings-q4-2020/

 

Were they dont they dont tell  the exact profit only the % and  list the profit  under Intelligent Cloud category  has made people wonder are they really doing that hot.

 

The lack of specificity around Azure frustrates many pundits as it simply can’t be compared directly to AWS, and inevitably raises eyebrows about how Azure is really doing. Of course, it also assumes that IaaS is the only piece of “cloud” that’s important, but then, that’s how AWS has grown to dominate the market. Microsoft’s release noted that “cloud usage and demand increased as customers continued to work and learn from home. Transactional license purchasing continued to slow, particularly in small and medium businesses, and LinkedIn was negatively impacted by the weak job market and reductions in advertising spend.”

https://www.parkmycloud.com/blog/aws-vs-azure-vs-google-cloud-market-share/

Link to comment
Share on other sites


Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...