Home » Amazon » Amazon Web Services problems take down websites, apps

Amazon Web Services problems take down websites, apps

28 February 2017

From MarketWatch:

Issues with a cloud storage service operated by Amazon.com Inc.’s Amazon Web Services division caused outages and curtailed functionality for websites and apps Tuesday. AWS’s S3 service, described by Amazon as “designed to deliver 99.999999999% durability,” was experiencing “high error rates” due to problems with servers located on the East Coast of the United States, the company said Tuesday.

Link to the rest at MarketWatch

Friends with online businesses hosted on AWS have told PG they’ve been having problems. He’s had one report of someone not being able to place an order with Amazon.

Here’s a link to the AWS status page.


22 Comments to “Amazon Web Services problems take down websites, apps”

  1. I was unable to change the price of one of our KDP ebooks today.

  2. Fixed at 2:08 PT.

  3. People buying Bujold’s new Penric novella, Mira’s Last Dance, which released today, had trouble downloading their purchased ebook because of the server issues. All fixed now. I think I know what I’ll be reading shortly! 😉

    • Ohhh! I’ll bet that’s also why I had trouble downloading a couple of books from Audible. I couldn’t figure out why they wouldn’t let me download books they said I owned.

  4. Prime music was down for awhile. I had to play music off of my hard drive, it was a rough two hours.

  5. Not just Amazon. I had trouble with accessing Gmail and one other site today. I had wondered what the heck was up.

  6. AWS has advanced tremendously, in my opinion.

    This whole “outage” showed up as “increased error rates.” I believe this is the disaster scenario that AWS has been working toward for a number of years. Instead of a catastrophic failure, the error rate goes up and the service struggles on.

    Of course, any degradation in service is bad, and AWS may have some penalties to pay, but degradation is so much better than failure.

    The basic design tenet of most modern internet and cloud systems is called “eventual consistency.” A gross simplification is that as long as a transaction will eventually complete satisfactorily, the system remains up. This is unlike the older approach that said no part of a transaction can be committed until every part of the transaction is ready to commit. Amazon practically invented this concept because they prefer to keep the store open instead of closing every time a router flaps or a disk is accidentally powered down.

    Data in systems like AWS S3 is replicated all over in such a way that unless there are floods, earthquakes, and nuclear attacks in several places at once over the globe, data is always eventually correctly read or written, but sometimes it can get beastly slow, to the point that some of the services that use the storage system will throw up their hands and time out, but the storage service is actually still working, just slowly.

    • I didn’t experience it as degradation, for me, many things wouldn’t work at all. I use an iPad Pro and access files through iCloud, GoodReader and SugarSync. Between noon and 4 pm Central time, I could not access files or even send a file as an attachment using any of the services/apps. I first noticed it about noon, when I tried to sideload a book to iBooks, and what should have taken 3 seconds took about 10 minutes. Depending on the service, I either got an error message or the “working” circle. After waiting 25 minutes to try to attach a 56KB file, I finally killed it. To me that’s not degradation, it was failure. For half the business day, I could not work. I didn’t really have any problems browsing the Internet, but I could not access the files that I needed to work on and send to customers.

      It certainly taught me a lesson about using cloud services. I was going to start using OneDrive on my office PC so that all my files would be accessible more easily from any device. But for me, after this experience, I would not risk it.

      I actually called Apple and the support person was trying to troubleshoot it.
      I told him I had tried googling the outage and asked if the down Amazon server could affect all three of these applications. He said that he didn’t think Amazon had anything to do with my iPad issues. Turns out Apple uses Amazon servers for iCloud among other things. Evidently, so does SugarSync. So I was out of luck.

      • I am sorry about your difficulties. Complex systems sometimes go awry and sometimes you are the goat that takes it in the shorts. Wish it were not so.

        When you look at an elephant, it looks different depending on where you stand. I suspect the system architects at AWS are looking at today and saying that it was a bad day, but the system functioned as designed, which is a lot better than it would have a few years ago. BTW, I have no connection with Amazon beyond casual acquaintance with a few of their architects and engineers.

        When you make a decision to use a cloud service, you have to take into consideration that there will be bad days, even bad weeks when the system will slow down, even stop. Your own use of the service, or the services you use that use the cloud service, have to compensate. Compensating costs money. If the cost of adequate compensation exceeds the cost of hosting the service locally, maybe you had better take the cloud service out of your stack. Individual users don’t have much leverage, but large enterprises write, or should write, contracts with their providers that compensate for the risk.

        For myself, I use many cloud services because they are cheap and I am willing to accept their vagaries, which I estimate are less deleterious than my own maintenance procedures which are interrupted and perverted by the necessities of my household. But that is my decision.

        • Reality Observer

          After thirty years, I’ve learned.

          One “working” two terabyte drive. Four terabyte drive plugged into my working machine, that backs up the first drive and the internal computer drive continuously.

          Nightly backup of that drive to another computer in the house.

          Then – and only then – are “critical” folders encrypted and sent to the cloud service (also nightly).

          Every fourteen days, I burn critical files to CD, and those go in the safe deposit box.

          Yes, I am very paranoid. But I’ll never run back into a burning house to rescue the only copy that exists of my latest manuscript…

          • Good for you Reality Observer. In my experience few individuals, even some enterprises, have the discipline to run a reliable backup scheme. For a lot of people, including me, using an automated cloud backup system is the best protection against themselves. Two important points: some ransomware detects connected storage and will encrypt it and hold it for ransom. Make sure you can be caught that way.

            Second, I don’t know how many times I have talked to folks who had a good backup plan, but for some reason it would not restore at a critical moment. Backup systems have a way of not keeping up exactly with changing systems. You are not safe unless you run periodic test restores and verify that they work and you are backing up the data you think you are.

            And I would call you sensible, not paranoid. And encrypting anything you hand over to a third party cloud is an excellent practice. But be sure you can decrypt it if a brontosaurus steps on your house.

        • Felix J. Torres

          The degradation was hours, not days; and partial, not across the board. I actually saw no effect. Many others sailed on unaffected. They probably discovered an area or two that needs more redundancy or robustness to prevent a repeat. It’ll be fixed. Knowing you have a problem is the first step. 🙂

          The iCloud issues are a surprise in one way yet not so surprising in another.

          Apple really isn’t a Cloud company but they are very controlling of their user experience so relying on an external provider for what is becoming a critical part of that experience is somewhat unexpected and something they are going to have to address. Like with the Mac Pro issues, it seems they have been too focused on phones and tablets and not enough on the rest of their business. They are big enough to do both if they choose to.

          • Meanwhile, people elsewhere were complaining that their store could no longer accept credit card payments, and their ‘Internet Of Things’ devices were no longer working. One even claimed their ‘smart’ oven wouldn’t turn off until Amazon came back.

            ‘The Cloud’ has created a single point of failure for millions of things that people rely on.

            Every day I find more reasons to move to that cabin in the woods with my solar panels and wood stove…

            • Felix J. Torres

              The cloud is a metaphor, you know.
              If it’s one thing it isn’t supposed to be is a single point.
              For starters there are multiple clouds competing for business; Amazon, Microsoft, Google, IBM, Oracle. Big companies can also roll their own, too.
              Second, each commercial cloud is composed of multiple datacenters spread across the world. Microsoft even hasone underwater. (Looking to save on cooling costs.)
              Third, though the media speaks of AWS going down, only a part of it went down. The rest stayed up as did the other players’ facilities.
              No need to head for the hills just yet.
              But if you want to plan for a major disaster, plan for another regional blackout. That is way more likely. Especially in the coastal states.

      • Depending on the Cloud is silly for individuals, especially at home. Our internet service goes down unpredictably but regularly – if we had the time and energy we’d yell at them. Instead, we switch easily to our own resources when that happens, and keep working.

        For businesses, this may be less of an option, but always expecting perfection results in people who don’t manage their time very well and work in crisis mode – these folk are bound to be disappointed as nothing is perfect.

    • In case anyone cares, when AWS says they aim to “deliver 99.999999999% durability” they are using a precise technical term.

      Durability measures the probability that all transactions that are partially committed will completely commit. For instance, if you place an order on Amazon, Amazon has built their system to make your order show up on your doorstep with the proper amount deducted from your account or otherwise correctly resolved 99.999999999% of the time. In the jargon, this is eleven-nines durability.

      Durability says nothing about when the order will show up, only guarantees that it eventually will. Clearly, durability is not the only metric that matters, but without a high level of durability, an online system is certainly not to be trusted.

      Most of the time, for high-availability systems, offering five-nines (99.999%) availability is considered very good. That would be a little over 5 minutes down time per year. Nine-nines (99.9999999%)is about three hundredths of a second down time per year.Comparing availability to durability is comparing apples to oranges, but proposing eleven-nines for anything is impressive. It takes heavy stones to back a claim like that.

      Bezos has built his empire on durability. Timely service is an essential, but the foundation of online commerce is durability. Durability means the store is always open. Amazon’s achievement there leaves me in awe.

  7. Aws powers the system for my day job. Was not a good day

  8. That’s really weird. Is there some sort of major cyber attack going on?

    • Not likely. In a major cyber attack you would not be able to turn on a light and there would be fires burning. This is probably just expected noise.

      • Felix J. Torres

        It might have been an *attempt* at cyber attack. How major nobody knows or will say but the internet is under 24x7x365 attack anyway. All sorts of black hats out there. It’s a fact of daily life in the IT business. Take your eye off the ball for even a second and you will be impacted.

        Next thing you know you’ll have a “nice” Target moment or a Sony moment or a Yahoo moment or…

        It is literally a never-ending fight.

  9. I guess I was lucky. I never noticed any problem (though I wasn’t trying to upload/download/buy anything), and only knew there were issues because people kept posting about it. I’m on the East Coast, too.

  10. Heh, and people look at me funny when I comment that today’s kids are helpless if their cellphones die.

    Clouds are nice (one shading me now) but sometimes they rain on your parades, so be sure to carry an umbrella (backups and non-cloud ways of getting things done.)

Sorry, the comment form is closed at this time.