Browse Source

Merge pull request #66 from open-guides/more-updates

Various expansions
pull/73/head
Thanos Baskous 4 years ago
parent
commit
7ebe70440e
4 changed files with 171 additions and 133 deletions
  1. 4
    2
      AUTHORS.md
  2. 163
    129
      README.md
  3. 4
    2
      admin/authors-info.yml
  4. BIN
      figures/aws-market-landscape.png

+ 4
- 2
AUTHORS.md View File

@@ -11,15 +11,17 @@ The following people (in alphabetical order) have contributed to or reviewed thi
* [Donne Martin (donnemartin)](https://github.com/donnemartin)
* [Max Grigorev (forwidur)](https://github.com/forwidur)
* [Dmitry Golyshev (golyshev)](https://github.com/golyshev)
* [Joshua Levy (jlevy)](https://github.com/jlevy) — [45+](https://github.com/open-guides/og-aws/commits?author=jlevy)/[52+](https://github.com/open-guides/og-aws/issues?q=author%3Ajlevy) — _project editor_
* [Joshua Levy (jlevy)](https://github.com/jlevy) — [46+](https://github.com/open-guides/og-aws/commits?author=jlevy)/[59+](https://github.com/open-guides/og-aws/issues?q=author%3Ajlevy) — _general editor_
* Jurgen Philippaerts
* [KAZUYUKI TANIMURA (kazuyukitanimura)](https://github.com/kazuyukitanimura)
* [Marcello Bastéa-Forte (marcello3d)](https://github.com/marcello3d)
* [Max Zanko (max-zanko)](https://github.com/max-zanko)
* [John Merrells (merrells)](https://github.com/merrells)
* [Nitin Sharma (nitingithub)](https://github.com/nitingithub) — [1+](https://github.com/open-guides/og-aws/commits?author=nitingithub)/[1+](https://github.com/open-guides/og-aws/issues?q=author%3Anitingithub) — _editor (cost management)_
* [Ola Wiberg (olawiberg)](https://github.com/olawiberg)
* Praveen Patnala
* [Russell Power (rjpower)](https://github.com/rjpower)
* [Thanos Baskous (ThanosBaskous)](https://github.com/ThanosBaskous) — [8+](https://github.com/open-guides/og-aws/commits?author=ThanosBaskous)/[5+](https://github.com/open-guides/og-aws/issues?q=author%3AThanosBaskous) — _project editor_
* [Thanos Baskous (ThanosBaskous)](https://github.com/ThanosBaskous) — [8+](https://github.com/open-guides/og-aws/commits?author=ThanosBaskous)/[5+](https://github.com/open-guides/og-aws/issues?q=author%3AThanosBaskous) — _general editor_
* [Stefan Zier (weirded)](https://github.com/weirded)

Additional authors are welcome; see the [contribution guidelines](CONTRIBUTING.md).

+ 163
- 129
README.md View File

@@ -3,45 +3,51 @@ The Open Guide to Amazon Web Services

**Table of Contents**

- [Why an Open Guide?](#why-an-open-guide)
- [Scope](#scope)
- [General Information](#general-information)
- [Managing AWS](#managing-aws)
- [Managing Servers and Applications](#managing-servers-and-applications)
- [Billing and Cost Management](#billing-and-cost-management)
- [Security and IAM](#security-and-iam)
- [S3](#s3)
- [EC2](#ec2)
- [AMIs](#amis)
- [Auto Scaling](#auto-scaling)
- [EBS](#ebs)
- [EFS](#efs)
- [Load Balancers](#load-balancers)
- [Elastic IPs](#elastic-ips)
- [Glacier](#glacier)
- [RDS](#rds)
- [DynamoDB](#dynamodb)
- [ECS](#ecs)
- [Lambda](#lambda)
- [API Gateway](#api-gateway)
- [Route 53](#route-53)
- [CloudFormation](#cloudformation)
- [VPCs, Network Security, and Security Groups](#vpcs-network-security-and-security-groups)
- [KMS](#kms)
- [CloudFront](#cloudfront)
- [DirectConnect](#directconnect)
- [Redshift](#redshift)
- [EMR](#emr)
- [High Availability](#high-availability)
- [Further Reading](#further-reading)
- [Disclaimer](#disclaimer)
- [License](#license)

**List of Tables**

- [Service Matrix](#service-matrix)
- [AWS Product Maturity and Releases](#aws-product-maturity-and-releases)
- [Storage Durability, Availability, and Price](#storage-durability-availability-and-price)
- *Purpose*
- [Why an Open Guide?](#why-an-open-guide)
- [Scope](#scope)
- *AWS in General*
- [General Information](#general-information)
- [Managing AWS](#managing-aws)
- [Managing Servers and Applications](#managing-servers-and-applications)
- *Specific AWS Products*
- [Security and IAM](#security-and-iam)
- [S3](#s3)
- [EC2](#ec2)
- [AMIs](#amis)
- [Auto Scaling](#auto-scaling)
- [EBS](#ebs)
- [EFS](#efs)
- [Load Balancers](#load-balancers)
- [Elastic IPs](#elastic-ips)
- [Glacier](#glacier)
- [RDS](#rds)
- [DynamoDB](#dynamodb)
- [ECS](#ecs)
- [Lambda](#lambda)
- [API Gateway](#api-gateway)
- [Route 53](#route-53)
- [CloudFormation](#cloudformation)
- [VPCs, Network Security, and Security Groups](#vpcs-network-security-and-security-groups)
- [KMS](#kms)
- [CloudFront](#cloudfront)
- [DirectConnect](#directconnect)
- [Redshift](#redshift)
- [EMR](#emr)
- *Special Topics*
- [High Availability](#high-availability)
- [Billing and Cost Management](#billing-and-cost-management)
- [Further Reading](#further-reading)
- *Legal*
- [Disclaimer](#disclaimer)
- [License](#license)

**Figures and Tables**

- [Tools and Services Market Landscape](#tools-and-services-market-landscape): A selection of third-party companies/products
- [Service Matrix](#service-matrix): How AWS services compare to alternatives
- [AWS Product Maturity and Releases](#aws-product-maturity-and-releases): AWS product releases
- [Storage Durability, Availability, and Price](#storage-durability-availability-and-price): A quantitative comparison

Why an Open Guide?
------------------
@@ -76,13 +82,13 @@ Scope
- 🔹 Important or often overlooked tip
- ❗ Gotcha or warning (where risks or time or resource costs are significant)
- 🔸 Limitation or quirk (where it’s not quite so bad)
- 🐥 Relatively new or immature services
- 🐥 Relatively new (and perhaps immature) services or features
- ⏱ Performance discussions
- ⛓ Lock-in (decisions that are likely to tie you to AWS in a new or significant way)
- ⛓ Lock-in: Products or decisions that are likely to tie you to AWS in a new or significant way — that is, later moving to a non-AWS alternative would be costly in terms of engineering effort
- 🚪 Alternative non-AWS options
- 💸 Cost issues and discussion
- 💸 Cost issues, discussion, and gotchas
- 🕍 A mild warning attached to “full solution” or opinionated frameworks that may take significant time to understand and/or might not fit your needs exactly; the opposite of a point solution (the cathedral is a nod to [Raymond’s metaphor](https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar)\)
- 🚧 Areas where correction or improvement are needed (possibly with link to an issue — do help)
- 🚧 Areas where correction or improvement are needed (possibly with link to an issue — do help!)

General Information
-------------------
@@ -105,6 +111,7 @@ General Information
- and you can afford, manage, or negotiate its somewhat higher costs
- ... then AWS is likely a good option for your company.
- Each of those reasons above might point to situations where other services are preferable. In practice, many, if not most, tech startups as well as a number of modern large companies can or already do benefit from using AWS. Many large enterprises are partly migrating internal infrastructure to Azure, Google Cloud, and AWS.
- **Costs:** Billing and cost management are such big topics that we have [an entire section on this](#billing-and-cost-management).
- 🔹**EC2 vs. other services:** Most users of AWS are most familiar with [EC2](#ec2), AWS’ flagship virtual server product, and possibly a few others like S3 and ELBs. But AWS products now extend far beyond basic IaaS, and often companies do not properly understand or appreciate all the many AWS services and how they can be applied, due to the [sharply growing](#which-services-to-use) number of services, their novelty and complexity, branding confusion, and fear of ⛓lock-in to proprietary AWS technology. Although a bit daunting, it’s important for technical decision-makers in companies to understand the breadth of the AWS services and make informed decisions. (We hope this guide will help.)
- 🚪**AWS vs. other cloud providers:** While AWS is the dominant IaaS provider (31% market share in [this 2016 estimate](https://www.srgresearch.com/articles/aws-remains-dominant-despite-microsoft-and-google-growth-surges)), there is significant competition and alternatives that are better suited to some companies:
- The most significant direct competitor is [**Google Cloud**](https://cloud.google.com/). It arrived later to market than AWS, but has vast resources and is now used widely by many companies, including a few large ones. It is gaining market share. Not all AWS services have similar or analogous services in Google Cloud. And vice versa: In particular Google offers some more advanced machine learning-based services like the [Vision](https://cloud.google.com/vision/), [Speech](https://cloud.google.com/speech/), and [Natural Language](https://cloud.google.com/natural-language/) APIs. It’s not common to switch once you’re up and running, but it does happen: [Spotify migrated](http://www.wsj.com/articles/google-cloud-lures-amazon-web-services-customer-spotify-1456270951) from AWS to Google Cloud. There is more discussion [on Quora](https://www.quora.com/What-are-the-reasons-to-choose-AWS-over-Google-Cloud-or-vice-versa-for-a-high-traffic-web-application) about relative benefits.
@@ -117,8 +124,15 @@ General Information
- 🚪**AWS vs. web hosting:** If your main goal is to host a website or blog, and you don’t expect to be building an app or more complex service, you may wish consider one of the myriad of [web hosting services](https://www.google.com/search?q=web+hosting).
- 🚪**AWS vs. managed hosting:** Traditionally, many companies pay [managed hosting](https://en.wikipedia.org/wiki/Dedicated_hosting_service) providers to maintain physical servers for them, then build and deploy their software on top of the rented hardware. This makes sense for businesses who want direct control over hardware, due to legacy, performance, or special compliance constraints, but is usually considered old fashioned or unnecessary by many developer-centric startups and younger tech companies.
- **Complexity:** AWS will let you build and scale systems to the size of the largest companies, but the complexity of the services when used at scale requires significant depth of knowledge and experience. Even very simple use cases often require more knowledge to do “right” in AWS than in a simpler environment like Heroku or Digital Ocean. (This guide may help!)
- **Geographic locations:** AWS has data centers in [about 10 geographic locations](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) (known as **regions**) in Europe, Asia, and North and South America. If your infrastructure needs to be in close physical proximity to another service for latency or throughput reasons (for example, latency to an ad exchange), viability of AWS will depend on the location.
- ⛓**Lock-in:** As you use AWS, it’s important to be aware when you are depending on AWS services that do not have equivalents elsewhere. Basic services like virtual servers in EC2 are usually easy to migrate to other vendors, but the more services you use, the more lock-in you have to AWS, and the more difficult it will be to change to other providers in the future. It is quite common to mix and match services from different vendors (such as using S3 for storage but a different vendor for serving) and, in larger enterprises, to hybridize between private cloud or on-premises servers and AWS.
- **Geographic locations:** AWS has data centers in [over a dozen geographic locations](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions), known as **regions**, in Europe, East Asia, North and South America, and now Australia and India. It also has many more **edge locations** globally for reduced latency of services like CloudFront.
- See the [current list](https://aws.amazon.com/about-aws/global-infrastructure/) of regions and edge locations, including upcoming ones.
- If your infrastructure needs to be in close physical proximity to another service for latency or throughput reasons (for example, latency to an ad exchange), viability of AWS may depend on the location.
- ⛓**Lock-in:** As you use AWS, it’s important to be aware when you are depending on AWS services that do not have equivalents elsewhere.
- Lock-in may be completely fine for your company. But it’s important from a business perspective to make this choice explicitly, and consider the cost, operational, business continuity, and competitive risks of being tied to AWS. AWS is such a dominant and reliable vendor, many companies are comfortable with using AWS to its full extent. Other companies avoid this.
- Generally, the more AWS services you use, the more lock-in you have to AWS — that is, the more engineering resources (time and money) it will take to change to other providers in the future.
- Basic services like virtual servers and standard databases are usually easy to migrate to other providers or on premises. Others like load balancers and IAM are specific to AWS but have close equivalents from other providers. The key thing to consider is whether engineers are architecting systems around specific AWS services that are not open source or relatively interchangeable. For example, Lambda, API Gateway, Kinesis, Redshift, and DynamoDB do not have have substantially equivalent open source or commercial service equivalents, while EC2, RDS (MySQL or Postgres), EMR, and ElastiCache more or less do. (See more [below](#which-services-to-use), where these are noted with ⛓.)
- **Combining AWS and other cloud providers:** Many customers combine AWS with other non-AWS services. For example, legacy systems or secure data might be in a managed hosting provider, while other systems are AWS. Or a company might only use S3 with another provider doing everything else. However small startups or projects starting fresh will typically stick to AWS or Google Cloud only.
- **Hybrid cloud:** In larger enterprises, it is common to have [hybrid deployments](https://aws.amazon.com/enterprise/hybrid/) encompassing private cloud or on-premises servers and AWS — or other enterprise cloud providers like [IBM](https://www.ibm.com/cloud-computing/solutions/hybrid-cloud)/[Bluemix](http://www.ibm.com/cloud-computing/bluemix/hybrid/), [Microsoft](https://www.microsoft.com/en-us/cloud-platform/hybrid-cloud)/[Azure](https://azure.microsoft.com/en-us/overview/azure-stack/), [NetApp](http://www.netapp.com/us/solutions/cloud/hybrid-cloud/), or [EMC](http://www.emc.com/en-us/cloud/hybrid-cloud-computing/index.htm).
- **Major customers:** Who uses AWS and Google Cloud?
- AWS’s [list of customers](https://aws.amazon.com/solutions/case-studies/) includes large numbers of mainstream online properties and major brands, such as Netflix, Pinterest, Spotify, Airbnb, Expedia, Yelp, Zynga, Comcast, Nokia, and Bristol-Myers Squibb.
- Google Cloud’s [list of customers](https://cloud.google.com/customers/) is large as well, and includes a few mainstream sites, such as [Snapchat](http://www.businessinsider.com/snapchat-is-built-on-googles-cloud-2014-1), Best Buy, Domino’s, and Sony Music.
@@ -127,8 +141,7 @@ General Information

- AWS offers a *lot* of different services — [about fifty](https://aws.amazon.com/products/) at last count.
- Most customers use a few services heavily, a few services lightly, and the rest not at all. What services you’ll use depends on your use cases. Choices differ substantially from company to company.
- Just because AWS has a service that sounds promising, it doesn’t mean you should use it. Some services are very narrow in use case, not mature, are overly opinionated, or have limitations, so very few people use them. More on this next.
- Many customers combine AWS with other non-AWS services. For example, legacy systems or secure data might be in a managed hosting provider, while other systems are AWS. Or a company might only use S3 with another provider doing everything else. However small startups or projects starting fresh will typically stick to AWS or Google Cloud only.
- **Immature and unpopular services:** Just because AWS has a service that sounds promising, it doesn’t mean you should use it. Some services are very narrow in use case, not mature, are overly opinionated, or have limitations, so building your own solution may be better. We try to give a sense for this by breaking products into categories.
- **Must-know infrastructure:** Most typical small to medium-size users will focus on the following services first. If you manage use of AWS systems, you likely need to know at least a little about all of these. (Even if you don’t use them, you should learn enough to make that choice intelligently.)
- [IAM](#security-and-iam): User accounts and identities (you need to think about accounts early on!)
- [EC2](#ec2): Virtual servers and associated components, including:
@@ -147,37 +160,37 @@ General Information
- [EMR](#emr): Managed Hadoop
- [Elasticsearch](https://aws.amazon.com/elasticsearch-service/): Managed Elasticsearch
- [ElastiCache](https://aws.amazon.com/elasticache/): Managed Redis and Memcached
- **Optional but important infrastructure:** These are key and useful infrastructure are less widely known used. You may have legitimate reasons to prefer alternatives, so evaluate with care you to be sure they fit your needs:
- [Lambda](#lambda): Running small, fully managed tasks “serverless”
- **Optional but important infrastructure:** These are key and useful infrastructure components that are less widely known and used. You may have legitimate reasons to prefer alternatives, so evaluate with care you to be sure they fit your needs:
- [Lambda](#lambda): Running small, fully managed tasks “serverless”
- [CloudTrail](https://aws.amazon.com/cloudtrail/): AWS API logging and audit (often neglected but important)
- 🕍[CloudFormation](#cloudformation): Templatized configuration of collections of AWS resources
- 🕍[CloudFormation](#cloudformation): Templatized configuration of collections of AWS resources
- 🕍[Elastic Beanstalk](https://aws.amazon.com/elasticbeanstalk/): Fully managed (PaaS) deployment of packaged Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker applications
- 🐥[EFS](https://aws.amazon.com/efs/): Network filesystem
- 🕍[ECS](#ecs): Docker container/cluster management. Note Docker can be used directly, without ECS.
- [ECR](https://aws.amazon.com/ecr/): Hosted private Docker registry.
- 🐥[EFS](https://aws.amazon.com/efs/): Network filesystem
- 🕍[ECS](#ecs): Docker container/cluster management (note Docker can also be used directly, without ECS)
- [ECR](https://aws.amazon.com/ecr/): Hosted private Docker registry
- 🐥[Config](https://aws.amazon.com/config/): AWS configuration inventory, history, change notifications
- **Special-purpose infrastructure:** These services are focused on specific use cases and should be evaluated if they apply to your situation:
- [DynamoDB](#dynamodb): Low-latency NoSQL key-value store
- [Glacier](#glacier): Slow and cheap alternative to S3
- [Kinesis](https://aws.amazon.com/kinesis/): Streaming (distributed log) service
- [SQS](https://aws.amazon.com/sqs/): Message queueing service
- [Redshift](#redshift): Data warehouse
- **Special-purpose infrastructure:** These services are focused on specific use cases and should be evaluated if they apply to your situation. Many also are proprietary architectures, so tend to tie you to AWS.
- [DynamoDB](#dynamodb): Low-latency NoSQL key-value store
- [Glacier](#glacier): Slow and cheap alternative to S3
- [Kinesis](https://aws.amazon.com/kinesis/): Streaming (distributed log) service
- [SQS](https://aws.amazon.com/sqs/): Message queueing service
- [Redshift](#redshift): Data warehouse
- 🐥[QuickSight](https://aws.amazon.com/quicksight/): Business intelligence service
- [SES](https://aws.amazon.com/ses/): Send and receive e-mail for marketing or transactions
- [API Gateway](https://aws.amazon.com/api-gateway/): Proxy, manage, and secure API calls
- [IoT](https://aws.amazon.com/iot/): Manage bidirectional communication over HTTP, WebSockets, and MQTT between AWS and clients (often but not necessarily “things” like appliances or sensors)
- [WAF](https://aws.amazon.com/waf/): Web firewall for CloudFront to deflect attacks
- [KMS](#kms): Store and manage encryption keys securely
- [API Gateway](https://aws.amazon.com/api-gateway/): Proxy, manage, and secure API calls
- [IoT](https://aws.amazon.com/iot/): Manage bidirectional communication over HTTP, WebSockets, and MQTT between AWS and clients (often but not necessarily “things” like appliances or sensors)
- [WAF](https://aws.amazon.com/waf/): Web firewall for CloudFront to deflect attacks
- [KMS](#kms): Store and manage encryption keys securely
- [Inspector](https://aws.amazon.com/inspector/): Security audit
- [Trusted Advisor](https://aws.amazon.com/premiumsupport/trustedadvisor/): Automated tips on reducing cost or making improvements
- ⛓🕍**Compound services:** These are similarly specific, but are full-blown services that tackle complex problems and may tie you in. Usefulness depends on your requirements. If you have large or significant need, you may have these already managed by in-house systems and engineering teams:
- **Compound services:** These are similarly specific, but are full-blown services that tackle complex problems and may tie you in. Usefulness depends on your requirements. If you have large or significant need, you may have these already managed by in-house systems and engineering teams.
- [Machine Learning](https://aws.amazon.com/machine-learning/): Machine learning model training and classification
- [Data Pipeline](https://aws.amazon.com/datapipeline/): Managed ETL service
- [SWF](https://aws.amazon.com/swf/): Managed background job workflow
- [Lumberyard](https://aws.amazon.com/lumberyard/): 3D game engine
- ⛓🕍[Data Pipeline](https://aws.amazon.com/datapipeline/): Managed ETL service
- ⛓🕍[SWF](https://aws.amazon.com/swf/): Managed background job workflow
- ⛓🕍[Lumberyard](https://aws.amazon.com/lumberyard/): 3D game engine
- **Mobile/app development:**
- [SNS](https://aws.amazon.com/sns/): Manage app push notifications and other end-user notifications
- [Cognito](https://aws.amazon.com/cognito/): User authentication via Facebook, Twitter, etc.
- ⛓🕍[Cognito](https://aws.amazon.com/cognito/): User authentication via Facebook, Twitter, etc.
- [Device Farm](https://aws.amazon.com/device-farm/): Cloud-based device testing
- [Mobile Analytics](https://aws.amazon.com/mobileanalytics/): Analytics solution for app usage
- 🕍[Mobile Hub](https://aws.amazon.com/mobile/): Comprehensive, managed mobile app framework
@@ -198,6 +211,16 @@ General Information
- 🕍[OpsWorks](https://aws.amazon.com/opsworks/): Management of your deployments using Chef. While Chef is popular, it seems few people use OpsWorks, since it involves going in on a whole different code deployment framework.
- [AWS in Plain English](https://www.expeditedssl.com/aws-in-plain-english) offers more friendly explanation of what all the other different services are.

### Tools and Services Market Landscape

There are now enough cloud and “big data” enterprise companies and products that few can keep up with the market landscape. (See the [Big Data Evolving Landscape – 2016](https://practicalanalytics.co/2016/02/09/big-data-evolving-landscape-2016/) for one attempt at this.)

We’ve assembled a landscape of a few of the services. This is far from complete, but tries to emphasize services that are popular with AWS practitioners — services that specifically help with AWS, or a complementary, or tools almost anyone using AWS must learn.

![Popular Tools and Services for AWS Practitioners](figures/aws-market-landscape.png)

🚧 *Suggestions to improve this figure? Please [file an issue](CONTRIBUTING.md).*

### Common Concepts

- 📒 The AWS [**General Reference**](https://docs.aws.amazon.com/general/latest/gr/Welcome.html) covers a bunch of common concepts that are relevant for multiple services.
@@ -295,7 +318,7 @@ It’s important to know the maturity of each AWS product. Here is a mostly comp
| [CloudWatch](https://aws.amazon.com/releasenotes/CloudWatch?browse=1) | 2009-05 | General |
| [CloudFront](https://aws.amazon.com/releasenotes/CloudFront?browse=1) | 2008-11 | General |
| [Fulfillment Web Service](https://aws.amazon.com/releasenotes/Amazon-FWS?browse=1) | 2008-03 | Obsolete? |
| [SimpleDB](https://aws.amazon.com/releasenotes/Amazon-SimpleDB?browse=1) | 2007-12 | Obsolete |
| [SimpleDB](https://aws.amazon.com/releasenotes/Amazon-SimpleDB?browse=1) | 2007-12 | ⚠️[Nearly obsolete](https://forums.aws.amazon.com/thread.jspa?threadID=121711) |
| [DevPay](https://aws.amazon.com/releasenotes/DevPay?browse=1) | 2007-12 | General |
| [Flexible Payments Service](https://aws.amazon.com/releasenotes/Amazon-FPS?browse=1) | 2007-08 | Retired |
| [EC2](https://aws.amazon.com/releasenotes/Amazon-EC2?browse=1) | 2006-08 | General |
@@ -322,7 +345,7 @@ It’s important to know the maturity of each AWS product. Here is a mostly comp
- These are a great resource, even if you’re not paying for premium support. Build a good relationship with them and make use of them, for questions, problems, and guidance.
- Assign a single point of contact on your company’s side, to avoid confusing or overwhelming them.
- **Contact:** The main web contact point for AWS is [here](https://aws.amazon.com/contact-us/). Many technical requests can be made via these channels.
- **Consulting:** For more hands-on assistance, AWS maintains a list of [consulting partners](https://aws.amazon.com/partners/consulting/). These won’t be cheap but depending on your needs, may save you costs long term by helping you set up your architecture more effectively, or offering specific expertise, e.g. security.
- **Consulting and managed services:** For more hands-on assistance, AWS has established relationships with many [consulting partners](https://aws.amazon.com/partners/consulting/) and [managed service partners](https://aws.amazon.com/partners/msp/). The big consultants won’t be cheap, but depending on your needs, may save you costs long term by helping you set up your architecture more effectively, or offering specific expertise, e.g. security. Managed service providers provide longer-term full-service management of cloud resources.
- **AWS Professional Services:** AWS provides [consulting services](https://aws.amazon.com/professional-services/) alone or in combination with partners.

### Restrictions and Other Notes
@@ -361,8 +384,8 @@ So if you’re not going to manage your AWS configurations manually, what should
- Ideally, you should only use the AWS Console in a few specific situations:
- It’s great for read-only usage. If you’re trying to understand the state of your system, logging in and browsing it is very helpful.
- It is also reasonably workable for very small systems and teams (for example, one engineer setting up one server that doesn’t change often).
- It can be useful for operations you’re only going to do rarely, like less than once a month. In this case using the console can be the simplest approach.
- ❗**Think before you use the console:** The AWS Console is convenient, but also the enemy of automation and reproducibility. If you’re likely to be making the same change multiple times, avoid the console. Favor some sort of automation, or at least have a path toward automation, as discussed next. Not only does using the console preclude automation, which wastes time later, but it prevents documentation, clarity, and standardization around processes for yourself and your team.
- It can be useful for operations you’re only going to do rarely, like less than once a month (for example, a one-time VPC setup you probably won’t revisit for a year). In this case using the console can be the simplest approach.
- ❗**Think before you use the console:** The AWS Console is convenient, but also the enemy of automation, reproducibility, and team communication. If you’re likely to be making the same change multiple times, avoid the console. Favor some sort of automation, or at least have a path toward automation, as discussed next. Not only does using the console preclude automation, which wastes time later, but it prevents documentation, clarity, and standardization around processes for yourself and your team.

### Command-Line tools

@@ -438,23 +461,6 @@ This guide is about AWS, not DevOps or server configuration management in genera

- **NTP and accurate time:** If you are not using Amazon Linux (which comes preconfigured), you should confirm your servers [configure NTP correctly](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#configure_ntp), to avoid insidious time drift (which can then cause all sorts of issues, from breaking API calls to misleading logs). This should be part of your automatic configuration for every server. If time has already drifted substantially (generally >1000 seconds), remember NTP won’t shift shift it back, so you may need to remediate manually (for example, [like this](http://askubuntu.com/questions/254826/how-to-force-a-clock-update-using-ntp) on Ubuntu).

Billing and Cost Management
---------------------------

- AWS offers a [**free tier**](https://aws.amazon.com/free/) of service, that allows very limited usage of resources at no cost. For example, a micro instance and small amount of storage is available for no charge. (If you have an old account but starting fresh, sign up for a new one to qualify for the free tier.) [AWS Activate](https://aws.amazon.com/activate/) extends this to tens of thousands of dollars of free credits to startups in [certain funds or accelerators](https://aws.amazon.com/activate/portfolio-detail/).
- You can set [**billing alerts**](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/free-tier-alarms.html) to be notified of unexpected costs, such as costs exceeding the free tier. You can set these in a [granular way](https://wblinks.com/notes/aws-tips-i-wish-id-known-before-i-started/#billing).
- AWS offers [Cost Explorer](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-explorer-what-is.html), a tool to get better visibility into costs.
- Unfortunately, the AWS console and billing tools are rarely enough to give good visibility into costs. For large accounts, the AWS billing console can time out or be too slow to use.
- **Tools:**
- 🔹Enable [billing reports](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/detailed-billing-reports.html) and install an open source tool to help manage or monitor AWS resource utilization. [**Netflix Ice**](https://github.com/Netflix/ice) is probably the first one you should try. Check out [docker-ice](https://github.com/jonbrouse/docker-ice) for a Dockerized version that eases installation.
- 🔸One challenge with Ice is that it doesn’t cover amortized cost of reserved instances.
- Other tools include [Security Monkey](https://github.com/Netflix/security_monkey) and [Cloud Custodian](https://github.com/capitalone/cloud-custodian).
- **Third-party services:** Several companies offer services designed to help you gain insights into expenses or lower your AWS bill, such as [OpsClarity](http://www.opsclarity.com/), [Cloudability](https://www.cloudability.com/), [CloudHealth Technologies](https://www.cloudhealthtech.com/), and [ParkMyCloud](http://www.parkmycloud.com/). Some of these charge a percentage of your bill, which may be expensive.
- AWS’s [Trusted Advisor](https://aws.amazon.com/premiumsupport/trustedadvisor/) is another service that can help with cost concerns.
- Don’t be shy about asking your account manager for guidance in reducing your bill. It’s their job to keep you happily using AWS.
- **Tagging for cost visibility:** As the infrastructure grows, a key part of managing costs is understanding where they lie. It’s strongly advisable to [tag resources](https://aws.amazon.com/blogs/aws/resource-groups-and-tagging/), and as complexity grows, group them effectively. If you [set up billing allocation appropriately](http://aws.amazon.com/blogs/aws/aws-cost-allocation/), you can then get visibility into expenses according to organization, product, individual engineer, or any other way that is helpful.
- If you need to do custom analysis of raw billing data or want to feed it to a third party cost analysis service, [enable](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/detailed-billing-reports.html#turnonreports) the [detailed billing report](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/detailed-billing-reports.html#detailed-billing-report) feature.
- Multiple Amazon accounts can be linked for billing purposes using the [Consolidated Billing](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html) feature. Large enterprises may need complex billing structures depending on ownership and approval processes.

Security and IAM
----------------
@@ -507,7 +513,9 @@ We cover security basics first, since configuring user accounts is something you
- [**Trusted Advisor**](https://aws.amazon.com/blogs/aws/trusted-advisor-console-basic/) addresses a variety of best practices, but also offers some basic security checks around IAM usage, security group configurations, and MFA.
- **Use KMS for managing keys**: AWS offers [KMS](#kms) for securely managing encryption keys, which is usually a far better option than handling key security yourself. See [below](#kms).
- [**AWS WAF**](https://aws.amazon.com/waf) is a web application firewall to help you protect your applications from common attack patterns.
- 🔹**Export and audit security settings:** You can audit security policies simply by exporting settings using AWS APIs, e.g. using a Boto script like [SecConfig.py](https://gist.github.com/jlevy/cce1b44fc24f94599d0a4b3e613cc15d) (from [this 2013 talk](http://www.slideshare.net/AmazonWebServices/intrusion-detection-in-the-cloud-sec402-aws-reinvent-2013)) and then reviewing and monitoring changes manually or automatically.
- **Security auditing:**
- [Security Monkey](https://github.com/Netflix/security_monkey) is an open source tool that is designed to assist with security audits.
- 🔹**Export and audit security settings:** You can audit security policies simply by exporting settings using AWS APIs, e.g. using a Boto script like [SecConfig.py](https://gist.github.com/jlevy/cce1b44fc24f94599d0a4b3e613cc15d) (from [this 2013 talk](http://www.slideshare.net/AmazonWebServices/intrusion-detection-in-the-cloud-sec402-aws-reinvent-2013)) and then reviewing and monitoring changes manually or automatically.

### Security and IAM Gotchas and Limitations

@@ -618,16 +626,18 @@ S3

### Storage Durability, Availability, and Price

As an illustration of comparative features and price, the table below gives S3 Standard, RRS, IA, in comparison with [Glacier](#glacier), [EBS](#ebs), and [EFS](#efs), using Virginia region as of August 2016. (Sources: S3 [pricing](https://aws.amazon.com/s3/pricing/), [SLA](https://aws.amazon.com/s3/sla/), [FAQ](https://aws.amazon.com/s3/faqs/), [RRS info](https://aws.amazon.com/s3/reduced-redundancy/), [Glacier pricing](https://aws.amazon.com/glacier/pricing/), [EBS pricing](https://aws.amazon.com/ebs/pricing/), [EFS pricing](https://aws.amazon.com/efs/pricing/), [EC2 SLA](https://aws.amazon.com/ec2/sla/).)
As an illustration of comparative features and price, the table below gives S3 Standard, RRS, IA, in comparison with [Glacier](#glacier), [EBS](#ebs), and [EFS](#efs), using **Virginia region** as of **August 2016**.

| | Durability (per year) | Availability “designed” | Availability SLA | Storage (per TB per month) | GET or retrieve (per million) | Write or archive (per million) |
|-----------------|-----------------------|-------------------------|------------------|--------------------------------------------------------------------------------------------------------------------------|-------------------------------|--------------------------------|
| **Glacier** | Eleven 9s | Sloooow | – | $7 | $50 | $50 |
| **S3 IA** | Eleven 9s | 99.9% | **99%** | $12.50 | $1 | $10 |
| **S3 RRS** | **99.99%** | 99.99% | 99.9% | $24 | $0.40 | $5 |
| **S3 Standard** | Eleven 9s | 99.99% | 99.9% | $30 | $0.40 | $5 |
| **EBS** | **99.8%** | Unstated | 99.95% | $25/$45/**$100**/$125+ ([sc1/st1/**gp2**/io1](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)\) | | |
| **EFS** | “High” | “High” | – | $300 | | |

| | Durability (per year) | Availability “designed” | Availability SLA | Storage (per TB per month) | GET or retrieve (per million) | Write or archive (per million) |
|-----------------|-----------------------|-------------------------|------------------|----------------------------|-------------------------------|--------------------------------|
| **Glacier** | Eleven 9s | Sloooow | – | $7 | $50 | $50 |
| **S3 IA** | Eleven 9s | 99.9% | 99% | $12.50 | $1 | $10 |
| **S3 RRS** | 99.99% | 99.99% | 99.9% | $24 | $0.40 | $5 |
| **S3 Standard** | Eleven 9s | 99.99% | 99.9% | $30 | $0.40 | $5 |
| **EBS** | 99.8% | Unstated | 99.95% | $100 and up | | |
| **EFS** | “High” | “High” | – | $300 | | |
Especially notable items are in **boldface**. Sources: [S3 pricing](https://aws.amazon.com/s3/pricing/), [S3 SLA](https://aws.amazon.com/s3/sla/), [S3 FAQ](https://aws.amazon.com/s3/faqs/), [RRS info](https://aws.amazon.com/s3/reduced-redundancy/), [Glacier pricing](https://aws.amazon.com/glacier/pricing/), [EBS availability and durability](https://aws.amazon.com/ebs/details/#Amazon_EBS_Availability_and_Durability), [EBS pricing](https://aws.amazon.com/ebs/pricing/), [EFS pricing](https://aws.amazon.com/efs/pricing/), [EC2 SLA](https://aws.amazon.com/ec2/sla/)

EC2
---
@@ -643,6 +653,7 @@ EC2
- Running EC2 is akin to running a set of physical servers, as long as you don’t do automatic scaling or tooled cluster setup. If you just run a set of static instances, migrating to another VPS or dedicated server provider should not be too hard.
- 🚪**Alternatives to EC2:** The direct alternatives are Google Cloud, Microsoft Azure, Rackspace, DigitalOcean and other VPS providers, some of which offer similar API for setting up and removing instances. (See the comparisons [above](#when-to-use-aws).)
- **Should you use Amazon Linux?** AWS encourages use of their own [Amazon Linux](https://aws.amazon.com/amazon-linux-ami/), which is evolved from from [Red Hat Enterprise Linux (RHEL)](https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux) and [CentOS](https://en.wikipedia.org/wiki/CentOS). It’s used by many, but [others are skeptical](https://www.exratione.com/2014/08/do-not-use-amazon-linux/). Whatever you do, think this decision through carefully. It’s true Amazon Linux is heavily tested and better supported in the unlikely event you have deeper issues with OS and virtualization on EC2. But in general, many companies do just fine using a standard, non-Amazon Linux distribution, such as Ubuntu or CentOS. Using a standard Linux distribution means you have an exactly replicable environment should you use another hosting provider instead of (or in addition to) AWS. It’s also helpful if you wish to test deployments on local developer machines running the same standard Linux distribution (a practice that’s getting more common with Docker, too, and not currently possible with Amazon Linux).
- **EC2 costs:** See the [section on this](#ec2-cost-management).

### EC2 Tips

@@ -664,33 +675,6 @@ EC2
- How to avoid sharing keys; how to add individual ssh keys for individual users.
- **GPU support:** You can rent GPU-enabled instances on EC2. There are [two instance types](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html). Both sport an NVIDIA card (K520, 1536 CUDA cores and M2050, 448 CUDA cores).

### EC2 Cost Management

💸 This section covers EC2 cost management; be sure to read the [general cost management section](#billing-and-cost-management) as well.

- With EC2, there is a trade-off between engineering effort (more analysis, more tools, more complex architectures) and spend rate on AWS. If your EC2 costs are small, many of the efforts here are not worth the engineering time required to make them work. But once you know your costs will be growing in excess of an engineer’s salary, serious investment is often worthwhile.
- 🔹**Spot instances:**
- EC2 [spot instances](https://aws.amazon.com/ec2/spot/) are a way to get EC2 resources at significant discount — often many times cheaper than standard on-demand prices — if you’re willing to accept the possibility that they be terminated with little to no warning.
- Use spot instances for potentially very significant discounts whenever you can use resources that may be restarted and don’t maintain long-term state.
- The huge savings that you can get with Spot come at the cost of a significant increase in complexity when provisioning and reasoning about the availability of compute capacity.
- Amazon maintains spot prices at a market-driven fluctuating level, based on their inventory of unused capacity. Prices are typically low but can [spike](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html#spot-bid-limit) very high. See the [price history](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances-history.html) to get a sense for this.
- You set a bid price high to indicate how high you’re willing to pay, but you only pay the going rate, not the bid rate. If the market rate exceeds the bid, your instance may be terminated.
- Prices are per instance type and per availability zone. The same instance type may have wildly different price in different zones at the same time. Different instance types can have very different prices, even for similarly powered instance types in the same zone.
- Compare prices across instance types for better deals.
- Use spot instances whenever possible. Setting a high bid price will assure your machines stay up the vast majority of the time, at a fraction of the price of normal instances.
- Get notified up to two minutes before price-triggered shutdown by polling [your spot instances’ metadata](https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termination-notices/).
- Make sure your usage profile works well for Spot before investing heavily in tools to manage a particular configuration.
- **Spot fleet:**
- You can realize even bigger cost reductions at the same time as improvements to fleet stability relative to regular spot usage by using [Spot fleet](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html) to bid on instances across instance types, availability zones, and (through multiple Spot Fleet Requests) regions.
- Spot fleet targets maintaining a specified (and weighted-by-instance-type) total capacity across a cluster of servers. If the spot price of one instance type and availability zone combination rises above the weighted bid, it will rotate running instances out and bring up new ones of another type and location up in order to maintain the target capacity without going over target cluster cost.
- **Reserved Instances** allow you to get significant discounts on EC2 compute hours in return for a commitment to pay for instance hours of a specific instance type in a specific AWS region and availability zone for a pre-established time frame (1 or 3 years). Further discounts can be realized through “partial” or “all upfront” payment options.
- Consider using Reserved Instances when you can predict your longer-term compute needs and need a stronger guarantee of compute availability and continuity than the (typically cheaper) spot market can provide. However be aware that if your architecture changes your computing needs may change as well so long term contracts can seem attractive but may turn out to be cumbersome.
- Instance reservations are not tied to specific EC2 instances - they are applied at the billing level to eligible compute hours as they are consumed across all of the instances in an account.
- **Hourly billing waste:** EC2 instances are [billed in instance-hours](https://aws.amazon.com/ec2/faqs/#How_will_I_be_charged_and_billed_for_my_use_of_Amazon_EC2) — rounded up to the nearest full hour! For long-lived instances, this is not a big worry, but for large transient deployments, like EMR jobs or test deployments, this can be a significant expense. Never deploy many instances and terminate them after only a few minutes. In fact, if transient instances are part of your regular processing workflow, you should put in protections or alerts to check for this kind of waste.
- If you have multiple AWS accounts and have configured them to roll charges up to one account using the “Consolidated Billing” feature, you can expect *unused* Reserved Instance hours from one account to be applied to matching (region, availability zone, instance type) compute hours from another account.
- If you have multiple AWS accounts that are linked with Consolidated Billing, plan on using reservations, and want unused reservation capacity to be able to apply to compute hours from other accounts, you’ll need to create your instances in the availability zone with the same *name* across accounts. Keep in mind that when you have done this, your instances may not end up in the same *physical* data center across accounts - Amazon shuffles availability zones names across accounts in order to equalize resource utilization.
- Make use of dynamic [Auto Scaling](#auto-scaling), where possible, in order to better match your cluster size (and cost) to the current resource requirements of your service.

### EC2 Gotchas and Limitations

- ❗Never use ssh passwords. Just don’t do it; they are too insecure, and consequences of compromise too severe. Use keys instead. [Read up on this](https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2) and fully disable ssh password access to your ssh server by making sure 'PasswordAuthentication no' is in your /etc/ssh/sshd_config file. If you’re careful about managing ssh private keys everywhere they are stored, it is a major improvement on security over password-based authentication.
@@ -835,7 +819,7 @@ Load Balancers
### ALB Basics

- 📒 [Homepage](https://aws.amazon.com/elasticloadbalancing/applicationloadbalancer/) ∙ [User guide](https://aws.amazon.com/elasticloadbalancing/applicationloadbalancer/developer-resources/) ∙ [FAQ](https://aws.amazon.com/elasticloadbalancing/applicationloadbalancer/faqs/) ∙ [Pricing](https://aws.amazon.com/elasticloadbalancing/applicationloadbalancer/pricing/)
- **Websockets and HTTP/2** are [now supported](https://aws.amazon.com/blogs/aws/new-aws-application-load-balancer/).
- 🐥**Websockets and HTTP/2** are [now supported](https://aws.amazon.com/blogs/aws/new-aws-application-load-balancer/).
- Prior to the Application Load Balancer, you were advised to use TCP instead of HTTP as the protocol to make it work (as described [here](http://www.quora.com/When-will-Amazon-ELB-offer-SPDY-support)) and use [the obscure but useful Proxy Protocol](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/enable-proxy-protocol.html) ([more on this](https://chrislea.com/2014/03/20/using-proxy-protocol-nginx/)) to pass client IPs over a TCP load balancer.

### ALB Tips
@@ -1116,6 +1100,10 @@ CloudFront

### CloudFront Tips

- 🐥**HTTP/2** is [now supported](https://aws.amazon.com/about-aws/whats-new/2016/09/amazon-cloudfront-now-supports-http2/)! Clients [must support TLS 1.2 and SNI](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesSupportedHTTPVersions).
- While the most common use is for users to browse and download content (GET or HEAD methods) requests, CloudFront also supports ([since 2013](https://aws.amazon.com/blogs/aws/amazon-cloudfront-content-uploads-post-put-other-methods/)) uploaded data (POST, PUT, DELETE, OPTIONS, and PATCH).
- You must enable this by specifying the [allowed HTTP methods](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesAllowedHTTPMethods) when you create the distribution.
- Interestingly, the cost of accepting (uploaded) data [is usually less](https://aws.amazon.com/cloudfront/pricing/) than for sending (downloaded) data.
- In its basic version, CloudFront [supports SSL](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/SecureConnections.html) via the [SNI extension to TLS](https://en.wikipedia.org/wiki/Server_Name_Indication), which is supported by all modern web browsers. If you need to support older browsers, you need to pay a few hundred dollars a month for dedicated IPs.
- 💸⏱Consider invalidation needs carefully. CloudFront [does support invalidation](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html) of objects from edge locations, but this typically takes many minutes to propagate to edge locations, and costs $0.005 per request after the first 1000 requests. (Some other CDNs support this better.)
- Everyone should use TLS nowadays if possible. [Ilya Grigorik’s table](https://istlsfastyet.com/#cdn-paas) offers a good summary of features regarding TLS performance features of CloudFront.
@@ -1124,7 +1112,6 @@ CloudFront

### CloudFront Gotchas and Limitations

- HTTP/2 is not yet supported.
- If using S3 as a backing store, remember that the endpoints for website hosting and for general S3 are different. Example: “bucketname.s3.amazonaws.com” is a standard S3 serving endpoint, but to have redirect and error page support, you need to use the website hosting endpoint listed for that bucket, e.g. “bucketname.s3-website-us-east-1.amazonaws.com” (or the appropriate region).

DirectConnect
@@ -1186,7 +1173,7 @@ EMR
### EMR Tips

- EMR relies on many versions of Hadoop and other supporting software. Be sure to check [which versions are in use](https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-components.html).
- 💸❗**EMR costs** can pile up quickly since it involves lots of instances, efficiency can be poor depending on cluster configuration and choice of workload, and accidents like hung jobs are costly. See the [section on EC2 cost management](#ec2-cost-management), especially the tips there about spot instances and avoiding hourly billing. [This blog post](http://engineering.bloomreach.com/strategies-for-reducing-your-amazon-emr-costs/) has additional tips.
- 💸❗**EMR costs** can pile up quickly since it involves lots of instances, efficiency can be poor depending on cluster configuration and choice of workload, and accidents like hung jobs are costly. See the [section on EC2 cost management](#ec2-cost-management), especially the tips there about Spot instances and avoiding hourly billing. [This blog post](http://engineering.bloomreach.com/strategies-for-reducing-your-amazon-emr-costs/) has additional tips.
- ⏱Off-the-shelf EMR and Hadoop can have significant overhead when compared with efficient processing on a single machine. If your data is small and performance matters, you may wish to consider alternatives, as [this post](http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html) illustrates.
- Python programmers may want to take a look at Yelp’s [mrjob](https://github.com/Yelp/mrjob).
- It takes time to tune performance of EMR jobs, which is why third-party services such as [Qubole’s data service](https://www.qubole.com/mapreduce-as-a-service/) are gaining popularity as ways to improve performance or reduce costs.
@@ -1207,6 +1194,8 @@ This section covers tips and information on achieving [high availability](https:
- The bulk of outages in AWS services affect one zone only. There have been rare outages affecting multiple zones simultaneously (for example, the [great EBS failure of 2011](http://aws.amazon.com/message/65648/)) but in general most customers’ outages are due to using only a single AZ for some infrastructure.
- Consequently, design your architecture to minimize the impact of AZ outages, especially single-zone outages.
- Deploy key infrastructure across at least two or three AZs. Replicating a single resource across more than three zones often won’t make sense if you have other backup mechanisms in place, like S3 snapshots.
- A second or third AZ should significantly improve availability, but additional reliability of 4 or more AZs may not justify the costs or complexity (unless you have other reasons like capacity or Spot market prices).
- 💸Watch out for **cross-AZ traffic costs**. This can be an unpleasant surprise in architectures with large volume of traffic crossing AZ boundaries.
- Deploy instances evenly across all available AZs, so that only a minimal fraction of your capacity is lost in case of an AZ outage.
- If your architecture has single points of failure, put all of them into a single AZ. This may seem counter-intuitive, but it minimizes the likelihood of any one SPOF to go down on an outage of a single AZ.
- **EBS vs instance storage:** For a number of years, EBSs had a poorer track record for availability than instance storage. For systems where individual instances can be killed and restarted easily, instance storage with sufficient redundancy could give higher availability overall. EBS has improved, and modern instance types (since 2015) are now EBS-only, so this approach, while helpful at one time, may be increasingly archaic.
@@ -1217,6 +1206,51 @@ This section covers tips and information on achieving [high availability](https:
- **AZ naming** differs from one customer account to the next. Your “us-west-1a” is not the same as another customer’s “us-west-1a” — the letters are assigned to physical AZs randomly per account. This can also be a gotcha if you have multiple AWS accounts.
- **Cross-AZ traffic** is not free. At large scale, the costs add up to a significant amount of money. If possible, optimize your traffic to stay within the same AZ as much as possible.

Billing and Cost Management
---------------------------

### Billing and Cost Visibility

- AWS offers a [**free tier**](https://aws.amazon.com/free/) of service, that allows very limited usage of resources at no cost. For example, a micro instance and small amount of storage is available for no charge. (If you have an old account but starting fresh, sign up for a new one to qualify for the free tier.) [AWS Activate](https://aws.amazon.com/activate/) extends this to tens of thousands of dollars of free credits to startups in [certain funds or accelerators](https://aws.amazon.com/activate/portfolio-detail/).
- You can set [**billing alerts**](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/free-tier-alarms.html) to be notified of unexpected costs, such as costs exceeding the free tier. You can set these in a [granular way](https://wblinks.com/notes/aws-tips-i-wish-id-known-before-i-started/#billing).
- AWS offers [Cost Explorer](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-explorer-what-is.html), a tool to get better visibility into costs.
- Unfortunately, the AWS console and billing tools are rarely enough to give good visibility into costs. For large accounts, the AWS billing console can time out or be too slow to use.
- **Tools:**
- 🔹Enable [billing reports](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/detailed-billing-reports.html) and install an open source tool to help manage or monitor AWS resource utilization. [**Netflix Ice**](https://github.com/Netflix/ice) is probably the first one you should try. Check out [docker-ice](https://github.com/jonbrouse/docker-ice) for a Dockerized version that eases installation.
- 🔸One challenge with Ice is that it doesn’t cover amortized cost of reserved instances.
- Other tools include [Security Monkey](https://github.com/Netflix/security_monkey) and [Cloud Custodian](https://github.com/capitalone/cloud-custodian).
- **Third-party services:** Several companies offer services designed to help you gain insights into expenses or lower your AWS bill, such as [OpsClarity](http://www.opsclarity.com/), [Cloudability](https://www.cloudability.com/), [CloudHealth Technologies](https://www.cloudhealthtech.com/), and [ParkMyCloud](http://www.parkmycloud.com/). Some of these charge a percentage of your bill, which may be expensive. See the [market landscape](#tools-and-services-market-landscape).
- AWS’s [Trusted Advisor](https://aws.amazon.com/premiumsupport/trustedadvisor/) is another service that can help with cost concerns.
- Don’t be shy about asking your account manager for guidance in reducing your bill. It’s their job to keep you happily using AWS.
- **Tagging for cost visibility:** As the infrastructure grows, a key part of managing costs is understanding where they lie. It’s strongly advisable to [tag resources](https://aws.amazon.com/blogs/aws/resource-groups-and-tagging/), and as complexity grows, group them effectively. If you [set up billing allocation appropriately](http://aws.amazon.com/blogs/aws/aws-cost-allocation/), you can then get visibility into expenses according to organization, product, individual engineer, or any other way that is helpful.
- If you need to do custom analysis of raw billing data or want to feed it to a third party cost analysis service, [enable](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/detailed-billing-reports.html#turnonreports) the [detailed billing report](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/detailed-billing-reports.html#detailed-billing-report) feature.
- Multiple Amazon accounts can be linked for billing purposes using the [Consolidated Billing](http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html) feature. Large enterprises may need complex billing structures depending on ownership and approval processes.

### EC2 Cost Management

- With EC2, there is a trade-off between engineering effort (more analysis, more tools, more complex architectures) and spend rate on AWS. If your EC2 costs are small, many of the efforts here are not worth the engineering time required to make them work. But once you know your costs will be growing in excess of an engineer’s salary, serious investment is often worthwhile.
- 🔹**Spot instances:**
- EC2 [Spot instances](https://aws.amazon.com/ec2/spot/) are a way to get EC2 resources at significant discount — often many times cheaper than standard on-demand prices — if you’re willing to accept the possibility that they be terminated with little to no warning.
- Use Spot instances for potentially very significant discounts whenever you can use resources that may be restarted and don’t maintain long-term state.
- The huge savings that you can get with Spot come at the cost of a significant increase in complexity when provisioning and reasoning about the availability of compute capacity.
- Amazon maintains Spot prices at a market-driven fluctuating level, based on their inventory of unused capacity. Prices are typically low but can [spike](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html#spot-bid-limit) very high. See the [price history](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances-history.html) to get a sense for this.
- You set a bid price high to indicate how high you’re willing to pay, but you only pay the going rate, not the bid rate. If the market rate exceeds the bid, your instance may be terminated.
- Prices are per instance type and per availability zone. The same instance type may have wildly different price in different zones at the same time. Different instance types can have very different prices, even for similarly powered instance types in the same zone.
- Compare prices across instance types for better deals.
- Use Spot instances whenever possible. Setting a high bid price will assure your machines stay up the vast majority of the time, at a fraction of the price of normal instances.
- Get notified up to two minutes before price-triggered shutdown by polling [your Spot instances’ metadata](https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termination-notices/).
- Make sure your usage profile works well for Spot before investing heavily in tools to manage a particular configuration.
- **Spot fleet:**
- You can realize even bigger cost reductions at the same time as improvements to fleet stability relative to regular Spot usage by using [Spot fleet](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html) to bid on instances across instance types, availability zones, and (through multiple Spot Fleet Requests) regions.
- Spot fleet targets maintaining a specified (and weighted-by-instance-type) total capacity across a cluster of servers. If the Spot price of one instance type and availability zone combination rises above the weighted bid, it will rotate running instances out and bring up new ones of another type and location up in order to maintain the target capacity without going over target cluster cost.
- **Reserved Instances** allow you to get significant discounts on EC2 compute hours in return for a commitment to pay for instance hours of a specific instance type in a specific AWS region and availability zone for a pre-established time frame (1 or 3 years). Further discounts can be realized through “partial” or “all upfront” payment options.
- Consider using Reserved Instances when you can predict your longer-term compute needs and need a stronger guarantee of compute availability and continuity than the (typically cheaper) Spot market can provide. However be aware that if your architecture changes your computing needs may change as well so long term contracts can seem attractive but may turn out to be cumbersome.
- Instance reservations are not tied to specific EC2 instances - they are applied at the billing level to eligible compute hours as they are consumed across all of the instances in an account.
- **Hourly billing waste:** EC2 instances are [billed in instance-hours](https://aws.amazon.com/ec2/faqs/#How_will_I_be_charged_and_billed_for_my_use_of_Amazon_EC2) — rounded up to the nearest full hour! For long-lived instances, this is not a big worry, but for large transient deployments, like EMR jobs or test deployments, this can be a significant expense. Never deploy many instances and terminate them after only a few minutes. In fact, if transient instances are part of your regular processing workflow, you should put in protections or alerts to check for this kind of waste.
- If you have multiple AWS accounts and have configured them to roll charges up to one account using the “Consolidated Billing” feature, you can expect *unused* Reserved Instance hours from one account to be applied to matching (region, availability zone, instance type) compute hours from another account.
- If you have multiple AWS accounts that are linked with Consolidated Billing, plan on using reservations, and want unused reservation capacity to be able to apply to compute hours from other accounts, you’ll need to create your instances in the availability zone with the same *name* across accounts. Keep in mind that when you have done this, your instances may not end up in the same *physical* data center across accounts - Amazon shuffles availability zones names across accounts in order to equalize resource utilization.
- Make use of dynamic [Auto Scaling](#auto-scaling), where possible, in order to better match your cluster size (and cost) to the current resource requirements of your service.

Further Reading
---------------


+ 4
- 2
admin/authors-info.yml View File

@@ -15,8 +15,8 @@ exclude:
ReadmeCritic

roles:
jlevy: project editor
ThanosBaskous: project editor
jlevy: general editor
ThanosBaskous: general editor
marcello3d:
golyshev:
forwidur:
@@ -31,3 +31,5 @@ roles:
alexanderatallah:
donnemartin:
benkehoe:
Jurgen Philippaerts:
nitingithub: editor (cost management)

BIN
figures/aws-market-landscape.png View File


Loading…
Cancel
Save