Browse Source

Improvemetns from review. Couple more tweaks.

pull/56/head
Joshua Levy 4 years ago
parent
commit
c777b2e83b
2 changed files with 80 additions and 80 deletions
  1. 79
    79
      README.md
  2. 1
    1
      admin/authors-info.yml

+ 79
- 79
README.md View File

@@ -40,7 +40,7 @@ The Open Guide to Amazon Web Services
**List of Tables**

- [Service Matrix](#service-matrix)
- [Maturity and Releases](#maturity-and-releases)
- [AWS Product Maturity and Releases](#aws-product-maturity-and-releases)
- [Storage Durability, Availability, and Price](#storage-durability-availability-and-price)

Why an Open Guide?
@@ -245,62 +245,62 @@ Selected resources with more detail on this chart:

- Google internal: [MapReduce](http://research.google.com/archive/mapreduce.html), [Bigtable](http://research.google.com/archive/bigtable.html), [Spanner](http://research.google.com/archive/spanner.html), [F1 vs Spanner](http://highscalability.com/blog/2013/10/8/f1-and-spanner-holistically-compared.html), [Bigtable vs Megastore](http://perspectives.mvdirona.com/2008/07/google-megastore/)

### Maturity and Releases
It’s important to know the maturity of each product. Here is a mostly complete list of first release date, with links to the [release notes](https://aws.amazon.com/releasenotes/). Most recently released services are first. Not all services are available in all regions; see [this table](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/).
| Service | Original release | Availability |
|-----------------------------------------------------------------------------------------------------------|------------------|--------------|
| [Database Migration Service](https://aws.amazon.com/releasenotes/AWS-Database-Migration-Service?browse=1) | 2016-03 | General |
| [IoT](https://aws.amazon.com/blogs/aws/aws-iot-now-generally-available/) | 2015-08 | General |
| [WAF](https://aws.amazon.com/releasenotes/AWS-WAF?browse=1) | 2015-10 | General |
| [Data Pipeline](https://aws.amazon.com/releasenotes/AWS-Data-Pipeline?browse=1) | 2015-10 | General |
| [Elasticsearch](https://aws.amazon.com/releasenotes/Amazon-Elasticsearch-Service?browse=1) | 2015-10 | General |
| [Service Catalog](https://aws.amazon.com/releasenotes/AWS-Service-Catalog?browse=1) | 2015-07 | General |
| [CodePipeline](https://aws.amazon.com/releasenotes/AWS-CodePipeline?browse=1) | 2015-07 | General |
| [CodeCommit](https://aws.amazon.com/releasenotes/AWS-CodeCommit?browse=1) | 2015-07 | General |
| [API Gateway](https://aws.amazon.com/releasenotes/Amazon-API-Gateway?browse=1) | 2015-07 | General |
| [Config](https://aws.amazon.com/releasenotes/AWS-Config?browse=1) | 2015-06 | General |
| [EFS](https://aws.amazon.com/releasenotes/Amazon-EFS?browse=1) | 2015-05 | Preview |
| [Machine Learning](https://aws.amazon.com/releasenotes/AmazonML?browse=1) | 2015-04 | General |
| [Lambda](https://aws.amazon.com/releasenotes/AWS-Lambda?browse=1) | 2014-11 | General |
| [ECS](https://aws.amazon.com/ecs/release-notes/) | 2014-11 | General |
| [KMS](https://aws.amazon.com/releasenotes/AWS-KMS?browse=1) | 2014-11 | General |
| [CodeDeploy](https://aws.amazon.com/releasenotes/AWS-CodeDeploy?browse=1) | 2014-11 | General |
| [Kinesis](https://aws.amazon.com/releasenotes/Amazon-Kinesis?browse=1) | 2013-12 | General |
| [CloudTrail](https://aws.amazon.com/releasenotes/AWS-CloudTrail?browse=1) | 2013-11 | General |
| [AppStream](https://aws.amazon.com/releasenotes/Amazon-AppStream?browse=1) | 2013-11 | Preview |
| [CloudHSM](https://aws.amazon.com/releasenotes/AWS-CloudHSM?browse=1) | 2013-03 | General |
| [Silk](https://aws.amazon.com/releasenotes/Amazon-Silk?browse=1) | 2013-03 | Obsolete? |
| [OpsWorks](https://aws.amazon.com/releasenotes/AWS-OpsWorks?browse=1) | 2013-02 | General |
| [Redshift](https://aws.amazon.com/releasenotes/Amazon-Redshift?browse=1) | 2013-02 | General |
| [Elastic Transcoder](https://aws.amazon.com/releasenotes/Amazon-Elastic-Transcoder?browse=1) | 2013-01 | General |
| [Glacier](https://aws.amazon.com/releasenotes/Amazon-Glacier?browse=1) | 2012-08 | General |
| [CloudSearch](https://aws.amazon.com/releasenotes/Amazon-CloudSearch?browse=1) | 2012-04 | General |
| [SWF](https://aws.amazon.com/releasenotes/Amazon-SWF?browse=1) | 2012-02 | General |
| [Storage Gateway](https://aws.amazon.com/releasenotes/AWS-Storage-Gateway?browse=1) | 2012-01 | General |
| [DynamoDB](https://aws.amazon.com/releasenotes/Amazon-DynamoDB?browse=1) | 2012-01 | General |
| [DirectConnect](https://aws.amazon.com/releasenotes/AWS-Direct-Connect?browse=1) | 2011-08 | General |
| [ElastiCache](https://aws.amazon.com/releasenotes/Amazon-ElastiCache?browse=1) | 2011-08 | General |
| [CloudFormation](https://aws.amazon.com/releasenotes/AWS-CloudFormation?browse=1) | 2011-04 | General |
| [SES](https://aws.amazon.com/releasenotes/Amazon-SES?browse=1) | 2011-01 | General |
| [Elastic Beanstalk](https://aws.amazon.com/releasenotes/AWS-Elastic-Beanstalk?browse=1) | 2010-12 | General |
| [Route 53](https://aws.amazon.com/releasenotes/Amazon-Route-53?browse=1) | 2010-10 | General |
| [IAM](https://aws.amazon.com/releasenotes/AWS-Identity-and-Access-Management?browse=1) | 2010-09 | General |
| [SNS](https://aws.amazon.com/releasenotes/Amazon-SNS?browse=1) | 2010-04 | General |
| [EMR](https://aws.amazon.com/releasenotes/Elastic-MapReduce?browse=1) | 2010-04 | General |
| [RDS](https://aws.amazon.com/releasenotes/Amazon-RDS?browse=1) | 2009-12 | General |
| [VPC](https://aws.amazon.com/releasenotes/Amazon-VPC?browse=1) | 2009-08 | General |
| [Snowball](https://aws.amazon.com/releasenotes/AWS-ImportExport?browse=1) | 2009-05 | General |
| [CloudWatch](https://aws.amazon.com/releasenotes/CloudWatch?browse=1) | 2009-05 | General |
| [CloudFront](https://aws.amazon.com/releasenotes/CloudFront?browse=1) | 2008-11 | General |
| [Fulfillment Web Service](https://aws.amazon.com/releasenotes/Amazon-FWS?browse=1) | 2008-03 | Obsolete? |
| [SimpleDB](https://aws.amazon.com/releasenotes/Amazon-SimpleDB?browse=1) | 2007-12 | Obsolete |
| [DevPay](https://aws.amazon.com/releasenotes/DevPay?browse=1) | 2007-12 | General |
| [Flexible Payments Service](https://aws.amazon.com/releasenotes/Amazon-FPS?browse=1) | 2007-08 | Retired |
| [EC2](https://aws.amazon.com/releasenotes/Amazon-EC2?browse=1) | 2006-08 | General |
| [SQS](https://aws.amazon.com/releasenotes/Amazon-SQS?browse=1) | 2006-07 | General |
| [S3](https://aws.amazon.com/releasenotes/Amazon-S3?browse=1) | 2006-03 | General |
### AWS Product Maturity and Releases
It’s important to know the maturity of each AWS product. Here is a mostly complete list of first release date, with links to the [release notes](https://aws.amazon.com/releasenotes/). Most recently released services are first. Not all services are available in all regions; see [this table](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/).
| Service | Original release | Availability |
|------------------------------------------------------------------------------------------------------------|------------------|--------------|
| 🐥[Database Migration Service](https://aws.amazon.com/releasenotes/AWS-Database-Migration-Service?browse=1) | 2016-03 | General |
| 🐥[IoT](https://aws.amazon.com/blogs/aws/aws-iot-now-generally-available/) | 2015-08 | General |
| 🐥[WAF](https://aws.amazon.com/releasenotes/AWS-WAF?browse=1) | 2015-10 | General |
| 🐥[Data Pipeline](https://aws.amazon.com/releasenotes/AWS-Data-Pipeline?browse=1) | 2015-10 | General |
| 🐥[Elasticsearch](https://aws.amazon.com/releasenotes/Amazon-Elasticsearch-Service?browse=1) | 2015-10 | General |
| 🐥[Service Catalog](https://aws.amazon.com/releasenotes/AWS-Service-Catalog?browse=1) | 2015-07 | General |
| 🐥[CodePipeline](https://aws.amazon.com/releasenotes/AWS-CodePipeline?browse=1) | 2015-07 | General |
| 🐥[CodeCommit](https://aws.amazon.com/releasenotes/AWS-CodeCommit?browse=1) | 2015-07 | General |
| 🐥[API Gateway](https://aws.amazon.com/releasenotes/Amazon-API-Gateway?browse=1) | 2015-07 | General |
| 🐥[Config](https://aws.amazon.com/releasenotes/AWS-Config?browse=1) | 2015-06 | General |
| 🐥[EFS](https://aws.amazon.com/releasenotes/Amazon-EFS?browse=1) | 2015-05 | Preview |
| 🐥[Machine Learning](https://aws.amazon.com/releasenotes/AmazonML?browse=1) | 2015-04 | General |
| [Lambda](https://aws.amazon.com/releasenotes/AWS-Lambda?browse=1) | 2014-11 | General |
| [ECS](https://aws.amazon.com/ecs/release-notes/) | 2014-11 | General |
| [KMS](https://aws.amazon.com/releasenotes/AWS-KMS?browse=1) | 2014-11 | General |
| [CodeDeploy](https://aws.amazon.com/releasenotes/AWS-CodeDeploy?browse=1) | 2014-11 | General |
| [Kinesis](https://aws.amazon.com/releasenotes/Amazon-Kinesis?browse=1) | 2013-12 | General |
| [CloudTrail](https://aws.amazon.com/releasenotes/AWS-CloudTrail?browse=1) | 2013-11 | General |
| [AppStream](https://aws.amazon.com/releasenotes/Amazon-AppStream?browse=1) | 2013-11 | Preview |
| [CloudHSM](https://aws.amazon.com/releasenotes/AWS-CloudHSM?browse=1) | 2013-03 | General |
| [Silk](https://aws.amazon.com/releasenotes/Amazon-Silk?browse=1) | 2013-03 | Obsolete? |
| [OpsWorks](https://aws.amazon.com/releasenotes/AWS-OpsWorks?browse=1) | 2013-02 | General |
| [Redshift](https://aws.amazon.com/releasenotes/Amazon-Redshift?browse=1) | 2013-02 | General |
| [Elastic Transcoder](https://aws.amazon.com/releasenotes/Amazon-Elastic-Transcoder?browse=1) | 2013-01 | General |
| [Glacier](https://aws.amazon.com/releasenotes/Amazon-Glacier?browse=1) | 2012-08 | General |
| [CloudSearch](https://aws.amazon.com/releasenotes/Amazon-CloudSearch?browse=1) | 2012-04 | General |
| [SWF](https://aws.amazon.com/releasenotes/Amazon-SWF?browse=1) | 2012-02 | General |
| [Storage Gateway](https://aws.amazon.com/releasenotes/AWS-Storage-Gateway?browse=1) | 2012-01 | General |
| [DynamoDB](https://aws.amazon.com/releasenotes/Amazon-DynamoDB?browse=1) | 2012-01 | General |
| [DirectConnect](https://aws.amazon.com/releasenotes/AWS-Direct-Connect?browse=1) | 2011-08 | General |
| [ElastiCache](https://aws.amazon.com/releasenotes/Amazon-ElastiCache?browse=1) | 2011-08 | General |
| [CloudFormation](https://aws.amazon.com/releasenotes/AWS-CloudFormation?browse=1) | 2011-04 | General |
| [SES](https://aws.amazon.com/releasenotes/Amazon-SES?browse=1) | 2011-01 | General |
| [Elastic Beanstalk](https://aws.amazon.com/releasenotes/AWS-Elastic-Beanstalk?browse=1) | 2010-12 | General |
| [Route 53](https://aws.amazon.com/releasenotes/Amazon-Route-53?browse=1) | 2010-10 | General |
| [IAM](https://aws.amazon.com/releasenotes/AWS-Identity-and-Access-Management?browse=1) | 2010-09 | General |
| [SNS](https://aws.amazon.com/releasenotes/Amazon-SNS?browse=1) | 2010-04 | General |
| [EMR](https://aws.amazon.com/releasenotes/Elastic-MapReduce?browse=1) | 2010-04 | General |
| [RDS](https://aws.amazon.com/releasenotes/Amazon-RDS?browse=1) | 2009-12 | General |
| [VPC](https://aws.amazon.com/releasenotes/Amazon-VPC?browse=1) | 2009-08 | General |
| [Snowball](https://aws.amazon.com/releasenotes/AWS-ImportExport?browse=1) | 2009-05 | General |
| [CloudWatch](https://aws.amazon.com/releasenotes/CloudWatch?browse=1) | 2009-05 | General |
| [CloudFront](https://aws.amazon.com/releasenotes/CloudFront?browse=1) | 2008-11 | General |
| [Fulfillment Web Service](https://aws.amazon.com/releasenotes/Amazon-FWS?browse=1) | 2008-03 | Obsolete? |
| [SimpleDB](https://aws.amazon.com/releasenotes/Amazon-SimpleDB?browse=1) | 2007-12 | Obsolete |
| [DevPay](https://aws.amazon.com/releasenotes/DevPay?browse=1) | 2007-12 | General |
| [Flexible Payments Service](https://aws.amazon.com/releasenotes/Amazon-FPS?browse=1) | 2007-08 | Retired |
| [EC2](https://aws.amazon.com/releasenotes/Amazon-EC2?browse=1) | 2006-08 | General |
| [SQS](https://aws.amazon.com/releasenotes/Amazon-SQS?browse=1) | 2006-07 | General |
| [S3](https://aws.amazon.com/releasenotes/Amazon-S3?browse=1) | 2006-03 | General |

### Compliance

@@ -370,7 +370,7 @@ So if you’re not going to manage your AWS configurations manually, what should
- Don’t underestimate its power. It also has the advantage of being well-maintained — it covers a large proportion of all AWS services, and is up to date.
- In general, whenever you can, prefer the command line to the AWS Console for performing operations.
- 🔹Even in absence of fancier tools, you can **write simple Bash scripts** that invoke *aws* with specific arguments, and check these into Git. This is a primitive but effective way to document operations you’ve performed. It improves automation, allows code review and sharing on a team, and gives others a starting point for future work.
- 🔹For use that is primarily interactive, and not scripted, consider instead using the [**aws-shell**](https://github.com/awslabs/aws-shell) tool from AWS. It is easier to use, with auto-completion and a colorful UI, but still works on the command line. If you're using [SAWS](https://github.com/donnemartin/saws), a previous version of the program, [you should migrate to aws-shell](https://github.com/donnemartin/saws/issues/68#issuecomment-240067034).
- 🔹For use that is primarily interactive, and not scripted, consider instead using the [**aws-shell**](https://github.com/awslabs/aws-shell) tool from AWS. It is easier to use, with auto-completion and a colorful UI, but still works on the command line. If youre using [SAWS](https://github.com/donnemartin/saws), a previous version of the program, [you should migrate to aws-shell](https://github.com/donnemartin/saws/issues/68#issuecomment-240067034).

### APIs and SDKs

@@ -463,7 +463,8 @@ We cover security basics first, since configuring user accounts is something you

### Security and IAM Basics

- 📒 [Homepage](https://aws.amazon.com/iam/) ∙ [User guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) ∙ [FAQ](https://aws.amazon.com/iam/faqs/)
- 📒 IAM [Homepage](https://aws.amazon.com/iam/) ∙ [User guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) ∙ [FAQ](https://aws.amazon.com/iam/faqs/)
- The [AWS Security Blog](https://blogs.aws.amazon.com/security) is one of the best sources of news and information on AWS security.
- **IAM** is the service you use to manage accounts and permissioning for AWS.
- Managing security and access control with AWS is critical, so every AWS administrator needs to use and understand IAM, at least at a basic level.
- IAM manages various kinds of authentication, for both users and for software services that may need to authenticate with AWS, including:
@@ -488,7 +489,7 @@ We cover security basics first, since configuring user accounts is something you
- Most users can use the Google Authenticator app (on [iOS](https://itunes.apple.com/us/app/google-authenticator/id388497605) or [Android](https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2)) to support two-factor authentication. For the root account, consider a hardware fob.
- ❗**Turn on CloudTrail:** One of the first things you should do is [enable CloudTrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-a-trail-using-the-console-first-time.html). Even if you are not a security hawk, there is little reason not to do this from the beginning, so you have data on what has been happening in your AWS account should you need that information. You’ll likely also want to set up a [log management service](#visibility) to search and access these logs.
- 🔹**Use IAM roles for EC2:** Rather than assign IAM users to applications like services and then sharing the sensitive credentials, [define and assign roles to EC2 instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) and have applications retrieve credentials from the [instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html).
- Assign IAM roles by realm — for example, to development, staging, and production. If you're setting up a role, it should be tied to a specific realm so you have clean separation. This avoids allowing a development instance to connect to a production database by accident, for example.
- Assign IAM roles by realm — for example, to development, staging, and production. If you’re setting up a role, it should be tied to a specific realm so you have clean separation. This prevents, for example, a development instance from connecting to a production database.
- **Best practices:** AWS’ [list of best practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html) is worth reading in full up front.
- **Multiple accounts:** Decide on whether you want to use multiple AWS accounts and [research](https://dab35129f0361dca3159-2fe04d8054667ffada6c4002813eccf0.ssl.cf1.rackcdn.com/downloads/pdfs/Rackspace%20Best%20Practices%20for%20AWS%20-%20Identity%20Managment%20-%20Billing%20-%20Auditing.pdf) how to organize access across them. Factors to consider:
- Number of users
@@ -510,9 +511,10 @@ We cover security basics first, since configuring user accounts is something you

### Security and IAM Gotchas and Limitations

- ❗**Don’t share user credentials.** It’s remarkably common for first-time AWS users create one account and one set of credentials (access key or password), and then use them for a while, sharing among engineers and others within a company. This is easy. But *don’t do this*. This is an insecure practice for many reasons, but in particular, if you do, you will have reduced ability to revoke credentials on a per-user or per-service basis (for example, if an employee leaves or a key is compromised), which can lead to serious complications.
- ❗**Instance metadata throttling:** The [instance metadata service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html) has rate limiting on API calls. If you deploy IAM roles widely (as you should!) and have lots of services, you may hit global account limits easily. One solution is to store have code or scripts cache and reuse the credentials locally (for example, in ~/.aws/credentials).

- ❗**Don’t share user credentials:** It’s remarkably common for first-time AWS users to create one account and one set of credentials (access key or password), and then use them for a while, sharing among engineers and others within a company. This is easy. But *don’t do this*. This is an insecure practice for many reasons, but in particular, if you do, you will have reduced ability to revoke credentials on a per-user or per-service basis (for example, if an employee leaves or a key is compromised), which can lead to serious complications.
- ❗**Instance metadata throttling:** The [instance metadata service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html) has rate limiting on API calls. If you deploy IAM roles widely (as you should!) and have lots of services, you may hit global account limits easily.
- One solution is to have code or scripts cache and reuse the credentials locally for a short period (say 2 minutes). For example, they can be put into the ~/.aws/credentials file but must also be refreshed automatically.
- But be careful not to cache credentials for too long, as [they expire](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials). (Note the other [dynamic metadata](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html#dynamic-data-categories) also changes over time and should not be cached a long time, either.)
- 🔸Some IAM operations are slower than other API calls (many seconds), since AWS needs to propagate these globally across regions.

S3
@@ -557,9 +559,10 @@ S3
- **Multi-part uploads:** For large objects you want to take advantage of the multi-part uploading capabilities (starting with minimum chunk sizes of 5 MB).
- **Large downloads:** Also you can download chunks of a single large object in parallel by exploiting the HTTP GET range-header capability.
- 🔸**List pagination:** Listing contents happens at 1000 responses per request, so for buckets with many millions of objects listings will take time.
- ❗In addition, latency on operations is [highly dependent on prefix similarities among key names](http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html). If you have need for high volumes of operations, it is essential to consider naming schemes with more randomness early in the key name (first 6 or 8 characters) in order to avoid “hot spots”.
- 🔸Note that sadly, the latter advice about random key names goes against having a consistent layout with common prefixes to manage data lifecycles in an automated way.
- For data outside AWS, [DirectConnect](https://aws.amazon.com/directconnect/) and [**S3 Transfer Acceleration**](https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/) can help. For S3 Transfer Acceleration, you [pay](https://aws.amazon.com/s3/pricing/) about the equivalent of 1-2 months of storage for the transfer in either direction for using nearer endpoints.
- ❗**Key prefixes:** In addition, latency on operations is [highly dependent on prefix similarities among key names](http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html). If you have need for high volumes of operations, it is essential to consider naming schemes with more randomness early in the key name (first 6 or 8 characters) in order to avoid “hot spots”.
- We list this as a major gotcha since it’s often painful to do large-scale renames.
- 🔸Note that sadly, the advice about random key names goes against having a consistent layout with common prefixes to manage data lifecycles in an automated way.
- For data outside AWS, [**DirectConnect**](https://aws.amazon.com/directconnect/) and [**S3 Transfer Acceleration**](https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/) can help. For S3 Transfer Acceleration, you [pay](https://aws.amazon.com/s3/pricing/) about the equivalent of 1-2 months of storage for the transfer in either direction for using nearer endpoints.
- **Command-line applications:** There are a few ways to use S3 from the command line:
- Originally, [**s3cmd**](https://github.com/s3tools/s3cmd) was the best tool for the job. It’s still used heavily by many.
- The regular [**aws**](https://aws.amazon.com/cli/) command-line interface now supports S3 well, and is useful for most situations.
@@ -601,11 +604,11 @@ S3
- [S3QL](https://github.com/s3ql/s3ql) ([discussion](https://news.ycombinator.com/item?id=10150684)) is a Python implementation that offers data de-duplication, snap-shotting, and encryption, but only one client at a time.
- [ObjectiveFS](https://objectivefs.com/) ([discussion](https://news.ycombinator.com/item?id=10117506)) is a commercial solution that supports filesystem features and concurrent clients.
- If you are primarily using a VPC, consider setting up a [VPC Endpoint](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html) for S3 in order to allow your VPC-hosted resources to easily access it without the need for extra network configuration or hops.
- **Cross-region replication:** S3 has [a feature](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) for replicating a bucket between one region and a another. Note that S3 is already highly replicated within one region, so usually this isn’t necessary, except for compliance or latency reasons.
- **Cross-region replication:** S3 has [a feature](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) for replicating a bucket between one region and a another. Note that S3 is already highly replicated within one region, so usually this isn’t necessary for durability, but it could be useful for compliance (geographically distributed data storage), lower latency, or as a strategy to reduce region-to-region bandwidth costs by mirroring heavily used data in a second region.

### S3 Gotchas and Limitations

- 🔸For many years, there was a notorious [**100-bucket limit**](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_s3) per account, which could not be raised and caused many companies significant pain. As of 2015, you can [request increases](https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/). You can ask for a raise in the number of buckets but it will still be capped (generally below ~1000 per account).
- 🔸For many years, there was a notorious [**100-bucket limit**](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_s3) per account, which could not be raised and caused many companies significant pain. As of 2015, you can [request increases](https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/). You can ask to increase the limit, but it will still be capped (generally below ~1000 per account).
- 🔸Be careful not to make implicit assumptions about transactionality or sequencing of updates to objects. Never assume that if you modify a sequence of objects, the clients will see the same modifications in the same sequence, or if you upload a whole bunch of files, that they will all appear at once to all clients.
- 🔸S3 has an [**SLA**](https://aws.amazon.com/s3/sla/) with 99.9% uptime. If you use S3 heavily, you’ll inevitably see occasional error accessing or storing data as disks or other infrastructure fail. Availability is usually restored in seconds or minutes. Although availability is not extremely high, as mentioned above, durability is excellent.
- 🔸After uploading, any change that you make to the object causes a full rewrite of the object, so avoid appending-like behavior with regular files.
@@ -654,14 +657,11 @@ EC2
- **Operating system:** To use EC2, you’ll need to pick a base operating system. It can be Windows or Linux, such as Ubuntu or [Amazon Linux](https://aws.amazon.com/amazon-linux-ami/). You do this with AMIs, which are covered in more detail in their own section below.
- **Limits:** You can’t create arbitrary numbers of instances. Default limits on numbers of EC2 instances per account vary by instance type, as described in [this list](http://aws.amazon.com/ec2/faqs/#How_many_instances_can_I_run_in_Amazon_EC2).
- ❗**Use termination protection:** For any instances that are important and long-lived (in particular, aren't part of auto-scaling), [enable termination protection](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html#Using_ChangingDisableAPITermination). This is an important line of defense against user mistakes, such as accidentally terminating many instances instead of just one due to human error.

- **SSH key management:**

- When you start an instance, you need to have at least one [ssh key pair](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) set up, to bootstrap, i.e., allow you to ssh in the first time.
- Aside from bootstrapping, you should manage keys yourself on the instances, assigning individual keys to individual users or services as appropriate.
- Avoid reusing the original boot keys except by administrators when creating new instances.
- How to avoid sharing keys; how to add individual ssh keys for individual users.

- **GPU support:** You can rent GPU-enabled instances on EC2. There are [two instance types](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html). Both sport an NVIDIA card (K520, 1536 CUDA cores and M2050, 448 CUDA cores).

### EC2 Cost Management
@@ -670,7 +670,7 @@ EC2

- With EC2, there is a trade-off between engineering effort (more analysis, more tools, more complex architectures) and spend rate on AWS. If your EC2 costs are small, many of the efforts here are not worth the engineering time required to make them work. But once you know your costs will be growing in excess of an engineer’s salary, serious investment is often worthwhile.
- 🔹**Spot instances:**
- EC2 [spot instances](https://aws.amazon.com/ec2/spot/) are a way to get EC2 resources at significant discount — often many times cheaper than standard on-demand prices — if you’re willing to accept the possibility that they be terminated little to no warning.
- EC2 [spot instances](https://aws.amazon.com/ec2/spot/) are a way to get EC2 resources at significant discount — often many times cheaper than standard on-demand prices — if you’re willing to accept the possibility that they be terminated with little to no warning.
- Use spot instances for potentially very significant discounts whenever you can use resources that may be restarted and don’t maintain long-term state.
- The huge savings that you can get with Spot come at the cost of a significant increase in complexity when provisioning and reasoning about the availability of compute capacity.
- Amazon maintains spot prices at a market-driven fluctuating level, based on their inventory of unused capacity. Prices are typically low but can [spike](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html#spot-bid-limit) very high. See the [price history](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances-history.html) to get a sense for this.
@@ -686,7 +686,7 @@ EC2
- **Reserved Instances** allow you to get significant discounts on EC2 compute hours in return for a commitment to pay for instance hours of a specific instance type in a specific AWS region and availability zone for a pre-established time frame (1 or 3 years). Further discounts can be realized through “partial” or “all upfront” payment options.
- Consider using Reserved Instances when you can predict your longer-term compute needs and need a stronger guarantee of compute availability and continuity than the (typically cheaper) spot market can provide. However be aware that if your architecture changes your computing needs may change as well so long term contracts can seem attractive but may turn out to be cumbersome.
- Instance reservations are not tied to specific EC2 instances - they are applied at the billing level to eligible compute hours as they are consumed across all of the instances in an account.
- **Hourly billing waste:** EC2 instances are [billed in instance-hours](https://aws.amazon.com/ec2/pricing/) — rounded up to the nearest full hour! For long-lived instances, this is not a big worry, but for large transient deployments, like EMR jobs or test deployments, this can be a significant expense. Never deploy many instances and terminate them after only a few minutes. In fact, if transient instances are part of your regular processing workflow, you should put in protections or alerts to check for this kind of waste.
- **Hourly billing waste:** EC2 instances are [billed in instance-hours](https://aws.amazon.com/ec2/faqs/#How_will_I_be_charged_and_billed_for_my_use_of_Amazon_EC2) — rounded up to the nearest full hour! For long-lived instances, this is not a big worry, but for large transient deployments, like EMR jobs or test deployments, this can be a significant expense. Never deploy many instances and terminate them after only a few minutes. In fact, if transient instances are part of your regular processing workflow, you should put in protections or alerts to check for this kind of waste.
- If you have multiple AWS accounts and have configured them to roll charges up to one account using the “Consolidated Billing” feature, you can expect *unused* Reserved Instance hours from one account to be applied to matching (region, availability zone, instance type) compute hours from another account.
- If you have multiple AWS accounts that are linked with Consolidated Billing, plan on using reservations, and want unused reservation capacity to be able to apply to compute hours from other accounts, you’ll need to create your instances in the availability zone with the same *name* across accounts. Keep in mind that when you have done this, your instances may not end up in the same *physical* data center across accounts - Amazon shuffles availability zones names across accounts in order to equalize resource utilization.
- Make use of dynamic [Auto Scaling](#auto-scaling), where possible, in order to better match your cluster size (and cost) to the current resource requirements of your service.
@@ -779,11 +779,11 @@ EFS
### EFS Basics

- 🐥**EFS** is Amazon’s new (general release 2016) network filesystem.
- It is similar to [EBS](#ebs) in that it is a network-attached drive, but it [differs in important ways](https://aws.amazon.com/efs/details/):
- It can be attached to many instances (up to thousands), when EBS can only be attached to one drive. It does this via [NFSv4](https://en.wikipedia.org/wiki/Network_File_System).
- It can offers higher thoughput (multiple gigabytes per second), higher latency, and likely better durability and availability than EBS (see [the comparison table](#storage-durability-availability-and-price)).
- It cannot be used as a boot volume or in certain other ways EBS can.
- It costs much more than EBS (up to three times as much).
- It is similar to [EBS](#ebs) in that it is a network-attached drive, but it [differs in important ways](https://aws.amazon.com/efs/details/#When_to_Use_Amazon_EFS_vs._Amazon_EBS):
- EFS can be attached to many instances (up to thousands), while EBS can only be attached to one drive. It does this via [NFSv4](https://en.wikipedia.org/wiki/Network_File_System).
- EFS can offer higher throughput (multiple gigabytes per second) and better durability and availability than EBS (see [the comparison table](#storage-durability-availability-and-price)), but with higher latency.
- EFS cannot be used as a boot volume or in certain other ways EBS can.
- EFS costs much more than EBS (up to [three times as much](#storage-durability-availability-and-price)).

🚧 [*Please help expand this incomplete section.*](CONTRIBUTING.md)

@@ -1184,7 +1184,7 @@ EMR
### EMR Tips

- EMR relies on many versions of Hadoop and other supporting software. Be sure to check [which versions are in use](https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-components.html).
- 💸❗**EMR costs** can pile up quickly since it involves lots of instances, and efficiency can be poor depending on cluster configuration and choice of workload, and accidents like hung jobs are costly. See the [section on EC2 cost management](#ec2-cost-management), especially the tips there about spot instances and avoiding hourly billing. [This blog post](http://engineering.bloomreach.com/strategies-for-reducing-your-amazon-emr-costs/) has additional tips.
- 💸❗**EMR costs** can pile up quickly since it involves lots of instances, efficiency can be poor depending on cluster configuration and choice of workload, and accidents like hung jobs are costly. See the [section on EC2 cost management](#ec2-cost-management), especially the tips there about spot instances and avoiding hourly billing. [This blog post](http://engineering.bloomreach.com/strategies-for-reducing-your-amazon-emr-costs/) has additional tips.
- ⏱Off-the-shelf EMR and Hadoop can have significant overhead when compared with efficient processing on a single machine. If your data is small and performance matters, you may wish to consider alternatives, as [this post](http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html) illustrates.
- Python programmers may want to take a look at Yelp’s [mrjob](https://github.com/Yelp/mrjob).
- It takes time to tune performance of EMR jobs, which is why third-party services such as [Qubole’s data service](https://www.qubole.com/mapreduce-as-a-service/) are gaining popularity as ways to improve performance or reduce costs.

+ 1
- 1
admin/authors-info.yml View File

@@ -24,7 +24,7 @@ roles:
Praveen Patnala:
kazuyukitanimura:
olawiberg:
Max Zanko:
max-zanko:
weirded:
bittlingmayer:
rjpower:

Loading…
Cancel
Save