Amazon EMR, Apache Spark, and Apache Zeppelin

I wrote a couple of guest posts about getting started with Apache Spark and Apache Zeppelin on Amazon EMR. These articles assumed that an IAM user has been created with AmazonElasticMapReduceFullAccess managed policy attached to it. In this blog post, I will show you how to set up this IAM user for EMR.

To begin using the AWS CLI, you need to create an administrative IAM user and configure it to use its access keys. After that, we can then create the IAM user for EMR. We will call this user emr.

sh $ aws iam create-user --user-name emr { "User": { "UserName": "emr", "Path": "/", "CreateDate": "2016-03-11T14:32:08.782Z", "UserId": "AIDAID746LKJUJI5SGLOC", "Arn": "arn:aws:iam::548698731369:user/emr" } }

To attach a policy to user emr, we can call the aws iam attach-user-policy command with the policy ARN.

sh $ aws iam attach-user-policy --policy-arn arn:aws:iam::aws:policy/AmazonElasticMapReduceFullAccess --user-name emr

We also want to grant the user with the iam:GetUser action. This way, we can verify that we are executing the AWS CLI commands as user emr.

sh $ cat iam-get-user.json { "Version": "2012-10-17", "Statement": [ { "Action": [ "iam:GetUser" ], "Effect": "Allow", "Resource": "arn:aws:iam::548698731369:user/emr" } ] } $ aws iam create-policy --policy-name make-emr-get-user --policy-document file://iam-get-user.json { "Policy": { "PolicyName": "make-emr-get-user", "CreateDate": "2016-03-11T15:05:22.936Z", "AttachmentCount": 0, "IsAttachable": true, "PolicyId": "ANPAJ3I5BYXIMET5E5UVC", "DefaultVersionId": "v1", "Path": "/", "Arn": "arn:aws:iam::548698731369:policy/make-emr-get-user", "UpdateDate": "2016-03-11T15:05:22.936Z" } } $ aws iam attach-user-policy --user-name emr --policy-arn arn:aws:iam::548698731369:policy/make-emr-get-user

Once this is done, we can go ahead and create an access key for user emr.

sh $ aws iam create-access-key --user-name emr { "AccessKey": { "UserName": "emr", "Status": "Active", "CreateDate": "2016-03-11T14:37:03.144Z", "SecretAccessKey": "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890abcd", "AccessKeyId": "ABCDEFGHIJKLMNOPQRST" } }

We will export two environment variables. After that, run aws iam get-user and it should show you information about user emr.

sh $ export AWS_SECRET_ACCESS_KEY=ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890abcd $ export AWS_ACCESS_KEY_ID=ABCDEFGHIJKLMNOPQRST $ aws iam get-user { "User": { "UserName": "emr", "Path": "/", "CreateDate": "2016-03-11T14:32:08Z", "UserId": "AIDAID746LKJUJI5SGLOC", "Arn": "arn:aws:iam::548698731369:user/emr" } }

If you see this error, that means that you did not grant the user with the iam:GetUser action.

```sh $ aws iam get-user

A client error (AccessDenied) occurred when calling the GetUser operation: User: arn:aws:iam::548698731369:user/emr is not authorized to perform: iam:GetUser on resource: arn:aws:iam::548698731369:user/emr ```

After performing all these steps, you can now follow along with my articles to create your own EMR cluster. Have fun!