The puppet master must have a public name and a fixed IP address. I need to be able to reach it via SSH, and the puppet agents need to be able to find it by name (oversimplification, go with me on this).
With Route53 and EC2 configured, I can request a static (elastic) IP and associate it with a hostname in my domain. I can also associate it with a new instance after the instance is launched. I can specify the network filtering rules so I can access the host over the network.
I actually have a task that does all this in one go, but I'm going to walk through the steps once so it's not magic.
NOTE: if you haven't pulled down the origin-setup tools from github and you want to follow along, you should go back to the first post in this series and do so.
This is not the only way to accomplish the goals set here. You can use the AWS web console, CloudFormation or even tools like the control plugins for Vagrant.
Instances and Images in EC2
First, a little terminology. EC2 has a number of terms to disambiguate... things.
An image is a static piece of storage which contains an OS. It is the "gold copy" that we used to make when we still cloned hard disks to copy systems. An image cannot run an OS. It's storage. An image does have some metadata though. It has an associated machine architecture. It has instructions for how it is to be mounted when it is used to create an instance.
(actually, this is a lie, an image is the metadata, the storage is really in a snapshot with a volume but that's too much and not really important right now.)
An instance is a runnable copy of an image. It has a copy of the disk, but it also has the ability to start and stop. It is assigned an IP address when it starts. A copy of your RSA security key is installed when it starts so that you can log in.
When you want a new machine, you create an instance from an image. You select an image which uses the architecture and contains the OS that you want. You give the instance a name, and comment, and its security groups. There are other things you can specify as well, but they don't come into play here.
Finding the Right Image
People like their three letter abbreviations. On the web interface you'll see the term "AMI", which, I think stands for "Amazon Machine Image". Otherwise known as "an image" in this context. While the image ID's all begin with ami- I'm going to continue to refer to them as "images".
For OpenShift I want to start with either a Fedora or a RHEL (or CentOS) image. I can't think of a reason anymore not to use a 64 bit OS and VM, so I'll specify that. You can easily find official RHEL images on the AWS web console or using the AWS Marketplace. You can find CentOS in the Marketplace. There are "official" Fedora images there too, though they're not publicized.
What I do is use the web interface to find a recommended image and then make a note of the owner ID of the image. From then on I can use the owner ID to find images using the CLI tools. It doesn't look like you can look up an owner's information from their owner ID.
New instances (running machines) are created from images. Conversely new images can be created from an instance. People can create and register and publish their images, so there can be lots of things that look like they're "official" which may have been altered. It takes a little sleuthing to find the images that come from the source you want.
Using the AWS console, I narrowed the Fedora x86_64 images down to this:
I made a note of the owner ID, and the pattern for the names and I can search for them on the CLI like this:
thor ec2:image list --owner 125523088429 --name 'Fedora*' --arch x86_64 ami-2509664c 125523088429 x86_64 Fedora-x86_64-17-1-sda ami-6f3b5006 125523088429 x86_64 Fedora-x86_64-19-Beta-20130523-sda ami-b71078de 125523088429 x86_64 Fedora-x86_64-18-20130521-sda
Launching the First Instance
I think I have enough information now to fire up the first instance for my OpenShift service. The first one will be the puppet master, as that will control the configuration of the rest.
- hostname - puppet.infra.example.com
- base image - ami-b71078de
- securitygroup(s) - default, allow SSH
- SSH key pair name
Later, I will also need a static (ElasticIP) address and a DNS A record. Both of those can be set after the instance is running.
There is one last thing to decide. When you create an EC2 instance, you must specify the instance type which is a kind of sizing for the machine resources. AWS has a table of EC2 instance types that you can use to help you size your instances to your needs. Since I'm only building a demo, I'm going to use the t1.micro type. This has 7GB instance storage a single virtual core and enough memory for this purpose. The CPU usage is also free (Storage and unused elastic IPs still cost).
So, here we go, with the CLI tools:
There is one last thing to decide. When you create an EC2 instance, you must specify the instance type which is a kind of sizing for the machine resources. AWS has a table of EC2 instance types that you can use to help you size your instances to your needs. Since I'm only building a demo, I'm going to use the t1.micro type. This has 7GB instance storage a single virtual core and enough memory for this purpose. The CPU usage is also free (Storage and unused elastic IPs still cost).
- size: t1.micro
So, here we go, with the CLI tools:
thor ec2:instance create --name puppet --type t1.micro --image ami-b71078de --key <mykeyname> --securitygroup default task: ec2:instance:create --image ami-b71078de --name puppet id = i-d8c912bb
That's actually pretty.... anti-climactic. I've got a convention that each task echos the required arguments back as it is invoked. That way as the tasks are composed into bigger tasks, you can see what's going on inside while it runs.
All this one seemed to do was to return an image id. To see what's going on, I can request the status of the instance:
Since I'm impatient, I do that a few more times and after about 30 seconds it changes to this:
All this one seemed to do was to return an image id. To see what's going on, I can request the status of the instance:
thor ec2:instance status --id i-d8c912bb pending
Since I'm impatient, I do that a few more times and after about 30 seconds it changes to this:
thor ec2:instance status --id i-d8c912bb running
I want to log in, but so far all I know is the instance ID. I can ask for the hostname.
And with that I should be able to log in via SSH using my private key:
thor ec2:instance hostname --id i-d8c912bb ec2-23-22-234-113.compute-1.amazonaws.com
And with that I should be able to log in via SSH using my private key:
ssh -i ~/.ssh/<mykeyfile;>.pem ec2-user@ec2-23-22-234-113.compute-1.amazonaws.com The authenticity of host 'ec2-23-22-234-113.compute-1.amazonaws.com (23.22.234.113)' can't be established. RSA key fingerprint is 64:ec:6d:7d:af:ae:9a:70:78:0d:02:28:f1:c3:45:50. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'ec2-23-22-234-113.compute-1.amazonaws.com,23.22.234.113' (RSA) to the list of known hosts.
It looks like I generated and saved my key pair right, and specified it correctly when creating the instance.
The Fedora instances don't use the root user as the primary remote login. Instead, there's an ec2-user account which has ssh ALL:ALL permissions. That is, the ec2-user account can use sudo without providing a password. This really just gives you a little separation and forces you to think before you take some action as root.
Getting a Static IP Address
Now I have a host running and I can get into it, but the hostname is some long abstract string in the EC2 amazonaws.com domain. I want MY name on it. I also want to be able to reboot the host and have it get the same IP address and name. Well, it's not quite that simple.
Amazon EC2 has a curious and wonderful feature. Each running instance actually has two IP addresses associated with it. One is the internal IP address (the one configured in eth0). But that's in an RFC 1918 private network space. You can't route it. You can't reach it. You could even have a duplicate inside your corporate or home network.
The second address is an external IP address and this is the one you can see and can route to. Amazon works some router table magic at the network border to establish the connection between the internal and external addresses. What this means is that EC2 can change your external IP address without doing a thing to the host behind it. This is where Elastic IP addresses come in.
As with all of these things, you can do it from the web interface, but since I'm trying to automate things, I've made a set of tasks to manipulate the elastic IPs. I'm lazy and there's no other kind of IP in EC2 that I can change, so the tasks are in the ec2:ip namespace.
Creating a new IP is pretty much what you'd expect. You're not allowed to specify anything about it so it's as simple as can be:
thor ec2:ip create task: ec2:ip:create 184.72.228.220
Once again, not very exciting. Since each IP must be unique, the address itself serves as an ID. An address isn't very useful until it's associated with a running instance. The ipaddress task can retrieve the IP address of an instance. It can also set the external IP address (the address must be an allocated Elastic IP)
thor ec2:instance ipaddress 184.72.228.220 --id i-d8c912bb task: ec2:instance:ipaddress 184.72.228.220
You can get the status and more information about an instance. You can also request the status using the instance name rather than the ID. For objects which have an ID and a name, you can query using either one, but you must specify it with an argument. For objects like the IP address which do not have a name, the id is the first argument f any query.
thor ec2:instance info --name puppet --verbose EC2 Instance: i-d8c912bb (puppet) DNS Name: ec2-184-72-228-220.compute-1.amazonaws.com IP Address: 184.72.228.220 Status: running Image: ami-b71078de Platform: Private IP: 10.212.234.234 Private Hostname: ip-10-212-234-234.ec2.internal
And now for something completely different: Route53 and DNS
I now have a a running host with the operating system and architecture I want. It has a fixed address. But it has a really funny domain name.
When I created my Route53 zones, I split them in two. infra.example.org will contain my service hosts. app.example.com will contain the application CNAME records. The broker will only have permission to change the application zone. It won't be able to damage the infrastructure either through a compromise or a bug.
I'm going to call the puppet master puppet.infra.example.org. It will have the IP address I was granted above.
All of the previous tasks were in the
ec2:
namespace. Route53 is actually a different service within AWS, so it gets its own namespace.
An IP address record has four components:
- type
- name
- value
- ttl (time to live, in seconds)
All of the infrastructure records will be A (address) records. The TTL has a regular default and there's no reason generally to override it. The value of an A record is IP address.
The name in an A record is a Fully Qualified Domain Name (FQDN). It has both the domain suffix and and the hostname and any sub-domain parts. To save some trouble parsing, the
route53:record:create
task expects the zone first, and the host part next as a separate argument. The last two arguments are the type and value.thor route53:record create infra.example.org puppet a 184.72.228.220 task: route53:record:create infra.example.org puppet a 184.72.228.220
Also pretty anti-climactic. This time though there will be an external effect.
First, I can list the contents of the infra.example.org zone from Route53. Then I can also query the A record from DNS, though this may take some time to be available.
And the same when viewed with
The SOA records for AWS Route53 have a TTL of 900 seconds (15 minutes). When you add or remove a record from a zone, you also cause an update to the SOA record serial number. Between you and Amazon there are almost certainly one or more caching nameservers and they will only refresh their cache when the SOA TTL expires. So you could experience a delay of up to 15 minutes from the time that you create a new record in a zone and when it resolves. I'm hoping this doesn't hold true for individual records, because it's going to cause problems for OpenShift.
You can check the TTL of the SOA record by requesting the record directly using dig:
First, I can list the contents of the infra.example.org zone from Route53. Then I can also query the A record from DNS, though this may take some time to be available.
thor route53:record:get infra.example.org puppet A task: route53:record:get infra.example.org puppet A puppet.infra.example.org. A 184.72.228.220
And the same when viewed with
host
:
host puppet.infra.example.org puppet.infra.example.org has address 184.72.228.220
The SOA records for AWS Route53 have a TTL of 900 seconds (15 minutes). When you add or remove a record from a zone, you also cause an update to the SOA record serial number. Between you and Amazon there are almost certainly one or more caching nameservers and they will only refresh their cache when the SOA TTL expires. So you could experience a delay of up to 15 minutes from the time that you create a new record in a zone and when it resolves. I'm hoping this doesn't hold true for individual records, because it's going to cause problems for OpenShift.
You can check the TTL of the SOA record by requesting the record directly using dig:
dig infra.example.org soa ; <<>> DiG 9.9.2-rl.028.23-P2-RedHat-9.9.2-10.P2.fc18 <<>> infra.example.org soa ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60006 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;infra.example.org. IN SOA ;; ANSWER SECTION: infra.example.org. 900 IN SOA ns-1450.awsdns-53.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400 ;; Query time: 222 msec ;; SERVER: 172.30.42.65#53(172.30.42.65) ;; WHEN: Wed May 29 18:46:46 2013 ;; MSG SIZE rcvd: 130
The '900' on the first line of the answer section is the record TTL.
Wrapping it all up.
The beauty of Thor is that you can take each of the tasks defined above and compose them into more complex tasks. You can invoke each task individually from the command line or you can invoke the composed task and observe the process.
Because this task uses several others from both EC2 and Route53, I put it under a different namespace. All of the specific composed tasks will go in the
origin:
namespace.
The composed task is called
origin:baseinstance.
At the top I know the fully qualified domain name of the host, the image and securitygroups that I want to use to create the instance. Since I already have the puppet master this one will be the broker.- hosthame: broker.infra.example.org
- image: ami-b71078de
- instance type: t1.micro
- securitygroups: default, broker
- key pair name: <mykeypair>
thor origin:baseinstance broker --hostname broker.infra.example.org --image ami-b71078de --type t1.micro --keypair <mykeypair> --securitygroup default broker task: origin:baseinstance broker task: ec2:ip:create 184.73.182.10 task: route53:zone:contains broker.infra.example.org Z1PLM62Y00LCIN infra.example.org. task: route53:record:create infra.example.org. broker A 184.73.182.10 - image id: ami-b71078de task: ec2:instance:create ami-b71078de broker id = i-19b1f576 task: remote:available ec2-54-226-116-229.compute-1.amazonaws.com task: ec2:ip:associate 184.73.182.10 i-19b1f576
This process takes about two minutes. If you add
I will duplicate this process for the data and message servers, and for one node to begin.
My tier of AWS only allows 5 Elastic IP addresses, so I'm at my limit. For a real production setup, only the broker, nodes and possibly the puppet master require fixed IP addresses and public DNS. The datastore and message servers could use dynamic addresses, but then they will require some tweaking on restart. I'm sure Amazon will give you more IP addresses for money, but I haven't looked into it.
--verbose
you can see more of what is happening. There is a delay waiting for the A record creation to sync so that you don't accidentally create negative cache records which can slow propagation. Also you can see the remote:available
task which polls a host for SSH login access. This allows time for the instance to be created, start running and reach multi-user network state.ssh ec2-user@broker.infra.example.org The authenticity of host 'broker.infra.example.org (184.73.182.10)' can't be established. RSA key fingerprint is 8f:db:46:25:bf:19:2e:47:f5:f4:4a:23:a5:98:e3:5c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'broker.infra.example.org,184.73.182.10' (RSA) to the list of known hosts. Last login: Thu May 30 11:37:08 2013 from 66.187.233.206
I will duplicate this process for the data and message servers, and for one node to begin.
My tier of AWS only allows 5 Elastic IP addresses, so I'm at my limit. For a real production setup, only the broker, nodes and possibly the puppet master require fixed IP addresses and public DNS. The datastore and message servers could use dynamic addresses, but then they will require some tweaking on restart. I'm sure Amazon will give you more IP addresses for money, but I haven't looked into it.
Summary
There's a lot packed into this post:
- Select an image to use as a base
- Manage IP addresses
- Bind IP addresses to running instances
- Create a running instance.
All of this can be done with the AWS console. The ec2, route53 tasks just make it a little easier and the origin:baseinstance task wraps it all up so that creating new bare hosts is a single step.
In the next post I'll establish the puppet master service on the puppet server and install a puppet agent on each of the other infrastructure hosts. From then all of the service management will happen in puppet and we can let EC2 fade into the background.
This comment has been removed by the author.
ReplyDeleteHello Mark first of all congratulations for those fantastic ec2 and OpenShift Origin posts.
ReplyDeleteI was trying on a Windows 7 and use putty for SSH.
All the tasks run individually except the last that wraps all origin:baseinstance task shows an output like this:
- instance id: i-46bd4421
- instance i-46bd4421 starting
- instance i-46bd4421 running
- waiting for ec2-107-21-74-205.compute-1.amazonaws.com to accept SSH connections
C:/.../origin-setup/tasks/remote.thor:43:in
`ssh_username': undefined method `[]' for nil:NilClass (NoMethodError)
from C:/.../origin-setup/tasks/origin.thor:186:in
`baseinstance' from D:/.../Ruby193/Ruby193/lib/ruby/gems/1.9.1/gems/
The origin:baseinstance tasks runs all the previous task, except the ec2:instance:ipaddress task .So I run it.
thor ec2:instance:ipaddress 184.72.228.220 --id i-d8c912bb
Greetings!
Wow, I'm amazed any of this works on Windows at all. Nice to know. I'll look at that last piece. I thought I'd replaced all instances of call-outs to the SSH client with calls to rubygem-net-ssh.
DeleteI've been diverted to some work integrating Kerberos while some friends have been working on the next level of installation with Puppet which I need to get back to.
Thanks again.
I took a look at the first failed line: tasks/remote.thor:48
DeleteThis is part of the Remote.ssh_username() method:
https://github.com/markllama/origin-setup/blob/master/tasks/remote.thor#L43
I think you've been bit by an out-of-date bit of blogging.
To get the username the script tries to read ~/.awscred or ./.awscred and looks for a value for RemoteUser.
...
> [fedora18]
> SourceOwner=125523088429
> BaseOSImage=ami-1c0d6b75
> RemoteUser=ec2-user
>
> [fedora19]
> SourceOwner=125523088429
> BaseOSImage=ami-b22e5cdb
> RemoteUser=fedora
> ...
Do you have that file and does it have a section for your baseos which indicates the remote user?
If so I'll keep tracing back.
Thanks for your response.
DeleteRemoteUser is in my .awscred file
C:\..\origin-setup\.awscred
AWSAccessKeyId=***********************
AWSSecretKey=**************************
AWSKeyPairName=rhem
RemoteUser=ec2-user
AWSEC2Type=t1.micro
Well, as a workaround I commented the line 21 of origin.thor file and added the line 22:
21 #class_option :baseos, :type => :string, :default => "fedora19"
class_option :baseos
22 class_option :baseos
And comment the next lines: 188,189, 192 of origin.thor file
188 #available = invoke("remote:available", [instance.dns_name], :username => username,
189 # :wait => true, :verbose => options[:verbose])
192 #raise Exception.new("host #{instance.dns_name} not available") if not available
With this workaround I can use the origin:baseinstance task
C:\...\origin-setup>thor origin:baseinstance data1
--hostname data1.infra.example.com --image ami-b71078de --type t1.micro --
keypair --securitygroup default
The output:
--hostname data1.infra.example.com --image ami-b71078de --type t1.micro --
keypair --securitygroup default
task: origin:baseinstance data1
task: route53:zone:contains data1.infra.example.com
Z3FJXM0PCNH529 infra.example.com.
task: route53:record:get infra.example.com. data1
- no IP address
task: ec2:ip:create
54.204.11.24
task: route53:record:create infra.example.com. data1 A 54.204.11.24
- image id: ami-b71078de
task: ec2:instance:create ami-b71078de data1
- instance id: i-45c90e3c
- instance i-45c90e3c starting
- instance i-45c90e3c running
- waiting for ec2-54-224-111-205.compute-1.amazonaws.com to accept SSH connectio
ns
- host ec2-54-224-111-205.compute-1.amazonaws.com is available
task: ec2:ip:associate 54.204.11.24 i-45c90e3c
trying to resolve 'data1.infra.example.com.'
- data1.infra.example.com. resolves to 54.204.11.24
I'll run the same task to create the hosts msg1 and node1.
I would like to ask a question. Can I put the hosts data1, msg1 and the broker in the same ec2 micro instance?
Thanks!
I'll use a Ubuntu client and install ruby 1.9 there. Because I can´t continue on Windows.
DeleteThanks
Hello Mark, It's me again
ReplyDeleteI am working well on Ubuntu, but still I have to comment the line 21 of origin.thor file and added the line 22:
21 #class_option :baseos, :type => :string, :default => "fedora19"
class_option :baseos
Greetings!