Installing Ceph using ceph-deploy

[Ceph](http://ceph.com/) is a distributed open source storage solution that supports [Object Storage](https://en.wikipedia.org/wiki/Object_storage), [Block Storage](https://en.wikipedia.org/wiki/Block_(data_storage)) and [File Storage](https://en.wikipedia.org/wiki/File_systems).

Other open source distributed storage systems are [GlusterFS](https://www.gluster.org/) and [HDFS](https://en.wikipedia.org/wiki/Apache_Hadoop#File_systems).

In this guide, we describe how to setup a basic Ceph Cluster for Block Storage. We have 25 nodes on our setup. The masternode is a [MASS](https://maas.io/) Region and Rack controller. The rest of the nodes are Ubuntu 16.04 deployed through MAAS. The recommended filesystem for Ceph is [XFS](https://en.wikipedia.org/wiki/XFS) and this is what is used on the nodes.

This guide is based on the [Quick Installation](http://docs.ceph.com/docs/master/start/) guide from the [Ceph Documentation](http://docs.ceph.com/docs/master/). This guide uses the *ceph-deploy* tool which is a relatively quick way to setup Ceph, especially for newbies. There is also the [Manual Installation](http://docs.ceph.com/docs/master/install/), deployment [through Ansible](https://github.com/ceph/ceph-ansible) and [juju](https://jujucharms.com/ceph/).

## Prerequisites

### Topology

* 1 deploy node (masternode). MAAS region and rack controler is installed plus Ansible
* 3 monitor nodes (node01,node11,node24). Ubuntu 16.04 on XFS deployed through MAAS
* 20 OSD nodes (node02-10,12-23).

### Create an Ubuntu user on masternode

It would be of convenience to create an *ubuntu* user on the masternode. with passwordless sudo access:

“`
$ sudo useradd -m -s /bin/bash ubuntu
“`

Run `visudo` and give passwordless sudo access to the *ubuntu* user:

“`
ubuntu ALL=NOPASSWD:ALL
“`

Generate an SSH key pair for the *ubuntu* user:

“`
$ ssh-keygen -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/ubuntu/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/ubuntu/.ssh/id_rsa.
Your public key has been saved in /home/ubuntu/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:t1zWURVk7j6wJPkA3VmbcHtAKh3EB0kyanORVbiiBkU ubuntu@masternode
The key’s randomart image is:
+—[RSA 4096]—-+
| .E +**B=*=|
| ..o==oOo+|
| .+.o.o=.=.|
| .. oo.o….|
| .S..=oo.. |
| oo += + |
| . o o o |
| .|
| |
+—-[SHA256]—–+
“`

Deploy the */home/ubuntu/.ssh/id_rsa.pub* pubkey on all the nodes (append in */home/ubuntu/.ssh/authorized_keys*). You could add this pubkey on the MAAS user before deploying Ubuntu 16.04 on the nodes.

### Set /etc/hosts

“`
$ for ID in {01..24}; do echo “$(dig +short node${ID}.maas @127.0.0.1) node${ID}.maas node${ID}”; done > nodes.txt
“`
It should look like this:

“`
192.168.10.28 node01.maas node01
192.168.10.29 node02.maas node02
192.168.10.30 node03.maas node03
192.168.10.31 node04.maas node04
192.168.10.32 node05.maas node05
192.168.10.33 node06.maas node06
192.168.10.34 node07.maas node07
192.168.10.35 node08.maas node08
192.168.10.36 node09.maas node09
192.168.10.37 node10.maas node10
192.168.10.38 node11.maas node11
192.168.10.39 node12.maas node12
192.168.10.40 node13.maas node13
192.168.10.41 node14.maas node14
192.168.10.42 node16.maas node16
192.168.10.43 node17.maas node17
192.168.10.44 node18.maas node18
192.168.10.45 node19.maas node19
192.168.10.46 node20.maas node20
192.168.10.47 node21.maas node21
192.168.10.48 node22.maas node22
192.168.10.49 node23.maas node23
192.168.10.50 node24.maas node24
“`

Now you can append the result in */etc/hosts*:

“`
$ cat nodes.txt | sudo tee -a /etc/hosts
“`

### Ansible setup

Use this setup in */etc/ansible/hosts* on masternode:

“`
[masternode]
masternode

[nodes]
node01
node02
node03
node04
node05
node06
node07
node08
node09
node10
node11
node12
node13
node14
node15
node16
node17
node18
node19
node20
node21
node22
node23
node24

[ceph-mon]
node01
node11
node24

[ceph-osd]
node02
node03
node04
node05
node06
node07
node08
node09
node10
node12
node13
node14
node15
node16
node17
node18
node19
node20
node21
node22
node23
“`

### Install python on all the nodes

“`
$ for ID in {01..24}
> do
> ssh node${ID} “sudo apt -y install python-minimal”
> done
“`

### Ensure time synchronization of the nodes

Install the *theodotos/debian-ntp* role from [Ansible Galaxy](https://galaxy.ansible.com):

“`
$ sudo ansible-galaxy install theodotos.debian-ntp
“`

Create a basic playbook *ntp-init.yml*:

“`

– hosts: nodes
remote_user: ubuntu
become: yes
roles:
– { role: theodotos.debian-ntp, ntp.server: masternode }
“`

Apply the playbook:

“`
$ ansible-playbook ntp-init.yml
“`

Verify that the monitor nodes are time synchronized:

“`
$ ansible ceph-mon -a ‘timedatectl’
node11 | SUCCESS | rc=0 >>
Local time: Fri 2017-04-28 08:06:30 UTC
Universal time: Fri 2017-04-28 08:06:30 UTC
RTC time: Fri 2017-04-28 08:06:30
Time zone: Etc/UTC (UTC, +0000)
Network time on: yes
NTP synchronized: yes
RTC in local TZ: no

node24 | SUCCESS | rc=0 >>
Local time: Fri 2017-04-28 08:06:30 UTC
Universal time: Fri 2017-04-28 08:06:30 UTC
RTC time: Fri 2017-04-28 08:06:30
Time zone: Etc/UTC (UTC, +0000)
Network time on: yes
NTP synchronized: yes
RTC in local TZ: no

node01 | SUCCESS | rc=0 >>
Local time: Fri 2017-04-28 08:06:30 UTC
Universal time: Fri 2017-04-28 08:06:30 UTC
RTC time: Fri 2017-04-28 08:06:30
Time zone: Etc/UTC (UTC, +0000)
Network time on: yes
NTP synchronized: yes
RTC in local TZ: no
“`

Check also the OSD nodes:

“`
$ ansible ceph-osd -a ‘timedatectl’
“`

## Install *Ceph*

Install *ceph-deploy*

On masternode:

“`
$ sudo apt install ceph-deploy
“`

Create a new cluster and set the monitor nodes (must be odd numbered):

“`
$ ceph-deploy new node01 node11 node24
“`

Install ceph on master node and all other nodes:

“`
$ ceph-deploy install masternode node{01..24}
“`

Deploy the monitors and gather the keys:

“`
$ ceph-deploy mon create-initial
“`

## Prepare the OSD nodes

### Create the OSD directories

Create the OSD directories on the OSD nodes:

“`
$ I=0;
$ for ID in {02..10} {12..14} {16..23}
> do
> ssh -l ubuntu node${ID} “sudo mkdir /var/local/osd${I}”
> I=$((${I}+1))
> done;
“`

Verify that the OSD directories are created:

“`
$ ansible ceph-osd -a “ls /var/local” | cut -d\| -f1 | xargs -n2 | sort
node02 osd0
node03 osd1
node04 osd2
node05 osd3
node06 osd4
node07 osd5
node08 osd6
node09 osd7
node10 osd8
node12 osd9
node13 osd10
node14 osd11
node16 osd12
node17 osd13
node18 osd14
node19 osd15
node20 osd16
node21 osd17
node22 osd18
node23 osd19
“`

Nodes 01, 11 and 24 are excluded because those are the monitor nodes.

### Fix OSD permissions

Because of some [bug](https://github.com/carmstrong/multinode-ceph-vagrant/issues/5) we need to change the OSD directories owneship to ceph:ceph. Otherwise you will get this:

“`
** ERROR: error creating empty object store in /var/local/osd0: (13) Permission denied
“`

Change the ownership of the OSD directories on the OSD nodes:

“`
$ I=0;
$ for ID in {02..10} {12..14} {16..23}
> do
> ssh -l ubuntu node${ID} “sudo chown ceph:ceph /var/local/osd${I}”
> I=$((${I}+1))
> done;
“`

### Prepare the OSDs

“`
$ I=0
$ for ID in {02..10} {12..14} {16..23}
> do
> ceph-deploy –username ubuntu osd prepare node${ID}:/var/local/osd${I}
> I=$((${I}+1))
> done
“`

### Activate the OSDs

For nodes 02 – 10:

“`
$ I=0
> for ID in {02..10} {12..14} {16..23}
> do
> ceph-deploy –username ubuntu osd activate node${ID}:/var/local/osd${I}
> I=$((${I}+1))
> done
“`

### Deploy the configuration file and admin key

Now we need to deploy the configuration file and admin key to the admin node and our Ceph nodes. This will save us from having to specify the monitor address and keyring every time we execute a Ceph cli command.

“`
$ ceph-deploy admin masternode node{01..24}
“`

Set the keyring to be world readable:

“`
$ sudo chmod +r /etc/ceph/ceph.client.admin.keyring
“`

## Test and verify

“`
$ ceph health
HEALTH_WARN too few PGs per OSD (9 < min 30) HEALTH_ERR clock skew detected on mon.node11, mon.node24; 64 pgs are stuck inactive for more than 300 seconds; 64 pgs stuck inactive; 64 pgs stuck unclean; Monitor clock skew detected ``` Out newly build cluster is not healthy. We need to increase [Placement Groups](http://docs.ceph.com/docs/master/rados/operations/placement-groups/). The formula is the *number_of_minimum_expected_PGs* (30) times the *number_of_OSDs* (20) and rounded to the closest power of 2: ``` 30x20=500 => pg_num=512
“`

Increase PGs:

“`
$ ceph osd pool set rbd pg_num 512
“`

Now we run `ceph health` again:

“`
$ ceph health
HEALTH_WARN pool rbd pg_num 512 > pgp_num 64
“`

Still some tweaking needs to be done. We need to adjust *pgp_num* to 512:

“`
$ ceph osd pool set rbd pgp_num 512
“`

And we are there at last:

“`
$ ceph health
HEALTH_OK
“`

## Create a Ceph Block Device device

Check the available storage:

“`
$ ceph df
MapGLOBAL:
SIZE AVAIL RAW USED %RAW USED
11151G 10858G 293G 2.63
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 306 0 3619G 4
“`

Now we need to create a RADOS Block Device (RBD) to hold our data.

“`
$ rbd create clusterdata –size 4T –image-feature layering
“`

Check the new block device:

“`
$ rbd ls -l
NAME SIZE PARENT FMT PROT LOCK
clusterdata 4096G 2
“`

Map the block device:

“`
$ sudo rbd map clusterdata –name client.admin
/dev/rbd0
“`

Format the clusterdata device:

“`
$ sudo mkfs -t ext4 /dev/rbd0
“`

Mount the blobk device:

“`
$ mkdir /srv/clusterdata
$ mount /dev/rbd0 /srv/clusterdata
“`

Now we have a block device for data that is distributed among the 21 storage nodes.

Here’s is a summary of some useful monitoring and troubleshooting commands for ceph

“`
$ ceph health
$ ceph health detail
$ ceph status (ceph -s)
$ ceph osd stat
$ ceph osd tree
$ ceph mon dump
$ ceph mon stat
$ ceph -w
$ ceph quorum_status –format json-pretty
$ ceph mon_status –format json-pretty
$ ceph df
“`

If you run into trouble contact the awesome folks at the #ceph IRC channel, hosted on Open and Free Technology Community IRC network.

## Start over

In case you messed up the procedure and you need to start over you can use the following commands:

“`
$ ceph-deploy purge masternode node{01..24}
$ ceph-deploy purgedata masternode node{01..24}
$ ceph-deploy forgetkeys
$ for ID in {02..11} {11..23}; do ssh node${ID} “sudo rm -fr /var/local/osd*”; done
$ rm ceph.conf ceph-deploy-ceph.log .cephdeploy.conf
“`

**NOTE: this procedure will destroy your Ceph cluster along with all the data!**

## Conclusions

Using *ceph-deploy* maybe an easy way to get started with Ceph, but it does not provide much customization. For a more fine tuned setup you maybe better with the [Manual Installation](http://docs.ceph.com/docs/master/install/), even though there is a steeper learning curve.

References
———-
* http://docs.ceph.com/docs/master/start/
* https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/1554806
* http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/
* http://www.virtualtothecore.com/en/adventures-ceph-storage-part-1-introduction/

Leave a Reply

Your email address will not be published. Required fields are marked *