to enable dns resolution with the hosts file, install dnsmasq
When deploying Ceph Cuttlefish and beyond with ceph-deploy on Ubuntu, you may start and stop Ceph daemons on a Ceph Node using the event-basedUpstart. Upstart does not require you to define daemon instances in the Ceph configuration file.
To list the Ceph Upstart jobs and instances on a node, execute:
STARTING A DAEMON
To start a specific daemon instance on a Ceph Node, execute one of the following:
$sudo start ceph-osd id={id}
$sudo start ceph-mon id={hostname}
$sudo start ceph-mds id={hostname}
To start a specific daemon instance on a Ceph Node, execute one of the following:
$sudo start ceph-osd id={id}
$sudo start ceph-mon id={hostname}
$sudo start ceph-mds id={hostname}
Ceph deployment |
Ceph OSD stores all the client data in the form of objects and serves the same data to clients when they request for it. A Ceph cluster consists of multiple OSDs. For any read or write operation, the client requests for cluster maps from monitors, and after this, they can directly interact with OSDs for I/O operations, without the intervention of a monitor. This makes the data transaction process fast as clients who generate data can directly write to OSD that stores data without any additional layer of data handling. This type of data-storage-and-retrieval mechanism is relatively unique in Ceph as compared to other storage solutions.
Usually, one OSD daemon is tied to one physical disk of your cluster. So, in general, the total number of physical disks in your Ceph cluster is the number of OSD daemons working underneath to store user data to each physical disk.
Ceph OSD operates on top of a physical disk drive having a valid Linux partition. The Linux partition can be either Btrfs (B-tree file system), XFS, or ext4. The filesystem selection is one of the major criteria for performance benchmarking of your Ceph cluster. With respect to Ceph, these filesystems differ from each other in various ways:
- Btrfs: The OSD, with the Btrfs filesystem underneath, delivers the best performance as compared to XFS and ext4 filesystem-based OSDs. One of the major advantages of using Btrfs is its support to copy-on-write and writable snapshots, which are very advantageous when it comes to VM provisioning and cloning. It also supports transparent compression and pervasive checksums, and incorporates multidevice management in a filesystem. Btrfs also supports efficient XATTRs and inline data for small files, provides integrated volume management that is SSD aware, and has the demanding feature of online fsck. However, despite these new features, Btrfs is currently not production ready, but it's a good candidate for test deployment.
- XFS: It is a reliable, mature, and very stable filesystem, and hence, it is recommended for production usage in Ceph clusters. As Btrfs is not production ready, XFS is the most-used filesystem in Ceph storage and is recommended for OSDs. However, XFS stands at a lower side as compared to Btrfs. XFS has performance issues in metadata scaling. Also, XFS is a journaling filesystem, that is, each time a client sends data to write to a Ceph cluster, it is first written to a journaling space and then to an XFS filesystem. Thisincreases the overhead of writing the same data twice, and thus makes XFS perform slower as compared to Btrfs, which does not uses journals.
- Ext4: The fourth extended filesystem is also a journaling filesystem that is a production-ready filesystem for Ceph OSD; however, it's not as popular as XFS. From a performance point of view, the ext4 filesystem is not at par with Btrfs. The ext4 filesystem does not provide sufficient capacity for XATTRs due to limits on the number of bytes stored as XATTRs, thus making it less popular among filesystem choices. On the other hand, Btrfs and XFS have a relatively large limit for XATTRs
Ceph monitors (MONs) track the health of the entire cluster by keeping a map of the cluster state, which includes OSD, MON, PG, and CRUSH maps. All the cluster nodes report to monitor nodes and share information about every change in their state. A monitor maintains a separate map of information for each component. The monitor does not store actual data; this is the job of OSD.
1: install setuptool, pip
$sudo -E python ez_setup.py
$sudo -E python get-pip.py
2:ssh-keygen && cat .ssh/cat id_rsa.pub >> ceph-node1/node2/.ssh/authorized_keys
sudo copy .ssh/id_rsa /root/.ssh/
cat .ssh/config
Host kubeHostname ceph-node1
User ceph
Host monit
Hostname ceph-node2
User ceph
cp .ssh/config /root/.ssh/
3: sudo
sudo useradd -d /home/<username> -m <username>
sudo passwd <username>
echo "<username> ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/<username>
sudo chmod 0440 /etc/sudoers.d/<username>
Defaults env_keep += "ftp_proxy http_proxy https_proxy no_proxy"
pip install ceph-deploy
create a Ceph cluster
ceph-deploy new <hostname>
mkdir /etc/ceph
install ceph
ceph-deploy install <ceph-node1-hostanme> <ceph-node2-hostanme>
#The disk zap subcommand will destroy the existing
#partition table andcontent from the disk
$ceph-deploy disk zap node1:/dev/vdb node2:/dev/vdb
#The osd create subcommand will first prepare the disk, that is, erase the #disk with a filesystem, which is xfs by default. Then, it will activate the #disk's first partition as data partition and second partition as journal
$ceph-deploy osd create node1:/dev/vdb node2:/dev/vdb
Create the first monitor
$ceph-deploy mon create-initial#The disk zap subcommand will destroy the existing
#partition table andcontent from the disk
$ceph-deploy disk zap node1:/dev/vdb node2:/dev/vdb
#The osd create subcommand will first prepare the disk, that is, erase the #disk with a filesystem, which is xfs by default. Then, it will activate the #disk's first partition as data partition and second partition as journal
$ceph-deploy osd create node1:/dev/vdb node2:/dev/vdb
A Ceph storage cluster requires at least one monitor to run. For high availability, a Ceph storage cluster relies on an odd number of monitors that's more than one, for example, 3 or 5, to form a quorum
add the second monitor
$cat ceph.conf
[global]
public network = 172.16.0.0/16
$sudo ceph-deploy --overwrite-conf mon create <node2>
You might encounter warning messages related to clock skew on new monitor nodes. To resolve this, we need to set up Network Time Protocol (NTP) on new monitor nodes:
You might encounter warning messages related to clock skew on new monitor nodes. To resolve this, we need to set up Network Time Protocol (NTP) on new monitor nodes:
# chkconfig ntpd on # ssh ceph-node2 chkconfig ntpd on # ssh ceph-node3 chkconfig ntpd on # ntpdate pool.ntp.org # ssh ceph-node2 ntpdate pool.ntp.org # ssh ceph-node3 ntpdate pool.ntp.org # /etc/init.d/ntpd start # ssh ceph-node2 /etc/init.d/ntpd start # ssh ceph-node3 /etc/init.d/ntpd start
Storage provisioning
$rbd create ceph-client1-rbd1 --size 10240
$rbd ls -l
$modinfo rbd
To grant clients' permission to access the Ceph cluster, we need to add the keyring and Ceph configuration file to them. Client and Ceph cluster authentication will be based on the keyring
Install Ceph binaries on ceph-node1 and push ceph.conf and ceph.admin.keyring to it:
# ceph-deploy install ceph-client1
# ceph-deploy admin ceph-client1
Map the RBD image
ceph-client1-rbd1
to a ceph-client1 machine
$rbd map rbd/ceph-client1-rbd1 (rbd map rbd/ceph-client1-rbd1)
$rbd showmapped
# fdisk -l /dev/<rbd#>
# mkfs.xfs /dev/<rbd#> mxfs.ext4
# mkdir /mnt/ceph-vol1
# mount /dev/rbd0 /mnt/ceph-vol1
Ceph supports thin-provisioned block devices, that is, the physical storage space will not get occupied until you really begin storing data to the block device. Ceph RADOS block devices are very flexible; you can increase or decrease the size of RBD on the fly from the Ceph storage end. However, the underlying filesystem should support resizing. Advance filesystems such as XFS, Btrfs, EXT, and ZFS support filesystem resizing to a certain extent. Follow filesystem-specific documentation to know more on resizing.
$rbd resize rbd/ceph-client1-rbd1 --size 20480
Now that the Ceph RBD image has been resized, you should check if the new size is being accepted by the kernel as well by executing the following command:
# xfs_growfs -d /mnt/ceph-vol1 (resize2fs /dev/rbd1)
Resizing Ceph RBD
Ceph supports thin-provisioned block devices, that is, the physical storage space will not get occupied until you really begin storing data to the block device. Ceph RADOS block devices are very flexible; you can increase or decrease the size of RBD on the fly from the Ceph storage end. However, the underlying filesystem should support resizing. Advance filesystems such as XFS, Btrfs, EXT, and ZFS support filesystem resizing to a certain extent. Follow filesystem-specific documentation to know more on resizing.
$rbd resize rbd/ceph-client1-rbd1 --size 20480
Now that the Ceph RBD image has been resized, you should check if the new size is being accepted by the kernel as well by executing the following command:
# xfs_growfs -d /mnt/ceph-vol1 (resize2fs /dev/rbd1)
Ceph RBD snapshots
Ceph extends full support to snapshots, which are point-in-time, read-only copies of an RBD image. You can preserve the state of a Ceph RBD image by creating snapshots and restoring them to get the original data
Now our filesystem has two files. Let's create a snapshot of Ceph RBD using the rbd snap create <pool-name>/<image-name>@<snap-name>syntax, as follows:
# rbd snap create rbd/ceph-client1-rbd1@snap1
#rbd snap rollback rbd/ceph-client1-rbd1@snap1
# rbd snap protect rbd/ceph-client1-rbd2@snapshot_for_clone
$ sudo rbd --image ceph-client1-rbd2@snapshot_for_clone info
rbd image 'ceph-client1-rbd2':
size 10240 MB in 2560 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.105074b0dc51
format: 2
features: layering
flags:
protected: True
The syntax for this is rbd clone <pool-name>/<parent-image>@<snap-name> <pool-name>/<child-image-name>. The command to be used is as follows:
# rbd clone rbd/ceph-client1-rbd2@snapshot_for_clone rbd/ceph-client1-rbd3
To initiate the flattening process, use the following command:
# rbd flatten rbd/ceph-client1-rbd3
# rbd snap unprotect rbd/ceph-client1-rbd2@snapshot_for_clone
Once the snapshot is unprotected, you can remove it using the following command:
# rbd snap rm rbd/ceph-client1-rbd2@snapshot_for_clone
In a usual Ceph-based setup, the RADOS gateway is configured on a machine other than MON and OSD
Note Previous versions of Ceph shipped with mod_fastcgi. The current version ships with mod_proxy_fcgi instead.
#rbd snap rollback rbd/ceph-client1-rbd1@snap1
Once the snapshot rollback operation is completed, remount the Ceph RBD filesystem to refresh the state of the filesystem. You should able to get your deleted files back.
# umount /mnt/ceph-vol1
# mount /dev/rbd0 /mnt/ceph-vol1
Ceph RBD clones
The Ceph storage cluster is capable of creating Copy-on-write (COW) clones from RBD snapshots. This is also known as snapshot layering in Ceph. This layering feature of Ceph allows clients to create multiple instant clones of Ceph RBD. This feature is extremely useful for Cloud and virtualization platforms such as OpenStack, CloudStack, and Qemu/KVM. These platforms usually protect Ceph RBD images containing OS/VM images in the form of a snapshot. Later, this snapshot is cloned multiple times to spin new virtual machines/instances. Snapshots are read only, but COW clones are fully writable; this feature of Ceph provides a greater flexibility and is extremely useful for cloud platforms
The type of the RBD image defines the feature it supports. In Ceph, an RBD image is of two types: format-1 and format-2. The RBD snapshot feature is available on both format-1 and format-2 RBD images. However, the layering feature, that is, the COW cloning feature is available only for RBD images with format-2. Format-1 is the default RBD image format.
Create a format-2 RBD image:
# rbd create ceph-client1-rbd2 --size 10240 --image-format 2Create a snapshot of this RBD image:
# rbd snap create rbd/ceph-client1-rbd2@snapshot_for_cloneTo create a COW clone, protect the snapshot.
This is an important step; we should protect the snapshot because if the snapshot gets deleted, all the attached COW clones will be destroyed:# rbd snap protect rbd/ceph-client1-rbd2@snapshot_for_clone
$ sudo rbd --image ceph-client1-rbd2@snapshot_for_clone info
rbd image 'ceph-client1-rbd2':
size 10240 MB in 2560 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.105074b0dc51
format: 2
features: layering
flags:
protected: True
Cloning the snapshot
Cloning the snapshot requires the parent pool, RBD image, and snapshotThe syntax for this is rbd clone <pool-name>/<parent-image>@<snap-name> <pool-name>/<child-image-name>. The command to be used is as follows:
# rbd clone rbd/ceph-client1-rbd2@snapshot_for_clone rbd/ceph-client1-rbd3
flatten the image
At this point, you have a cloned RBD image, which is dependent upon its parent image snapshot. To make the cloned RBD image independent of its parent, we need to flatten the image, which involves copying the data from a parent snapshot to a child image. The time it takes to complete the flattening process depends upon the size of data present in the parent snapshot. Once the flattening process is completed, there is no dependency between the cloned RBD image and its parent snapshot. Let's perform this flattening process practically:To initiate the flattening process, use the following command:
# rbd flatten rbd/ceph-client1-rbd3
remove the parent image snapshot
remove the parent image snapshot if you no longer require it. Before removing the snapshot, you first have to unprotect it using the following command:# rbd snap unprotect rbd/ceph-client1-rbd2@snapshot_for_clone
Once the snapshot is unprotected, you can remove it using the following command:
# rbd snap rm rbd/ceph-client1-rbd2@snapshot_for_clone
Object storage
In a production environment, if you have a huge workload for Ceph object storage, you should configure the RADOS gateway on a physical dedicated machine, else you can consider using any of the monitor nodes as the RADOS gateway. We will now perform a basic RADOS gateway configuration to use Ceph storage cluster as object storage.In a usual Ceph-based setup, the RADOS gateway is configured on a machine other than MON and OSD
Installing the RADOS gateway
To run a Ceph Object Storage service, you must install Apache and Ceph Object Gateway daemon on the host that is going to provide the gateway service, i.e, the gateway host. If you plan to run a Ceph Object Storage service with a federated architecture (multiple regions and zones), you must also install the synchronization agent.Note Previous versions of Ceph shipped with mod_fastcgi. The current version ships with mod_proxy_fcgi instead.
Load mod_proxy_fcgi module.
$sudo apt-get install apache2
/etc/apache2/apache2.conf : ServerName {fqdn}
$sudo a2enmod proxy_fcgi
ENABLE SSL
Some REST clients use HTTPS by default. So you should consider enabling SSL for Apache. Use the following procedures to enable SSL.
$sudo apt-get install openssl ssl-cert
$sudo a2enmod ssl
$sudo mkdir /etc/apache2/ssl
The most important item that is requested is the line that reads "Common Name (e.g. server FQDN or YOUR name)". You should enter the domain name you want to associate with the certificate, or the server's public IP address if you do not have a domain name
$sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/apache2/ssl/apache.key -out /etc/apache2/ssl/apache.crt
/etc/apache2/sites-available/default-ssl.conf:
SSLCertificateFile /etc/apache2/ssl/apache.crt
SSLCertificateKeyFile /etc/apache2/ssl/apache.key
$sudo a2ensite default-ssl.conf
$sudo service apache2 restart
INSTALL CEPH OBJECT GATEWAY DAEMON
$sudo apt-get install radosgw
If you plan to run a Ceph Object Storage service with a federated architecture (multiple regions and zones), you must also install the synchronization agent.
$sudo apt-get install radosgw-agent
Once you have installed the Ceph Object Gateway packages, the next step is to configure your Ceph Object Gateway. There are two approaches:
- Simple: A simple Ceph Object Gateway configuration implies that you are running a Ceph Object Storage service in a single data center. So you can configure the Ceph Object Gateway without regard to regions and zones.
- Federated: A federated Ceph Object Gateway configuration implies that you are running a Ceph Object Storage service in a geographically distributed manner for fault tolerance and failover. This involves configuring your Ceph Object Gateway instances with regions and zones.
The Ceph Object Gateway is a client of the Ceph Storage Cluster. As a Ceph Storage Cluster client, it requires:
- A name for the gateway instance. We use gateway in this guide.
- A storage cluster user name with appropriate permissions in a keyring.
- Pools to store its data.
- A data directory for the gateway instance.
- An instance entry in the Ceph Configuration file.
- A configuration file for the web server to interact with FastCGI.
CREATE A USER AND KEYRING
Each instance must have a user name and key to communicate with a Ceph Storage Cluster. In the following steps, we use an admin node to create a keyring. Then, we create a client user name and key. Next, we add the key to the Ceph Storage Cluster. Finally, we distribute the key ring to the node containing the gateway instance.
Create a keyring for the gateway
$sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.radosgw.keyring
$sudo chmod +r /etc/ceph/ceph.client.radosgw.keyring
Generate a Ceph Object Gateway user name and key for each instance
$sudo ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n client.radosgw.gateway --gen-key
When you provide CAPS to the key, you MUST provide read capability. However, you have the option of providing write capability for the monitor. This is an important choice. If you provide write capability to the key, the Ceph Object Gateway will have the ability to create pools automatically; however, it will create pools with either the default number of placement groups (not ideal) or the number of placement groups you specified in your Ceph configuration file. If you allow the Ceph Object Gateway to create pools automatically, ensure that you have reasonable defaults for the number of placement groups first. See Pool Configuration for details
$sudo ceph-authtool -n client.radosgw.gateway --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.client.radosgw.keyring
sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.gateway -i /etc/ceph/ceph.client.radosgw.keyring
Distribute the keyring to the node with the gateway instance
#append the following configuration to /etc/ceph/ceph.conf in your admin node
#on the deployer node
$ceph-deploy --overwrite-conf config push osd1/osd2/gateway
#COPY CEPH.CLIENT.ADMIN.KEYRING FROM ADMIN NODE TO GATEWAY HOST
$sudo scp /etc/ceph/ceph.client.admin.keyring ceph@{hostname}:/home/ceph
$ssh gateway
$sudo mv ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring
#COPY CEPH.CLIENT.ADMIN.KEYRING FROM ADMIN NODE TO GATEWAY HOST
$sudo scp /etc/ceph/ceph.client.admin.keyring ceph@{hostname}:/home/ceph
$ssh gateway
$sudo mv ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring
$sudo mkdir -p /var/lib/ceph/radosgw/ceph-radosgw.gateway
$sudo radosgw-admin user create --uid="testuser" --display-name="First User"
$sudo radosgw-admin subuser create --uid=testuser --subuser=testuser:swift --access=full --key-type=swift --gen-secret( or radosgw-admin subuser create --uid=testuser --subuser=testuser:swift --access=full --secret=secretkey --key-type=swift)
#make sure use can be query
$sudo radosgw-admin user info --uid="testuser"
$key={sudo radosgw-admin user info --uid="testuser" | grep swift_keys."secret_key"}
$swift -V 1.0 -A http://10.243.192.111/auth -U testuser2:swift -K $key post c1
$swift -V 1.0 -A http://10.243.192.111/auth -U testuser2:swift -K $key list
No comments:
Post a Comment