Tag Archives: Apache Mesos

Dell EMC World 2017 – Las Vegas, NV

It looks like that time of year again as we are just days away from Dell EMC World 2017. The {code} team will once again be in attendance and presenting some interesting sessions (16 in total), a Hands-On Lab (ran through it myself and it’s great!), and various materials at the show. The buffet (Yes, we are in Vegas after all!) of information we have lined up is pretty dang awesome! You can find more information about the stuff {code} has going on in our official {code} at Dell EMC World page.

Demos, Demos, Demos

What I wanted to talk about today were the two sessions that I will be presenting at Dell EMC World. The first session called Demos Demos Demos! Containers & {code} is happening on Wednesday, May 10 at 1:30 PM in room Zeno 4602. I will be co-presenting with Travis Rhoden and Vladimir Vivien. Just like the title says this session will have a few slides to set up what is going on and talk about who we are… then it’s nothing but live demos. I think this will be a pretty amazing session that captures what is hot in the container and scheduler space but at the same time, will give you some practical and real-world information to take home with you. Definitely, check this out!

The second session I will be presenting solo. It’s called Managing ScaleIO As Software On Mesos and is occurring on Thursday, May 11 at 11:30 AM in room Zeno 4602. I floated this idea last year during a session at (the then) EMC World 2016 where I thought it would be cool to be able to treat storage just as another piece of software. Well now its one year later and that idea is a reality now and we are going to talk about and demonstrate the ScaleIO Framework in this session. Many other container schedulers have implementations of this pattern and this concept will change the way how we consume software in the future.

Have fun, but not too much fun!

If you are heading down to Dell EMC World this year, stop by the sessions the {code} team will be presenting at and if you have any questions, feel free to linger around after the presentations to chat. I think this is going to be an awesome conference, do check out some of the social networking opportunities available to connect with some new people, and as always enjoy the show and have fun (but not too much… it’s Vegas after all)!

Applications that Fix Themselves

I know that in my last blog post I said I would be talking (and probably announcing) the FaultSet functionality planned for the next release of the ScaleIO Framework. As all things in the world of technology and software, things don’t always go as planned. So today I am here to talk about some stuff relating to the Framework that will be in my speaking session entitled How Container Schedulers and Software Defined Storage will Change the Cloud at SCaLE 15x this Saturday March 4th at 3pm in Ballroom F of the Pasadena Convention Center.

SCaLE 15x Logo

This new functionality at face value seems straight forward but the implications start to open the doors to next level thought kinda stuff. Ok ok ok. I may have oversold that a little, but the idea itself is still pretty cool and I am super excited to talk about here.

Just make it happen. I don’t care how!

Just this week, I released the ScaleIO Framework version 0.3.1 which has a functionality preview **cough** experimental **cough** for a couple of features that I think is cool. The first feature, although not as interesting, will probably be the most useful immediately to people that want use ScaleIO but was turned off from the installation instructions… starting from a bare Mesos cluster, you can provision the entire ScaleIO storage platform in an highly available 3-node configuration from scratch and have all the storage integrations, like REX-Ray and mesos-module-dvdi, installed automatically.

Easy Street

In case you missed it… without having to know anything about ScaleIO, you can deploy an entirely software-based storage platform that will give your Mesos workloads the ability to persist application data seamlessly, that is globally accessible, and make your apps highly available. This abstracts the complexities of the entire storage platform and transforms it into a simple service where you can simply consume storage from. As far as any user is concerned, the storage platform natively came with Mesos and the first app you deploy can consume ScaleIO volumes from day one. If you want more details on how to make that happen, please check out the documentation.

The Sky is Falling!! Do Something?!?!

I think the second functionality preview **cough** experimental **cough** in the 0.3.1 release has perhaps the most compelling story but may be less useful in practice (at least for now). I have always been fascinated by this idea that applications, when they run into trouble, can go and fix themselves. We often call this self-remediation. In reality, that has always been a pipe dream but there is some really cool infrastructure in the form of Mesos Frameworks that make this idea a possibility.

It's not going to happen

So this second feature comes from my days as both a storage and backup user… where I get the dreaded storage array is full notification. This typically entails getting another expander shelf for your storage array (if you are lucky enough to have expansion capability), populate disks in the expansion bay, and then configure the array to accept this new raw capacity. In the age of Clouds and DevOps, anything is possible and provisioning new resource is only as far as an API call away.

Anything is possible

The idea is that as our ScaleIO storage pool starts to approach full, we can provision more raw disks in the form of EBS volumes to contribute to the storage pool. Since we exist in the cloud or in this case AWS that is only an API call away. That is exactly idea behind this feature… to live in a world where applications can self-remediate and fix themselves. Sounds cool yea?!?! If you are interested in more information about this feature, I urge you to check out the user guide, try it out, provide input and feedback! And if you happen to be at SCaLE 15x this week, I will be doing this exact demo live! BONUS: You can watch that video demo that was performed at SCaLE here:

Where to go next…

So I hope the FaultSet functionality is just around the corner along with the support for CoreOS, or what they are now calling Container Linux, since a lot of the stuff coming out of Mesos and DC/OS is now based on that platform. Let us know if you want more surrounding Mesos and the ScaleIO Framework by hitting me up in our {code} community slack channel #mesos. Additionally, if you are in the Los Angeles area this week, I would highly recommend stopping by SCaLE 15x in Pasadena, catch some of the sessions, and stop by the {code} booth in the expo hall to continue the conversation.

ScaleIO: Deep Dive on Imperative Deployment

By now you probably have read the blog post, ScaleIO Framework v0.3: Deploy This!, where we announced the new version of the ScaleIO Framework. (If you haven’t, I would definitely go check it out first.) In that release, a new feature called Imperative Deployment was unveiled which is the first structured method for deploying ScaleIO into your Apache Mesos cluster. In this blog post, we are going to do a deep dive for that feature and highlight some of the interesting and cool things that Imperative Deployment brings to this release.

Let’s Kick this Off

The first thing we should point out when it comes to ScaleIO is that you need a strategy when it comes to how you want to deploy it. Since ScaleIO is flexible and allows for infinite possible combinations, each one of those combinations has pros and cons. So it turns out that the marketing material that makes ScaleIO super easy to use glosses over the fact that there is actually a set of best practices that you need to adhere to get the most out of ScaleIO.

We are going to tackle various ScaleIO deployment scenarios in a series of installment blogs and our first topic will discuss environments for demos, dev/test, and smaller configurations. In this type of configuration, a fully distributed or hyper-converged deployment might be best to roll out since you are dealing with a relatively small number of systems. Demo and dev/test environments are trivial as it “just needs to work” and performance is an afterthought. So let’s take a look at a real world hyper-converged configuration. It goes without saying that you want to have at least a 3-node Mesos master quorum to tolerate failure. For the ScaleIO MDM nodes (Primary, Secondary, and TieBreaker), we will make use of the 3 nodes used for the Mesos masters. Then for the compute, we will have 16 Mesos Agent nodes configured each with a single 2TB drive. This configuration must have already been pre-created prior to deploying the ScaleIO Framework.

Mesos Configuration

To deploy ScaleIO using the Framework’s Imperative Deployment feature, you would define similar Mesos Agent attributes as mentioned in the “Deploy This!” blog article. Before we begin, it is important to understand what the scaleio-sds and scaleio-sdc attributes really mean. The scaleio-sds represents the protection domains and storage pools that will be created on ScaleIO and which disks/devices will be contributed to that domain/pool combination. The scaleio-sdc represents the protection domains and storage pools to which that particular node will provision and consume ScaleIO volumes from. So very simply put, the difference between sds and sdc is sds is the server configuration of the disk/devices offered up to ScaleIO and sdc is the client configuration to consume volumes from ScaleIO.

The Imperative Configuration

So in the 3 Mesos Master + 3 ScaleIO MDMs and 16 Mesos Agent node configuration defined above, if you had the 2TB drives installed at /dev/xvdf for each node (can be verified using fdisk), your Mesos Agent node’s attributes would look like the block below. Note that any changes to your Mesos Agent attributes will require a restart of your Mesos Agent service before deployment of the Framework.

# cat /etc/mesos-slave/attributes/scaleio-sds-domains
mydomain

#cat /etc/mesos-slave/attributes/scaleio-sds-mydomain
mypool

# cat /etc/mesos-slave/attributes/scaleio-sds-mypool
/dev/xvdf

# cat /etc/mesos-slave/attributes/scaleio-sdc-domains
mydomain

#cat /etc/mesos-slave/attributes/scaleio-sdc-mydomain
mypool

Now a few things should be noted. It might be wise to use more meaningful names than mydomain or mypool. If this was for the Quality Engineering department, maybe mydomain can be replaced with engineering and mypool with qe. The next thing is this assumes all devices are configured at /dev/xvdf but depending on your storage controller, the drive might be at /dev/xvdg for example, so replace it with the discovered or assigned value. Lastly, since REX-Ray currently only supports provisioning volumes from ScaleIO on a single protection domain and storage pool, we could omit the definition of any /etc/mesos-slave/attributes/scaleio-sdc attributes. There exists code such that the last defined scaleio-sds domain and pool are automatically used for the scaleio-sdc components. When REX-Ray implements multi-domain/pool capabilities, this code will likely be deprecated.

Finally, let’s assume that we know for certain that all the disks/devices are attached to /dev/xvdf because the initial setup was performed using your favorite DevOps tool or you are in AWS (/dev/xvdf happens default when you add your first disk), you could have deployed based on the ScaleIO Framework’s Single Global Pool method which would automatically attached all unused (ie without a filesystem) disks on the 16 Mesos Agent nodes. The default protection domain and default storage pool names can be overwritten with meaningful names using the configuration options -scaleio.protectiondomain=engineering and -scaleio.storagepool=qe. The end results of both methods in this particular case would have been identical.

Huge Mistake

This appears to be simpler than the Imperative deployment, why don’t we just use the Single Global Pool method all the time? First, keep in mind that only a single protection domain and single storage pool can be created. You may want to have more that one and in that case, you must use Imperative Deployment (Example Below). Second, if you have disks without a partition that you want to allocate for some other function like additional local storage, the Single Global Pool method will automatically consume and contribute that disk/device to ScaleIO. Warning: This includes Agent nodes to be added to the cluster for expansion! Defining these attributes for new nodes to be on-boarded to the cluster yields an explicit configuration and without these attributes, newly on-boarded nodes will contribute all disks/devices presented to that node based on the -scaleio.protectiondomain and -scaleio.storagepool configuration options.

An example of multiple StoragePools. Maybe Mesos Agent nodes 1-8 are defined like:

# cat /etc/mesos-slave/attributes/scaleio-sds-domains
engineering

#cat /etc/mesos-slave/attributes/scaleio-sds-engineering
qe

# cat /etc/mesos-slave/attributes/scaleio-sds-qe
/dev/xvdf

And Mesos Agent nodes 9-16 are defined like:

# cat /etc/mesos-slave/attributes/scaleio-sds-domains
engineering

#cat /etc/mesos-slave/attributes/scaleio-sds-engineering
development

# cat /etc/mesos-slave/attributes/scaleio-sds-development
/dev/xvdf

What’s next?

A piece of functionality that is currently being worked on is Fault Sets. This will allow one to specify which nodes can fail without data loss. This will naturally allow for advanced configurations for ScaleIO and happens to be the target for the next blog article in this series.

Further down the road, there are plans to work on a Declarative Deployment option which kind of sits between the simplicity of the Single Global Pool and the explicit Imperative Deployment methods. By providing more abstract constructs, your end result will yield deployment of bigger configurations without getting into the weeds of managing what devices belong to what protection domain or storage pool.

Be sure to check out the ScaleIO Framework project on GitHub and visit the {code} labs page to test drive this feature. All feedback is welcome!

David's Technobabble Blog

A blog about everything tech (and not)…