I'm so late to the party, all the food is gone and people are leaving...

11th December, 12:20

It's not like I've not seen, heard of, understood fully in principle, or known about any 'smart technology'.

Far from it.

In fact, my dad has made use of this particular tech quite a bit, and I've been entertained to see in place and see its operation...

But, up until this point, I haven't felt a need for this particular technology.

Except maybe this time last year... I thought about it, but I'll be honest - this time last year I was working such insane hours, any thought of "home" technology was rapdily dismissed by concerns over VMWare's upcoming licencing change, or as I recall specifically - having lots of woes trying to access vSAN at full speed over iSCSI.

(In other words, I had bigger problems)

As of right now I am very busy with freelance work, I have a huge amount of stuff to do, but it still is probably half of what I was doing 12 months ago.

So this year, when I realised that every single evening, I was going to have to walk out the back door, walk into the garage (potentially over wet ground, which meant getting wet socks or having to go to the effort of putting my shoes on) just to turn the Christmas lights on and off...

...I thought, "OK. There has to be a better way of doing this."

As it stood, the lights were fed from an outdoor plug socket, which doesn't have enough room for a smart plug. Also, the wifi would probably not reach that point, although I've been meaning to put a repeater in the garage... but the point was moot, I couldn't use a smart plug at the main socket.

The plug for the lights was in fact in an awkward spot, meaning the easier solution was to switch off the spur internally in the garage.

That also, was a lot more complex to make 'smart' - whilst possible, I was looking for rapid deployment. I'm all about rapid deployment.

So my attention went to the front of the house, where the box is for the plug board that powers the lights. But, because of the number of lights, and the width of the box, I have to have two plug boards, each with 4 sockets on, to make the required 7 I need.

(We like lights...)

But NOW the problem is, these plug boards are fed from an external extension cord, which again doesn't have room for a smart plug.

So to the solution:

1. An additional external box for the electrics

2. A 2 gang plug board

3. 2 Smart Plugs

In their location, the WiFi would be weak, but should suffice. I implemented this rapid deployment solution via Amazon Prime, and had the required products in less that 24 hours. I had also deployed said solution, within that 24 hours. Although with one modification.

The stupid smart plugs (I realise the irony there) were too wide to fit on the 2 gang plug board - and so with a deep sigh, I restructured the cable. There was very little power flowing through these lights, so I simply daisy chained the 4 gang plug boards, to give me 7 sockets, on one smart plug.

A hop, skip and a jump later, and I was giggling like a schoolgirl saying "Alexa, turn the Christmas lights off" and "Alexa, turn the Christmas lights on" and watching with glee as they were turning off and on.

The neighbours probably assumed we had some electrical issues....

These were the plugs I bought - I was a little apprehensive at first as they weren't actual natively integral with Alexa, but - the app that you get with it is pretty decent and as someone who's not a massive user/fan of voice assistants (I'll use Alexa, but Siri is just useless) - I found this good.

I also bought this electric box which - after a little cursing (I was trying to do it all in the dark, with only my phone light for visibility) I decided was much better than any other outdoor electric box that I've bought.

Anyway - a strong thumbs up for automation. Something I should have embraced long ago!

TTFN,

Ben

RIDICULOUS.

9th October, 19:30

A little while ago, I wrote about how I found some of my old uni work.

I say "found"...

It's not like it was in a box in the garage, and I was tidying and discovered it...

Although that is a good analogy, for the mess that is my "Old" folder.

In my "Old" folder, I have a bunch of files, and another folder called "Backup". Then one called "Old Stuff". Then one called "Stuff". Then one called "Backup".

You get the idea. There's probably in the region of 2TB of data, going all the way back to the 1990's.

YES, I have kept some crap from the 1990's.

Lots of stuff, but some ".mod" files, which were basically files containing music instructions which were then played through a piece of software synthesising the music. The better the synthesiser the better the quality output.

Why do I hang on to these files?

Well, because it's a good reminder of this:

Image obtained from https://hackaday.com/2014/09/29/the-lpt-dac/

I found that image online, searching for "parallel port ad lib sound card".

Because, I built one just like that... although that one's probably built much better than the one I built.

I remember building with a friend, and being super chuffed that we actually made something that made sound! I don't remember what year this was but it was definitely 1990-something.

It's amazing how far technology has come, over the last 30 years.

I don't remember when it was, but I'd say at a guess around 10 years ago... I bought a computer laptop, but it only had 128GB storage. So, to give myself a bit of extra space, I bought a 128GB USB Drive which was roughly the size of my thumbnail (obviously it was USB height...)

I gave it the label "RIDICULOUS".

Because to me, it genuinely felt absolutely ridiculous, that as a teenager, the THOUGHT of having something even the size of 128GB alone was crazy, but 128GB the size of my thumbnail?! Madness.

Now of course, we're at a whole new level - although I don't find it as 'ridiculous' any more.

We'll probably be saying the same thing about AI in 10 years time...!

I can't help you with AI, but if you do need any help with hosting, development or project management - get in touch here

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

Every day is a school day...

30th September, 16:36

I recently found some of my old Uni work.

This is the 'end' result of a step by step process where I built a calculator (theory only, not practically, although that would have been fun).

The work I did, built each component out before then 'encapsulating' the circuit into a 'chip'. For example, at the most basic level:

Simple Adder

The above is a "simple adder".

These are using logic gates (like 'AND' and 'OR'). In this instance, you have two inputs, and two outputs. "r" the result, and "o" the overflow". If x is 0, y is 0, then the result and overflow will be 0. If x OR y are 1, then the result will be 1 and the overflow 0.

But make x AND y 1, then the result AND overflow will be 1, and 1+1 = 2!

So how do we make that useful? Because if it can only do 1+1=2, that's... not helpful. Well, combining with other elements plus encapsulating it and repeating it... so for example, the below is known as a "1 Bit ALU" - ALU is Arithmetic Logic Unit

Circuit

(OK bear with me, I appreciate you're probably falling asleep)

But you take that ALU, and replicate it 8 times and you get this:

Circuit

Suddenly, you have an 8 Bit ALU, ok.. getting more handy... (and if you've ever looked at computer chips, you'll probably be able to see how that sort of looks like one (albeit you wouldn't see inside it, you'd just have black plastic over the top...).

So to save you from complete boredom (as much as I love this stuff, I know not everyone does) I'm going to fast forward to the end product now:

Circuit

I hand drew every single one of those lines, 1 pixel wide, in Microsoft Paint. There was probably an easier or more 'proper' way to do it, but this was one of those occasions where I remember just being captivated by the subject and just absolutely focused on the matter at hand.

I do remember scoring a reasonably high grade on this piece of work, but reading it now I want to jump back 20 odd years and give myself a slap. Because I know it could have been so much better.

But that's just human nature, I guess. We always strive to improve ourselves and I guess the consolation I should hold on to, is that the fact I can see so many ways to improve this whole piece of work I did, means I have myself improved...!

It's why I love the phrase 'Every day is a school day' - no matter what you're doing, you can always do something better tomorrow than you could yesterday.

In the last 25 years, I've learnt about development/programming, servers/server provisioning and management, project management and got pretty damn good with my networking skills... (computer networking, not people networking... I still suck at that) along with a million other things like marketing/sales and probably a dozen other things I can't remember right now.

So when it comes to developing a new app or system for your business, or streamlining your IT support, or maybe you've got a lot on already and need more general technical project management - I'm here to help... Reach out to me here.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

Got your new iPhone yet?

23rd September, 15:26

So the iPhones arrived on Friday... (not mine, I hasten to add)

My wife upgraded to a 16 Pro, as she was on a 13 Pro and it was "about time". The battery isn't lasting as long, and there are some pretty cool new features over the last few years that mean upgrading has been a big step up.

Being a tech nerd, and historically having been obsessed over always having the latest gadget (at one point I (pre-iPhone) I think I changed my phone every 6 months).

But humorously - I have just DOWNGRADED my phone.

I had a 13 Pro Max, you see... but for so long now, I have been frustrated with the "bigger phone".

It would fall out my shorts in Summer when I got in the car, or I simply wouldn't be able to effectively use it one handed all the time (#firstworldproblems I know).

So I made the decision a while ago, unless there was a really ultra compelling reason/feature - I would go with the Pro this year, not the Pro Max.

But I've made regretful decisions before when it comes to phones, and whilst I could buy it and see, then return it in a short period... for a start, that's hassle... secondly, it's probably more than a couple of weeks for the "niggles" to kick in.

So right now? I'm using my wife's old 13 Pro. My son was elated that he got my 13 Pro Max.

Things I've noticed instantly:

1. It's SO much nicer using it one handed, everything is a bit more reachable, and it just fits in the hand better.

2. I'm wearing shorts now that the Pro Max would fall out of, and ... time will tell - but I am sure this is a much better fit in the pocket too.

So, win/win all round, get the 16 Pro, right?

*Sharp inhale through the teeth*

I don't know.

I've had the Plus / Pro Max since the iPhone 7 Plus. That's a long time to be used to a bigger screen.

So right now, I've made the right decision: test this out for a few weeks, or a month or two, then decide.

It's handy having no "lock ins" or "contracts" that force me to make the decision. And that's the way I've always operated by hosting and IT services.

I know some companies will tie you in for 12, 24 or even 36 months or more!

None of that with me. I have said 1,000 times to my clients, "I never want you to leave, but if you decide you want to, I want it to be easy for you."

That's also why, I think, I've had clients return to me after leaving me for a few years. I've had clients since 2007 that I still look after today.

So whether it's project development you need help with, IT Support, or simply somewhere decent to host your website - you can reach out to me here.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

One of the most unusual things I've seen, and I'm sure there's a logical explanation...

18th September, 12:26

This was pretty weird.

And Google wasn't much help, either.

But the thing that stuck in my mind, were some words I learned at a relatively young age (i.e. young to me now, I was probably in my early 20s).

"Computers are not smart. Computers are dumb. Computers are just following your instructions."

The latter part of that can be open to interpretation, i.e. This was more referencing development at a lower level, not talking about Microsoft Outlook being a ****.

(I could have really put any piece of software there, but Outlook recently has been frustrating me)

But in this instance, the problem I had is relatively "low level" - in so much as - we're talking about the operating system.

I've recently been playing around installing some software on some new servers, and in building out the cluster of machines, I made an error.

Having done this 1,001 times - I thought - OK at this stage, with nothing on there, let's just reinstall the server that went wrong. Quickest solution.

(I have seen the issue I had before, and I've fixed it, but in theory what I was about to do was quicker)

I reinstalled, let it start up and ... oh, this is the original install?! "Ha, you idiot, what did you do wrong", I said to myself, cursing for wasting more time on something so simple.

I reinstalled again, made sure to select the right drives, made sure there were no major errors on the install, and let it start up again.

"Huh, this is... odd", I thought, whilst still looking at the "original" install I had put on the drives.

"OK, time to get a bit more brutal", I thought. I used the BIOS to format the drives, and reinstalled again.

At this point, whilst looking at the original install, I started to lose it just a little bit. "What the hell is going on?!?" I said to an empty room.

"OK, maybe the stupid BIOS did something stupid, and the stupid server is just stupidly booting into the wrong stupid drive because it's stupid."

(There were probably a few more swear words scattered around there)

Now, every time I have installed the server, I have selected ZFS RAID1, and specified the drives I wanted to install on.

I thought, OK, I don't trust the BIOS, I'll trash those drives using the command line... so I booted into a terminal using the install cd, did exactly that:

# dd if=/dev/zero of=/dev/sde bs=1M

(For the non-techies, that basically means "absolutely categorically make this drive blank")

It took a while, but I reinstalled satisfied there was no way it could boot into the original OS.

Until it did.

And it was then, and only then, I realised I had to have been being a complete prize idiot. Why this didn't occur to me sooner, I don't know, but... maybe I installed this onto a different drive by mistake?

Sure enough that was pretty much the problem. Something so simple, so obvious, but so easy to overlook.

That's the trouble with working in IT. Computers are stupid. Computers do not have a brain, like we do.

With "AI" - you might be fooled into believing they can "think" - but at the base level of it, it all comes down to zeros and ones.

I recently found a project I did from Uni, showing how from the most simple level, to build a calculator. When you look at the end result? Magic. When you look at the very first "layer", it's literally just comparing two bits to see if they're on or off.

All this to say, it's not uncommon to hear in frustration in the office, "Oh you stupid computer" (especially in my office), but remember this comforting thought: the computer's just doing what you told it to :)

If you do frequently say "stupid computer" - and want to chat about how I can help, you can reach out to me here.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

Are you still paying for a "dedicated server"?

16th September, 11:04

It actually surprises me this is still a thing....

...but in a world of "If it ain't broke, don't fix it" - and knowing how resilient enterprise hardware is... I can see why it is still a thing.

I am of course, talking about the good ol' fashioned "dedicated server". Not a VPS, or a "Cloud" Virtual Machine, a proper, bare-metal dedicated server.

The type where you have to have REALLY regular failsafe backups, because if that server dies, it's dead. There's no "high availability migration" - there is just death and destruction of your poor data.

This, I find even worse, in scenarios where people RELY on their dedicated server for their actual income, i.e. If that machine dies, for every minute you're offline, you're losing money.

In that scenario, in this day of the 16th September 2024, I find that NUTS.

Because, there are so much better solutions out there and *they don't cost the world*.

I know a couple of organisations that work in this way, and it terrifies me - and I have no interest in their actual business. I think the main reason people put up with it, is two fold. First, I think they probably don't really know much better, and it's what they've always used, and it's always been fine, and when you compare what spec they have to a "cloud" alternative, it's way cheaper.

Secondly, it's human nature, I think. The longer something stays a certain way, the longer you're more likely to forget about it. These people sleep at night because they just don't think about it. At all. It's working, what is there to think about?

But this is why I do what I do.

Building small, bespoke clusters of servers, that work in a way that means the "worst case" scenario is 2 minutes of downtime (well, OK, probably more like 3-4 factoring in time for the machine to boot properly and start all relevant services) but it's *automated*.

And you might think that this is going to be "far more" expensive than just buying one server and a backup server. But here's the thing: you only need 3 servers as a minimum, and you can 'pool' your resources across those servers, so you don't need 2 insanely high spec'd machines, you can get 3 machines that are more reasonably spec'd, and spread the load around them. This keeps the overall expense much lower, and the added benefit that in reality, you have a lot more resources actually available to you.

But what about storage? Well, the configurations I implement mean that you essentially share the hard drives across all three machines, so again, even with a physical server failure - whilst you need to get that server fixed ASAP - there's essentially no downtime.

There's just no way I could ever in good conscience implement a single dedicated server; it might never fail, it might work really, really well for you throughout its whole life - but it might not.

And speaking as someone who has seen all sorts of failures in servers including RAID cards, I know that you have to expect the unexpected.

If I've made you sufficiently worried about your hosting provider, I'm happy to have a chat and see what sort of setup you have.

You can reach out to me here.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

This isn't a blog all about Apple products, but this was really impressive...

11th September, 9:41

Before anyone else messages me to tell me I'm late to the party...

...I know.

I had heard about someone talking about this on a podcast quite a while ago, and not so much dismissed it, but just never really gave it any thought.

Then last night, my dad messaged asking for a little help. He'd offered to bring the laptop round (I was sorting dinner so was harder for me to go there), but it just popped into my head...

I remembered that Marco Arment from ATP had said he uses the Messages app for everything, including the 'find my' but also screen sharing and more.

If you're not a Mac user, the Messages app is a mirror of the Messages app on the phone, so you have access to all your text messages etc... So forgive me that this wasn't the first thing I thought of when it came to screen sharing.

But I went into it, clicked on my dad, clicked sharing, then "Ask to share screen" - and so far as I know, he just had to approve the request, then we were connected!

I couldn't believe how easy it was.

So, top tip, if you use a Mac, and someone you know needs help who also uses a Mac, this is an insanely easy way to do it.

Something I also discovered, was that it connected audio too; I didn't realise that until after as I'd had my speaker volume down, but that's another added win.

Remote support is a difficult one to get right in business, and I've used a ton of different packages over the years.

Ultimately, the solution I now implement is as simple as, you phone (or email) to say you're having trouble with something, I request to connect, you approve, I'm in.

It's as simple as the Mac solution, but on Windows.

And I can do IT support for just one person, 10 people, 100 people or more.... the key is all in the 'setup' at the start.

If you want to find out how I can help, you can reach out to me here.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

Bespoke, or off the shelf?

9th September, 13:29

This is one of those questions that can practically burn you out trying to decide.

For years, when I was building my previous platform, I wanted to use off the shelf and just customise it.

But the more I tried, the more I found it just wasn't possible.

I can relate to this with seeing how other businesses have been built. Several times over the years, my mind has been blown, seeing what people have used "because it works" instead of a "proper solution".

I mean literally running huge (to me) businesses, and utilising nothing but spreadsheets... or using off the shelf systems that do 80% of the job, but then they literally have to hire an entire employee to go through and input that data into another system for their billing.

"It ain't crazy if it works"

When it comes to building out my own hosting platform, I have been torn over exactly how I'm going to do it. I don't have the luxury I did when I first setup my last platform, where I had [a] spare time [b] staff with a bit of spare time and [c] no time pressures to implement it.

This time around, I need to deploy hosting solutions, fast, and reliably.

So I'm going the 80/20 way... I know of systems that will do about 80% of the job for what I want, but 20% will have to be manual. But the benefit is, the 80% will be automated and billing will also be automated...

Side note, I remember first hearing about 80/20 in about 2014 (wow, 10 years ago!) and even then I was considerably late to the party, but I was astounded to see how often the 80/20 thing held up (in summary, 80% of your results will come from 20% of your efforts, and 80% of your revenue will come from 20% of your customers, etc etc... there's a book by Richard Koch on it if you want to go look it up).

But the funny thing is, 80% of my hosting which is automated will probably produce 20% of my income, and 20% of my hosting which is manual will produce 80% of my income.

*Insert shrug emoji*

I don't know why in life, things work out to 80/20 all the time. Perhaps because we're all in a simulation and none of this is real...

But fundamentally I know this: I now have over 2 decades' experience building platforms where solutions didn't exist, and by taking solutions that exist and customising them.

If you need help with anything like that - you can reach out to me here.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

You probably don't need huge infrastructure, and you're probably overpaying...

6th September, 11:42

OK yes another over-dramatic headline... But... this is really important:

Are you paying more than £2,500 a year for hosting/IT requirements? If so, this might be relevant to you...

(If not, you might find this interesting, but you're probably just using regular hosting / have a "normal" website, so it doesn't apply to you).

Obviously, every website/company is different, and needs/use cases vary dramatically... but whether you're paying £3,000 a year, £30,000 a year or £300,000 a year or more - I feel compelled to urge you read this because it could make a big difference to you.

Something I've learned over the last 20+ years, is that if things are "set up right" - they don't need half the resources you think they do.

Very quick side note, I'll always remember building out one of my first hosting servers in 2004, and getting "loads" (to me) of websites on it, and noticing that the CPU on the server barely ever went above "idle". You really can achieve a lot with not much.

Anyway, over the years I've built systems for people and hosted them on tiny solutions, and built systems where there's HUGE data and hundreds of people accessing it 24/7 (call centre scenario). Then, I've seen people provisioned VAST resources, with auto-scaling/elastic storage on AWS and just every feature you could throw at it... for some relatively basic websites. The mind BOGGLES sometimes, when you see what effort people have gone to, to host something so simple.

I've reduced costs for people to 25% of what they were paying, or more.

To be clear on two elements here, because this is important:

1. That's NOT reduced their costs by 25%... that's reducing their costs to 25% of what they were paying.

2. This is not factoring in support/maintenance costs, i.e. If anything, because of my skill level/experience, I'm probably more expensive than your current company if you broke it down "per hour"... but that's not how I work, anyway (and this isn't a sales pitch, really).

Using platforms like AWS, Azure, etc... it's really easy to forget what your key requirements are, and what actual resources you need to power them.

Where this flips on its head, is if you're in the £100m+ spend for infrastructure, AWS really does start to pay for itself... but most people I know aren't in that scenario. This is why for all of my IT life, I have focused on running my own infrastructure - and where possible - doing so for clients too. Some of it is simple, some of it is not. But the truth that the person who's currently selling you your infrastructure probably doesn't want you to know is: it really doesn't take much power to run even a fairly complex website.

They may offset their "fees" by overselling you on hardware, or in the worst scenario, they oversell you on both.

As a non-technical person, you have literally nothing to go on but trust and recommendations when it comes to an IT supplier. And that's scary. I'm the same with my car; my 20+ year old skip on wheels I understand quite well, and I'll know the relative costs of labour and parts, so it's easy to know I'm not being screwed over. My 'family car' however, which is a much newer BMW, I have litte to no understanding on costs for the parts or labour.

And sure, I could learn, but we all could, right? i.e. You could learn all the ins and outs of the technical world - but you have better things to do, and probably no interest.

The problem within our industry is that over-provisioning is easy, and often a glitch or "slown down" that might be something really simple like a misconfiguration or tiny piece of badly written code, is easily solved by chucking more memory at it. And suddenly it all works great. And it just costs another £10/month. Or £100/month. Or £1000/month.... etc.

The problem's gone away, and only a tiny relative increase in cost, so who cares, right? But that happens over and over, and before you know it... you're spending more on infrastructure than staff.

(OK again I'm being overdramatic, but you get my point...)

The real 'wins' come from looking at why it's slow in the first place, and fixing that. In probably 80% of cases, it's something insanely simple. I've seen pages taking 30+ seconds to load, reduced to just a second or two, because a database table was missing an index (and for those that have no idea what an index is... I'm sure you don't care so I won't go into detail, but just know that it took about 10 seconds to write the query to make it happen, and a matter of minutes for the thing to actually execute... that was it.)

This is why who you place your platform with is so important.

So here's the question...

(Although again, the question only applies if you're spending more than £2,500 a year on hosting, up to that point, you're probably OK)

How much more money would you have available for your business, or YOU, or YOUR family, if your hosting cost was cut by 10%, 25%, or even 50%? (Savings of 75% that I mentioned above are astonishingly rare, that particular client had been deliberately over-sold. If you don't know, you don't know, right?)

However a saving of 50%? That's not THAT unusual.

I'm not really into volunteering my time for free, but what I would say is, if you're curious about whether your IT spend is "right" - get in touch and I'll have an initial free chat... beyond that I will undoubtedly have to charge to properly look at anything, but I can almost certainly get a good feeling for your situation from that first conversation as to whether it's worth either of our whiles.

You can reach out to me here.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

Keeping it nerdy, Dr Who at the proms was everything I hoped it would be...

4th September, 11:43

This is slightly off topic, but this blog is called 'nerds life' so I figure it fits.

I remember watching Dr Who as a child. It was one of those programs that my older brothers were into, and so I watched it becuase this was the 80s, we had one TV, and I had nothing else to do.

I remember loving it. But if you challenged me to tell you about a single episode I watched....... I can't.

I remember snippets, certainly involving Daleks (but that's an easy one, right?) but it wasn't until 2005 that I got interested in it again.

My wife and I very much enjoyed watching it, although I was far from 'obsessed' - I was practically angered by David Tennant's departure, and remember hating the first scene with Matt Smith.

But as fickle as I am, I ended up loving Matt Smith by the end of his first series, and was equally apprehensive of Peter Capaldi.

But I remembered how much I hated Matt Smith, then how much I loved Matt Smith, so figured - shut up and just watch the show.

Well, Peter Capaldi to this day, is my favourite Doctor by far. I think his first series is good, but his final series and specials are just unbeatable.

Jodie got a lot of stick, but that wasn't her fault, *gives evil eyes to Chris Chibnall* but I also respect the 'shake things up' a bit because it reminded us how good the earlier seasons were. (Sorry Mr Chibnall).

Ncuti so far is going okay, I have nothing significantly negative to say, but also nothing significantly positive. The jury's out, as they say.

Anyway, where's all this going? Well, Dr Who from 2005 to 2017, was made extra incredible by the amazing Murray Gold.

And many of the compositions are just definitive of "The Doctor".

By the late 2010's, I had really begun to become a strong Doctor Fan (NO I will not call myself a Whovian) and just searching around online, realised that there was a special Doctor Who at the Proms for the 50th anniversary.

How sad I was, that it was a one off and I'd long missed it (I know there were some others before it but I think that was the last one).

Last year I held my breath, hoping and hoping that there would be one for the 60th anniversary by no... until this year! I made sure I had three computers on the go ready to get the tickets, and got in the queue as soon as the Royal Albert Hall opened the online queue.

It was everything I had hoped it would be, some great music, "surprise" (it wasn't really a surprise was it?) monsters coming down the aisles, and a great mix of old and new music.

Will there be another one any time soon? I doubt it. But it'll be shown on TV / BBC IPlayer towards Christmas time, I believe.

Can I weave this into some reason why you should reach out to me about IT Support or development?

You know you've seen "those" sort of posts... where they find some "business lesson" or related reason from the story... not here, not today.

This was just to say, the Dr Who Proms were awesome.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

It really was an unreplicable environment, as much as I've tried to replicate it over the years...

3rd September, 09:14

I'd mentioned that in 2001, I started working at Demon.

Well, it was THUS Plc - a Scottish telecommunication company. They'd just bought Demon in 1999 I believe, for £66.6m.

The thing is, I look back at that time in my life fondly. Not just because I was in my early 20s (oh Lord to be 20 again).

But because that call centre, that whole company, had a team/ethos/spirit that I have never experienced since.

I was chatting to one of my clients yesterday, who is in fact, the person that was in charge of training back in 2001. I had mentioned the customer service training we had to take, as despite being tech support, we were trained to a very high standard for customer service.

One thing, for example, was that we didn't want to refer to the customer's issues as "problems" because you'd end up saying "What's your problem?" and that sounds confrontational. You want to keep the customer at ease at all times.

This meant that Demon for a long time, held a reputation for being a fantastic company (until they started cutting corners and costs a couple of years later).

The thing is, it's an unreplicable company/environment. As much as I've tried.

Part of the thing that made it GREAT, was the fact that we were dealing with emerging technology. And I don't mean broadband (although I did work there as ADSL came in), I simply mean the computers themselves.

Sure, in 2001, most people either had or where about to get a computer, and a significant portion of those would also get online, but I can tell you from first hand experience, many of those people barely knew how to use a mouse.

They'd probably gone to PC World, bought a computer, unpacked it, and then scratched their head and gone, "Right, how do I Internet?".

I remember having to coach someone through literally dragging an icon from one place to another, and then teaching them about right clicking... things that we just take for granted these days.

Plus, the vast majority of problems (sorry, I can't call them problems... customer service training... let's call it "reasons for getting in touch") were specifically around Windows issues. TCP/IP often needed reinstalling. Which hilariously, nowadays, if I suggested that as a resolution to a problem nowadays would get an equally confused look.

But alongside the great memories we made (especially on night shift) - I learnt more in that short time about troubleshooting and problem solving than I did anywhere else. Ever.

(OK, I did learn a lot about diagnostics with networks at Uni... but that detracts from my story so... we'll gloss over that...)

Either way it's all gone a long way to helping build a hosting environment. Because if there's one thing I know that'll hold true: stuff will not work like it's supposed to.

So whilst I'm still just starting out back on my own again, the one thing I can tell you is: when it comes to resolving technical problems (yes let's go with problems for now) or diagnosing faults, I can both do this rapidly, and do so with what I'd like to call "great customer service" (which in reality, should just be called "customer service" but that's another story for antoher day)

Find out more about how I can help your business / IT Department - visit www.spinningtheweb.co.uk.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

One of the most important questions, when it comes to building a tech company...

2nd September, 09:24

Clickbait is really annoying isn't it?

It's not as bad as it used to be, but I still see it from time to time, and roll my eyes at the post.

I suspect there's a large number of people that'll roll their eyes at this... the "most important questions" when coming to building a hosting company...

(Sorry, there's no 'number 3' will shock you)...

But in all seriousness, over the last 24 years I have built out a large number of hosting platforms.

It really properly started in 2001, after I'd got a job at Demon Internet doing tech support (literally answering the phones and helping people with their dialup modems, good times).

I had access to a Sun Solaris machine, which was labelled as 'sos.support.demon.net' (not exactly breaking any confidences now - plus I think that's probably been out of DNS for a long time).

It was where the emails for support@demon.net were directed; the majority of front line techs (which is what I was) would get a shell whereby you ssh in, and it loaded 10 mails from the mail queue, to which you could then respond.

Anyway, "keep on topic, Ben" - I have a habit of digressing.

Point being, the first servers I installed, I had to come up with a name for it. Initially, I didn't really put much thought into it... but a little while later, a friend and I had started to come up with ideas for our own business, and one of them we dubbed 'ChunkySystems' (I've still got the domain). It was partially due to a KitKat Chunky.

And so, one of the most important questions: What's the naming scheme for your servers?

In 2001, we went with snack food. I had two switches, called Kit and Kat, two routers called Monster and Munch, and then our servers had all a variety of names, Jammie Dodger, Hobnob... etc

In 2006, as part of a joint venture we built out a new platform and the naming scheme was 'Moons of Saturn' (I can't honestly tell you why).

That lasted quite a while, then the server names were mostly related around what the services were (Boooooooriiiiiiing....)

Then in 2019 when I started my previous venture, because our office NAS drive was called 'TARDIS' (because it was bigger on the inside... I'll wait for you to stop laughing...) I decided on a Dr Who naming scheme. Well, more than that, the company name was even based around Dr Who.

But as that scaled, it became impractical.

Naming servers fun things like 'TheDoctor' is all great whilst you're the only sysadmin, and whilst no-one else has any need or even access to the servers, but when you're sharing access with a team of people, "there's a problem on the mail cluster" - suddenly becomes more of a challenge to work out which of the silly names you're using.

So the naming scheme was mostly changed over the course of about 6 months, although I never finished doing it (long story).

So now the question is: as I build out new servers, what do I call them?

Well, it'll come to no surprise to my closer friends, yup - it'll be Dr Who related.

But I'll do it differently, in a way that 'scales' better. The physical servers themselves will have the character names (because that's a bit of fun), and the cluster names will be solar systems, but the individual virtual machines will all be very specifically named around their service.

This to me, seems a great way of keeping a bit of fun in the business, whilst keeping the day to day stuff serious.

I'll write again as/when I've got the next stages of our infrastructure ready (will be a little while yet) but I can still help with all sorts of things - take a look at www.spinningtheweb.co.uk.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

The 'return' to Mac (thank goodness)

29th August, 13:44

I'm pleased to say, it wasn't that long after my last post I gave in.

Don't get me wrong. I'm sure Windows is the right solution for many people. It was for me, for 20+ years. But I could not work efficiently on it.

My workflows, and the way I deal/interact with servers, and the software I am used to using now... it's all so much better on Mac.

That, and the M-series of Macbooks are just phenomenal. I opted for a Macbook Air this time round... as much as I love the Macbook Pro, if I was going to do it, the spec I'd want would not be the base model.

So for now, the Macbook Air is doing me proud. It's powering a 4K display with ease, and runs all the relevant software I need, from the Adobe CC suite, to XCode, and the usual stuff like VSCode etc...

The upside of this change-around, is that I can use the Windows PC I built for gaming. Ask me how many games I've played since I started my new venture? Go on... ask me...

(I know you know the answer: Zero.)

So here's the thing. I needed to build an amazing hosting platform, and having spent the last 24 years being involved in hosting, and the last 5+ years building a specific type of platform, what have I learned?

There are some 'must haves' and some 'nice to haves', and some 'really nice but really not needed for a while'.

The 'must haves' are pretty obvious.

Right?

....Servers.

Can't host much without a server.

So I'm focusing on Dell hardware again, specifically because I just love the iDrac software and I'm so used to it. I've looked at the solutions from HP etc, and I've used the SuperMicro IPMI stuff before, and iDrac is just... better.

(In my humble opinion, of course)

Key choices:

Proxmox: Hypervisor of choice; there are so many choices, and some might be right for you, but fundamentally, Proxmox is great. It's simple, it's fast, and it's based on Debian which between Debian and Ubuntu, are the main operating systems I've used this last few years.
Multiple Clusters: I've got an initial cluster which will be used for our own internal hosting requirements and very simple hosting customers. But then, rather than giving customers space on that cluster, I've decided to allocate hardware using MAAS and build clusters on demand. This means I've got more 'cold spares' right now, and I can run some 'hot spares' in the future too, but it means that each customer with complex requirements can have their very own cluster. Benefits to this are: more secure, less impact from hardware failures, easier to manage and having crunched the numbers - more profitable in the long run.
Deployment: This one I'm still a little hazy on, but basically WHMCS is going to be the system for me, I think. Again it's one I've used before, in fact my previous venture used it for quite a while at the start, and it can be configured to do all sorts of clever things. So the basic requirements (i.e. a single hosting account or single VPS / Dedi Server) can be deployed with ease.
Support: Again WHMCS has this one covered. I was building a bespoke support/ticketing system, I've looked at others off the shelf, and I also tried out Microsoft Dynamics, but many of them are overkill. A simple support solution from WHMCS is the winner.

So there'll be lots "coming soon" publicly, but if you need any help with your website/development or hosting, just give me a shout - details are at www.spinningtheweb.co.uk.

TTFN,

Ben

P.S. If you want to sign up and get these updates via email, signup here

The 'return' to Windows

19th June 2024, 08:24

Well, this was unexpected.

Without going into the real details of 'why' - the long story short is, I am starting out on a new venture, from scratch.

It all happened fairly quickly, and as a consequence, I had no 'personal' machine to use for work, really.

Well...

I have an old Windows laptop with an i5 and 8Gb of RAM. To say "insufficient" is a fairly big understatement, however, needs must.

It got me going for the first week or two, but then an old friend managed to find me a 2017 Macbook Air - woohoo!

Despite this probably overall being a lower spec than the laptop I was using, it was like 'coming home' again. MacOS really has come so far, over the last 5-10 years and makes working as a developer/sysadmin so much easier than Windows (for the sort of servers I deal with, obviously, I'm sure if you're a Windows Sysadmin MacOS is probably not your first choice...).

However, that laptop was plagued with issues, and is now cast aside, as it seemed to have some significant underlying fault which meant it rebooted randomly (not helpful in the middle of some code).

At that point, I remembered on the top of my wardrobe (I don't know why it's there, it was a convenient place to shove it?) was a motherboard complete with CPU and RAM (16GB thankfully) and it even had onboard graphics.

I had a spare 512GB nVME drive from my son's machine (we had to upgrade him recently) and despite the board being ancient it did support nVME (albeit I had to turn off two of the PCIe slots).

I went to a local computer shop, bought a case and power supply without even thinking of looking at the requirements of the board - and that afternoon built the PC and realised, the connector for the additional CPU power wasn't big enough - but... it booted. Marvellous.

Only, I needed to do some graphics video work, and it was horrendously slow. OK, not a problem, I thought... my youngest has been using a graphics card from 2015 and is long overdue an upgrade, I'll get him something reasonably cheap which should still be a good upgrade for him, and I'll take that one.

It arrived, and I popped it in his PC, and popped his old graphics card in my PC. I went to connect the power supply and oh... oh.... oh dear. This PSU had no support for PCI-e power. Sigh.

Back on the phone to my good ol' friend who sent me a new PSU. He sent me an 800w one, which was more suited to my youngest, especially with his new graphics card.

He was busy playing F1 2024, so I kicked him off and made him do the dishwasher whilst I swapped his PSU. I went to touch his case and it was as hot as lava. Oh dear.

Ok, fine. My case was a bit bigger, so I decided he could have that. I ripped out his components, ripped out my components, and built his PC back out.

It looked a lot better, a lot more room and airflow and I felt much more comfortable with that.

I went to put my motherboard in his case, and... you know where this is going don't you? The motherboard was too big.

*Insert loud depressed groaning Homer style*

But you know what...

Since the age of 13/14, I have used computers without a case. A case is just to make it look pretty, right?

So I found a side table that wasn't in use, and stacked it all very carefully, and I now have a working PC with a working decent GPU. And good airflow.

The downside is, I have to use Windows.

I'm lucky, in so much as I've used Windows since the early 90s, so whilst I prefer MacOS in every single way, I can at least use it in the same 'power user' way.

That said, I might start looking into a Mac Mini, because they're pretty damn decent and using my Windows laptop, I could then remote into the Mac so I'd always have it "on the go"... with another additional benefit...

For quite some time I've been wanting to sort out some sort of home server, for media and a variety of stuff, my router can do VPN obviously but I'd much rather have an environment like a Mac Mini which I can use.

Hopefully, it won't be too long before I'm writing about that setup.

TTFN,

Ben

Apple Macbook Pro M3 Pro: My thoughts...

7th March 2024, 19:24

This last month or so, I've been flat out at work.

Around about the last time I wrote a post, I was using my Macbook Pro M1 Pro, which I'd only bought last year, I think.

I had the 14" screen, and I forget completely but think I had one or two notches up on the CPU cores.

I was in the datacentre, and was sat cross-legged on the floor. This is not an uncommon position for me, with the laptop on my lap (living up to its name there). I was doing some relatively essential network maintenance, and really needed to pay attention, and so whilst I was focused on what I was doing, I wasn't so focused on the actual precarious positioning of the Macbook.

I actually can't remember why now, but I leant over to my left to get something from my bag. Possibly/probably a dongle; either way, the laptop slid off my lap. The screen was open, because I was using it, and as it slid it tipped backwards. It fell perfectly in between the bottom of the frame of the rack, and side wall of the rack. There was a gap not much bigger than the laptop screen itself.

Naturally, I kept perfectly calm. No, wait, I panicked. I grabbed the laptop reflexively and by doing so, caused the corner of the screen to crunch against the underside of the rack. The screen went black immediately. My heart sank.

Not only is this the first laptop I've "broken" in probably 20 years, I had literally just asked if I could upgrade from 14" to 16" because I was working so much more in the evenings now, I needed/wanted the bigger screen.

An apologetic note to the MD later, and we had agreed I'd go via the Apple store on my way home, and pick up a 16" replacement. I had decided I could have the screen replaced, and that could go to someone else at work, as an upgrade for them (as they're using the 2020 M1 Macbook Pro). It's a brilliant laptop, but - if you're getting old like me and eyesight is not what it once was - the 16" makes life a lot easier. More screen real estate but basically, bigger brighter and better.

Anyway... I have used a lot of different computer hardware over the last 4 decades.

There's not been that many occasions where I've sat and looked at something, and thought "this is the best thing I've ever owned".

Things I can think of (this is not an exhaustive list):

The Nokia 5110, which I got in around 1998, I think
The Hollywood Decoder card (that's as much of its name as I can remember). A PCI card which took input as VGA, and outputted VGA, and made it possible to play DVDs with its hardware decoding
A brand spanking new Toshiba Satellite Pro, which replaced a crappy old Toshiba Satellite Pro thanks to an insurance claim (I forget what I did, but I broked it)
My Sharp 903 flip phone, with a screen that flipped around so you could watch stuff with it folded back (the screen quality was amazing)
My first iPhone - the 3GS
My original Surface tablet in 2013
The iPhone X, it blew my mind - was 100% "the best phone I've ever had"
Subsequently, the iPhone 13, a phone I still own today because it just meets every need. (I will 100% upgrade to the 16 in September, but for now, this is ace)
My Macbook Pro M1 Pro 14"... hands down I had said this was the best piece of tech I'd ever owned at the time

The Macbook Pro M3 Pro, takes it to a whole new level. Yes, it costs a fortune, especially if you don't go for the base model (don't go for the base model.... :) but it is worth every. single. penny.

There are 1001 proper tech reviews that'll tell you all about this in infinite detail. So I'm not gonna do that, because [a] I cba and [b] well... see [a].

But what I will say is this:

Point 1: The hardware

The hardware is just lovely. The keyboard is one of the nicest I've ever typed on, the trackpad is just flawless, the screen is incredible, and given it's size, it's pretty lightweight.

In addition, whilst I'm talking hardware, ports. Magsafe, USB-C/Thunderbolt, headphone jack and HDMI; it's a perfect balance. There's enough of stuff that you can get by without having to carry 1000 dongles, but not so much it forces the laptop to be thicker or heavier.
Point 2: The screen

Yes I know I said this in point 1, but I'm raising it again because you didn't listen. This screen is incredible. I can't describe how good it is. Go to a store, and look at one, see for yourself. It's amazing.
Point 3: The battery

Battery life on this is incredible. I often use it on near max brightness and have elevety billion apps open, and still will get a full day's usage out of it without even blinking - but if I know I'm going to be using it over a more extended period, I'll make sure I keep the heavy duty apps (Adobe, XCode, etc) closed and only open on demand, and I'll use it at "normal" brightness, which is still ace. I have never come to the point where I have thought "Oh no, I'm running out of battery I need to charge it" - I have always run out of energy personally and then gone to stick it on to charge.
Point 4: The biggie - software

Right. Let's clear something up. I was a die-hard Windows fanbois. From the mid 90s, to my initial flirt with Apple in 2015 (which lasted until about 2016) - up to 2021, I was adamant Microsoft was better. I wasn't being biased on purpose...

The main reason I loved Windows so much was because, I had used it so long and was such a strong power user of it, I could do anything faster in Windows than I could do in any other platform.

So the natural conclusion I always made was: Windows is better.

The thing is, I still believe this was true for a large portion of my time as a Windows user. Come at me, I don't care. But I really think Apple didn't really care too much about the Mac, from the late 2000's when the iPhone was sky-rocketing, up to the late 2010s. And that makes sense, right? The iPhone absolutely had to be their primary and almost only concern.

But then, from 2018 ish onwards, I think you can see they really started to get serious. From 2020 with "Apple Silicon", well... both the operating system and hardware started to take a real turn for the better.

And as I write this today, on my lovely Macbook Pro, I can hand on heart say, the MacOS Operating system is easier, more intuitive, more secure and more stable than Windows ever has been, is, or probably will be.

So, there you have it. My highly detailed scientific analysis on why you should get a Macbook Pro *OR* Macbook Air - the Air's are awesome - and with the M3 Air just being released... to be honest - for even fairly moderate workloads, the Air is insanely light, looks slick - and I'm sure has the incredible battery life that we've come to expect of the M-series laptops.

TTFN,

Ben

Why we're now ditching VMWare, and moving to Proxmox and Ubuntu/OpenStack

23rd January 2024, 14:07

January has flown by. I blame VMWare.

Well, Broadcom.

This month has been mostly chaos. But I'm getting back to it now, so I'll be back to normal soon, and back to blogging more often.

I spent around 2 years learning VMWare, getting everything as I wanted, built out a new cluster and bought in new hardware, getting everything absolutely perfect.

By end of November, we had got everything fully running/transitioned to VMWare and vSAN... and we were happy. Then I read about the Broadcom/VMWare acquisition.

"Fear not!", I thought to myself... "There's no way they're going to ruin the Partner Program or Cloud Services stuff, that must be a huge part of their revenue."

In December, we got a rather odd email about the future of the partner program, but I thought, hey that's OK... they just want to get fresh agreements setup and everything will be fine.

Or... not...

I'd emailed our supplier asking for clarification, they emailed me back a standard response that smelled exactly like "We have no idea either. Sorry."

Come January, the announcements and realisation hit home. Oh, and the updated pricing. We were looking at a 10x increase on our usage, and given our plans for more hardware, it would probably be more like 20x.

I mean talk about a big "screw you" to all their existing customers. I find it really disgusting behaviour, and no matter what they come up with in the future/shortly for their cloud services equivalent, they have shown us they can't be trusted.

But here's the thing: we are fortunate. Yeah I'd put myself through hell at the end of 2023 getting migrated fully to VMWare, but you know what? There are companies in way deeper than we are/were, who have no choice but to pay the new horrific pricing.

Maybe it won't come to that. Maybe they'll negotiate. Maybe this is all a storm in a teacup and by the end of this month everything will be fine.... I can't take that risk, when I'm trying to build out the infrastructure for a solid, strong reputable hosting platform.

So after much consideration, I've decided to return to my favourite thing of all time: Ceph! Hurrah for Ceph. At least I know it well, eh.

I'll write more about our new structure as I deploy it, as I think it'll be really interesting. I'm taking the last 4 years knowledge and experience in Ceph/LXD, and taking what I've learnt from implementing stuff under VMWare, and building out the best platform I can possibly ever build.

In summary though, we're going to have two clusters. One is Proxmox/Ceph, and the other will be Ubuntu/OpenStack. Both will be powered by Ceph, although I'm going to run two Ceph clusters as it'll allow me greater flexibility.

I'll be really interested to see the performance, to be honest, especially as I think we're going to be making use of FreeBSD a lot... but more on that another day.

Ben

VMWare 8 -> Ubuntu 22.04 Ceph via iSCSI

2nd January 2024, 15:33

This will only be a short post, because I'm going to briefly explain why I stopped using VMWare/Ceph/iSCSI. I should preface this by saying a few things:

I am new to VMWare; I have been using it around 2 years, but previously was using 6.7 and 8.0 felt quite different in a number of ways
This was all done under a very limited timescale, and I plan to return to this configuration at a later date; in fact, as I see it, this is the only way for us to build our platform going forwards
I am reasonably experienced with Ceph, but had up to this point, only used it with Proxmox natively
I am writing this all from memory of what I had deployed, so if I have misremembered something... sorry, I guess?
But the "biggest issue" I faced, was that I had already started using the Ceph cluster in production, feeding storage to our Proxmox servers via RBD. This meant I had to go *very* careful when setting up the iSCSI.

Getting started...

So I know the outcome of what I wanted to achieve. Having not setup anything along these lines successfully before, I decided to turn to Google to spend some time researching. Along the way I found a lot of people talking about issues with speed/performance, but nevertheless persisted with my research on the best way to set it up.

I found a couple of guides, walked myself through them virtually (i.e. I didn't do what they were saying, just followed the steps so I understood the process...

I'm not a fan of doing stuff in the GUI, because you don't get to understand what's going on underneath, but in this instance I decided, in the interests of expedience, I would use the GUI control panel (Ceph Dashboard). It made it remarkably easy, and I followed the guides to set it up.

However, this probably shot me in the foot, because no less than 24 hours later I was pulling my hair out.

I'd got it functional, but the 'disk access time' was pathetic, i.e. I would be getting around 5MB/s write speed, compared to 2GB/s on Proxmox.

I ended up trashing the configs, and starting again, and followed this guide very closely. Same performance issues.

I was going crazy. In the end, because I had to migrate the stuff off our old VMWare 6.7 setup, I setup an NFS share and mounted that. I got roughly 200MB/s.

Still crap compared to the Proxmox RBD mounting, but that was good enough things would work. So I moved everything over, and we got rid of all our old hardware.

I then turned my focus back to iSCSI, and I tried all manner of tweaking/adjustments (you tell me what you think it is, I'll bet anything I tried it). The issue was, the NFS mount was not good enough for us to run on long term, and I took the view: We had to get up and running fully on the new hardware, we didn't have time to be diagnosing this, and I knew long term, my goal was to redeploy the Ceph using FreeBSD anyway, so... let's move to vSAN.

The process for this was reasonably simple. I formatted/installed ESXi on one of the Ceph nodes, brought some storage online, moved everything onto that storage, formatted the other 2 Ceph nodes, and setup vSAN, then migrated everything on vSAN. And hey presto, we were getting every faster write speeds than before.

For now, VMWare/vSAN remains our sole platform, but I know by the end of 2024 we'll have a FreeBSD / Ceph cluster setup. Now that I'm writing here on a reasonably regular basis, I will make sure I document the whole process, and I'll detail everything here, sharing the performance statistics we get along the way.

TTFN, and HNY,

Ben

Configing a Ceph Cluster with 4 x 10Gbps (LACP) using Ubuntu 22.04

28th December 2023, 13:53

So I had said I'd dig into the configuration I had setup with VMWare->Ceph cluster, but I think the better place to start is actually how I had setup the Ceph cluster in the first place.

I'll make one note first; this is not a beginners guide, there are a million of them. This is not a step by step guide. This is merely the ramblings of a person who has setup Ceph eleventy-billion times and both destroyed and fixed Ceph clusters just as many times.

My first introduction to Ceph was using it built into Proxmox. Having learnt about how great Ceph is, I had decided that in this scenario, I wanted it standalone / separate from the hypervisors, because that would allow me to use multiple types of hypervisor, and just mount the storage in a variety of ways.

For example, once you have an operational Ceph cluster (and I used Ubuntu 22.04) adding on of the extra services to provide access to the storage via iSCSI, NFS, and I am sure many other ways, was pretty easy.

The hardware

We are using Dell R650 servers, with 7 x 7.68TB nVME U.2 drives.
Each machine has 128GB of RAM, and 2 x Xeon Silver 4310 CPUs.
Each machine has 4 x 10Gbps SFP+ network ports, which were connected to two separate Cisco switches (2 ports in each).
The servers made use of the Dell BOSS (Boot Optimized Storage Solution) card / 2 x 240GB SSD boot drives in RAID 1.

Setting up...

Unfortunately because I'm writing this very much 'post-mortem' (i.e. this solution is long dead) I don't have a great record of exactly what I did. I will attempt to give you an accurate representation of what I did to have a very good, stable Ceph setup providing storage via RBD to Proxmox, albeit, I had issues with iSCSI to VMWare which is why this is now dead.

So the first thing to do, was get the physical setup sorted. Because LACP was required to utilise all 4 SFP+ ports, I set up the machine for ease using just one, with a straight forward configuration on the switch.

I always opt for the bare installation when setting up a server; I'm a big fan of only installing things as you absolutely need it, rather than just installing lots of crap you'll never use.

Now there are a million ways to deploy Ceph, I used this guide as my template to getting started. If you follow Canonical's guide, that uses a different process. So, don't get confused.

So I would have started with:

# apt -y install cephadm

Now before going too much further, you need to understand about the networking for Ceph. Getting the right network config is a big deal. I've messed this up 10,000 times. I'm not a big fan of linking to 3rd party stuff because links change over time but this is all I can do at this stage as I'm not rewriting all of it. So take a look at this to get an idea.

Basically the idea is you have a public, and a cluster network. Both despite the name, should be 'private', i.e. don't think of public as in Internet acessible. It's a way of seperating out the traffic for the Ceph cluster and the drive storage itslef. In an ideal world, you want to separate these in VLANs at the very least, but if possible a separate physical network is better.

That said, I have never had a separate physical network for it, because I haven't had a high enough quantity of network interfaces at the speeds I want. i.e. I would rather VLAN off a segment of the 40Gbps LACP bonded connection than I would do 2 seperate 20Gbps networks.

So essentially, you need to setup VLANs, and virtual interfaces on the Ubuntu machines, making sure they're all subnetted nice and differently for ease / identification purposes. If you're doing this at home / in a test scenario, you can get a vaguely reasonable performing test scenario using 1Gb ethernet, by the way, just don't expect it to handle any large throughput or deal with failures well. But for learning, it will work. So then you just need a switch that can handle VLANs and that can be bought on eBay super cheap (Cisco 2950 or similar is a good shout for learning).

So let's just say, my network was setup along the lines of, 10.10.0.0/24 for the public, and 192.168.0.0/24 for the cluster.

So then I would have done:

# cephadm bootstrap --mon-ip 10.10.0.10

I always tend to leave a gap from the start of the range to allow for any other networking gear, i.e. routers or whatever else might crop up/be needed, so my nodes would have been .10, .11 and .12

I then (following the guide) installed the ceph command line tool; this was quite important for me as while you can do everything using the cephadm tool, I was very much used to using the 'ceph' command tools, so finger memory was my main driver here.

# cephadm add-repo --release <insert current version>

The current version I used was quincy, but the current version as I write this appears to be reef.

# cephadm install ceph-common

So you should then be able to run 'ceph status' and get an output of some sort. I can't remember what it was exactly but obviously at this stage you don't have any other hosts or drives.

Now before you go much further, you need to get the root user's SSH key, and copy it to the root user on the other servers AND you need to enable root login via SSH if you're using Ubuntu 22.04, as it's disabled by default (as it should be if it's a public facing server!).

As I said at the start, this isn't a step by step, so I'll outline it as this:

Edit /etc/ssh/sshd_config on each machine and enable PermitRootLogin to prohibit-password (so keys will work, but not password).
Copy the contents of /root/.ssh/id_rsa.pub to all other servers authorized_keys file (/root/.ssh/authorized_keys), and likewise for the other servers into this one (i.e. so the root user from each machine can ssh to each machine)

I then installed Ceph Dashboard, as I wanted to do this all via CLI, but I wanted to be able to have a nice pretty web interface to look at everything. It also meant for other staff members who weren't so familiar with what I'd done via CLI were able to get information out or carry out basic tasks/maintenance.

According to the docs, you do:

# ceph mgr module enable dashboard

I vaguely recall I had problems setting this up, but I don't recall the problems specifically, so... good luck :)

Once enabled and I had a web interface, I then added the other hosts:

# ceph orch host label add 10.10.0.61 _admin

...and repeat for .62

I then did:

# ceph orch apply osd --all-available-devices

...and, voila. Ceph was (after a little time) showing as healthy.

All that remained, was to setup the storage pool, then work out how to share it. I'll go into all that next time!

TTFN,

Ben

Why we're using vSAN instead of Ceph, and why that will eventually change...

17th December 2023, 17:58

The overall idea for our hosting platform, is to provide containerised hosting; there are so many different ways to do it, and if I were to start again RIGHT NOW, I would do it a different way. However, I'm not starting again any time soon, and I have plans to slowly phase our way into my preferred setup, but it's not as easy as simply changing some packages/software and off you go. Especially when you're dealing with an ever increasing number of websites, email accounts, DNS zones and overall infrastructure.

How I planned our platform to go "for now"

When I first setup DataLords I had no real money to speak of, i.e. The entire platform was self-funded from my own business and/or personal income. That all changed in 2022, but I knew there was no point in going out and buying all the hardware then because by the time we would be ready to "go live" we'd be able to get more/better hardware for our money.

I wasn't wrong.

It wasn't until around April/May of this year (2023) that I was ready to say, "OK, we now REALLY need to upgrade our hardware"... we had to date, been running on Dell R630, R620 and R720 servers. The old configuration was:

VMWare servers, running vSAN, for our "VIP" and our 100% SLA hosting on the R630/R620 servers
Proxmox servers, running Ceph, for our "storage heavy" requirements, i.e. Mail, primarily, on the R720 servers

The Proxmox servers also ran any staging/dev stuff, and I did run some services split over both, i.e. DNS (1 on VMWare, 1 on Proxmox, we have 1 in Azure), but basically because with VMWare, we pay for RAM usage in GB and equally storage used on the vSAN in GB, it was better to put any stuff that was less "mission critical" on there. i.e. it was still HA hosting but it may have had up to 2 mins of downtime.

Anyway... there were things I wanted to do but when you're running a production platform, as I said a moment ago, ripping it all out and starting again is not a simple task.

But in July of this year I was fully in the flow of designing our new server infrastructure, and I wanted to make it easily scalable across 3 fronts.

First and foremost, I wanted the 'compute' to be scalable. Second, I wanted the storage to be scalable. Third, I wanted our network to be scalable.

So I planned:

VMWare Cluster: A 'compute' only cluster of machines, using 2 x Xeon Gold 6338N in each, with 256GB RAM initially, and 4 x 10Gbps SFP+ Network ports, allowing us to have 40Gbps connectivity across them.
Proxmox Cluster: A 'compute' only cluster of machines, using Xeon Gold 5317 CPUs, and 128GB RAM in each, and 2 x 10Gbps SFP+ Network ports, these would be used more for staging, and the mail platform.
Ceph Cluster: A 'storage' only cluster of machines, Xeon Silver 4310 CPUs and 128GB RAM in each, starting with a total of 21 x 7.68TB nVME U.2 drives, giving us 161TB of raw storage initially (with plenty of room for expansion). These machines also had 4 x 10Gbps SFP+ ports.
We had 4 switches in total with 96 x 10Gbps ports, and the routers were going to be run on Virtual Machines on VMWare using FRR as the routing software

The above setup meant that Ceph was going to be my storage platform of choice, and I could then mount that sort of 'natively' with Proxmox as that would use RBD (RADOS Block Device - and RADOS is Reliable Autonomic Distributed Object Store) and then I could mount storage to VMWare via iSCSI.

So in theory, we have VMWare machines, and Proxmox machines, all sharing storage from machines running Ceph (on Ubuntu 22), and everything pretty much was over 40Gbps so should be plenty fast for our current requirements. I could have looked at 25Gbps instead, but I felt given the cost increase for the necessary hardware and our current requirements, it was logical to setup using this, and once we get to even 20-30% utilisation of the new hardware, we can look at upgrades throughout. After all, the majority of the actual hardware would be fine, it would be things like Switches and Network Cards that need replacing.

Anyway... Ceph.

I have built a Ceph cluster what feels like 300 times.

I have destroyed Ceph clusters by accident. I have brought Ceph clusters back from the brink of destruction. I have had Ceph clusters running spectacularly slowly, and been unable to work out why, but have been able to get exceptionally fast performance and reliability from the same cluster after days, weeks, and months of working at it.

Benefits of Ceph

So Ceph, if you don't know, is a way of doing something a little like RAID 5 across multiple servers. The actual way it works, is a little like this:

Imagine you have 10 servers with 10 hard drives, so a total of 100 hard drives, and each hard drive is 1TB.
You setup a pool of storage and set a replication size on it; I have always done 3, I think you can get away with 2 but basically what you're saying is, "I have 1MB of data to write, how many times should I write it across different hosts/drives? And 3 copies is always better than 2 for redundancy.
Ceph then has managers and monitors which keep watch on the hosts, and OSDs (a hard drive/SSD basically) and should a drive fail, it will 'rebalance', where it uses data from the other hard drives to re-replicate the missing objects.
The issue you have to watch out for, is over-filling the raw storage, and then having a host fail, and getting to the point where data can't be re-replicated and is just plain unavailable, and in that case you will probably start having servers/virtual machines failing like crazy.
There's a lot more to it than that, but if you want to experiment, I would 100% recommend you do it natively/set it up manually rather than using Proxmox. Not because what Proxmox implement is no good, in fact - if you just need a rapid deployment and don't have the time to learn about it at all - it's a good quick fix... but I learnt more about Ceph by doing it manually over 3 months, than I ever did using Proxmox. Check out the Ceph documentation here.

I had spent a long time planning how the platform was going to work, and had set myself a timescale: 1 month.

1 month to implement the new servers, and transfer everything over. On one hand you might think "that sounds loads of time" but if you've ever dealt with moving hosting around you know that's pretty tight, but the simple reason was, we were going to literally double our power usage in the Datacenter, and I wanted to keep costs to a minimum, so by spiking our usage for only 1 month, it was the best use of time and money.

Servers physically co-located, we got things installed and I knew CEPH was going to be my biggest pain, but I got underway. I was having trouble getting the right performance from the VMWare iSCSI connector to the Ceph cluster, and spent a long time working on that. More time than I should have. Then disaster hit. Our old platform went offline, everything start paging down at about midnight on the 16th August. I can remember that date specifically because we were away in London with the kids for our wedding anniversary, and all I took with me was my iPad "in case of emergencies".

The old platform, which I knew was at its limit, had run out of space on the vSAN; something that was actually caused by a rogue backup script, but that didn't matter - what mattered was getting it back online, which after significant stress, I did by around 1am. I was on the verge of calling a taxi to drive me back home, and to then drive me to the datacentre, but fortunately, managed to sort it.

Anyway, this meant I had to accelerate our plans to move off the old platform, as I had semi-resigned to moving the servers over a 2 month period instead, but given this, we pushed hard to move everything fast. I wasn't getting the right performance from VMWare, so I loaded virtually everything on to Proxmox, removed 100% SLA from our services (fortunately we didn't have anyone that actually required that, and I gained consent from a couple of clients that had 100% SLA to move to 99.999% SLA).

Everything was running really fast, and was running swimmingly, but there was a definite problem with Ceph -> iSCSI.

I know I haven't quite got into why I'm using VMWare entirely, yet, but you're probably getting the picture.

I'll dig into a bit more next time, including some of the actual setup we had, the Ceph -> iSCSI configuration and the performance results we were getting, and will explain exactly why I chose to use VMWare vSAN instead of Ceph for now.

In the meantime, I'm off to cook dinner.

TTFN,

Ben

How I ended up using Proxmox and Ceph to get started...

27th November 2023, 20:49

Back in 2018, I ran a web agency where we dealt with hosting too. A pretty typical setup for 'web design' agencies.

I'd been dealing with hosting commercially since 2004 (but started playing around with it in 2001), but long story short in 2011/2012 following a "disaster" I decided to use our co-lo supplier's hosting solution. They used VMWare, and they provisioned us with dedicated machines as we needed, and this worked pretty nicely for a long time. Eventually, we simplified and simplified until we had just one big cPanel server dealing with everything.

It made use of the CloudLinux features which meant we could jail off individual users from each other (i.e. to prevent one user from taking down the server) and to secure it a bit better, as well live kernel updates, etc etc... All the stuff you'd probably expect as standard these days with cPanel.

The problem was I would find from time to time, despite settings being correct, you'd get sigifnicant slow down when a few of the sites were really busy... and as much as I wanted to fix/deal with that I was really trying to focus on building the agency.

So I did the "sensible" thing, and sold the hosting base to our supplier, on the basis they'd deal with the support. That way I could sell a website, and not worry about the ongoing actual hosting platform/maintenance.

Except I realised shortly after just how much "personal crap" I hosted, for myself and family/friends that I didn't charge for. It was a LOT.

So fast forward, I'd tried renting a dedi from a couple of companies; I didn't go to our main co-lo supplier for a couple of reasons but primarily I just wanted a cheap VPS and that wasn't really their business.

But I was continually frustrated because the hosting had loads of problems, and the control panels I was using were just damn horrible/ugly. I won't name names on the suppliers because that's not what this is about. But I hated them.

The frustration led to lots of conversations with my techies in the web design business, which eventually led to.... "OK, let's do this, let's start a proper hosting company."

I went to visit a datacentre over in Gatwick - 4D Datacentres. They're really quite good, the only reason I left was because I just couldn't deal with the journey... at first I didn't mind it, but when you need to literally go swap drive, or install some new RAM, a round trip of around 4 hours was really quite annoying.

Whilst I was having my initial tour there, they asked if I'd sourced the hardware yet, to which I said no... I'd explained this was very much a "proof of concept" because I had this idea on developing containerised hosting, but hadn't really got much further than that. They put me in touch with a company that did me some stellar pricing on some servers - I mean, I couldn't quite believe the deal I was getting.

So after doing some research, and testing, here was the idea to get us developing/off the ground:

Physical servers (4 of them)
- 32GB RAM on each server
- 2 x Xeon E5 CPUs on each server
- 4 x 240GB SSDs on each server*
- Quad 1GB Network cards on each server
Networking
- 2 x Cisco 2950 Catalyst Switches
- Routers to be FRR running on Debian, as virtual machines

*I only added these *AFTER* I had bought them; i.e. when I first bought them I had planned to buy a SAN from eBay or similar...

Bearing in mind this was going to host stuff that could get away with being powered by a potato, this was pretty overkill, but I felt like this was good.

Each server had 4 network cables going to the switches, which were stacked (which meant the two switches acted as one switch) so 2 cables to each switch. I then configured LACP, so that essentially, the servers had 1x4GB connection.

I then needed the software. Ideas I'd thrown around included using FreeBSD (my one true operating-system-love), Windows Hyper-V, VMWare, Proxmox, and quite a few other solutions too. To put it into perspective, I'd been looking at these solutions for around 6 months before we actually deployed.

But storage was a tricky one. I didn't want a single point of failure, anywhere, if I could help it, even for this crappy little test cluster (which by rights was still pretty powerful).

I started looking around for SANs, and in fact, I bought one. I remember thinking if this got to production level, we'd just make sure we have two of the same bits of hardware to give us the resilience, but it wasn't worth it just yet... I can't remember how much I paid for it, but I never got it working. I picked it up from a computer shop in London somewhere, and brought it back to the office, powered it up and got nothing; I don't think it was broken - I just didn't know what I was doing. You see, whilst I've been a techie geek all my life, from probably 2013 to 2018 I had solely focused on marketing/web design, and that seemed to rot my nerd-brain.

I was also doing this all as a 'spare time' project because I was still running the web agency.

So... I spoke to a friend who dealt with this sort of arhictecture all the time, and he said to me "Why don't you use some vSAN thing or something?" - and as he was saying that I was Googling it... and oh how my life changed at that moment.

I ended up giving my friend the SAN, I've no idea if he got it working, actually (I'd forgotten about giving it to him until I wrote this).

I had Googled for Proxmox and vSAN and found people talking about Ceph. Excited, I went to Currys (yes, Currys) and bought 16 x 240GB SSDs.

I was delighted to find Proxmox had Ceph built in, so much so, I could get it up and running using the GUI and doing point and click.

I very quickly found, however, that I couldn't rely on just the point and click. For two reasons.

1. Point and click masks too much of what's actually going on; if you're going to base a business on something like this, you need to know how it works.

2. I'm pretty sure it was within a matter of days I'd completely trashed my Ceph Configuration so much so, I had (for the time being) lost all the VMs on it.

I spent the next few days really working hard on it, understanding more and more as I went, but really becoming to understand just how important the /etc/pve directory was. This was very much a Proxmox specific thing, but fundamentally it comes down to how very careful you have to be with the configuration files relating to both Proxmox's cluster as well as Ceph configuration files.

A lot of problems I had, came from changing network IP addresses. As time went by over the coming years, I became very, very, very paranoid about changing ANYTHING about the Ceph configuration once it was deployed.

So now that background's all out the way... my next post will be all about Ceph in more detail, and why I stuck using it from 2019 until 2023 (October 2023 was the last time I used it, and it was an independent cluster to Proxmox and VMWare... but... spoilers...).

I'll talk more about that next time, and will go into more depth about the configs/setups (as best I can from memory) that I've deployed. (I've deployed Ceph probably, I think, about 6 different ways over the last 4 years).

TTFN,

Ben

"I’m so glad we’ve never had an outage of that scale. It’s scary to know it would be all on me to fix"

20th November 2023, 23:27

As a company, it's always easy to say, "we're dedicated to our customers", and "we strive to ensure zero downtime", etc, etc... blah blah..

I've always been insanely serious about uptime and availability. I think it stems from my time at Demon Internet back in the early 2000's... And I don't mean the lessons they taught me, merely that I was trying to setup a service that imitated them as much as possible (with regards to hosting; I didn't have a massive leased line and bank of modems to offer dialup).

Whilst working at Demon, I had created a service to provide a Unix shell to my colleagues. This was using Sun Solaris, because that's what Demon used, and if that's what Demon used, that's what I should use. That didn't last too long, I soon replaced it with my all-time favourite operating system: FreeBSD.

But even back then, when I needed to do maintenance, take the server down for a reboot, or do anything which disrupted people's use of the system - I scheduled the maintenance in advance, emailed out about it multiple times, and then did the maintenance between Midnight / 2am, or similar. I guess doing nightshifts at Demon kind of prepped me for that, but even so, I was more just keen to ensure that we NEVER appeared to be offline.

When I first got serious with DataLords, I had decided to try FreeBSD, but I'd really forgotten how much more "serious" it is... getting certain services up and running can take a lot longer, and more importantly, there are fewer people using it so when searching good ol' Google for issues, you can often come up against a brick wall, having to resort to documentation, log files, and in a few instances... RFC documents (and boy oh boy, are those a riveting read...

So for a large part of our infrastructure, I took the easy way, because whilst I want to build everything on FreeBSD and have lots of cool geeky setups (and for the record, I still maintain that certain services run significantly faster in FreeBSD, it's not just the geeky setup...) the main thing for me right now is to ensure that we have a well supported, well documented, and most importantly - well managed platform.

Building a hosting company is no small task. Especially when you're doing what we're doing: Everything.

That means we're building, from scratch:

Web servers
- Front end web servers/load balancers
- Containers/environments for the individual websites
- MySQL Servers
- Automating SSL Certificates
Email servers
- MX Records, including spam filters
- Back end mail store(s)
- Front end mail servers for pop/imap
- Webmail interfaces
- SSL Certificate management
DNS Servers
- Front end DNS servers
- Back end database/management for DNS
Backup services
- Onsite backups
- Offsite backups
- MORE Offsite backups
- Interface to manage these backups and restore with ease
Management system
- Front end customer portal
- Domain name management/registration system
- Back end management
- DNS Control Panel
- Mail Control Panel
- Support ticketing system

The above doesn't even cover it all. We have dozens of machines managing each part of our platform, and we've written hundreds of thousands of lines of code to make it work.

You might ask 'why'... and that's another story for another day, but right now, focusing on building a resilient platform, what's the one thing you should avoid?

SPOF.

Single Point Of Failure.

Our entire platform as I've designed and built it, has been created to ensure that there is never a SPOF. Each server has at minimum two physical network links, although most have four. They feed into different switches which are all stacked, so as to ensure speed as well as resilience. Each server has dual power supplies, fed from A/B sources, and each machine we use has 2 x SSD Boot drives in RAID 1.

As for the storage solution, we were using Ceph. I'm going to talk more about this another time. We decided to move to vSAN. I'll talk about why another time.

We're using vSAN ESA, a relatively new product from VMWare (and this is one of the reasons we switched).

It does basically RAID5 over the servers; similar to Ceph but I'm undecided if it's "better". Anyway, the mistake I made recently, was by thinking that by having this vSAN, which uses multiple servers and multiple drives, we had eliminated any worry about SPOF in this scenario.

How wrong I was.

Monday morning, 13th November, I woke to a storm that was not just brewing, but that had wreaked havoc and destruction on our very comfortable lives...

Long story short, the vSAN was "down" for one particular machine, and that particular machine was an NFS server powering a whole bunch of other (very important) machines. I had spent just over an hour trying to resolve the issue, and this is a difficult thing to do when you've got the whole World crashing around you. You have to remain focused, because everything depends on this coming back online, but you can't let that pressure break you - the more stressed you are in your mind, the less likely you are to find the issue and resolve it.

By the end of the hour, which was as long as I gave myself to fix it on my own, I had determined that the issue was specific to just this NFS server (something that took me a short while to realise) but more importantly, that it wasn't "down" - it was just very, very slow for certain parts of the storage. i.e. I could SSH into it fine, I could do disk writes in a good speed, I could search all the files on the server - it LOOKED fine... and that was what slowed me down.

Throw me some errors to Google at least, right?! But no, nothing...

It was around this time, a friend of mine (who I'd been giving practically live updates to) said "I’m so glad we’ve never had an outage of that scale. It’s scary to know it would be all on me to fix".

Fortunately, we have a great team in-house but also, I was able to utilise our support with VMWare to quickly identify and (temporarily) resolve the issue.

But the point here, was specifically that we thought we had a fully redundant storage solution, but really, it's not fully redundant. Do we need to replace it or add a secondary system now? No... but, plans are in hand to make this better. The likelihood of the same issue reocurring are miniscule, specifically because of the causes (...ahem... human error is a phrase that's often thrown around in situations like this...).

TTFN,

Ben

"Those will all be dead within a year"

11th November 2023, 17:11

The year was 2019.

The month, was April.

The day was.... yeah I don't know.

But this was roughly the time I placed the order for the initial 'proof of concept' cluster I setup for DataLords. To recap, I knew I wanted to offer web hosting, I knew I wanted it to be superior quality, and I had this sort of idea about containerised hosting, but I hadn't really got any real deep experience of it. The majority of the previous 12 years I had spent focusing most of my time selling websites and marketing. I had still been doing a lot of server related stuff, but the majority of it was managing CentOS / cPanel servers, not exactly... "high skill" stuff.

In the months that had led up to this, I had been researching and chatting to my friend Kris, a fellow nerd, over the way we can do these solutions. In between the typical sarcastic comments and suggestions, I got a lot of good information and advice. I had heard a lot about Proxmox, a hypervisor which ran on Debian and basically used LXC / QEMU for containers and virtualisation.

I didn't have much money to invest into this at this point, after all, it was just an idea, and fundamentally I *could* have just rented a dedicated server (which is what I was doing) and lived with it.

But before I really delved into the containerisation of everything, I wanted to make sure I had a way of handling storage safely. I planned to buy in servers from eBay, something I've done in the past, I wasn't happy with it but I couldn't afford brand new, and whilst I'd been out of the 'nerd circle' a while, I was confident after a couple of weeks of researching I'd found the of hardware I wanted. We visited the datacentre, 4D Datacentres in Gatwick, and I fell in love with the service, offering, everything. When we were chatting, I can't remember who it was, but one of the 4D people suggested I check out this company (LA Micro) for servers as they do Dell reconditioned servers and they come with warranty.

That sounded really good, and after a chat, I'd picked up 4 x Dell R620 servers with 10 x 2.5" drive bays, 32Gb RAM in each, and Dual Xeon E5 CPUs.

But here's the bit that'll probably make you go slightly wide eyed, and echo the words of my friend who said "Those will all be dead within a year"... I got literally the cheapest 240GB SSD drives money could buy. Literally think I spent £20 each on them.... they were PNY - and they ALL survived the testing period which I think lasted around a year... but they weren't meant to last more than a year anyway!. I ended up donating them to my old high school after I was done with them. I had just needed a proof of concept, I needed something to play around with that relatively accurately mimicked what I actually wanted to implement.

As it turns out I used consumer SSD drives going forward for quite some time, and you may be surprised to learn that overall, out of probably 100 drives, I only had a handful of failures.

Actually, that's not true...

Around a couple of years ago, I bought in 10 x 1TB drives from a friend of mine. I had previously only used Samsung EVO, Kingston, Crucial and PNY drives (I was really impressed by the PNY to be honest, read/write speeds weren't terrible (especially when you factor in the cost). But my friend convinced me to buy TEAM drives, because they come with a lifetime warranty. Sounded decent, as I knew they'd fail if I kept them in long enough, which with these I had planned to keep in a bit longer.

All but two had failed within a matter of months. Some literally failed in the first few months, some lasted up to 6 months, and a couple held out a bit longer, but I ended up switching to 2TB drives and went back to Samsung.

Interesting stuff, eh.

I also build a backup cluster, to run in Brentwood where our office is, and these I decided to use spinning disks for cost per GB. I got 9x5TB drives, I think Western Digital, and set them up using Ceph (I'll talk more about this another day)... but I found a fatal flaw with these. Their write speeds were so poor compared to that of SSDs, that given even a reasonable failure of just one drive, it too forever to rebalance, and whilst it did that, in many cases entire virtual machines would be unavailable.

I'm still using them, and I've had a few failures, and I regularly get SMART error emails, but they're basically forming just a nice big NAS for me to use as a data dump. We have a couple of suppliers for backup, the main one is Backblaze, they're awesome, you can check them out here for personal or here for business. We store obscene amounts of data with them and the charge is remarkably low. Also, it is incredibly easy to setup.

But anyway, the upshot I took from all this, was that SSD drive reliablility seems to be more aligned with luck than it does manufacturer. Sure, I wouldn't buy any more TEAM drives, but the fact that they give lifetime warranty as standard kind of makes me think they're generally pretty reliable, and I must have just got some drives from a dodgier batch (which was exexacerbated by the fact that when I broke Ceph - which I did a lot when I was playing with it - they were under huge read/write loads).

I'll talk more about Ceph next time...

TTFN,

Ben

Why our own infrastructure?

7th November 2023, 23:11

A few years ago, I was still busy running a web design/marketing agency, and I remember how busy life felt then. I had an amazing team that supported me, and helped us achieve great things (and I look back on those days fondly) but nothing has felt as manic as the last couple of months.

To clue you in (although if you're reading this right now you clearly know me), I work as the CTO of DataLords, a web hosting company with desires for World Domination (I mean, can't be hard building a company like AWS or Azure, right? .... right?!?)

So let me give you the short version, because the short version is still long enough for a tl;dr but what it comes down to is this: I could not find a hosting company I liked back in 2018. As a web agency owner, I was busy (by my standards back then, loser, ha!) and if I won a client, and wanted to get the hosting setup, I wanted a quick, easy solution to be able to either register or transfer a domain, and setup the web space and ideally install something like Wordpress or CodeIgniter (I'll come back to that) so we could get started. Then I'd want an easy way to give access - ideally fully access - for just that one website - to a developer.

There were/are various solutions, but I wanted (because I'm a picky bugger) to have this all done with my principles upheld, specifically, I never want to overload a server and in fact, I want the server load average to be as low as humanly (or... serverly...?!) possible.

I tried a variety of companies, which I won't list because I'm not trying to put anyone else down specifically - but I just found that other control panels were hideous, or - more importantly - I could easily work out what I needed to do but my "chief of staff" was not a techie, and so whilst I would have no problem giving her full access to something, I didn't want to give that same access to the actual techies (for a variety of reasons, not because I didn't trust them)..

So - DataLords was created as a concept, and I started fiddling around here and there building things, and testing out ideas, before buying in some servers and co-locating in May of 2019.

Which leads me (finally) onto this post's topic: Why our own infrastructure?

There are a million hosting companies I could build this off the back of, or I could do what all the cool kids are doing and use AWS, or worst case Azure. Surely I'd save myself a buttload of stress, hassle, pain, aggro, tears, blood, sweat, irritation, overhead, sleepless nights, moments of absolute fear, stress again, and more... by simply clicking a few buttons and having a mini hosting empire at my hands thanks to the elastic jungle, but....

Reason number one: the most obvious one. Control.

So let me (very briefly this time I really promise) rewind. I started out in this world in 2001, when got a job working at Demon Internet on their helpdesk. I was the monkey on the end of the phone that you'd call when your dialup wouldn't dialup. Fun fact: the guy that trained me at Demon is now a client of ours, and I couldn't be happier to help him (really, really nice guy).

Every front line agent got access to the 'sos support' server for replying to the support@ mailbox, but eventually I was given a shell to use for various tools and I remember thinking I wish I'd found this 5-6 years earlier, when I'd really started to get into coding/PHP and "computing in general".

Within what I think was a matter of days, but could have either been hours or months (it was 22 years ago as I write this, give me a break) I setup my first Solaris server at home.

Truth, it was hell.

But within a year, and a LOT of reading (no Google back then, I literally borrowed books from the library) I was getting proficient, and had FreeBSD servers up and running with Apache/PHP/MySQL, QMail and VPopmail and had started to give out accounts to people for beer money.

I still genuinely believe that FreeBSD is one of the best server platforms, the problem is, is it's HARD.

But that for the large part, doesn't mean we shouldn't use it. In some cases I took the easy options. I'll talk about Ceph/Ubuntu/FreeBSD, Proxmox/LXC, Ubuntu/LXC, and VMWare in separate posts.

But fundamentally, the reason I chose all our own infrastructure was so that we could absolutely customise every inch of the platform, and have full control over every piece of hardware/software design. Something that in the last 48 hours I have come to question (and the reason this blog exists is to partly let out some of the rants, but also, I really hope some of the stuff I've learned over the last few years will help out someone somewhere).

I'll talk more about our design later, and... I'll talk about why that design got thrown out the window in the last week, and we're taking a whole different approach right now... but for the minute I'll move on.

Reason number two: this might sound like 'control' but .... independence.

From the very first day I setup a Solaris server in my bedroom at my parents place in 2001, to when I was pitching this at investors in 2021, I wanted one thing: complete provider independence.

We're getting closer and closer to achieveing that, but quite simply I didn't want to be in a position where something bad goes wrong, and all I can say to my clients is "we're waiting for someone else to fix something, we have no idea how long it'll be down and there's nothing we can do".

But this extends much beyond just the servers. From a network perspective, this was one of the first things I wanted to have sorted. We joined RIPE, and got our own IPv6 allocation, and we're waiting for our own IPv4 allocation. There's a good chance we'll buy one at some point too but I've been quite spoiled by our suppliers Burstfire, where I've had the same /24 since 2004.

Anyway... the goal is:

A server platform designed to scale, fast, but have the resilience of Azure hosted by AWS backed by CloudFlare.... (ok... might be thinking a bit big there, but you get my drift).
A network fully independent of any one supplier; something many people really don't consider enough, in my opinion, but equally, this is not easy.
Multiple locations replicating to each other; I know in principle this isn't scalable, but equally I'm arguing with myself, "why not?" - I suppose technically it is perfectly feasible, but if the likes of AWS don't do it, does that mean it shouldn't be done? (And by that, I mean, that you don't just have one platform that automatically and transparently to the user syncs to multiple geographic locations). It's almost certainly not financially viable, but I'd like to try and achieve this somehow.
A really solid, REALLY solid, support service. I don't want front line, second line, third line and the engineers separately. I've just been dealing with some support issues and been dealing with a front line person who was very lovely but clearly didn't know what she was doing and was clearly just following a script, and because of that if anything came up (which it did) that deviated from that, it meant she didn't know what to do. It wasn't her fault. It was the suppliers fault for not training her enough/giving her the right resources, but I don't want to end up like this. I know there's no way to scale this and have full on senior engineers answering every support call, but I really want our support team to be amazing, like our core team currently is.

Finally reason number three - cost.

To achieve what we want to achieve, I coudldn't conceivably see a way to make it work whilst using AWS, Azure or any other off the shelf platform.

My goal was/is simple, to deliver fast, secure hosting to the masses. I'll talk about this in a bit more detail another time, because otherwise I'll be here for another three hours writing this, but I wanted every website to be in its own container. I wanted the control panel to be easy to use. I wanted email to be included and not be some ridiculous extra huge fee. I wanted DNS, SSL Certs and all the other gubbins that hosting companies often charge stupidly high extras for.

But I've learnt a lot here. I started out using Dell R620 and R630s; the R620 servers were first, and at point of purchase were just over 5 years old (ex-lease). I bought these for next to nothing and was able to do great things with them.

Even our latest purchases, which were significantly more than those - compared to renting the same from AWS is just night and day.

So by building our own infrastructure, it's harder, it's more complex, it's more stressful, it's more painful, but... it means we have a great offering for our customers.

Well, if you've made it this far.... thanks? I hope you found it interesting, I'll have a lot more to write another time.

TTFN,

Ben

P.S. I am currently writing this all in HTML, statically, via SSH. Because I don't want to use anything like Wordpress, and I don't want to delay getting some posts out before I've finished building my blog software - which is based on my own framework Rioja. I'll talk about all this another time too, but if you're wondering where "all the features are" including RSS feeds (do people still use RSS?) it'll all be coming soon. Any by soon, I mean, probably some point throughout 2024... but anyway, all this to say, excuse weird typos / grammar, it's not so easy to spot over a green terminal which keeps highlighting words like 'for', 'or' and 'and' because it thinks I'm writing a shell script...!

"Hello, world!"

7th November 2023, 15:30

Well, hello there. Welcome to my blog. Do people welcome people still? I don't know. I don't care, really (sorry?).

Except this page to fill out with bloggy goodness over the coming weeks. I have much to talk(/rant) about, including:

VMWare / VSan
DNS
Proxmox / Ceph
Linux, LXC, FreeBSD, Jails
DNS
Email, Spam, dealing with the "big boys"
Servers, storage solutions, and a whole buttload more
Oh, and IT'S ALWAYS DNS

Right now I just set this up so it works. More to come soon (including I'll talk about this framework I'm using too (PHP MVC framework, not front end, I'm using Bootstrap front end because I absolutely cannot even come close to caring, and this is the one I've used for eleventy-billion years.

TTFN,

Ben