Buying a server for a neural net

So my little startup has brought on board a Cambridge professor who specializes in AI, neural nets, and deep learning, specifically neural nets. He and I are going to build a few neural nets, one of which will be watching every movie ever made (or as big a data set we can get our hands on) and then my AI will try and recreate it. Stuff like that.

Neaural nets take a lot of processing power and I suggested cloud based processing we outsource, but he thinks we should build our own server.

So now I have to do that. This will be a true dedicated server nothing else but the neural net running on it.

I don’t think we will be using the multiple GPUs trick, we need raw processing power.

Not asking for a complete build, but rather advice on where to start. Anyone built a beefy processing server lately? What’s the scoop for suppliers and manufacturers?

This sort of thing is exclusively done on GPUs.

Cloud versus on premises depends on your usage model. If you think you’ll be feeding it data 24/7 on-premises is cheaper and faster. Otherwise the elasticity of the cloud (you can spin up 1000 VMs with GPUs and only pay when you need it) is very valuable.

My company builds-out vertically scaled servers for one application frequently, with 40 CPUs and 256GB of RAM. We just buy 'em from Dell.

40 CPUs! Jesus Christ. :)

You mean connected in parallel, right? Or at least, distributing to multiple servers…

Amazon’s got some useful stuff you may find interesting:

I don’t think you really wanna bother investing in your own hardware at this point if you can avoid it.

No, actually. We do run a VM on it for portability but that VM gets all the hardware. Like I said, vertical scaling. It’s a… monolithic… application.

Actually we have at least three of them per cluster, and at least two clusters per install, one non-prod and one prod.

But it does parallel processing somehow, right? Forks, threads, green threads? Because, otherwise, scaling the number of CPU’s might not help much or at all.

Oh yes, it’s massively threaded. It’s massively everything.

But can it run Crysis?

What stack is he using? TensorFlow? Keras? etc. Does he have a model already that can actually interpret video?

I would focus on setting up a test image using AWS or a local machine with a 1070 or similar before dropping a bunch of cash on a machine loaded with GPUs.

This forum has lots of great info for beginners, depending on the stack you are using anyways.

I made it partially though their free course for fun, but its a rapidly developing field and setting up my own box was littered with minefields. Getting that initial development environment up and running can be a bear, especially if you are not a linux guru.

Ultimately it’s a matter of scaling and reasonable speed. I can technically run a neural net on my laptop. I just have to then not use that laptop for a month. That’s how long the initial job might run on my work laptop. Crikey! A month!

You can get started using the cloud. What you need to buy will be dictated by what specific software he’s using. If you really want speed you want to be utilizing GPUs, but it’s possible with one developer he’d have to port over whatever he’s using to GPUs and that might not be feasible.

AMD Threadripper comes out soon. Should be nice for some cheap machines that will be 16x your laptop anyway.

In that case I suggest MULTIVAC. Just make sure you feed it sufficient data, or you won’t get meaningful answers. ;)

But seriously… these days, you should consider using a cloud solution, unless you have huge amounts of data that will be easier to feed locally to the system (and/or bandwidth limitations).

Uh… the software. What does it run on? Windows, Linux, Solaris…?

What, my software? It runs on linux. We do offer a cloud version also, in AWS. We don’t use neural nets, though.

Sorry, I mean @Guap 's. Replied to the wrong person. ;)

No trick about it. Multiple GPUs is a ton of raw processing power.

What you wrote here doesn’t make a sense to me unless you are talking about building a true supercomputer that can get into the top500 which is an enterprise undertaking. You want to learn cuda and use GPUs.

We don’t know if the software he’ll use can use said GPUs, do we?

If he’s analyzing movie frames it would be a pretty big shock if it can’t use gpus.

You are asking for advice on building a server for neural nets, but we don’t even know the NN type or more importantly what stack you plan on using. Unless you plan on developing your own from the ground up, ignoring the rocket ship pace of the currently freely available stacks and models out there.

I would ask your professor what stack he uses, and then you can proceed with building your own server if you want. But I would set up AWS or a more specialized cloud hosting provider to start.

Right now you just want to start working on your model using small sample sets to tune and act as a proof of concept. Then you can get a feel for what you will need to do at scale.

Agreed, but still…

(get it? still? Ok, I’ll show myself out)

Try cloud first. I routinely spin up 64-core servers for 4-5 hours. It costs $10 or so each time I do it. Try AWS GPU servers if you need that. Either way, try cloud before dropping cash.