Trying to figure out the best way to collect some model train data for analysis (SQL maybe)

This is a really geeky exercise in overanalyzing. Hey, I can only to me.

So, one of the fun things of being an analyst is using my day job to overanalyze my hobbies.

I am really into model trains and run them at a club, I track some data on what trains and locomotives I run. Mainly this is to prove my gut feelings I run certain locos and train more often than others. Collecting this data is easy: I have an Excel workbook with the dates and times I ran trains and locomotives. I looks kinda like this:

|12/7/2019 | UP |SD70ACE |UPFRT1 | Manifest |
Date, roadname, loco type, freight name, and an overall classification of the train. Classification isn’t completely important.

All of this gets sucked into Tableau, because I like the tool and use it for work.

Now, the main question at hand.
As in the previous example, the train named UPFRT1 is made up of about 24 cars. I swap cars in and out a lot, and I can tell you what was in UPFRT1 isn’t the same as is now. I have a tab in Excel that lists all of the rolling stock and the train it’s assigned to at that point. This is more so I try and avoid buying a car I already have.

So, what I am trying to figure out how to track how often a specific car ran. I don’t know how scalable my method of tracking locomotives will scale. A 4-line entry in Excel listing the dates I ran a locomotive isn’t bad. However, tracking 60 cars running is harder.

I’ve given serious thought to bar codes, but that’s too much.

On one hand, it’s not hard to create a different worksheet, and copy and paste the train rosters and assign them a date. There are some cars that may be part of two different trains (a shorter train I may run). I can create a second field for “secondary train name” and filter off that.

I want some way of saying “I ran train x,” and have it grab and record a date it ran. Maybe SQL is the best tool?

That is assuming there is a lot of variation in train consists this year. I’ve gone kinda nuts buying train stuff and may be cutting back on it.

Sounds to me like you need a relational model where cars are a fixed asset but trains vary in composition with each running. A SQL database would do it, or a tool like Airtable that implements a relational model in a spreadsheet like tool. If you use Airtable, you then would export your tables separately and join them in Tableau. Airtable has a free tier and has tools that makes it easy to convert a repeating field into another table. SQLite is the easiest SQL database to work with, but I don’t think Tableau connects to it.

Either way, you’ll need to do simple relational data modeling. Given your description, I think you need a cars table, a trains table, and a runnings table. But I’m sick in bed with a cold and might need to review when my head isn’t throbbing.

I think Clay is on the right path. You don’t want to track a train as a distinct entity of certain cars if you cary them. Instead your database should be built on each train running generates a distinct ‘train’ populated with the locomotive and car data. Maybe some basic constraints like all trains must contain one or more engines.

Anyhow it would be more work on a per run level than U(nion)P(acific)Freight1 as a collection, but even that may be resolvable. You could create a train object containing the load, and create a run object that loads a train object containers cars as they currently exist. That way you can have common setups, but change them over time (swapping in cars) but since the run is a distinct object it only loads what the consist for that train is at that specific moment, so changes to the load do not edit old runs.

Like Clay I am also in a fog due to a cold so may be completely bungling the explanation.

@Mark_Crump Here’s an example Airtable base that I set up. You can edit away, etc. Notice how some of the fields link to other tables. Try creating new fields, editing their types, etc. Airtable is a fantastic tool for this type of tracking. Not sure how much you track, but I think you can have up to 1000 entries in a table for free? Otherwise, it pretty much functions on the same concept as SQL.

Caveat: I know nothing about trains, so sorry for the poor example entries. :D

One thing you’ll notice is that in tables that are linked from other tables, there’s a column automatically created that shows what it links to. For example, the trains columns in the train_cars table was automatically generated. You can edit joined records, jump between them, etc. with ease.

p.s. Then you can export all of the tables, import them to Tableau, and join them for analysis.

It would be awesome to build an RFID scanner and scan your trains + carriages as they move around the track then automatically correlate the data in a database. I bet you could do this pretty easily. Might be a fun project.

Like this :)

I will sign up and take a look, thanks.

Probably and ideal world would be a web front end that I could pick:

  • Date
  • Loco 1
  • Loco 2
  • Train name (Pull down list)

And then have it create records based off the tables. So, cars in UPFRT it would create a record of it running.

I gave serious thought to how UPS does shipments. Barcode each car. Then barcode them into the storage bin i use. Then barcode the bin when I run the train and have it scan all the cars and create a record.

I will look at Airtable. It’s too bad it doesn’t work natively with Tableau, but there is a free Gsuite app that at least syncs between Airtable and Google Sheets. Which is kinda too bad.

Airtable will do all of that for you. I’m not trying to shill for them, but it is a very useful tool. I use it all the time at work for some business critical tasks. The API is self-documenting and awesome, too, if you’re into writing python scripts to pull/push data to databases.

I am going to take a look at it this weekend, for sure. So, if a car changes from UPFRT1 to UPFRT2, it will be able to count 1 run as FRT1 and one as FRT2?

As an aside, there are 3 trains I don’t track rosters to. All my passenger cars I don’t care about individual consisting of.

Haha this is cool :)

Yeah, it is.

Sitting on a telecon, so can’t dedicate full brain power to this, but on some level, it seems like tracking a “train” is pointless for what you’re after.

You just need to track each car, since that’s the real data, unless you are specifically interested in how many times you ran a given combination of cars.

Not being a train expert here, but it seems like you want to ignore the whole concept of a train to begin with and instead change it for a running. A running would be an individual one way execution of a train route. Cars may be added to the running at one stop and then removed at another stop while the “running” continues on the route. My understanding is that some runnings utilize two engine cars just in case one breaks down. So you’d have a Runnings table with a Cars table, and a CarsRunning table that notes when a car was added to a running, when a car was removed from a running, and when the car was the primary locomotive engine for the running.

And then the times the running reached each stop so you can calculate how long a car was a part of that running.

What is the goal of your data tracking? What are you hoping to do with the data?

You’re not far off, other than when I go to the club the train just runs laps around the layout. So, it doesn’t work as a traditional train. If I bring 25 cars in one storage bin to the club, those 25 cars run as one train, and get packed up the same bin and go home.

I have around 160 pieces of tracked rolling stock. I also have around 14 named trains. Of those, 4 of them I don’t track any consist records. My Erie Passenger train, when I run it, I just note “Erie Limited” date run, and the two locomotives. This is a perfect Airtable train. Nothing it about it ever changes. The same two locos pull a set of cars I don’t track. I usually just copy a previous Excel entry and change the date.

Having a named train is the best way for me to handle, well, everything. When I go to the club, I don’t decide what cars to run and put them in a storage bin. The storage bin is labeled UPFRT1, and I grab that and go.

Of the 160 cars, all but maybe 20 are firmly locked in their respective bins. I would have to go on a major buying spree to cause a big shift. I do want to build a big tank car train, but I doubt I am going to buy 20 more, causing the 3 tank cars in UPFRT1 to get moved to something like UPTANK1. My Erie Lackawanna version of UPFRT1, for example, I expect maybe one car to come out causing a shift.

I have a long blog post I am going to write about this, but this is the general goal: I want to see how often I run different trains and locomotives and track patterns. Now, a lot of this is wanting to understand Tableau and data analysis better, so I admit this is a solution in search of a problem. My true main goal is to get better at data collection and analysis, so tracking something I do a lot of makes it more enjoyable

But, it started from me thinking last summer that it seemed like I run my Erie passenger train a lot. So, when I started going to the club I just made a note that I ran certain trains. Now, in Tableau I can see what locomotives I run and the trains I run a lot, and, conversely the stuff I don’t. The report also tells me the last time I ran a train, which is also handy.

If I am honest with myself, tracking general trains and locomotives is enough. Last year I started modeling the Union Pacific in addition to the Erie Lackawanna, and that caused a lot of car purchases and reshuffling of cars into era-appropriate trains.

What I am curious about is how often does a car get moved from train a to train b, and does the corresponding move mean the car runs more, or less often. I will likely run a simplified report of:
|Car Info|Trains it ran on|Date Last run|

Edit: That said, the one thing I don’t have is a unique key field for locomotives and cars. I will probably use an Excel function to combine road name and reporting mark.

Ignore me. I read “model train” as in you wanting to model train data, not literally model trains hah.

No it’s not. You know it’s not. What you really need is a reader that looks up from beneath a piece of track so this data gets collected automatically.

You’ll get there someday. Why put it off?

Well, money. And the layout is club owned so I can’t really stick stuff to the tracks. When I get my own layout, sure.

I ended up doing a few things last night on this:

  • I created a unique ID for everything. It’s just an excel function that combines road name, number, and engine type; and road name and number for box cars.
  • I had several pairs of locomotives I tracked as a singular unit because they always ran together. On the roster and reporting sheets I broke them out to their own individual entries.
  • Since this will cause a lot of the reporting to go haywire, I also added a note to say if the train was a UP or Erie train.
  • For now, I am just going to copy and paste the train rosters into a “cars run” sheet. I keep them updated anyway, and it lets me adjust on the fly if I also run a car on two different trains on the same day. This happened recently.
  • Now that the joins are based off a unique ID, I also grabbed the current train a car is assigned to on the report. I have two non-running train names: unassigned, and bad. This way, it will be interesting to see where the car currently is, and if it bounces around a lot. Bad cars are cars that need more than a quick bench repair and are effectively out of service until I fix it.

If you don’t want to share the database online, you could try MS Access. Dunno however if you can feed data into it automatically from some other source using a script.

For those curious, I put the Tableau Viz here.

Note: I only started tracking the laps in 2020, so if you include the 2019 views the average laps is out out of whack. The main view is the first tab, where it’s the breakdown of trains run. The second one is just a report of the different cars I own, and the last one is just when the cars run.

Also, to close this out, I wrote a blog post about the whole thing here.