Heh… -I- thought my description was simpler.
Try this:
Think of a mathematical function. Any function. Say sin(x). You can represent sin(x) by adding other mathematical function together. For example:
sin(x) = sin(x) (Trivial!)
sinx(x) = x - (x^3)/6 + x^5/120 + x^7/5040 + … (Taylor series)
or, in general:
sin(x) = af1(x) + bf2(x) + cf3(x) + df4(x) + …
The -best- representation of a function is generally the one which has the smallest basis set (i.e. which has the fewest number of terms on the right hand side). So that’d be, in this case, sin(x). (Trivial result, right?)
In Exactive’s case, the left hand side is his mass spectrum (which is the output from an instrument that systematically takes fragments of stuff, in this case it sounds like biological samples, breaks them up under known conditions, and then produces some output based on the fragments these samples break into). The idea is that the process of breaking up the samples and collecting the data is reproducible: A 100% pure sample of substance A will always produce the same output function.
So then, the question becomes: What is actually in substance A. (He cares, because if he knows what is in it, he might know how to modify or otherwise interact with it. If substance A is the proteins generated inside a cancerous growth’s cells, this is probably something of interest, for example.)
What he’s trying to do, in this case, is figure out the equivalent of the right-hand side of the above equation, because knowing the pieces that go into his output spectrum tells him something about the proteins that were originally in the system. If he knows the proteins originally in the system, he can use that information to try to help understand what’s normal/different bout the system and how to better interact with it.
So, anyway, back to the fitting. He has the left hand side above, and he basically has a huge book of potential functions for the right hand side. The problem is that the left hand side of his data is extremely complex: Spikes all over the place, with different heighths, and often so close to each other that you get a function that may look like one shape, but is actually multiple other shapes added together. So the basic approach is to pick a potential function for the right hand side, and say “Do I see points on my data where this function shows there are points?” If so, that function from the book is a potential fit. So you note it, and move on to the next function. He’s flipping through a list of roughly 5k - 5m functions for the right hand side for each of 50k function on the left hand side. Each comparison is relatively easy (relatively!), but the sum total of all the comparisons is not. But the basic idea is:
For each real spectrum (left hand side)
- Look at model spectra?
- Does it fit at all? If so, give it a score based on how well it fits(a).
- Goto 1 for each theoretical spectrum (right hand side)
a) “How well it fits” is non-trivial, and is where the algorithm for fitting he linked comes in. I have issues with the link he gave, because it seemingly throws out peaks that should be there for the model but may not be in he real spectrum. However, there may be reasons for this couched in he chemistry that I’m ignorant of.
Hopefully that helps some? I understand the generics of what he’s doing quite well (one of my tasks right now, albeit on the backburner, is to find better methods for functional decomposition of interaction functions for chemical simulation, which is a conceptually similar task to finding the base spectra from an aggregate spectrum), but not the specifics (since I don’t work in bio, and in general don’t have a good enough organic chemistry knowledge to understand how potential side reactions muck with the transition from theoretical to actual spectral components).
If not and you’re interested, holler, and I can try to clarify whichever part(s) you need more info on. If not, more cute dogs looking confused is a great signal! ;)
(Sorry, I’m a science geek of the worst kind, who assumes everyone else would also be if only they had complex things explained, so I’ll explain as long as folks would like. Unfortunately, I may not be good at breaking it down into appropriate sized pieces and assumptions for random audiences. And there’s no getting around at least a little bit of math knowledge; though if you’re not aware of functions in concept then that may be a hard limit. :) )