Code architecture advice needed

I am toying with an idea to change how a large subsystem of my application works, but I am not sure yet how to do what I want. This will be a bit long. Also for the XML sticklers, treat XML as XML-like, as in it is very close to XML but doesn’t strictly follow the ISO standard.

Let me first explain how it currently works:

I have these “Workers” which have a base class of “WorkerBase”. There is a factory which produces these workers from XML strings. These workers do all sorts of things like check to see if a registry entry exists, seeing if a process is running, seeing if a file’s modified date is less then N days old, etc…

Fundamentally these workers just answer true or false to the questions posed to them, although sometimes there is extra info which can be obtained from another interface, although this is rarely used.

WorkerBase defines all the public interfaces. The most commonly used interface is Run() which returns true or false. Another one, which is rarely used is GetValue() which returns a string. This string is incidental data that might be of interest as a result of the Run() operation. For the most part, this is not used, but is something I would like to re-engineer to be a method of much greater utility.

As the list of workers grows, redundancy started creeping in. NewWorker does what Oldworker does, but then does some additional processing. The following is a set of workers that show this problem.

WorkerReg: Returns true if the given registry key-value exists. GetValue() returns the value read by the registry operation.

WorkerRegFile: Returns true of the file exists, the file name is pointed to by a registry value. It also can accept a path from the registry and do some post-processing and look for a different file in the same path that another file pointed to by the registry is in.

For example, under the windows run key, there might be a key called Sophos, with a value of “C:\Program Files\Sophos\bin\Sophos.exe” and you want to see if a file called “defs” is in that directory, so it reads the reg value and converts it into a path like “C:\Program Files\Sophos\bin\defs”. Its GetValue() returns this path.

WorkerFileDate: Compares file dates to the current time and returns true of the date is less then or equal to the days specified in the XML that constructed the worker. IE: It can tell you if “C:\Program Files\Sophos\bin\defs” is 14 days old or newer.

WorkerVDef: Compares the dates of a pattern of files and returns true if any one of them satisfy the date requirement.

From a code implementation stand point it goes something like this:

WorkerFileDate has a WorkerRegFile, and WorkerRegFile has a WorkerReg.
WorkerVDef has a WorkerRegFile, and WorkerRegFile has a WorkerReg.

Actually these are very simplified examples, and there is much more complexity involved, which is why WorkerVDef doesn’t have a WorkerFileDate instead of a WorkerRegFile.

There are two things that make me unhappy about this. First is that there is redundancy in functionality, and secondly each type of functionality requires a new worker to be built. The second issue is my biggest concern, and solving the first would just be a bonus.

For example, I already have a WorkerProcess who’s GetValue() returns the executable’s path.

If someone asked me to check the date of the executable, I would have to write a new worker that used WorkerRegPath and WorkerFileDate. I already have that functionality in my code, it is just not assembled in that kind of relationship.

This brings me to the new, ideal way it would be work. I would like make this totally data driven, and be able to build new workers via an XML data stream. I would still have some basic workers, but they would be assembled like Lego blocks via XML to provide new functionality.

Looking at the above example of a WorkerProcessFileDate, here is the current XML example of the Process and FileDate worker data.

<WORKER_PROCESS>
<PROC_NAME>Sophos.exe</PROC_NAME>
</WORKER_PROCESS>

<WORKER_FILEDATE>
<PATH>%Programfiles%Sophos\Sophos.exe</PATH>
<DAYS>14</DAYS>
</WORKER_FILEDATE>

Then there would be some new XML if I made the WorkerProcessFileDate worker in my current system.

However, it could work like this if I re-engineered the entire worker system.

<WORKER_FILEDATE>
<PATH>
<WORKER>
<WORKER_PROCESS>
<PROC_NAME>Sophos.exe</PROC_NAME>
</WORKER_PROCESS>
</WORKER>
</PATH>
<DAYS>14</DAYS>
</WORKER_FILEDATE>

The implication being, that the Path value would be the result of another worker instead of a string literal providing there was a <WORKER> sub tag under <PATH>. This would mean that if we needed this functionality in the field, you just would change the XML stream, and Viola! new functionality right there on the spot.

To bad it isn’t quite that easy. These last few examples have stuck with workers that just use a single file path, so it seems to flow into this new pattern quite easily.

Even sticking with just file paths, there are problems that can crop up. Remember WorkerVDef? It looks at a sequence of files. I could have a WorkerFileList that takes a root path and a pattern, and then creates a list of matching files.

However, GetValue() returns a string, not a string list. I could create some kind of compound path string like the PATH environmental variable uses, or I could just change the type of GetValue to a StringList instead of just a string.

OK, so that problem is solved, but it goes on with other complications. I have a worker that enumerates registry keys and does something with the answers it finds. I could have a worker that simply enumerates registry keys and could return that list in the StringList for GetValue(), but there is a problem. Now I have key-value pair I need to return, the registry path, and the value under the enumeration. Actually I have 3 values. The root registry entry is not part of the path, IE: HKEY_LOCAL_MACHINE, HKEY_CURRENT_USER, etc… So now I need to return the right root key, the path “SOFTWARE/Microsoft/Windows/CurrentVersion/Services/TCPIP/Adapters/Interfaces” and the associated value to look for, “{SomeBigAssNumber}”.

Now what?

I have workers that do much more complicated things, who’s GetValue() might not make sense as a string, or has several pieces of data that might be useful.

Going back to WorkerProcess, maybe I could also return the User who owns the process, and if the process has admin privileges, etc… The fact that I return just the path is kind of one-sided thinking, only considering the needs of workers who just need path info.

Anyway, there is so much to consider, so many architectural mistakes that can be made. I am asking the people who have time or would simply find it a fun exercise, to look at the bigger picture here and give any advice you have.

tl;dr pay me and i’ll help
lol

qfmft

Seriously. I’ve tried to help with smaller bits, but design is always the hardest part of the project, and I struggle with it enough at my own job.

DeepT: You need a big whiteboard and someone else standing there.

It seems like you already know exactly where you want to go, you’re just having a hard time getting there due to the restricted output each worker has. I think you need expand that a bit.

You’re not going to get something Lego like, where everything fits into everything, but that shouldn’t be what you aim for anyway. As long as matching inputs and outputs fit, you should be able to build whatever you need.

99% of everything is a state machine

OK, serious attempt to answer to your question.

  1. I assume that each worker is an independent thread. You do know that creating threads is an expensive operation? You should only create a new thread if the amount of time it takes for the worker to return its result is going to exceed a certain threshold, say 1s. If you’ve got a parallel architectures, you can create your workers in advance, then dispatch data to process to each one. But in general, making a worker for each thing to do is silly; just process them sequentially.

  2. The answer to redundancy is always to separate the redundant code into a shared function. How you share those functions is up to you; you can create an abstract Worker class with all the shared functionality, or what have you. Just make sure that two workers can’t access the same data – internal or external.

  3. Just in general it seems you’re looking at this problem from the bottom up – seeing all of the little functions and wondering how to group them into big functions. Start from the top and work your way down instead. It sounds to me like the overarching design is crap; the good news is that rearranging the pieces to fit a new overarching design is at worst a matter of cut and paste.

Edit: You know, now that I re-read your post, I can’t really tell if my suggestion’s any good or not. Agreed with whomever said we need a whiteboard.

Okay, I stopped reading at this point (and skimmed the rest):

I may be completely off base in suggesting this, as I’m having trouble parsing your post. However, have you considered Decorators?

Also, not to be a dick, but do you actually name your classes/methods so nonsensically? Or are you simply changing the names to make your post easier to read/understand?

I notice you make a lot of these posts. Do you own a book on design patterns? This is another book you should absolutely own (and read). Again, I’m not trying to be a dick here.

How about using recursion in the XML to build out the workers, so each gathers the pieces you need? That way you could send the right workers at the right pieces without trying to make an inefficient generic worker. Unless you want to use less workers (Not sure of your goal here.) in which case, you could just build out the workers themselves to handle these conditions you are throwing at them.

K

Rimbo:

To answer your questions:

  1. No, workers are not on separate threads. I do not believe I mentioned threading anywhere.

  2. There isn’t really any “code” redundancy. Via inheritance or encapsulation, there is very little, if no redundant code. The redundancy is more along the lines of something like:

Worker B does what Worker A does, but a little bit more.
Worker C does what Worker B does, but a little bit more.
Worker D does what Worker C does, but a little bit more.
etc…

From an implementation point of view, Worker D might inherit from Worker C, or maybe contain an instance of Worker C. In either case, there isn’t any redundant code.

However, looking at this from the point of view of someone who creates the XML data stream, it can be a bit confusing. You have 10 workers that do similar things. So now I have this giant ass document with example XML, and still people get confused. I do not blame them. If I haven’t looked at it in a while, I have to go back and read the docs, or my source code.

Oddly, reading my source code and understanding what is going on is very easy compared to all these worker XML rules. It just that there are so many workers that are so similar, it all begins to blur.

  1. I am not sure how this applies.

Adam:

  1. I did not know what that was formally called, and yes, I do use that here and there with these workers. On some of the workers which are only slightly different I just wrap them into a new worker. Some run methods are just a call to a different worker and then one or two more operations.

  2. These class names are not quite the names I actually use, but are close. I probably do not have the best naming convention rules. Generally I prefix the class name with the subsystem they belong to (assuming these classes are not general classes used by many systems).

For example, all my workers do start with Worker in the name. From there I add their functional description, which is tempered by the XML name for the worker so it is easy to see what XML tag corresponds to what worker.

I have a “HSS” subsystem, and all these objects start with HSS. Under this subsystem there is a “Logical” set of objects logical operations, they all start with “HSSLogical”.

I am not sure what a better naming system would be like. From the class names I can instantly tell what sub-system they belong to and what exactly they do.

I have over 160 classes. If I were to start them with “C” for class, that would be useless to me. What would you suggest?

As for methods, there are two rules I try and follow.

If the class is a general class, used in a lot of places, the method names are descriptions of the functions. IE: NetInfo (Information about networks class) NetInfo::GetArpTable(). That method gets the ARP table. I would have no idea what a better naming convention would be.

If the class is a sub-system class, IE: A worker, a HSS, an Agent, an Inquisitor, etc… Then I have some standard method names I use which are abstractions certain kinds of functionality. For example, the Run method. Workers have a Run, HSS’s have a Run, Agents and all the rest have a Run method. Do they all do the same thing? Functionally, No. Abstractly, yes.

A Run method, as I use it simply means “Do whatever it is you do”, be it launching a web browser, drawing circles on the screen, or checking a file date. I do not need to remember or know what they do from a high level point of view. What would you suggest here as a better naming convention?

  1. Design patters, yes. I have the actual book called Design patterns. I also have and have read cover to cover Code Complete and Writing Solid code. I have only paged through design patterns, it seems more like a reference then a book you just read cover to cover.

Kevin:

That is a good idea, and that is part of why I want to make a change. Currently I can not support that. Workers that take other workers as XML are fixed. For example, the worker that checks the existence of a file also takes the XML for the worker. However, this worker knows that a certain parameter not only is a worker, but a particular kind of worker. It is fixed and can’t be changed without any code changes.

A great example of how useful recursion would be is registry indirection. That is, a registry key that tells you which registry key to find your data. I do not think there is more then one level of indirection in the registry, but you could theoretically nest any number of WorkerReg inside of each other.

As it is, I can’t do that, and that is another reason for the big change.

Not knowing the specifics it’s hard to say if this is crazy or not, but instead of having

Worker B does what Worker A does, but a little bit more.
Worker C does what Worker B does, but a little bit more.
Worker D does what Worker C does, but a little bit more.

Why not have Worker B take the output from worker A, then just do the little bit. Likewise with C and D. Maybe this would let you eliminate many redundant workers that confuse people. Of course, it might also cause massive bloat in your XML files, but maybe that could be cleared up by letting users define their own composite worker types that can be reused. Of course, maybe that would just put you back to where you are now.

That is what I have in my code now.

What I want to do is move that out of my code and move that into XML.

This will require significant changes and a lot of considerations, hence this thread.