in which we use qt3 to blog about our tech jobs

Your initial message seemed to define times when you were busy / doing focused work / otherwise not available. It did not seem to be limited to times you were actually in meetings.

Do you not mind being randomly huddled if you are not in a meeting? (I hope not!)

I worked in the age of the desk phone.

Nope.

From the annals of The Book Of Impossible IT Tasks…

My firm has several attorneys who have announced they are leaving (together, to another firm) at the end of the month. Two of them have worked here for 20+ years, another for 10+. Their main client is an insurance company for whom they have litigated several hundred cases over the years. They will be taking that client with them, which means exporting all of the documents from those cases from our cloud-based document management system (NetDocuments) and all of their email from Outlook.

The documents are all sorted in the NetDocuments system by Client-Matter identifiers, so the Client # is the same for all 700+ cases, but the matter number changes for each case. In the past, for much smaller exports, I would just navigate to the NetDocuments workspace page for the Client-Matter and download (export) a copy of all documents in the matter to a Zip file which I would name “Client#-Matter#.zip” and off you go. Unfortunately NetDocuments support has informed me that they do not have a way to mass export all matters for a single client, I am stuck exporting each matter individually. Hell, as far as I can tell they don’t even have a way to PRINT A LIST off all matters attached to a client so that I can at least create a checklist before diving in to 800+ export requests.

That’s not the impossible task though. For email, one of these attorneys has 40,000 messages and another has close to 100,000 messages in their Inbox, all unsorted. I asked permission to simply export the entire Inbox to a PST (albeit a massive one) and was told that for legal/ethical reasons we can’t allow them to take any possible emails from clients who are staying at the firm or that might have sensitive firm information (like compensation and profit numbers as these two were both equity partners at one time). When I asked how, exactly, they expected me to run a search on 100,000 emails to find only those which are associated with the clients these attorneys were taking with them when I don’t know anything about the 800+ matters and there aren’t any keywords or combination of keywords to search on that would produce successful results I just got blank stares.

Oh, and I am the only IT person at the firm, so all of this is on me, and I have to still do my regular job every day supporting 40+ people and our systems while trying to figure out how to export all of this data.

Fuck.

It seems to me that the lawyers would be the ones determining which emails are legally/contractually/whateverly able to be released. So I’d just grant them access to the mailboxes and let them to go to town.

You’d think if there was one thing a law firm would be set up to do, it would be searching archives for responsive documents.

I do not work for a law firm, but I am a pro at searching through emails in outlook/exchange.

If you are using outlook in the cloud, and you are the global admin, you should have access to the compliance portal. You can use content search in there to export only emails that match search criteria like only from or to these people, including or excluding these terms, etc. and have it export as either a pst of all the emails that were found or individual message files.

I no longer have access to it at my company otherwise I’d give you some screenshots.

Here’s a starting point in the documentation.

This is the portal URL:
https://compliance.microsoft.com/

There’s a UI for selecting stuff and you can also use “KQL” and do things like (from:bob.smith AND to:lawfirm.com) to do partial string matches and get everything from bob to anyone at lawfirm.com.

Best thing to do would be to run searches on each attorney’s mailboxes for things where from and/or to (and maybe cc) are the exact emails of clients or if you need a broader search are something like the domains of the clients.

Note that large result sets can take a very long time to process on the MS side before you can download anything. You might want to break up the exports by not just attorney, but attorney and a subset of their clients if you expect the whole thing will over >10gb.

Let me know if you need any help.

You’d think, but I’ve never seen it be the case. I’m in IT, not directly for a law firm, but my consulting work involves a lot of being involved with law firms on software licensing disputes. Law firms are generally terrible at IT. Don’t know why, they just are.

My preferred solution would be “Select All…Delete” followed by retirement.

My first thought was to train a classifier (similar to a spam filter) where “spam” in this case are examples of documents you want them to have, and not spam are documents you don’t want to have (but it’s a binary classifier, so you can reverse that.). Then label a bunch of exemplars, then apply the filter and pull the rest.

Now, it’s not gonna be perfect, but if nothing else works, that would be the “pretty close to right” way to go. If you’re throwing on top a few filters for people who are sending / receiving emails, that’ll help a ton and the rest will just be content.

You could also imagine doing document classification with TF-IDF and identifying clusters of documents that should be exported. BERT might outperform TF-IDF, but I’m not sure if that’s more work or not, how much time you want to put in vs accuracy, etc.

Most of these are “sit down for a day and write some python” things that aren’t hard to do (tons of examples on-line). If you need 100% accuracy, I’m not sure how I’d do it - even humans searching the docs are going to make mistakes, and I’d rather search clusters of documents.

Heh. I’m sure that’s their preferred solution too.

But you sweet summer child, you think deleting actually deletes anything? Even if you empty your deleted items folder, your company has a retention policy that will hold on to a copy of that email, or teams message, likely for years. Even if the policy is to not hold on to things, they still don’t get deleted immediately and are just lazily swept up over time. That’s just the substrate holds, if your account has been put on a “discovery hold” your emails will be searchable basically forever. Even various iterations of your drafts are still there. All the ways you tried to reword your inquiry into how you could get away with illegal activity without making it seem like you were trying to get away with illegal activity, those are all saved. Waiting for some skilled searcher to find them all! Even the ones you don’t remember writing!

Oh, sorry, I got a little carried away there.

Anyway, as I mentioned due to certain life events I’ve become quite an expert at searching and exporting emails if anyone needs any tips.

And we are, or rather, we were. We use an online eDiscovery service called Everlaw (www.everlaw.com) and it is brilliant. You can drop terrabytes of native format discovery (emails, PDFs, MS Office documents, photos, TXT files, etc.) into it and it will ingest the metadata and index all the text at which point you can do word cloud style searches and the AI will produce the best matching results at the top of a list, saving hundreds of hours of time.

Technically, if I were doing this for a case, I would drop the whole inbox into Everlaw, then run searches based on information the attorney gave me to produce results. Sadly, the reason we have Everlaw is because of the work this particular set of litigators does regularly (I was the person who brought in several of these companies to review their products and I and these attorneys chose Everlaw), and of course they’re taking it with them when they depart in 14 business days. There simply isn’t time to attempt any sort of mass searching and categorization of 20+ years worth of unsorted emails.

Our other litigators, our estate planning attorneys and our business mergers and acquisitions attorneys have little use for eDiscovery software, and it is expensive, so I will be sad to see Everlaw go. I highly recommend it to any firm that regularly does eDiscovery / large production work.

Sounds like you might be a little ahead of my earlier advice.

Wow, that sounds awesome! Unfortunately I build, administrate and maintain file servers, Exchange servers, Active Directory networks, Windows systems, and all of the other software and hardware it takes to keep a business operational. What I do NOT do is code. Ever. Never liked it in college, haven’t touched it in the 30+ years since. Also I only have 14 business days until these folks are gone, and they’re going to be pretty busy in that time, so the chances of my getting search terms, even exclusionary ones, from them are slim to none. Thank you though!

We have an on-prem Exchange server, and thus I do not have compliance portal. Exchange limits search results to 250 as well, which is a hinderance for this sort of work. We do have Mimecast though, which in addition to all the nice spam/virus protection and email tools also happens to archive every users mailbox in real time, so I CAN use Mimecast search (which is much better than Outlook anyway) to search the users archived online mailbox. However, I don’t think I will have time for that either given the scope of the project, need for additional information and limited time frame.

Thinking about it tonight I have come to the conclusion that the only solution here is a compromise. There is simply no way my firm has the resources to do the sort of complex and time consuming search breakdown it would take to extract 800 different cases emails from among 100,000 unsorted emails and then export them by case. Our guidelines clearly state that all email communication is property of the firm, so technically we are within our rights to refuse any sort of export of emails to departing attorneys even if they are from clients going with those attorneys. However that is not a good look, and these attorneys are not departing on bad terms, so I need to make some sort of good faith effort on their behalf.

So I am going to suggest that we create an EXPORT folder in each of their Outlook clients, and that they (the attorney) move any emails and email folders from the last few months that relate to active cases ongoing for clients they are taking with them to said export folder over the next 14 business days. At the end of that time I will create a PST of the Export folder (and the Contacts and Calendar information) for each attorney, and they can at least take all recent client related emails, their contacts and their meeting/court/personal calendars with them to import into their new firm’s email system. The rest will remain with us, archived for all eternity as we always do in the event we are ever subpoenaed to produce communications related to a case we handled in the past.

I just don’t see any way I could realistically export only client related material from that much data when I won’t even have proper search terms for said material (and search terms likely wouldn’t exist anyway since no attorney is going to remember all the parties that took part in every case they worked over 20+ years). This is why I always instruct my new attorneys to make individual client folders in Outlook so that we can quickly and easily produce communication if it is demanded of us, and we can quickly and easily archive & delete old clients that are no longer active. Unfortunately these folks had already formed their bad habits long before my arrival as their IT Director.

Thanks to everyone who replied, I very much appreciate all of the suggestions and advice. I am still faced with the likely hundreds of hours needed to export a copy of each individual case’s electronic files from our document management system, as unlike email that is required when an attorney departs. If a client signs a release form stating they wish to go with the attorney, the firm is obligated to produce all paper and electronic documents related to that client if the attorney requests, and they always do. This is by far the biggest request we’ve ever had as a firm though. Never have multiple attorneys with nearly 60 years worth of cases combined ever left at once. I have asked them to make a priority list of active cases which we will export first, and the rest will trickle out over the next year or so as time permits. I’m still trying to grasp the enormity of the amount of data we’re looking at, and I can guaranty you that the partners at the firm do not understand it at all. I have a meeting scheduled for Wednesday to lay it out for them, assuming I don’t quit in frustration before then.

Sorry, just getting caught up on the thread, but no. No.