Copy and Paste from Excel in to a PDF - Naming Issue

Reemul · April 18, 2016, 2:40am

At present I am doing a job for a company that involves renaming PDF files from an Excel spread sheet and am getting a few error messages and ws wondering if anyone had any solutions.

Example below.

The excel sheet has four rows and each row has a specific word in it.

So it looks like this.

Company Name Insurance Company Policy Number Owner Name

I copy directly from the 4 rows trying to rename a PDF called 340 (named after the row in the spread sheet) so the PDF should now be called Company Name Insurance Company Policy Number Owner Name.pdf rather than 340.pdf. It is then uploaded in SharePoint for retrieval.

However many times I am now getting you cannot rename PDF as it cannot contain a /?"* etc (even though it does not contain them) and then have to rename section by section which takes 4 times as long or if I delete the spaces in between the names and add them back in it mostly works this again is very time consuming.

Anyone know what is causing this issue, is it to do with Excel and the sheet format etc. It’s just I have 14,000 of these to rename and quicker is better.

Does anyone know what is causing the error

Ginger_Yellow · April 18, 2016, 2:54am

There’s probably some hidden delimiting character being copied. Have you tried using an intermediary paste dump like Word/Notepad to clean it up?

Razgon · April 18, 2016, 2:59am

Or select insert special, or only insert the values could work as well?

Reemul · April 18, 2016, 3:34am

Would TRIM and CLEAN work?

Eventually I am looking at using some form of mail merge / bat file to match the entries in the spread sheet to the PDF and get it to auto rename rather than manually copy / paste.

The next spread sheet has 2,000 entries and automating as much as possible would be really helpful and getting the naming convention correct is key.

SergioBAM · April 18, 2016, 3:42am

Could you just run a macro over all of the names/cells and strip out non a-z characters? Something like:


dim aValue, finalValue as string
finalValue = vbnullstring
' Grab the cells unformatted value.
aValue = somecell.value2

' Iterate over each char, removing all non a-z characters.
for i = 1 to len(aValue)
  if Mid(aValue, i, 1) Like "[a-z, A-Z]" then
    finalValue = finalValue & Mid(aValue,i,1)
  end if
next i

somecell.value = finalValue

SergioBAM · April 18, 2016, 3:55am

Trim and Clean will not work. Trim removes leading and trailing white spaces. Clean removes any characters that your system cannot print; your system can print /?"*.

Ginger Yellow hit the nail on the head. There are probably hidden characters in the cell that Excel is formatting out. You will need to clean the data first or use copy/paste special.

Ginger_Yellow · April 18, 2016, 4:09am

I mean, I assume the most efficient way of doing it would be to export the entire Excel sheet to text, then run a Python (or whatever) script using the file as input. But if you’re doing it manually, I think pasting into notepad first and if necessary doing a find & replace for whatever the guilty character is (maybe tab?) should work.

Reemul · April 18, 2016, 5:57am

Well Tried copying a few which are causing us errors into Word and notepad but none show any sort of issues. Real pain, the text all looks fine.

fdsaion · April 18, 2016, 6:30am

The macro solution above is probably what you want.

Ginger_Yellow · April 18, 2016, 6:32am

And what happens when you copy the text from Word/Notepad into the filename?

ArmandoPenblade · April 18, 2016, 6:46am

Are you single-clicking the cell, hitting Ctrl-C/Copy, then going to filename entry box, and Pasting the cell’s contents in as a name?

IIRC, Excel cells seem to contain something very much like a Carriage Return except that it’s not quite. You only “pick it up” if you copy while having the cell itself selected. If you double-click the cell, then highlight its contents and copy them (or highlight them in the Formula line above and copy from there), it should not grab that character.

If doing that still fails to work, then yeah, the text itself contains some sorta hidden nasties you’ll need to automatically scrub, most likely.

Reemul · April 18, 2016, 7:09am

[QUOTE=Armando Penblade;3965193]Are you single-clicking the cell, hitting Ctrl-C/Copy, then going to filename entry box, and Pasting the cell’s contents in as a name?QUOTE]

Yes this is exactly what we are doing. The issue being we are actually highlighting 4 cells at a time horizontally and pasting them in to the file name entry box and say 1 in 5 times it errors out.

We want to automate the processing by exporting as a csv (ms dos file) and using a bat file to auto rename but if continues to error it will then fail.

Also when pasting out it is leaving big gaps, for example.

The pasted text from the cells is

This(Row A) File(Row b) is(Row c) Rubbish(Row D) and pasting it to the pdf it looks like this

This(BIG SPACE)File(BIG SPACE)is(BIG SPACE)Rubbish. The gap is big but it is only 1 space in real size, eg using backspace once removes the big gap. Maybe this causes the issue, but we cannot make it fail every time only say 1 in 10 times the other 9 work fine even if they do have the big gaps which we also have to manually remove.

For some quick context, I do not generally manage the IT side of a project like this but run the whole end to end process, but have been employed to take 3 million on documents, to scan and upload them in to share point under specific guidelines, searchable pdf’s and specific naming conventions based on 4 entries on a spread sheet.

Once I started and looked at their process it was going to take around 4 years to do. I have made major changes to the Prep and scan stages and we are now down to 18 months. Ideally I would like to get us down to 1 year. By out putting the scanned images against their relevant spread sheet line I hope to automate the final process from a manual one to a quicker simple rename bat file.

I just need to get the spread sheet clean. The current dept will be finished in 3 weeks and the next dept is the biggest. the companies IT depart is 1 guy who is pretty good but also run off his feet and I would like to be up running in a few weeks so all the given advice is really appreciated even if it’s just enough for me to understand the issue.

One point I may need to look at is how the spread sheet is generated, I think it’s pulled from the back end of the software they use so it could be they need to change how the sheet is generated.

ArmandoPenblade · April 18, 2016, 7:18am

Reemul:

Couple more questions (I work a lot in Excel, but am not some Uber-Guru, sadly):

Are there ever any hidden columns in the spreadsheets you’re working with? There’s a way to force Excel to only highlight/copy/paste “visibly selected” columns/rows, so if you have hidden ones, figuring that out could help.

But yeah, the big, one-character “spaces” are the funky cell-ending characters I was talking about. The idea of converting to CSV and using a Batch file or some sort of simple Python script, etc., to parse [and clean] the data and copy it elsewhere automatically is going to be monumentally faster in the long run, even if it involves a few days’ study of the relevant principles to get up to speed.

Ginger_Yellow · April 18, 2016, 7:27am

Have you tried “Replace all” on ^t (in Word)? Seems to work for me.

Edit: just tested on a five cell selection. Pasted into Word, then into the filename, I get the same error. Replacing ^t with a space and pasting the result works fine.

Scott123 · April 18, 2016, 7:27am

If you have Acrobat I think you can save as a MS Word file, and from there copy and paste should be more consistent.

Reemul · April 18, 2016, 8:05am

Fantastic, that worked. I copied the 4 columns x 400 rows in to Word ran the replace all ^t with space, it found 1838 replacements. Pasted it back in to Excel and even better it put it all in 1 row making things perfect.

Really really appreciated, could knock months off this.

barstein · April 18, 2016, 1:07pm

Glad you found a solution. Guess I’ll go ahead and share my own suggestion anyway.

Take four cells such as these:

A | B | C | D
Company Name | Insurance Company | Policy Number | Owner Name

And add a fifth column with the following:

=A2&" “&B2&” “&C2&” "&D2

or

=CONCATENATE(A2,B2,C2,D2)

Then paste down to the last row, and copy/paste from column #5 as needed.

Reemul · April 19, 2016, 1:27am

Thanks, we will also look at this one when we get the new Sheet with 2,000 entries on.

Appreciate all the help.

SergioBAM · April 19, 2016, 6:10am

Let me get this right. You already have a file on disk, it’s named after the row in the excel sheet. That row contains the real file name, but it first needs to be concatenated?

If this is the case, I would be checking out the FileSystemObject, specifically the CopyFile method https://msdn.microsoft.com/en-us/library/aa265015(v=vs.60).aspx.

Use that CONCATENATE method above to create your final filename. Then with some VBA, start at the top of the list then loop through until you get to the end of your list. Inside the loop, use:


Dim fileSys as FileSystemObject
Set fileSys = new FileSystemObject

fileSys.CopyFile "c:\" & currentRow & ".pdf", "c:\" & newFileName & ".pdf"

Add error handling like fileSys.FileExists(aFile).

If it’s already in Excel, leverage off VBA. If a problem can’t be solved in VBA within 15 minutes, then it’s a big problem!

Good luck.

Razgon · April 20, 2016, 12:35am

Okay, that has my interest. You are saying, that all these kind of issues can be solved by using VB in excel ? I need to take some courses on this then.

Say, for instance, I have this problem. I have a long list with a series of company names, adresses, but listed in one long line down, instead of in normal order. Like this :

I need to move them up so I can enter them into our CRM system, and use them with a mail merging in word as well.

Since there sometimes are an extra line for a company, since a town can be so small its added to the adress instead of having its own zip number, but all start with a phone number, I don’t think you can automate this. I move this kind of information manually, using CTRL-X all the time, but if there is ANY other way I’d really love to know.