Py - Clean illegal character in filename

Hi,
I’m doing a production tool which copy files between folders in our pipeline, on Windows 10. My problem is that some files have been renamed with copy/paste by some people, and it seems to add illegal characters in the file names. Those characters are invisible in explorer.exe:

The illegals characters are between ‘0’ and ‘3’ (you should be able to cc them):
pixPath = u"E:/col_udim_test.100​​​​​3.jpg"

os.rename and shutil.copy give errors when trying to handle the files:
os.rename(path + f, path + unicode(f))
WindowsError: [Error 123] La syntaxe du nom de fichier, de repertoire ou de volume est incorrecte

shutil.copy(path + f, path + 'test/' + unicode(f).replace('?', ''))

    with open(src, 'rb') as fsrc:
IOError: [Errno 22] invalid mode ('rb') or filename: 'E:/col_udim_test.100?????3.jpg'

Trying to get hexa of illegal character n:
hex(ord(n)) gives me0x200b`

Anyone have an idea how to clean those file names? I was thinking about converting the unicode to hexa, remove unwanted hex and convert back to unicode,

Hi @BenWall, looks like you can use the string decode/encode method:

filename = 'foo'

filename=filename.decode('utf-8','ignore').encode("utf-8")

print filename

From:

2 Likes

Additionally, in windows there are illegal filename characters that you could remove:

invalid = '<>:"/\|?* '

for char in invalid:
	filename = filename.replace(char, '')
	
print filename

Thanks for your quick answer Chalk.
That will allow me to clean the strings inside python, but I still can’t handle the files with os or shutil. It seems they can’t work with blated filenames.
Do you know of an obscure python module which would do the trick ?

Hey @BenWall,

Might have to brute force this one - I’m more of a bat file man myself, but PowerShell is all the rage these days - this may get you through. Please test with example data first - PowerShell path to your folder location, then:

get-childitem *.* | foreach {rename-item $_ $.name.replace("?", "")}

*.* could be .jpeg

I think you can use regex patterns too:

get-childitem *.* | foreach {rename-item $_ $_.name.replace("\<|\>|\:|\/|\\\|\?|\*", "")}