Data not match when encode , decode with python 3

im3djoe · January 29, 2023, 11:42am

Hi, I need some help with encoding and decoding data. It was working fine with python 2,
But I found it’s very buggy in python 3. not sure why it happens.

The decoded data is only a few characters different, but one error can stop the script from running.

Here is my current code.

import os
import random, base64
from hashlib import sha1
from base64 import encode,decode

def crypt2021(data, key):
x = 0
box = list(range(256))
for i in range(256):
x = (x + box[i] + ord(key[i % len(key)])) % 256
box[i], box[x] = box[x], box[i]
x = y = 0
out = []
for char in data:
x = (x + 1) % 256
y = (y + box[x]) % 256
box[x], box[y] = box[y], box[x]
out.append(chr(ord(char) ^ box[(box[x] + box[y]) % 256]))

return ''.join(out)

def tencode2021(data, key, encode=base64.b64encode, salt_length=16):
salt = ‘’
for n in range(salt_length):
salt += chr(random.randrange(256))
data = salt + crypt2021(data, sha1((key + salt).encode(‘utf-8’)).hexdigest())
return data

def tdecode2021(data, key, decode=base64.b64decode, salt_length=16):
salt = data[:salt_length]
return crypt2021(data[salt_length:],sha1((key + salt).encode(‘utf-8’)).hexdigest())

#read file
tempEncodeMel = ‘D:/Tool2023/scripts/testData’
fdAAA = open(tempEncodeMel,encoding=“utf-8”)
readDataLines = fdAAA.read()
fdAAA.close()

password = str(‘1111’)

#encode
encoded_data = tencode2021(data=readDataLines, key=password)

#write file
tempEncodeMel = ‘D:/Tool2023/scripts/encodeMel’
fwBBB = open(tempEncodeMel,‘w’,encoding=“utf-8”)
fwBBB.write(encoded_data)
fwBBB.close()

#read encode file

decodeData = ‘D:/Tool2023/scripts/encodeMel’
fdCCC = open(decodeData,encoding=“utf-8”)
loadDataLines = fdCCC.read()
fdCCC.close()

#decode
password = str(‘1111’)
decoded_data = tdecode2021(data=loadDataLines, key=password)

#write decode data to file
writeOutData = ‘D:/Tool2023/scripts/checkDecode.mel’
fwDDD = open(writeOutData,‘w’,encoding=“utf-8”)
fwDDD.write(decoded_data)
fwDDD.close()

#compare data

if decoded_data == readDataLines:
print(‘success’)
else:
print(‘fail’)

########################################################
Compare data. This is the original data.

/==========================
=============================/

This is after the decode.
/===============:==========
=============================/

Some of line is a little different
the funny thing is if you encode and decode data without writing and reading to a file, then everything works.

Does anyone experience something similar?Processing: testData…
I can not upload files here, so I save them in my dropbox

VVVSLAVA · January 29, 2023, 7:27pm

Greetings !

Friendly advice: please post your code using formatting.
To format single words and phrases as code, insert them between single backticks.
To format a block of text as program code, place the text between triple backticks.
After the first three backticks, write the language syntax you want:

Porting Python 2 Code to Python 3

codecs — Codec registry and base classes

In order for your code to be compatible with Python 2.7.11 and Python 3.7.7/3.9.7 you need to at least understand Unicode.
For example: Unicode HOWTO

Then it will become obvious to you why some character combinations in your code are “interpreted in a strange way”.
Believe me, colleague, this is not snobbery on my part. I sincerely believe that you should not write commercial code until you have absolute clarity on this issue…

To ensure compatibility, you can use, for example, the unicodedata module (along with checking which version of Python you are using).

PS:
Using the open/close methods of a file is very dangerous.
Try to always use constructs that guarantee file closures.
It is also desirable to provide exception handling.

For example:

try:
    with open(my_file, 'rb') as f:
        file_read = f.read()
except Exception:
    traceback.print_exc()

I think it would be more versatile to read and write byte streams.
And separately encode / decode / interpret this data.
NB: If you try to protect the software you develop in this way, then I have bad news for you…
Good luck and prosperity!

im3djoe · January 29, 2023, 8:44pm

Hi VVVSLAVA, thanks for the reply. I saw you always helping people here want to say you are a nice person.

I am not writing any commercial code because if I do, it should be more like a plug-in ( by the way, I am only coding for Maya). This is more of a learning/practice/fun task

I work with python 2. I am trying to understand what makes it fail with python 3.
Almost works, except for this strange miss match data issue.

as I mentioned, the data is matched within the code and only fails when written out as a file. so I think maybe the “open/close” method for writing a file is not accurate in some way.

Testing format…

import os
import maya.mel as mel
import random, base64
import maya.cmds as mc
from hashlib import sha1
from base64 import encode,decode

def crypt2021(data, key):
    x = 0
    box = list(range(256))
    for i in range(256):
        x = (x + box[i] + ord(key[i % len(key)])) % 256
        box[i], box[x] = box[x], box[i]
    x = y = 0
    out = []
    for char in data:
        x = (x + 1) % 256
        y = (y + box[x]) % 256
        box[x], box[y] = box[y], box[x]
        out.append(chr(ord(char) ^ box[(box[x] + box[y]) % 256]))

    return ''.join(out)

def tencode2021(data, key, encode=base64.b64encode, salt_length=16):
    salt = ''
    for n in range(salt_length):
        salt += chr(random.randrange(256))
    data = salt + crypt2021(data, sha1((key + salt).encode('utf-8')).hexdigest())
    return data


def tdecode2021(data, key, decode=base64.b64decode, salt_length=16):
    salt = data[:salt_length]
    return crypt2021(data[salt_length:],sha1((key + salt).encode('utf-8')).hexdigest())


#read file
tempEncodeMel = 'D:/Tool2023/scripts/testData'
fdAAA = open(tempEncodeMel,encoding="utf-8")
readDataLines = fdAAA.read()
fdAAA.close() 

password = str('1111')

#encode
encoded_data = tencode2021(data=readDataLines, key=password)

#write file
tempEncodeMel = 'D:/Tool2023/scripts/encodeMel'
fwBBB = open(tempEncodeMel,'w',encoding="utf-8")
fwBBB.write(encoded_data)
fwBBB.close()



#read edcode file

decodeData = 'D:/Tool2023/scripts/encodeMel'
fdCCC = open(decodeData,encoding="utf-8")
loadDataLines = fdCCC.read()
fdCCC.close()

#decode    
password = str('1111')
decoded_data = tdecode2021(data=loadDataLines, key=password)

#write decode data to file
writeOutData = 'D:/Tool2023/scripts/checkDecode.mel'
fwDDD = open(writeOutData,'w',encoding="utf-8")
fwDDD.write(decoded_data)
fwDDD.close()

       
#compare data
     
if decoded_data == readDataLines:
    print('success')
else:
    print('fail')

VVVSLAVA · January 30, 2023, 9:03pm

As I emphasized earlier, until you understand Strings and Character Data in Python, you shouldn’t try using more complex concepts in your code!
Especially since you view coding as a learning / practice / fun task…
Without going into too much detail, when you write bytes to a file or read those bytes from a file, you are dealing with data (We can claim this, but with some exceptions).
When you write/read this data in a string representation, you do not get data, but an interpretation of this data, due to a specific software implementation. In addition, these implementations are platform dependent and have fundamental differences in how they interpret the contents of strings and individual characters. Adding to the complexity is that in Python 3 all strings are Unicode strings (type == class ‘str’), while in Python 2 ASCII strings are of type string (type == ‘str’) and Unicode strings are unique type (type == ‘unicode’).
The same applies to individual ASCII/Unicode characters. In Python-2 for ordering: chr and unichr. In Python 3 it is chr.
Your code uses a construct like: chr(random.randrange(256)), most likely this code was used in Python 2 to generate characters limited to the extended ASCII encoding range.
At the same time, combinations of individual characters may turn out to be escape sequences ! And so on and so forth…

Although it’s not pedagogical (but I’m not a real welder ), I corrected and commented part of your code so that it works the way you want, but in general the code shows a lack of acceptance of the basic concepts and regulations of the language .
I sincerely wish you outstanding achievements on your path of self-improvement!
And please forgive my bad temper and my ugly English…

import os
import maya.mel as mel
import random, base64
import maya.cmds as mc
from hashlib import sha1
from base64 import encode,decode

def crypt2021(data, key):
    x = 0
    box = list(range(256))
    for i in range(256):
        x = (x + box[i] + ord(key[i % len(key)])) % 256
        box[i], box[x] = box[x], box[i]
    x = y = 0
    out = []
    for char in data:
        x = (x + 1) % 256
        y = (y + box[x]) % 256
        box[x], box[y] = box[y], box[x]
        out.append(chr(ord(char) ^ box[(box[x] + box[y]) % 256]))
    return ''.join(out)

def tencode2021(data, key, encode=base64.b64encode, salt_length=16):
    salt = ''
    for n in range(salt_length):
        salt += chr(random.randrange(256))
    data = salt + crypt2021(data, sha1((key + salt).encode('utf-8')).hexdigest())
    return data

def tdecode2021(data, key, decode=base64.b64decode, salt_length=16):
    salt = data[:salt_length]
    return crypt2021(data[salt_length:],sha1((key + salt).encode('utf-8')).hexdigest())


#read file
tempEncodeMel = 'D:/Tool2023/scripts/testData'
fdAAA = open(tempEncodeMel,encoding="utf-8")
readDataLines = fdAAA.read()
fdAAA.close()

password = str('1111')

#encode
encoded_data = tencode2021(data=readDataLines, key=password)


f_path = 'D:/Tool2023/scripts/encodeMel'

'''
# OLD VARIANT (Write/read strings):

# write file
fwBBB = open(f_path,'w',encoding="utf-8")
fwBBB.write(encoded_data)
fwBBB.close()
# read file
fdCCC = open(f_path,encoding="utf-8")
loadDataLines = fdCCC.read()
fdCCC.close()
# *  check
loadDataLines == encoded_data
# Result: False
'''


# *  NEW VARIANT (Write/read bytes):

# *  write file as bytes
with open(f_path, 'wb') as fwBBB:
    fwBBB.write(encoded_data.encode('utf-8'))
# *  read file as bytes
with open(f_path, 'rb') as fdCCC:
    loadDataLines = fdCCC.read().decode('utf-8')
# *  check
loadDataLines == encoded_data
# Result: True


# decode
password = '1111'
decoded_data = tdecode2021(data=loadDataLines, key=password)

# write decode data to file
# *  Despite the fact that the contens of the files visually match,
# *  you should not write/read files in the way!
writeOutData = 'G:\\Temp\\encode\\checkDecode.mel'
fwDDD = open(writeOutData,'w',encoding="utf-8")
fwDDD.write(decoded_data)
fwDDD.close()

# compare data
if decoded_data == readDataLines:
    print('success')
else:
    print('fail')

# Result: 'success'

im3djoe · January 31, 2023, 12:31am

Thanks VVVSLAVA, after your explanation and a few more google I start to understand a bit more
in Python-2 str was both str and unicode, However, in Python-3 string(text, string_to_hash) and unicode are two different types. My daily task does not require to involve any Unicode relate stuff, I am so unfamiliar with this area. thank you sooo much for your time. the way I learn code is by making a lot of mistakes and google until I can not push further, then I start to bother other people. There is nothing wrong with your temper or English, I knew few other guys they are extremely smart and straight to the point as their time is very valuable .

cheers
Joe