Our Python obfuscator is written for Python 2 and needs an update so it can work with our new Python 3 code.
So unfortunately things are a bit more problematic than just porting the syntax of pyobfuscate: a) The module "compiler" is gone b) The module "parser" is not gone yet, but marked as deprecated c) The grammar has changed quite a bit Fortunately we seem to have a way forward: We can replace "compiler" with "ast", which makes things a lot cleaner. Upstream is also pointing at "ast" as a replacement for "parser", which would clean that up as well. However "ast" doesn't have line numbers for all nodes until 3.9, so we can't use it yet. We'll have to stick with "parser" for now and look at a switch to "ast" once "parser" actually gets removed. As for c), it will just be about rolling up our sleeves and start adapting the code to the new grammar.
Upstream threw another wrench in to things by breaking Symbol.is_local() for us: https://bugs.python.org/issue41840 An ugly workaround seems possible, but hopefully they'll fix this quickly and we can go back to normal.
Conversion to Python 3 done and sent upstream: https://github.com/astrand/pyobfuscate/pull/24
Should be done now. It passes all the included tests, and I couldn't see any meaningful difference when running the new and old version on code that is both Python 2 and Python 3 compatible. Since we need both the new version has gotten a suffix (i.e. "pyobfuscate3"). We could consider removing that once all Python 2 code is gone.
Looks good. I've looked through the code and verified the functionality by obfuscating hiveconf and running the unit tests (the only modification I had to do was to specify the public function names in "__all__ = []"). I ran both python2 and python3 unittests on the output from `cbrun x86_64 pyobfuscate3 hiveconf.py`. I also compared the output from pyobfuscate with pyobfuscate3. No problems found.
We get a traceback if we try to obfuscate a file containing non-ascii. In my case this was an "Å" located in a comment in the handler_unbindports.py code. The traceback we get is: >Traceback (most recent call last): > File "/usr/bin/pyobfuscate3", line 1154, in <module> > main() > File "/usr/bin/pyobfuscate3", line 1117, in main > source = open(conf.file, 'r').read() > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] >UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 67: >ordinal not in range(128) >make[3]: *** [install-agent] Error 254 >make[3]: Leaving directory `/home/nikle/dev/ctc/buildarea/BUILD/thinlinc-vsm' >make[2]: *** [install-obfusc] Error 2 >make[2]: Leaving directory `/home/nikle/dev/ctc/buildarea/BUILD/thinlinc-vsm' >error: Bad exit status from /var/tmp/rpm-tmp.0k2OhO (%install) Our Python 3 obfuscator should respect the encoding declarationor if given. In absence of the encoding declaration we should instead use the Python 3 default value, which is UTF-8. (https://docs.python.org/3/reference/lexical_analysis.html)
(In reply to Niko Lehto from comment #10) Missed to include the line causing this traceback: > Calling /usr/bin/pyobfuscate3 modules/thinlinc/vsm/handler_unbindports.py >/home/nikle/dev/ctc/buildarea/BUILDROOT/thinlinc-vsm-4.12.1post-6770.i386/opt/thinlinc/modules/thinlinc/vsm/handler_unbindports.py
We get the following traceback if we try to use the pyobfuscate script 'run_tests' in python 3.9: >Traceback (most recent call last): > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 1179, in <module> > main() > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 1154, in main > cw = CSTWalker(source, pae.pubapi) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 140, in __init__ > self.walk(elements, [self.symtab]) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk > self.walk(node, symtabs) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk > self.walk(node, symtabs) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk > self.walk(node, symtabs) > [Previous line repeated 1 more time] > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 197, in walk > self.handle_classdef(elements, symtabs) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 638, in handle_classdef > self.walk(node, symtabs + [classtab]) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk > self.walk(node, symtabs) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk > self.walk(node, symtabs) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk > self.walk(node, symtabs) > [Previous line repeated 2 more times] > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 203, in walk > self.handle_decorator(elements, symtabs) > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 709, in handle_decorator > assert name[0] == token.NAME >AssertionError
When obfuscating an iso-8859-15 (Latin9) encoded file and piping this to python 3.6, we get the following error: > SyntaxError: encoding problem: ISO-8859-15 This piping of output is what our 'run_tests' script does at the moment. The problem does not occur when either using Python 3.9 or saving the pyobfuscate output into a file first. Note that this is a different error from the one you get if the encoding is unknown, which gives a traceback followed by: > SyntaxError: unknown encoding: ISO-bad-15
(In reply to Niko Lehto from comment #13) > > File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 709, in handle_decorator > > assert name[0] == token.NAME > >AssertionError Python has changed the grammar in 3.9 so we need some tweaks.
Both issues are now fixed and a new package has been deployed.
File encoding set to `latin-1' and non-ASCII letters (outside of comments) runs fine in python 3.9.2 but crashes `pyobfuscate' with the following stack trace: > Traceback (most recent call last): > File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 1168, in <module> > main() > File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 1143, in main > cw = CSTWalker(source, pae.pubapi) > File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 143, in __init__ > cst = parser.suite(source.decode(encoding)) > File "<string>", line 2 > ä = 3 > ^ > SyntaxError: invalid character '¤' (U+00A4) Input file: > # -*- coding: latin-1; -*- > ä = 3 > print(ä)
(In reply to William Sjöblom from comment #18) > > File "<string>", line 2 > > ä = 3 > > ^ > > SyntaxError: invalid character '¤' (U+00A4) > AFAICT this is a bug in Python's parser module. It requires the input to be "str", not "bytes" like for compile(). However it needs to feed the lower layers a byte stream so it seems to always convert things to UTF-8. However if the file is tagged as something other than UTF-8, then the lower layers will get upset and complain. Everything works just fine if the file is actually UTF-8, so in most cases this is not an issue. No point in reporting this upstream as they have dropped the entire parser module for 3.10 (which is another issue, but we'll deal with that when we get there).
I could reproduce the issue and handling of characters in UTF-8 outside the ASCII range works as expected. Tested by running `python3.9 pyobfuscate' on `cpython/Lib/test/*.py' (https://github.com/python/cpython/, 3.9 branch). These ran successful results in regard to handling of encoding, apart from one character in `Lib/test/test_unicode_identifiers.py' with a trailing `VARIATION SELECTOR-17' that resulted in the following stack trace: Traceback (most recent call last): File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 1154, in <module> main() File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 1141, in main ce = ColumnExtractor(source, cw.names) File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 809, in __init__ self.parse(f) File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 840, in parse raise RuntimeError("Overlooked symbol '%s' on line %d column %d" % (t_string, srow, scol)) RuntimeError: Overlooked symbol 'x' on line 11 column 12 This is deemed out of scope as of now. Marking as closed.