Python "StringLike" object advice

Hi all,

I wanted to share with you something I’ve been researching, and ask if any of you have advice on how I should proceed with it (or even if I should).

I want to wrap a string in a class, which may freely modify the string, but which will seamlessly evaluate as that string.

Print, format, concatenation with “a” + obj, everything. As far as the interpreter is concerned, an instance of this StringLike object should be a string, but one which happens to have any user-defined functionality as well.

I’ve been working on this for quite a while now - originally I did try subclassing the str metaclass, and inheriting directly from string. That worked very well, apart from one small problem: the object was still actually a string, and therefore totally immutable. I could override all the magic methods to return modified strings, but print( myStringLike ) or otherwise directly evaluating the object would always return the original string from its initialisation.

I’ve since started again with this: https://gist.github.com/edart76/f805fb9200a3a529c5178f2413557712

It’s a different, cleaner approach, which allows more control through methods, total freedom to modify the string, a clear interface for doing so, and more conventional class/metaclass structure - however, as noted in the gist, the new object is not a string, which causes problems with concatenation and probably many more things I haven’t found yet.

Long story short, I need to make this random object very very closely resemble a string, ideally close enough that isinstance(StringLike, str) evaluates to true. I don’t know if I should pursue multiple inheritance in the metaclass, how any of this interacts with the special interpreter rules regarding strings themselves, or even if this is possible without ctypes. I don’t know anything really.

If anyone here can share a perspective on this, I would be very grateful to hear it, I’m literally going bald over it.
Thank you

Probably a massive bastardization of magic methods - but if you really want to turn something that is inherently imutable to mutable you’ll need to break a few eggs…

class SorryString(str):

  def __init__(self, value):

    super(SorryString, self).__init__(value)
    self._value = value

  def set_value(self, value):
    self._value = value

  def __str__(self):
    return self._value

This will do the str representation - but you’d need to override all the magic method like your currently doing potentially.

BTW - to match isInstance you can use the ABC Abstract class methods to register it as a string type which will return True on type comparisons.

https://docs.python.org/3/library/abc.html#abc.ABCMeta.register

-c

1 Like

So one option would be to do something like PyMel does with their ProxyUnicode class.

1 Like

Thanks for the point to the register method, however your solution is what I tried originally in terms of inheriting from str directly, although you actually need to override new, not init (see http://blog.redturtle.it/2014/09/10/python-strings).

It doesn’t actually matter if you override str or any other magic method - while this does work for concatenation, formatting, etc, the core value represented by the object itself (ie in print x) can never change.

1 Like

print should print whatever is returned by __str__ and if that doesn’t exist than __repr__

In [1]: class StupidString(str):
   ...:     def __str__(self):
   ...:         return 'derp'
   ...:

In [2]: ss = StupidString('herp')

In [3]: print(ss)
derp

Another option is to use bytearray which is in fact mutable, but you are restricted to bytes.
While its not an instance of a string, it can be converted to one fairly easily.

I got it working for now: https://gist.github.com/edart76/0eb79b1dadf8daf3b9a8490f7abf9c8b

I wasn’t able to reproduce the immutability problem with this new system, and the internal value string can be modified without issue - it’s likely there was some other problem in my old code. I also couldn’t find a way to use the ABCMeta register method without creating an inheritance cycle, so now it passes isinstance, but not issubclass. This is inconsequential for what I need, and I’ve spent too much time on it to throw out a working solution, but it’s still annoying.

Thanks for your help on this.

Generally you’ll want to be careful about mutable strings being used as strings, because calling code then has to worry if that “string” is potentially changing unexpectedly – it’ll make it harder for users to really use the code “as a string”.

OTOH emulating string behavior with an immutable class is not that hard, and can make some kinds of problems easier to handle. For example here’s one I did a while back that does case-insensitive and always-right-slashed file name comparison. It’s pretty similar to what you were doing but with a narrower range (it derives from unicode rather than string, but it in this case it does pass both the isinstance and issubclass for that)

class UPath(unicode):
    '''
    Immutable path object with enforced right slash pathing
    On windows, equality tests done case-insenstively
    '''

    def __new__(cls, *args, **kwargs):
        """
        override __new__ to so that we can pass multiple arguments in derived classes
        """
        return super(UPath, cls).__new__(cls, UPath.as_path(args[0]))

    def __init__(self, dta):
        if not dta:
            raise UPathError("No path provided")
        self._low = self.lower()
        if os.name == 'nt':
            self._hsh = hash(self._low)
        else:
            self._hsh = hash(self)

    @classmethod
    def as_path(cls, data):
        if hasattr(data, '_uni'):
            return data._uni

        unicoded = data.encode('utf-8')
        result = os.path.normpath(data)
        result = os.path.expandvars(result)
        result = result.replace('\\', '/')
        # preserve double right slash for perforce and UNC paths
        is_depot_path = result.startswith("//")
        while "//" in result:
            result = result.replace('//', '/')
        if is_depot_path: result = "/" + result
        return result

    def is_abs(self):
        if self[0] == '/': return True
        if ":" in self[:2]: return True
        return False

    def down(self):
        pieces = self.split('/')
        pieces.reverse()
        result = pieces.pop()
        while pieces:
            yield UPath(result)
            result += "/"
            result += pieces.pop()
        yield self

    def up(self):
        pieces = self.split('/')
        while pieces:
            yield UPath("/".join(pieces))
            pieces.pop()

    def __hash__(self):
        return self._hsh

    def __eq__(self, other):
        if not other:
            return False
        if hasattr(other, '_hsh'):
            return self._hsh == other._hsh
        return self._low == self.as_path(other).lower()

    def __add__(self, other):
        if other.startswith("."):
            return UPath(unicode(self) + other)
        return UPath(unicode(self) + "/" + other)

    def __radd__(self, other):
        return UPath(other + "/" + unicode(self))

    def __div__(self, other):
        return self.__add__(other)

    def __rdiv__(self, other):
        return self.__radd__(other)

    def __contains__(self, item):
        if os.name != 'nt':
            cmp = item._uni if hasattr(item, '_uni') else item
            return self.__contains__(cmp)
        else:
            cmp = item._low if hasattr(item, '_low') else str(item).lower()
            return self._low.__contains__(cmp)

    def __getattr__(self, item):
        if item in self.__dict__:
            return self.__dict__[item]
        return self.__getattribute__(item)

    def contains(self, path):
        other = UPath(path)
        return other._low.startswith(self._low)

    def replace(self, old, new):
        repl = unicode(self).replace(old, new)
        return UPath(repl)
1 Like