Is there a place for non-@dataclass classes in Python any more?

I have previously — and somewhat famously — written
favorably
about @dataclass’s venerable progenitor,
attrs, and how you should use it for pretty
much everything.

At the time, attrs was an additional dependency, a piece of technology that
you could bolt on to your Python stack to make your particular code better.
While I advocated for it strongly, there are all the usual implicit reasons
against using a new thing. It was an additional dependency, it might not
interoperate with other convenience mechanisms for type declarations that you
were already using (i.e. NamedTuple), it might look weird to other Python
programmers familiar with existing tools, and so on. I don’t think that any of
these were good counterpoints, but there was nevertheless a robust discussion
to be had in addressing them all.

But for many years now, dataclasses have been — and currently are — built in
to the language
. They are increasingly
integrated to the toolchain at a deep level that is difficult for application
code — or even other specialized tools — to replicate. Everybody knows what
they are. Few or none of those reasons apply any longer.

For example, classes defined with @dataclass are now optimized as a C
structure might be when you compile them with
mypyc
, a trick that is extremely
useful in some circumstances, which even attrs itself now has trouble keeping
up with
.

This all raises the question for me: beyond backwards compatibility, is there
any point to having non-@dataclass classes any more? Is there any
remaining justification for writing them in new code?

Consider my original example, translated from attrs to dataclasses. First, the
non-dataclass version:

class Point3D:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

And now the dataclass one:

from dataclasses import dataclass

@dataclass
class Point3D:
    x: int
    y: int
    z: int

Many of my original points still stand. It’s still less repetitive. In fewer
characters, we’ve expressed considerably more information, and we get more
functionality (repr, sorting, hashing, etc). There doesn’t seem to be much
of a downside besides the strictness of the types, and if typing.Any were a
builtin, x: any would be fine for those who don’t want to unduly constrain
their code.

The one real downside of the latter over the former right now is the need for
an import. Which, at this point, just seems… confusing? Wouldn’t it be
nicer to be able to just write this:

class Point3D:
    x: int
    y: int
    z: int

and not need to faff around with decorator semantics and fudging the difference
between Mypy (or Pyright or Pyre) type-check-time and Mypyc or Cython compile
time? Or even better, to not need to explain the complexity of all these
weird little distinctions to new learners of Python, and to have to cover
import before class?

These tools all already treat the @dataclass decorator as a totally special
language construct, not really like a decorator at all, so to really explore it
you have to explain a special case and then a special case of a special case.
The extension hook for this special case of the special
case
notwithstanding.

If we didn’t want any new syntax, we would need a from __future__ import
dataclassification
or some such for a while, but this doesn’t seem like an
impossible bar to clear.


There are still some folks who don’t like type annotations at
all
,
and there’s still the possibility of awkward implicit changes in meaning when
transplanting code from a place with dataclassification enabled to one
without, so perhaps an entirely new unambiguous syntax could be provided. One
that more closely mirrors the meaning of parentheses in def, moving
inheritance (a feature which, whether you like it or not, is clearly far less
central to class definitions than ‘what fields do I have’) off to its own part
of the syntax:

data Point3D(x: int, y: int, z: int) from Vector:
    def method(self):
        ...

which, for the “I don’t like types” contingent, could reduce to this in the
minimal case:

data Point3D(x, y, z):
    pass

Just thinking pedagogically, I find it super compelling to imagine moving from
teaching def foo(x, y, z):... to data Foo(x, y, z):... as opposed to
@dataclass class Foo: x: int....

I don’t have any desire for semantic changes to accompany this, just to make it
possible for newcomers to ignore the circuitous historical route of the
@dataclass syntax and get straight into defining their own types with legible
reprs from the very beginning of their Python journey.

(And make it possible for me to skip a couple of lines of boilerplate in short
examples, as a bonus.)


I’m curious to know what y’all think, though. Shoot me an
email
or a
toot
and let me know.

In particular:

  1. Do you think there’s some reason I’m missing why Python’s current method for
    defining classes via a bunch of dunder methods is still better than
    dataclasses, or should stick around into the future for reasons beyond
    “compatibility”?
  2. Do you think “compatibility” is sufficient reason to keep the syntax the way
    it is forever, and I’m underestimating the cost of adding a keyword like
    this?
  3. If you do think that a change should be made, would you prefer:
    1. changing the meaning of class itself via a __future__ import,
    2. a new data keyword like the one I’ve proposed,
    3. a new keyword that functions exactly like the one I have proposed but
      really want to bikeshed the word data a bunch,
    4. something more incremental like just putting dataclass and field in
      builtins,
    5. or an option I haven’t even contemplated here?

If I find I’m not alone in this perhaps I will wander over to the Python
discussion boards
to have a more
substantive conversation…


Thank you to my patrons who are helping
me while I try to turn… whatever this is… along with open source maintenance
and application development, into a real job. Do you want to see me pursue
ideas like this one further? If so, you can support me on Patreon as
well
!

Read More