Article: Python data analysis example
Author: Brook
00
preface
This article will introduce various meanings and naming conventions of single underline and double underline ("dunder") in Python, how name mangling works, and how it affects your own Python classes.
Single and double underscores {have their own meanings in Python variable and method names. Some meanings are just as agreed and are seen as hints to programmers - while some meanings are strictly enforced by the Python interpreter.
In this article, I will discuss the following five underline patterns and naming conventions, and how they affect the behavior of Python programs:
-
Single leading underline:_ var
-
Single end underscore: var_
-
Double leading underline:__ var
-
Double leading and trailing underline:__ var__
-
Single underline:_
At the end of the article, you can find a short quick look-up table that summarizes five different underline naming conventions and their meanings. Let's start now!
01
Single leading underline_ var
When it comes to variable and method names, a single underscore prefix has a conventional meaning. It is a hint to programmers: it means that the Python community agrees on what it should mean, but the behavior of the program is not affected.
The meaning of the underscore prefix is to inform other programmers that variables or methods starting with a single underscore are for internal use only. This Convention is defined in PEP 8.
This is not mandatory for Python. Python does not have the strong distinction between "private" and "public" variables that Java does. It's like someone put forward a small underlined warning sign, saying:
"Hey, it's not really going to be part of the public interface of the class. Just leave it alone“
Take the following example:
class Test: def __init__(self): self.foo = 11 self._bar = 23
If you instantiate this class and try to access__ init__ Foo and foo defined in constructor_ bar attribute, what happens? Let's take a look:
>>> t = Test()>>> t.foo11>>> t._bar 23
You'll see_ A single underscore in bar does not prevent us from "entering" the class and accessing the value of the variable.
This is because a single underscore prefix in Python is just a convention - at least relative to variable and method names.
However, leading underscores do affect the way names are imported from modules.
Suppose you're in a place called my_module contains the following code:
# This is my_module.py: def external_func(): return 23 def _internal_func(): return 42
Now, if you import all names from a module using wildcards, Python does not import names with leading underscores (unless the module defines a _all _listthat overrides this behavior):
>>> from my_module import *>>> external_func()23>>> _internal_func()NameError: "name '_internal_func' is not defined"
By the way, wildcard imports should be avoided because they make it unclear what names exist in the namespace. For clarity, it is better to adhere to regular import.
Unlike wildcard imports, regular imports are not affected by the naming convention of leading single underscores:
>>> import my_module>>> my_module.external_func()23>>> my_module._internal_func()42
I know this may be a little confusing. If you follow the PEP 8 recommendation to avoid wildcard import, the only thing you really need to remember is this:
A single underscore is a python naming convention that indicates that the name is for internal use. It is not usually enforced by the Python interpreter, just as a hint to the programmer.
02
Single end underscore var_
Sometimes, the most appropriate name of a variable has been occupied by a keyword. Therefore, names like class or def cannot be used as variable names in Python. In this case, you can attach an underscore to resolve the naming conflict:
>>> def make_object(name, class):SyntaxError: "invalid syntax" >>> def make_object(name, class_):... pass
In short, a single end underscore (suffix) is a convention to avoid naming conflicts with Python keywords. PEP 8 explains this Convention.
03
Double leading underline__ var
So far, the meaning of all naming patterns we have involved comes from the agreed conventions. The situation is a little different for the properties of Python classes starting with double underscores, including variables and methods.
Double underscore prefixes cause the Python interpreter to override attribute names to avoid naming conflicts in subclasses.
This is also called name mangling - the interpreter changes the name of the variable so that it is not prone to conflict when the class is extended.
I know it sounds abstract. Therefore, I combined a small code example to illustrate:
class Test: def __init__(self): self.foo = 11 self._bar = 23 self.__baz = 23
Let's use the built-in dir() function to see the properties of this object:
>>> t = Test()>>> dir(t)['_Test__baz', '__class__', '__delattr__', '__dict__', '__dir__','__doc__', '__eq__', '__format__', '__ge__', '__getattribute__','__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__','__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__','__setattr__', '__sizeof__', '__str__', '__subclasshook__','__weakref__', '_bar', 'foo']
The above is a list of the properties of this object. Let's look at this list and look for our original variable name foo_ bar and__ baz, I'm sure you'll notice some interesting changes.
-
self. The foo variable is displayed in the attribute list as unmodified as foo.
-
self._bar behaves the same way - it acts in_ bar is displayed on the class. As I said before, in this case, the leading underline is just a convention. Just give the programmer a hint.
-
However, for self__ For baz, the situation looks a little different. When you search in this list__ baz, you won't see a variable with this name.
__ What happened to baz?
If you look closely, you will see that there is a name on this object_ Test__ Properties of Baz. This is the name modification made by the Python interpreter. It does this to prevent variables from being overridden in subclasses.
Let's create another class that extends the Test class and try to override the existing properties added in the constructor:
class ExtendedTest(Test): def __init__(self): super().__init__() self.foo = 'overridden' self._bar = 'overridden' self.__baz = 'overridden'
Now, you think foo_ bar and__ Will the value of baz appear on the instance of this ExtendedTest class? Let's take a look:
>>> t2 = ExtendedTest()>>> t2.foo'overridden'>>> t2._bar'overridden'>>> t2.__bazAttributeError: "'ExtendedTest' object has no attribute '__baz'"
Wait a minute, when we try to see T2__ Why do we get AttributeError when the value of baz? Name modification is triggered again! It turns out that this object doesn't even have__ baz attribute:
>>> dir(t2)['_ExtendedTest__baz', '_Test__baz', '__class__', '__delattr__','__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__','__getattribute__', '__gt__', '__hash__', '__init__', '__le__','__lt__', '__module__', '__ne__', '__new__', '__reduce__','__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__','__subclasshook__', '__weakref__', '_bar', 'foo', 'get_vars']
As you can see__ baz becomes_ ExtendedTest__baz to prevent accidental modifications:
>>> t2._ExtendedTest__baz'overridden'
But the original_ Test__baz is still:
>>> t2._Test__baz42
Double underlined name decoration is completely transparent to programmers. This is confirmed by the following examples:
class ManglingTest: def __init__(self): self.__mangled = 'hello' def get_mangled(self): return self.__mangled >>> ManglingTest().get_mangled()'hello'>>> ManglingTest().__mangledAttributeError: "'ManglingTest' object has no attribute '__mangled'"
Does the name modifier also apply to method names? Yes, it also applies. Name decoration affects all names that begin with two underscore characters ("dunders") in the context of a class:
class MangledMethod: def __method(self): return 42 def call_it(self): return self.__method() >>> MangledMethod().__method()AttributeError: "'MangledMethod' object has no attribute '__method'">>> MangledMethod().call_it()42
This is another perhaps surprising example of the use of name modification:
_MangledGlobal__mangled = 23 class MangledGlobal: def test(self): return __mangled >>> MangledGlobal().test()23
In this example, I declare a_ MangledGlobal__ Global variable of mangled. I then access the variables in the context of a class called mangledglobal. Because of the name modification, I can in the test() method of the class__ Mangled to reference_ MangledGlobal__mangled global variable.
The Python interpreter automatically converts the name__ mangled extended to_ MangledGlobal__mangled because it starts with two underscore characters. This indicates that name modifiers are not specifically associated with class properties. It applies to any name that begins with two underscore characters used in the context of the class.
There's a lot to absorb.
To be honest, these examples and explanations didn't jump out of my mind. I did some research and processing to get it out. I've been using Python for many years, but rules and special situations like this don't always come to mind.
Sometimes the most important skill for programmers is "pattern recognition" and know where to look up information. If you feel a little overwhelmed at this point, please don't worry. Take your time and try some examples in this article.
Immerse yourself in these concepts so that you can understand the general idea of name modification and some other behaviors I show you. If one day you meet them by chance, you will know what to look up in the document.
04
Double leading and double trailing underscores_ var_
Perhaps surprisingly, if a name starts and ends with a double underscore at the same time, the name modifier will not be applied. Variables surrounded by double underlined prefixes and suffixes are not modified by the Python interpreter:
class PrefixPostfixTest: def __init__(self): self.__bam__ = 42 >>> PrefixPostfixTest().__bam__42
However, Python retains names with double leading and double trailing underscores for special purposes. An example of this is init__ Object constructor, or__ call --- it enables an object to be called.
These dunder methods are often called magic methods - but many people in the Python community (including myself) don't like this method.
It is best to avoid using names that begin and end with double underscores ("dunders") in your own programs to avoid conflicts with future changes in the Python language.
05
Single underline_
By convention, sometimes a single independent underscore is used as a name to indicate that a variable is temporary or irrelevant.
For example, in the following loop, we do not need to access the running index, we can use "" To indicate that it is only a temporary value:
>>> for _ in range(32):... print('Hello, World.')
You can also use a single underscore as an "indifferent" variable in an unpacking expression to ignore specific values. Again, this meaning is "by convention" and does not trigger special behavior in the Python interpreter. A single underscore is just a valid variable name, which can be used for this purpose.
In the following code example, I split the car tuple into separate variables, but I'm only interested in color and mileage values. However, for the split expression to run successfully, I need to assign all the values contained in the tuple to the variable. In this case, "" It can be used as a placeholder variable:
>>> car = ('red', 'auto', 12, 3812.4)>>> color, _, _, mileage = car >>> color'red'>>> mileage3812.4>>> _12
In addition to being used as a temporary variable, "" Is a special variable in most Python REPL that represents the result of the most recent expression evaluated by the interpreter.
This is very convenient. For example, you can access the results of previous calculations in an interpreter session, or you are dynamically building and interacting with multiple objects without assigning names to these objects in advance:
>>> 20 + 323>>> _23>>> print(_)23 >>> list()[]>>> _.append(1)>>> _.append(2)>>> _.append(3)>>> _[1, 2, 3]
06
Summary
The following is a brief summary of the five Python underline patterns I talked about in this article: