Python datatypes

From wikinotes
Revision as of 01:22, 6 February 2022 by Will (talk | contribs) (→‎dict)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

concepts

mutable vs immutable

MUTABLE: lists, dict, set The object containing your variable data can be changed.
IMMUTABLE: bool, string, int, long, tuple, frozenset The object containing your variable data cannot be changed. (for new data, python creates separate new object)

Problem
If two variables refer to the same mutable object, changing either variable changes the other.

listA= ['a','b','c']
listB= listA  # both listB and listA point to object of listA now

listA[2]= 'z'
listA, listB  # ['a', 'z', 'c'], ['a', 'z', 'c']


Non-Nested Datatypes
The solution is to copy your original data into an entirely new object.

listA= ['a','b','c'] 
listB= listA[:]  # taking full slice of listA forces creation of new object

Nested Datatypes

import copy

listA = [ ['a','b'], ['c','d'] ]
listB = copy.deepcopy(listA)

datatypes

text

str = 'text'

str[0]       # Prints first character of the string
str[2:5]     # Prints characters starting from 3rd to 5th
str[2:]      # Prints string starting from char after 2
str * 2      # Prints string two times
str.upper()  # string as all uppercase
str.lower()  # string as all lowercase

string formatting

'{} {}-items found'.format(3, 'list')                                 # substitute each {} with an item
'{found} {type}-items found'.format(**{'found':3, 'list':'list'})     # dict-substitution
sys.stdout.write("{:<7}{:<51}{:<25}\n".format(code, name, industry))  # left/right alignment and preformatted spacing

See str.format

string templating

# less strict than formatting
import string

template = string.Template('my name is ${firstname} ${lastname}')
template.safe_substitute(firstname='will')
>>> 'my name is will ${lastname}'

split/join

'/a/b/c.txt'.split('/')
>>> ['a','b','c.txt']

['a','b','c.txt'].join('/')
>>> '/a/b/c.txt'

zfill

'15'.zfill(4)
>>> '00015'

strip leading whitespace in text blocks (like ruby's squish)

import textwrap

def foo():
    return textwrap.dedent(
    """
    def foo():
        bar
    """)

numbers

literals

int   = 123    # int
long  = 123L   # long (very big integers - py2 only)
float = 123.0   # float (a C double object)
octal = 0o755  # octal
binary = 0b010 # binary

conversion between types

bin(10)    # 0b1010
hex(10)    # 0xa
oct(10)    # 0o12
int(10.0)  # 10
float(10)   # 10.0

# change base of number
int(10, 2)  # 0b1010 -- int 10 in base-2 (binary)
int(10, 8)  # 0o12   -- int 10 in base-8 (octal)

collections

list

Mutable, resizable collection of items (of any type).

list                                # Prints complete list
list[0]                             # Prints first element of the list
list[1:3]                           # Prints elements starting from 2nd till 3rd 
list[2:]                            # Prints elements starting from 3rd element
list * 2                            # Prints list two times
list + tinylist                     # Prints concatenated lists

list.append(4,5)                    # adds to the end of the list  ex: ["abc", "efg", "4", "5"]
list.append([4,5])                  # list entry is a list 4,5     ex: ["abc", "efg"] ["4", "5"]
list.remove(4)                      # removes 4 from list
list.index('abc')                   # returns index of listItem 'abc'
list.pop(0)                         # remove listItem at index 0
",".join(['a', 'b'])                # join list items by a comma

['cat','dog','fish'].index('dog')   # Returns index number of entry 'dog' in list
'a' in ['a','b','c']                # Prints True or False if item exists in list
max([1, 36, 2])                     # prints highest number in list
min([1, 36, 2])                     # prints lowest number in list
len([1, 36, 2])                     # prints number of entries in list

[x for x in ('a','b','c')]              # list-comprehension. produces a list ['a','b','c']
[x for x in ('a','b','c') if x == 'a']  # list-comprehension with if

tuple

immutable collection of items (of any type).

tuple = tuple()          # empty tuple
tuple = (,)              # empty tuple

tuple = (1,2,3)          # tuple with items 1,2,3
tuple[1]                 # refers to '2'

tuple[1:]                # items with index 1-onwards
tuple[:-1]               # items before last index
tuple[1:-1]              # items between 1st/last item

set

mutable, unordered collection of unique items.

set = set()
set = {1,2,3}

set.issubset({1,2})                 # {1,2} is made entirely from items in var ?
set.issuperset({1,2,3,4})           # {1,2,3,4,5} contains all items from var ?
set.union({4,5})                    # items from both {4,5} and var
set.difference({3,4,5})             # {3,4,5} with items from var removed from it

set.add(4)                          # add item to set (ignored if already exists)
set.pop(4)                          # remove item from set

dict

mutable collection of key-value pairs. Can be nested.

dict = {}
dict = {'a':123, 'b':456}
dict['a']

del dict['a']                                # remove item 'a' from dict
dict.pop('a')                                # remove/return item 'a' from dict
dict.get('a',None)                           # retrieve value of 'a', default to None

reduce(lambda d, key: d.get(key) if d else None, keys, dictionary)  # retrieve nested key from list (fetch)

collection variations

namedtuple

Defines an arbitrary type, a self-documenting tuple. Even the repr is updated so that you know what the item is, and what each assignment is. Think of this like a faster, immutable dictionary.

import collections

cartype = collections.namedtuple('car',['model','license','year'])
car = cartype('honda', 'BHXZ921', 2018)
car = cartype(model='honda', license='BHXZ921', year=2018)

print(car.year)                #> 2018
print(car.model)               #> 'honda'

defaultdict

TODO:
todo

OrderedDict

TODO:
todo

Misc

dataclass

Essentially a struct or value-object. See https://docs.python.org/3/library/dataclasses.html

import dataclasses

@dataclasses.dataclass
class Employee:
    firstname: str
    lastname:  str
    employment_duration: int = 0

    def fullname(self) -> str:
        return '{} {}'.format(self.firstname, self.lastname)

You may customize the dataclass (equality, ...) and/or the fields (defaults, ...).

You can add custom validation using the __post_init__() method.

import dataclasses
from typing import Tuple

@dataclasses.dataclass
class Carpool:
    passengers: Tuple[str]

    def __post_init__(self):
        if len(self.passengers) > 4:
            raise ArgumentError("Max passengers is 4")