Data Structure#

  • A quick overview of important data structures in Python

General Functions#

## Check object types
type(23)
type('some texts')
c = [1, 2, 'some text']
type(c)
list

Basic Structures#

## Factory functions
int(4.0)
str(4)
list()
tuple('international')
dict(one=1, two=2)
{'one': 1, 'two': 2}
## Operations

## Modulus
15 % 4
## Exponentiation
4 ** 3
-4 ** 2 # minus applies to the result
(-4)**2

## Random numbers
import random
random.randint(1,10)
7

str Object#

## Sequences

## Membership
'a' in 'track'
9 in [1, 2, 3, 4]
9 not in [1, 2, 3, 4]

## Concatenation
'this' + 'is'
' '.join(['this','is'])

## Subsetting Sequences
mylist = ['a', 'b','c','a word','e']
mylist[1]
mylist[0]
mylist[-1]
mylist[:3]
mylist[3][2:6]

##Strings
mystr = '   This is a seentence sample.  '
mystr.capitalize()
mystr.title()
mystr.upper()
mystr.lower()
mystr.rstrip()
mystr.lstrip()
mystr.strip()
mystr.find('is')
mystr.replace('is','was')

## String Comparison
## sapce > num > upper > lower
' ' > '0'
'0' > 'a'
'z' > 'm'
'm' > 'M'

## Special Characters
print('\O')
print('\t')
print('\n')
print('\'')
print('\"')
print('\\')

### Triple Quotes

multiline = """This is the first sentence
This is a second.
And A third. """

print(multiline)

### Format Strings
##`formatstring % (arguments to format)`
ntoys = 4
myname = 'Alvin'
length = 1234.5678
'%s has %d types' % (myname, ntoys)
'The toy is %.3f meters long' % (length)
\O
	


'
"
\
This is the first sentence
This is a second.
And A third. 
'The toy is 1234.568 meters long'

List#

  • A List is typically a sequence of objects all having the same type, of arbitrary length

  • It is a mutable data structure.

  • You can always append elements into the list, and it will automatically expand its size.

  • A List can include elements of different object types.

## Lists
list_empty = []
list_strings = ['Amy', 'Emma','Jane']
list_mixed = ['Amy','Emma', 5, 6]
list_embeded = [1, 3, 4, [99, 100, 101]]
len(list_empty)
len(list_mixed)
len(list_embeded)

## List operations
list_empty.append('Tom')
list_empty.append('John')
print(list_empty)
print(list_empty[1])
del list_empty[1]
print(list_empty)
list_mixed.index('Amy')
# list_mixed.index('Alvin')

## Other functions
## max(), min(), sum(), sorted(), reversed()
['Tom', 'John']
John
['Tom']
0
  • Python lists are zero-indexed (i.e., the index of the first element of the list is 0.

  • Negative indices mean counting elements from the back.

  • Syntax for a slice of a list:

    • x[-2:]: print the last two elements of x

    • x[2:]: print all elements starting from the third element

    • x[start:end:step]: the end is not included in the result

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
odd = x[::2]
even = x[1::2]
print(odd)
print(even)
[1, 3, 5, 7, 9]
[2, 4, 6, 8, 10]

Tuples#

  • A Tuple is typically a collection of objects of different types, of fixed length

  • Immutable (The tuple is a read-only data structure. It cannot be modified after it is created).

## Tuples

tuple_numbers = (1,2,3,4,5,6,7)
tuple_strings = ('mon','tue','wed','thu','fri','sat','sun')
tuple_mixed = (1, 'mon', ['feb', 2])
print(tuple_mixed)
len(tuple_mixed)
(1, 'mon', ['feb', 2])
3
## unpacking with tuples
def powers(n):
    return n, n**2, n**3
x = powers(2)
print(x)

a,b,c = powers(2)
print(a, b, c)
(2, 4, 8)
2 4 8

Dictionary#

  • Square brackets [] for list and curly brackets {} for dict.

  • A dict is for key-value mapping, and the key must be hashable.

Note

From Official Python Documentation

An object is hashable if it has a hash value which never changes during its lifetime, and can be compared to other objects. Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

## Dictionary

dict_days = {'M': 'Monday', 'T':'Tuesday', 'W':'Wednesday'}
dict_days['M']
#dict_days['S']
dict_days['S']='Sunday'
dict_days['S']
'A' in dict_days
dict_days.keys()
dict_days.values()
dict_days.get('A','NA')
'NA'
wordfreq = {
    "the":100,
    "name": 10,
    "banana": 50
}

print(wordfreq["the"])

w = "hay"

if w in wordfreq:
    print(w)
    print("Its freq is ", wordfreq[w])
else:
    print("It is not observed in the corpus")
    
## set default values   
print(wordfreq.get(w, 0))

## Use keys or values

list(wordfreq.keys())
list(wordfreq.values())

## items()
list(wordfreq.items())

## combine two lists

wordfreq1 = {
    "hay":20
}

newwordfreq = dict(list(wordfreq.items())+ list(wordfreq1.items()))
print(newwordfreq)
100
It is not observed in the corpus
0
{'the': 100, 'name': 10, 'banana': 50, 'hay': 20}

String Formatting#

## Format strings
print('Hello, {}! This is your No.{} book!'.format('Alvin',5))
print('An {0} a day keeps the {1} away.'.format('apple','doctor'))
print('An {1} a day keeps the {0} away.'.format('apple','doctor'))
print('The {noun} is {adj}!'.format(noun='book',adj='difficult'))

## Format strings with Dictionary
table = {'John': 98, 'Mary': 30, 'Jessica': 78, 'Goerge': 89, 'Jack': 45}
print('Jack: {0[Jack]:d}'.format(table))
print('Jack: {Jack:d}Jessica: {Jessica:d}'.format(**table))
Hello, Alvin! This is your No.5 book!
An apple a day keeps the doctor away.
An doctor a day keeps the apple away.
The book is difficult!
Jack: 45
Jack: 45Jessica: 78
# wrapping strings
import textwrap
sentence= '''
美國大選首場總統辯論今晚登場,辯論會上總統川普頻頻插話,並與對手拜登互相人身攻擊,兩人鬥嘴不斷。美國有線電視新聞網(CNN)主持人直呼,這是史上最混亂總統辯論。
總統辯論一向被視為美國大選最重要環節之一,不少選民專心聆聽候選人政見,並為他們颱風及口條打分數。不過,今晚在俄亥俄州克里夫蘭市(Cleveland)登場的首場總統辯論,恐怕讓許多民眾直搖頭。
90分鐘辯論開始沒多久,總統川普與民主黨總統候選人拜登(Joe Biden)就吵個不停。川普頻頻插話並對拜登展開人身攻擊,不只酸拜登造勢活動只有兩三隻小貓,並指他一點都不聰明;拜登則多次面露不耐要川普「閉嘴」,並稱他是個「小丑」(clown)。'''

print(textwrap.fill(sentence, 20))
 美國大選首場總統辯論今晚登場,辯論會上
總統川普頻頻插話,並與對手拜登互相人身攻
擊,兩人鬥嘴不斷。美國有線電視新聞網(C
NN)主持人直呼,這是史上最混亂總統辯論
。 總統辯論一向被視為美國大選最重要環節
之一,不少選民專心聆聽候選人政見,並為他
們颱風及口條打分數。不過,今晚在俄亥俄州
克里夫蘭市(Cleveland)登場的首
場總統辯論,恐怕讓許多民眾直搖頭。 90
分鐘辯論開始沒多久,總統川普與民主黨總統
候選人拜登(Joe Biden)就吵個不
停。川普頻頻插話並對拜登展開人身攻擊,不
只酸拜登造勢活動只有兩三隻小貓,並指他一
點都不聰明;拜登則多次面露不耐要川普「閉
嘴」,並稱他是個「小丑」(clown)。
## Old string formatting
import math
print('The value of pi is %1.2f' % math.pi) # specify number of digits before and after .
The value of pi is 3.14

List Comprehension#

  • A classic Pythonic way to create a list on the fly.

mul3 = [n for n in range(1,101) if n%3 == 0]
mul3
[3,
 6,
 9,
 12,
 15,
 18,
 21,
 24,
 27,
 30,
 33,
 36,
 39,
 42,
 45,
 48,
 51,
 54,
 57,
 60,
 63,
 66,
 69,
 72,
 75,
 78,
 81,
 84,
 87,
 90,
 93,
 96,
 99]
table = [[m*n for n in range(1,11)] for m in range(1,11)]
for row in table:
    print(row)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]
[4, 8, 12, 16, 20, 24, 28, 32, 36, 40]
[5, 10, 15, 20, 25, 30, 35, 40, 45, 50]
[6, 12, 18, 24, 30, 36, 42, 48, 54, 60]
[7, 14, 21, 28, 35, 42, 49, 56, 63, 70]
[8, 16, 24, 32, 40, 48, 56, 64, 72, 80]
[9, 18, 27, 36, 45, 54, 63, 72, 81, 90]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

Enumerate and Zip#

  • enumerate(): This is a handy function for loop-structure. We can get the loop index and the object from the looped structure at the same time. The result of enumerate() produces a tuple of the counter (default starts with zero) and the element of the list.

  • zip(): It takes elements from different lists and put them side by side.

x = ["alpha", "beta", "gamma", "delta"]
for n,string in enumerate(x):
    print("{}: {}".format(n, string))
0: alpha
1: beta
2: gamma
3: delta
x = ["blue", "red", "green", "yellow"]
y = ["cheese", "apple", "pea", "mustard"]
for a, b in zip(x, y):
    print("{} {}".format(a, b))
blue cheese
red apple
green pea
yellow mustard

Map, Filter, and Reduce#

  • map(): to transform elements of a list using some function.

  • filter(): to short list the elements based on certain criteria.

  • reduce(): It scans the elements from a list and combines them using a function.

from functools import reduce

def maximum(a,b):
    if a > b:
        return a
    else:
        return b
 
x = [-3, 10, 2, 5, -6, 12, 0, 1]
max_x = reduce(maximum, x)
print(max_x)



## use reduce to 
## sum all positive words from a list
def concat_num(a,b):
    
    def pos(i):
        return i > 0
    
    out = filter(pos, [a,b])
    return(sum(out))

reduce(concat_num, x)
12
30

Requirements#

numpy==1.18.1