Einführung in NumPy

NumPy für Data Science

NumPy ist eine grundlegende Bibliothek für numerische Berechnungen in Python, die in der Data Science unverzichtbar ist.

Hauptvorteile

Effiziente Arrays: Schneller als Python-Listen durch zusammenhängende Speicherung (Lokalität der Referenz)

Optimiert für moderne CPU-Architekturen

Vielseitige mathematische Funktionen

Wichtige Funktionen und Konzepte

Array-Erstellung

import numpy as np

# 1D Array
arr_1d = np.array([1, 2, 3, 4, 5])
print(arr_1d)
# Ausgabe: [1 2 3 4 5]

# 2D Array (Matrix)
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
# Ausgabe:
# [[1 2 3]
#  [4 5 6]]

# Array mit bestimmtem Datentyp
arr_float = np.array([1, 2, 3], dtype=float)
print(arr_float)
# Ausgabe: [1. 2. 3.]

# Spezielle Arrays
zeros = np.zeros((3, 3))
print(zeros)
# Ausgabe:
# [[0. 0. 0.]
#  [0. 0. 0.]
#  [0. 0. 0.]]

ones = np.ones((2, 2))
print(ones)
# Ausgabe:
# [[1. 1.]
#  [1. 1.]]

identity = np.eye(3)
print(identity)
# Ausgabe:
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

Array-Operationen

# Elementweise Operationen
arr = np.array([1, 2, 3, 4])
print(arr * 2)
# Ausgabe: [2 4 6 8]

# Matrix-Multiplikation
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print(np.dot(a, b))
# Ausgabe:
# [[19 22]
#  [43 50]]

# Transponieren
print(a.T)
# Ausgabe:
# [[1 3]
#  [2 4]]

Indexing und Slicing

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(arr[0, 1])    # 2
print(arr[:, 1])    # [2 5 8]
print(arr[1:, 1:])
# Ausgabe:
# [[5 6]
#  [8 9]]

Statistische Funktionen

arr = np.array([1, 2, 3, 4, 5])

print(np.mean(arr))   # Mittelwert
# Ausgabe: 3.0

print(np.median(arr)) # Median
# Ausgabe: 3.0

print(np.std(arr))    # Standardabweichung
# Ausgabe: 1.4142135623730951

print(np.var(arr))    # Varianz
# Ausgabe: 2.0

Lineare Algebra

# Eigenwerte und Eigenvektoren
eigenvalues, eigenvectors = np.linalg.eig(np.array([[1, 2], [2, 3]]))
print("Eigenwerte:", eigenvalues)
print("Eigenvektoren:")
print(eigenvectors)
# Ausgabe:
# Eigenwerte: [-0.23606798  4.23606798]
# Eigenvektoren:
# [[-0.85065081 -0.52573111]
#  [ 0.52573111 -0.85065081]]

# Matrixinversion
inv_matrix = np.linalg.inv(np.array([[1, 2], [3, 4]]))
print(inv_matrix)
# Ausgabe:
# [[-2.   1. ]
#  [ 1.5 -0.5]]

# Lösen linearer Gleichungssysteme
x = np.linalg.solve(np.array([[1, 2], [3, 4]]), np.array([5, 6]))
print(x)
# Ausgabe: [-4.  4.5]

Zufallszahlen

# Gleichverteilte Zufallszahlen
random_array = np.random.rand(3, 3)
print(random_array)
# Ausgabe (Beispiel):
# [[0.12345678 0.23456789 0.34567890]
#  [0.45678901 0.56789012 0.67890123]
#  [0.78901234 0.89012345 0.90123456]]

# Normalverteilte Zufallszahlen
normal_array = np.random.randn(3, 3)
print(normal_array)
# Ausgabe (Beispiel):
# [[-0.12345678  0.23456789 -0.34567890]
#  [ 0.45678901 -0.56789012  0.67890123]
#  [-0.78901234  0.89012345 -0.90123456]]

# Zufällige Ganzzahlen
random_integers = np.random.randint(0, 10, size=(3, 3))
print(random_integers)
# Ausgabe (Beispiel):
# [[3 7 2]
#  [9 0 1]
#  [5 4 8]]

Weitere wichtige NumPy-Funktionen für Data Science

Reshaping und Dimensionsänderungen

# Reshape eines Arrays
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped = arr.reshape(2, 3)
print(reshaped)
# Ausgabe:
# [[1 2 3]
#  [4 5 6]]

# Hinzufügen einer neuen Achse
new_axis = arr[np.newaxis, :]
print(new_axis.shape)
# Ausgabe: (1, 6)

# Dimensionen zusammenführen (flatten)
flattened = reshaped.flatten()
print(flattened)
# Ausgabe: [1 2 3 4 5 6]

Broadcasting

Broadcasting ermöglicht arithmetische Operationen zwischen Arrays unterschiedlicher Formen.

a = np.array([1, 2, 3])
b = np.array([[1], [2], [3]])
print(a + b)
# Ausgabe:
# [[2 3 4]
#  [3 4 5]
#  [4 5 6]]

Maskierung und Fortgeschrittenes Indexing

arr = np.array([1, 2, 3, 4, 5])
mask = arr > 2
print(arr[mask])
# Ausgabe: [3 4 5]

# Fancy indexing
indices = [0, 2, 4]
print(arr[indices])
# Ausgabe: [1 3 5]

Aggregationen und Achsenoperationen

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arr_2d, axis=0))  # Summe entlang der Spalten
# Ausgabe: [5 7 9]
print(np.mean(arr_2d, axis=1))  # Mittelwert entlang der Zeilen
# Ausgabe: [2. 5.]

Universelle Funktionen (ufuncs)

NumPy's ufuncs führen Operationen elementweise auf Arrays durch und sind hochoptimiert.

arr = np.array([0, np.pi/2, np.pi])
print(np.sin(arr))
# Ausgabe: [0.         1.         1.22464680e-16]

arr = np.array([1, 2, 3, 4])
print(np.exp(arr))
# Ausgabe: [ 2.71828183  7.3890561  20.08553692 54.59815003]

Polynome

coeffs = [1, 2, 3]  # entspricht 1 + 2x + 3x^2
x = np.array([0, 1, 2])
y = np.polyval(coeffs, x)
print(y)
# Ausgabe: [1 6 17]

Datei I/O mit NumPy

# Speichern eines Arrays
arr = np.array([1, 2, 3, 4, 5])
np.save('my_array', arr)

# Laden eines Arrays
loaded_arr = np.load('my_array.npy')
print(loaded_arr)
# Ausgabe: [1 2 3 4 5]

# CSV-Datei lesen
data = np.genfromtxt('data.csv', delimiter=',')

Praktische Tipps für NumPy in Data Science

Nutzen Sie NumPy's Broadcasting-Fähigkeiten, um Code zu vereinfachen und die Leistung zu verbessern.

Verwenden Sie fortgeschrittenes Indexing und Maskierung für effiziente Datenfilterung und -selektion.

Achten Sie auf die Achsen bei mehrdimensionalen Operationen, insbesondere bei Aggregationen.

Nutzen Sie universelle Funktionen (ufuncs) für schnelle elementweise Operationen.

Verwenden Sie NumPy's I/O-Funktionen für effizientes Lesen und Schreiben großer Datensätze.

Beachten Sie den Unterschied zwischen Kopien und Ansichten von Arrays, um unerwartete Änderungen zu vermeiden.

Nutzen Sie np.einsum() für komplexe Tensor-Operationen und -Kontraktionen.

Weiterführende Ressourcen

Offizielle NumPy-Dokumentation

NumPy-Cheatsheet

100 NumPy Exercises