Dokumentation (english)

Using Existing Architectures

How to choose pre-built model architectures

Most of the time, we do not build models from scratch.

Instead, we use existing model architectures created by others. We choose them based on:

  1. the type of data (e.g. images, text, numbers)
  2. the task (e.g. classification, prediction, generation)

The task defines the goal. The goal defines how we measure success (loss). The data type defines which layers are useful.

That is why most AI people are AI engineers.

Example: Using a Pretrained Model

The easiest way is to use a model that is already trained:

import torch
import torch.nn as nn
from torchvision import models

class ResNetClassifier(nn.Module):
    def __init__(self):
        super().__init__()

        # Load a pretrained ResNet
        self.model = models.resnet18(pretrained=True)

        # Replace the last layer (final filter)
        self.model.fc = nn.Sequential(
            nn.Linear(self.model.fc.in_features, 1),
            nn.Sigmoid()  # binary classification
        )

    def forward(self, x):
        return self.model(x)

This uses a pretrained model that someone else already trained on millions of images. We only change the last layer to fit our specific task.

Building from Scratch

If a developer wants to build it completely from scratch, they use the ideas that someone else had on how the AI model works mathematically and then implement it.

Here is a ResNet-style architecture built from scratch:

import torch
import torch.nn as nn
import torch.nn.functional as F

class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ConvBlock, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(out_channels))

    def forward(self, x):
        identity = self.downsample(x)
        out = self.conv(x)
        out = self.bn(out)
        out += identity
        out = self.relu(out)
        return out

class IdentityBlock(nn.Module):
    def __init__(self, channels):
        super(IdentityBlock, self).__init__()
        self.conv = nn.Conv2d(channels, channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn = nn.BatchNorm2d(channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        identity = x  # Shortcut connection
        out = self.conv(x)
        out = self.bn(out)
        out += identity  # Element-wise addition
        out = self.relu(out)
        return out

class ResNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ResNet, self).__init__()

        self.stage1 = nn.Sequential(
            nn.ZeroPad2d(1),
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        self.stage2 = nn.Sequential(
            ConvBlock(64, 128, stride=2),
            IdentityBlock(128)
        )

        self.stage3 = nn.Sequential(
            ConvBlock(128, 256, stride=2),
            IdentityBlock(256)
        )

        self.stage4 = nn.Sequential(
            ConvBlock(256, 512, stride=2),
            IdentityBlock(512)
        )

        self.stage5 = nn.Sequential(
            ConvBlock(512, 1024, stride=2),
            IdentityBlock(1024)
        )

        self.classifier = nn.Sequential(
            nn.AdaptiveAvgPool2d((1,1)),
            nn.Flatten(),
            nn.Linear(1024, num_classes)
        )

    def forward(self, x):
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)
        x = self.stage5(x)
        x = self.classifier(x)
        return x

This implements the ResNet architecture from scratch and equals the example above. Just a bit more complex. The architecture has multiple stages with convolutional blocks.

Less than 1% of all AI developers create own AI architectures. So even if someone creates models from scratch, mostly they rebuild what is already there and change a couple of layers so it suits their purpose.

It is like a window builder that uses an existing drill to build windows. It would be crazy to build the drill himself just so he can build windows. Building windows is already hard enough to master on its own. That's why AI also has a lot of subfields that all have their justification.

Building from scratch requires understanding the mathematical principles. Using pretrained models is much faster and easier.


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 9 Stunden
Release: v4.0.0-production
Buildnummer: master@d237a7f
Historie: 10 Items