Dokumentation (english)

Tabular Data Tasks

ML tasks involving structured tables and time-series data

Tabular data is structured information organized in rows and columns, where each row represents an observation and each column represents a feature. This is the most common data format in business, science, and analytics-think spreadsheets, databases, and CSV files.

Supervised Learning Tasks

Classification

Predict which category an observation belongs to based on its features. The target variable is discrete.

Examples: Is this email spam? Will this customer churn? Which product category does this belong to?

Common models: Logistic Regression, Random Forest, K-Nearest Neighbors

Regression

Predict a continuous numerical value based on input features. The target variable is a real number.

Examples: What will the house price be? How much will sales be next month? What's the expected temperature?

Common models: Linear Regression, Polynomial Regression

Unsupervised Learning Tasks

Clustering

Group similar observations together without predefined labels. Discovers natural structure in data.

Examples: Customer segmentation, anomaly detection, organizing documents

Common methods: K-Means, hierarchical clustering, DBSCAN

Dimensionality Reduction

Reduce the number of features while preserving essential information. Makes data easier to visualize and process.

Examples: Compress high-dimensional data, visualize in 2D/3D, remove redundant features

Common methods: PCA, t-SNE, UMAP

Sequential Data Tasks

Time Series Forecasting

Predict future values based on past observations ordered in time. Unlike standard tabular tasks, the sequence matters.

Examples: Stock price prediction, demand forecasting, weather prediction

Common methods: ARIMA, Prophet, exponential smoothing, LSTMs

Model Families for Tabular Data

Different algorithms share core principles and can be adapted to multiple tasks. See Model Families for details on:

Decision Trees: Recursive splitting based on feature values. Interpretable, handles mixed data types, prone to overfitting.

Linear Models: Linear combinations of features. Fast, interpretable, assumes linear relationships. Includes linear regression, logistic regression, Ridge, Lasso.

Support Vector Machines: Maximize margin between classes or fit within error tubes. Handles high dimensions well with kernels. Includes SVC, SVR.

Tree Ensembles: Combine multiple decision trees. More robust than single trees. Includes Random Forest, Gradient Boosting, XGBoost.

Recommendation Systems: Predict user preferences and rank items. Collaborative filtering, content-based, matrix factorization.

Key Characteristics of Tabular Data

Structured format: Clear rows (observations) and columns (features). Each cell has a specific meaning.

Mixed feature types: Can include continuous numbers, discrete categories, ordinal values, dates, and text.

Feature engineering: Often requires creating new features, handling missing values, encoding categories, and scaling.

Interpretability: Many tabular models (linear models, decision trees) are interpretable, which is valuable in business and scientific applications.

Small to medium datasets: Unlike images or text, tabular datasets are often smaller (thousands to millions of rows rather than billions). This makes tree-based models and classical ML very competitive with neural networks.

Practical Workflow

  1. Explore the data: Understand distributions, missing values, correlations, outliers
  2. Clean and preprocess: Handle missing values, encode categories, scale features
  3. Feature engineering: Create new features, transform existing ones, select relevant features
  4. Choose appropriate task: Classification, regression, clustering, etc.
  5. Select models: Start simple (linear models, decision trees), then try ensembles
  6. Evaluate: Use appropriate metrics, cross-validation, check for overfitting
  7. Interpret: Understand which features matter, validate predictions make sense

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern
STRG + BSidepanel umschalten

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@07d372a
Historie: 78 Items