Data Code 101
@datacode101.bsky.social
35 followers 49 following 190 posts
Data / Software Engineering
Posts Media Videos Starter Packs
datacode101.bsky.social
Because you tell the system what you want via SQL, there are clauses, the “verbs“ to describe the action you want with the data. This is the order of the physical execution behind the scenes.
datacode101.bsky.social
SQL is declarative. It’s cool. However, the database still needs to translate the SQL query into “procedural steps“ (e.g., reading these tables, selecting this field,…). The mathematical framework for this step is Relational Algebra. It comprises a set of operators that operate on relations.
datacode101.bsky.social
Edgar F. Codd. In 1970, working at IBM, Codd published his paper, “A Relational Model of Data for Large Shared Data Banks.” It introduced a new model for managing data, now accepted as the dominant approach for Relational Database Management Systems (RDBMS).
www.seas.upenn.edu/~zives/03f/c...
#sql
www.seas.upenn.edu
datacode101.bsky.social
Autogen

Microsoft open-source framework for building cooperating multi-agent systems.

Extensible, Python-based, modular abstractions for agent loops, memory policies, dynamic routing, and tool APIs.

Use for: programmable building blocks for multi-agent without committing to a vendor runtime.
datacode101.bsky.social
IBM Bee

IBM’s open-source, no-code plus Python/TypeScript framework.

Enterprise-grade orientation, open-source, multi-agent collaboration across heterogeneous implementations, focus on operational automation and ROI, steeper learning curve.

Use for: Alignment to IBM’s stack, deep governance.
datacode101.bsky.social
CrewAI

Role-based multi-agent framework with “crews” and flows, aiming to simplify production coordination.

Large community and fast iteration pace. Autonomous crews and deterministic flows, cost-aware patterns. self-hosted options.

Use for: Pragmatic multi-agent teams with clear roles and tasks
datacode101.bsky.social
AWS Agent Squad

AWS Labs open-source orchestrator for multi-agent teams.

Deploy collaborative agent systems on AWS. Intent classification, streaming support, parallel team coordination. Aligns with Bedrock agents/Flows

Use for: Scalable, teams on AWS wanting open-source orchestration
datacode101.bsky.social
LlamaIndex

Centered on retrieval-augmented generation (RAG) with agent capabilities, event-driven, multi-step workflows.

Data connectors, indexing strategies. Flexible data handling, but requires developer setup.

Use for: Retrieval-heavy agents needing answers over heterogeneous enterprise data
datacode101.bsky.social
OpenAI Agent SDK

SDK that supersedes Swarm for agents on OpenAI's platform.

Tight integration with GPT models for fast prototyping, but operates within a closed ecosystem. Built-in external tools (web search, local file search, operator)

Use for: Rapidly building tool-using agents powered by GPT.
datacode101.bsky.social
Google ADK

Google’s open-source Agent Development Kit enabling single and multi-agent app.

Develop reasoning AI agents with powerful cloud-scale tools. Optimized for Gemini and Vertex AI but model- and deployment-agnostic by design.

Use for: Building scalable, enterprises needing agent teams
datacode101.bsky.social
LangGraph

A graph/state-machine-first orchestration layer for agents built on the LangChain ecosystem.

Build custom multi-agent workflows using graph structures.

Highly flexible for complex, flow-controlled tasks but requires coding skills.

Use for: Controlled, step-by-step agent interactions.
datacode101.bsky.social
AI Agent Frameworks

The framework shapes how your agent thinks, acts, and connects to tools and data. LLMs are the brain, frameworks are the wiring connecting different parts.

Image by /in/rakeshgohel01
datacode101.bsky.social
Data Lakehouse

Model: Hybrid architecture combining a data lake's low-cost, flexible storage with data warehouse's robust management (ACID transactions, schema enforcement).

Best For: BI and ML on a single platform. Analytics and data science workloads.

Examples: Databricks, Snowflake, Dremio.
datacode101.bsky.social
Data Lakehouses: Unified Analytics Platform #Lakehouse

Lakehouses combine the scalability and flexibility of data lakes with the reliability and governance of data warehouses. Eliminate data silos and duplication. Priorize long-term scalability, data variety, massive datasets, real-time content.
datacode101.bsky.social
Data Warehouse

Model: Highly structured repository for filtered and transformed data (schema-on-write). Fast querying, single source of truth for historical analysis.

Best For: BI and reporting. Answering predefined business questions quickly and reliably.

Examples: Snowflake, BigQuery, Redshift.
datacode101.bsky.social
Data Warehouses: Structured Analytics Powerhouse #DWH

Data warehouses excel when your primary focus is business intelligence and structured reporting. They use a schema-on-write approach, meaning data must be cleaned and structured before storage. For strict data lineage and governance, SQL skills.
datacode101.bsky.social
Data Lake

Model: A vast, centralized repository that stores enormous volumes of raw data in its native format (schema-on-read).

Best For: Data science, ML model training, exploratory analysis where questions are not yet defined.

Examples: Amazon S3, Azure ADSL, Google Cloud Storage.
datacode101.bsky.social
Data Lakes: Flexible Data Repository #DataLake

Data lakes store raw data in its native format, supporting structured, semi-structured, and unstructured data.

Flexibility for machine learning and data science, cost-effective storage, real-time BI and reporting aren't critical requirements.
datacode101.bsky.social
While operational databases are the engines running your day-to-day applications, large-scale analytical systems are designed not for rapid, small transactions, but for complex, large-scale queries and aggregations unlocking insights from vast amounts of historical information.

#DataEngineer #OLAP
Data Lake, Warehouse, Lakehouse
datacode101.bsky.social
Graph Databases #GraphDatabases

Model: Nodes, edges, and properties to represent and query relationships.

Best For: Social networks, fraud detection, and recommendation engines.

Examples: Neo4j, Amazon Neptune, TigerGraph.
datacode101.bsky.social
Time-Series Databases #TimesSeriesDatabases

Model: Optimized for time-stamped data points.

Best For: Monitoring systems, IoT sensor data, and financial market data.

Examples: InfluxDB, TimescaleDB, Prometheus.
datacode101.bsky.social
NoSQL Databases #NoSQL

Model: Varies by type—Key-Value, Document, Column-Family, or Graph.

Best For: Big data applications, real-time systems, and use cases needing high scalability and flexible schemas.

Examples: MongoDB, Apache Cassandra, Redis, DynamoDB, Couchbase.
datacode101.bsky.social
Relational Databases #RDBMS

Model: Structured tables with rows and columns (schema-on-write).

Best For: Transactional systems (OLTP), ERPs, and CRMs requiring ACID compliance.

Examples: PostgreSQL, MySQL, Microsoft SQL Server, Oracle.
datacode101.bsky.social
Undestanding the Types of Databases

Choosing the right database is a critical architectural decision. Each type is a specialized tool designed for a specific job.

Here’s a breakdown of the essentials:
datacode101.bsky.social
By stage:

- Ingestion → Kinesis (AWS), Event Hub (Azure), Pub/Sub (GCP)
- Computation → EMR (AWS), Databricks (Azure), DataProc/DataFlow (GCP)
- Data Warehouse → Redshift (AWS), Synapse/SQL (Azure), BigQuery (GCP)
- Presentation → QuickSight, Power BI, Colab/Looker