Lightnews — Scholar-powered news

Data Code 101 @datacode101.bsky.social · 8d

Because you tell the system what you want via SQL, there are clauses, the “verbs“ to describe the action you want with the data. This is the order of the physical execution behind the scenes.

1

Data Code 101 @datacode101.bsky.social · 8d

SQL is declarative. It’s cool. However, the database still needs to translate the SQL query into “procedural steps“ (e.g., reading these tables, selecting this field,…). The mathematical framework for this step is Relational Algebra. It comprises a set of operators that operate on relations.

1 1

Data Code 101 @datacode101.bsky.social · 8d

Edgar F. Codd. In 1970, working at IBM, Codd published his paper, “A Relational Model of Data for Large Shared Data Banks.” It introduced a new model for managing data, now accepted as the dominant approach for Relational Database Management Systems (RDBMS).
www.seas.upenn.edu/~zives/03f/c...
#sql

www.seas.upenn.edu

1 1 2

Data Code 101 @datacode101.bsky.social · 29d

Autogen

Microsoft open-source framework for building cooperating multi-agent systems.

Extensible, Python-based, modular abstractions for agent loops, memory policies, dynamic routing, and tool APIs.

Use for: programmable building blocks for multi-agent without committing to a vendor runtime.

1

Data Code 101 @datacode101.bsky.social · 29d

IBM Bee

IBM’s open-source, no-code plus Python/TypeScript framework.

Enterprise-grade orientation, open-source, multi-agent collaboration across heterogeneous implementations, focus on operational automation and ROI, steeper learning curve.

Use for: Alignment to IBM’s stack, deep governance.

1 1

Data Code 101 @datacode101.bsky.social · 29d

CrewAI

Role-based multi-agent framework with “crews” and flows, aiming to simplify production coordination.

Large community and fast iteration pace. Autonomous crews and deterministic flows, cost-aware patterns. self-hosted options.

Use for: Pragmatic multi-agent teams with clear roles and tasks

1

Data Code 101 @datacode101.bsky.social · 29d

AWS Agent Squad

AWS Labs open-source orchestrator for multi-agent teams.

Deploy collaborative agent systems on AWS. Intent classification, streaming support, parallel team coordination. Aligns with Bedrock agents/Flows

Use for: Scalable, teams on AWS wanting open-source orchestration

1 1

Data Code 101 @datacode101.bsky.social · 29d

LlamaIndex

Centered on retrieval-augmented generation (RAG) with agent capabilities, event-driven, multi-step workflows.

Data connectors, indexing strategies. Flexible data handling, but requires developer setup.

Use for: Retrieval-heavy agents needing answers over heterogeneous enterprise data

1

Data Code 101 @datacode101.bsky.social · 29d

OpenAI Agent SDK

SDK that supersedes Swarm for agents on OpenAI's platform.

Tight integration with GPT models for fast prototyping, but operates within a closed ecosystem. Built-in external tools (web search, local file search, operator)

Use for: Rapidly building tool-using agents powered by GPT.

1

Data Code 101 @datacode101.bsky.social · 29d

Google ADK

Google’s open-source Agent Development Kit enabling single and multi-agent app.

Develop reasoning AI agents with powerful cloud-scale tools. Optimized for Gemini and Vertex AI but model- and deployment-agnostic by design.

Use for: Building scalable, enterprises needing agent teams

1

Data Code 101 @datacode101.bsky.social · 29d

LangGraph

A graph/state-machine-first orchestration layer for agents built on the LangChain ecosystem.

Build custom multi-agent workflows using graph structures.

Highly flexible for complex, flow-controlled tasks but requires coding skills.

Use for: Controlled, step-by-step agent interactions.

1

Data Code 101 @datacode101.bsky.social · 29d

AI Agent Frameworks

The framework shapes how your agent thinks, acts, and connects to tools and data. LLMs are the brain, frameworks are the wiring connecting different parts.

Image by /in/rakeshgohel01

1

Data Code 101 @datacode101.bsky.social · Sep 15

Data Lakehouse

Model: Hybrid architecture combining a data lake's low-cost, flexible storage with data warehouse's robust management (ACID transactions, schema enforcement).

Best For: BI and ML on a single platform. Analytics and data science workloads.

Examples: Databricks, Snowflake, Dremio.

Data Code 101 @datacode101.bsky.social · Sep 15

Data Lakehouses: Unified Analytics Platform #Lakehouse

Lakehouses combine the scalability and flexibility of data lakes with the reliability and governance of data warehouses. Eliminate data silos and duplication. Priorize long-term scalability, data variety, massive datasets, real-time content.

1

Data Code 101 @datacode101.bsky.social · Sep 15

Data Warehouse

Model: Highly structured repository for filtered and transformed data (schema-on-write). Fast querying, single source of truth for historical analysis.

Best For: BI and reporting. Answering predefined business questions quickly and reliably.

Examples: Snowflake, BigQuery, Redshift.

1

Data Code 101 @datacode101.bsky.social · Sep 15

Data Warehouses: Structured Analytics Powerhouse #DWH

Data warehouses excel when your primary focus is business intelligence and structured reporting. They use a schema-on-write approach, meaning data must be cleaned and structured before storage. For strict data lineage and governance, SQL skills.

1

Data Code 101 @datacode101.bsky.social · Sep 15

Data Lake

Model: A vast, centralized repository that stores enormous volumes of raw data in its native format (schema-on-read).

Best For: Data science, ML model training, exploratory analysis where questions are not yet defined.

Examples: Amazon S3, Azure ADSL, Google Cloud Storage.

1

Data Code 101 @datacode101.bsky.social · Sep 15

Data Lakes: Flexible Data Repository #DataLake

Data lakes store raw data in its native format, supporting structured, semi-structured, and unstructured data.

Flexibility for machine learning and data science, cost-effective storage, real-time BI and reporting aren't critical requirements.

1 1

Data Code 101 @datacode101.bsky.social · Sep 15

While operational databases are the engines running your day-to-day applications, large-scale analytical systems are designed not for rapid, small transactions, but for complex, large-scale queries and aggregations unlocking insights from vast amounts of historical information.

#DataEngineer #OLAP

1 1

Data Code 101 @datacode101.bsky.social · Sep 15

Graph Databases #GraphDatabases

Model: Nodes, edges, and properties to represent and query relationships.

Best For: Social networks, fraud detection, and recommendation engines.

Examples: Neo4j, Amazon Neptune, TigerGraph.

Data Code 101 @datacode101.bsky.social · Sep 15

Time-Series Databases #TimesSeriesDatabases

Model: Optimized for time-stamped data points.

Best For: Monitoring systems, IoT sensor data, and financial market data.

Examples: InfluxDB, TimescaleDB, Prometheus.

1

Data Code 101 @datacode101.bsky.social · Sep 15

NoSQL Databases #NoSQL

Model: Varies by type—Key-Value, Document, Column-Family, or Graph.

Best For: Big data applications, real-time systems, and use cases needing high scalability and flexible schemas.

Examples: MongoDB, Apache Cassandra, Redis, DynamoDB, Couchbase.

1 1

Data Code 101 @datacode101.bsky.social · Sep 15

Relational Databases #RDBMS

Model: Structured tables with rows and columns (schema-on-write).

Best For: Transactional systems (OLTP), ERPs, and CRMs requiring ACID compliance.

Examples: PostgreSQL, MySQL, Microsoft SQL Server, Oracle.

1

Data Code 101 @datacode101.bsky.social · Sep 15

Undestanding the Types of Databases

Choosing the right database is a critical architectural decision. Each type is a specialized tool designed for a specific job.

Here’s a breakdown of the essentials:

1

Data Code 101 @datacode101.bsky.social · Sep 15

By stage:

- Ingestion → Kinesis (AWS), Event Hub (Azure), Pub/Sub (GCP)
- Computation → EMR (AWS), Databricks (Azure), DataProc/DataFlow (GCP)
- Data Warehouse → Redshift (AWS), Synapse/SQL (Azure), BigQuery (GCP)
- Presentation → QuickSight, Power BI, Colab/Looker

1