Lightnews — Scholar-powered news

Data Code 101

@datacode101.bsky.social

- Typically 30–60% fewer tokens than JSON1
- Explicit lengths and fields enable validation
- Removes redundant punctuation (braces, brackets, most quotes)
- Indentation-based structure, like YAML, uses whitespace instead of braces
- Tabular arrays: declare keys once, stream data as rows

November 6, 2025 at 6:01 AM

Data Code 101

@datacode101.bsky.social

JSON:

{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}

TOON:

users[2]{id,name,role}:
1,Alice,admin
2,Bob,user

November 6, 2025 at 6:01 AM

Data Code 101

@datacode101.bsky.social

RAG is not just an integration problem. It’s a design problem. Each layer of this stack requires deliberate choices that impact latency, quality, explainability, and cost.

If you're serious about GenAI, it's time to think in terms of stacks—not just models.

October 27, 2025 at 10:36 AM

Data Code 101

@datacode101.bsky.social

Evaluation

Tools like Ragas, Trulens, and Giskard bring much-needed observability—measuring hallucinations, relevance, grounding, and model behavior under pressure.

October 27, 2025 at 10:36 AM

Data Code 101

@datacode101.bsky.social

Text Embeddings

The quality of retrieval starts here. Open-source models (Nomic, SBERT, BGE) are gaining ground, but proprietary offerings (OpenAI, Google, Cohere) still dominate enterprise use.

October 27, 2025 at 10:36 AM

Data Code 101

@datacode101.bsky.social

Open LLM Access

Platforms like Hugging Face, Ollama, Groq, and Together AI abstract away infra complexity and speed up experimentation across models.

October 27, 2025 at 10:36 AM

Data Code 101

@datacode101.bsky.social

Data Extraction (Web + Docs)

Whether you're crawling the web (Crawl4AI, FireCrawl) or parsing PDFs (LlamaParse, Docling), raw data access is non-negotiable. No context means no quality answers.

October 27, 2025 at 10:36 AM

Data Code 101

@datacode101.bsky.social

Vector Database

Chroma, Qdrant, Weaviate, Milvus, and others power the retrieval engine behind every RAG system. Low-latency search, hybrid scoring, and scalable indexing are key to relevance.

October 27, 2025 at 10:36 AM

Data Code 101

@datacode101.bsky.social

Frameworks

LangChain, LlamaIndex, Haystack, and txtai are now essential for building orchestrated, multi-step AI workflows. These tools handle chaining, memory, routing, and tool-use logic behind the scenes.

October 27, 2025 at 10:36 AM

Data Code 101

@datacode101.bsky.social

LLMs (Open vs Closed)

Open models like LLaMA 3, Phi-4, and Mistral offer control and customization. Closed models (OpenAI, Claude, Gemini) bring powerful performance with less overhead. Your tradeoff: flexibility vs convenience.

October 27, 2025 at 10:36 AM

Data Code 101

@datacode101.bsky.social

EtLT (Extract, transform, Load, Transform) (2/2)

Best for scenarios requiring strict data security/compliance (pre-load masking) while still benefiting from the speed and flexibility of cloud data warehouse transformations.

October 19, 2025 at 4:45 AM

Data Code 101

@datacode101.bsky.social

EtLT (Extract, transform, Load, Transform) (1/2)

Attempts to balance the data governance of ETL with the speed and flexibility of ELT. A minimal transformation step is performed before loading. Essential tasks like data cleaning, basic formatting, masking sensitive data for immediate compliance.

October 19, 2025 at 4:45 AM

Data Code 101

@datacode101.bsky.social

ELT (Extract, Load, Transform) (2/2)

Transformation is implemented inside the target system (e.g., a modern cloud data warehouse like Snowflake or BigQuery, or a data lake). Highly scalable for massive and diverse (structured/unstructured) datasets.

October 19, 2025 at 4:45 AM

Data Code 101

@datacode101.bsky.social

ELT (Extract, Load, Transform) (1/2)

Modern Approach: Became popular with the rise of cloud-native data warehouses offering cheap storage and elastic compute. Raw, unprepared data is loaded immediately, offering faster data ingestion and near real-time analytics.

October 19, 2025 at 4:45 AM

Data Code 101

@datacode101.bsky.social

ETL (Extract, Transform, Load) (2/2)

Transformation is in a dedicated, separate staging server or processing engine outside the target data warehouse. Typically higher latency, as the data must wait for the transformation to complete before loading.

October 19, 2025 at 4:45 AM

Data Code 101

@datacode101.bsky.social

ETL (Extract, Transform, Load) (1/2)

Traditional Approach: Older methodology common with on-premises data warehouses where compute was limited and expensive. Data is cleaned, standardized, and sensitive information can be masked before it enters the final warehouse.

October 19, 2025 at 4:45 AM

Data Code 101

@datacode101.bsky.social

EtLT (Extract, transform, Load, Transform) (2/2)

Best for scenarios requiring strict data security/compliance (pre-load masking) while still benefiting from the speed and flexibility of cloud data warehouse transformations.

October 19, 2025 at 4:40 AM

Data Code 101

@datacode101.bsky.social

EtLT (Extract, transform, Load, Transform) (1/2)

Attempts to balance the data governance of ETL with the speed and flexibility of ELT. A minimal transformation step is performed before loading. Essential tasks like data cleaning, basic formatting, masking sensitive data for immediate compliance.

October 19, 2025 at 4:40 AM

Data Code 101

@datacode101.bsky.social

ELT (Extract, Load, Transform) (2/2)

Transformation is implemented inside the target system (e.g., a modern cloud data warehouse like Snowflake or BigQuery, or a data lake). Highly scalable for massive and diverse (structured/unstructured) datasets.

October 19, 2025 at 4:40 AM

Data Code 101

@datacode101.bsky.social

ELT (Extract, Load, Transform) (1/2)

Modern Approach: Became popular with the rise of cloud-native data warehouses offering cheap storage and elastic compute. Raw, unprepared data is loaded immediately, offering faster data ingestion and near real-time analytics.

October 19, 2025 at 4:40 AM

Data Code 101

@datacode101.bsky.social

ETL (Extract, Transform, Load) (2/2)

Transformation is in a dedicated, separate staging server or processing engine outside the target data warehouse. Typically higher latency, as the data must wait for the transformation to complete before loading.

October 19, 2025 at 4:40 AM

Data Code 101

@datacode101.bsky.social

ETL (Extract, Transform, Load) (1/2)

Traditional Approach: Older methodology common with on-premises data warehouses where compute was limited and expensive. Data is cleaned, standardized, and sensitive information can be masked before it enters the final warehouse.

October 19, 2025 at 4:40 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news