Lightnews — Scholar-powered news

Diomidis Spinellis @coolsweng.bsky.social · 1d

Real backups must to be automatically maintained as multiple encrypted, integrity-checked, tested, monitored, and documented historical copies, on different media, offsite, and offline or immutable.

2 1

Diomidis Spinellis @coolsweng.bsky.social · 1d

RAID, snapshots, Apple Time Machine, Windows Volume Shadow, cloud storage, availability zone mirroring, database replication, Git/GitHub, DRBD. These are all useful, but NOT true backups.

1 2 1

Diomidis Spinellis @coolsweng.bsky.social · 1d

Following the permanent loss of 858TB data due to a fire in a South Korea government datacenter, here's a reminder on the several technologies that are NOT backup solutions:

koreajoongangdaily.joins.com/news/2025-10...

NIRS fire destroys government's cloud storage system, no backups available

A fire at the National Information Resources Service (NIRS) Daejeon headquarters destroyed the government’s G-Drive cloud storage system, erasing work files saved individually by some 750,000 civil se...

koreajoongangdaily.joins.com

1 2 6

Diomidis Spinellis @coolsweng.bsky.social · 5d

Saw the following hidden deep inside the Linux kernel arm compilation commands. I don't think that the arguments were arbitrarily ordered 😃

ar mPiT init/main.o vmlinux.a arch/arm/kernel/head.o

2 5

Diomidis Spinellis @coolsweng.bsky.social · 12d

Why I Choose Email Over Messaging

My colleagues and friends know that I prefer to communicate with them via email rather than chat messaging. There are many benefits in such a choice. You may want to consider them and adopt the same stance.

www.spinellis.gr/blog/2025092...

4 9

Diomidis Spinellis @coolsweng.bsky.social · 15d

The proposed changes are well reasoned and documented. More thought could have been given to asking for shorter proposals. A good proposal takes months to write, which is time not devoted to research.

1

Diomidis Spinellis @coolsweng.bsky.social · 19d

With all the GenAI buzz we tend to forget the value of 100% correct deterministic tools. Thank you Rust clippy!

$Checking uu_sed v0.0.1 (/xfs/dds/sed/src/uu/sed) error: manual implementation of `.is_multiple_of()` --> src/uu/sed/src/fast_io.rs:869:8 | 869 | if (out_off + head_align) % block_size != 0 { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace with: `!(out_off + head_align).is_multiple_of(block_size)` | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_is_multiple_of = note: `-D clippy::manual-is-multiple-of` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::manual_is_multiple_of)]`$

4 13

Diomidis Spinellis @coolsweng.bsky.social · Sep 5

TIL that setting Turnitin checks to *include* bibliographic references (a check typically disabled, as it generates false positive matches) can help identifying hallucinated generative AI references (as these are less likely to appear elsewhere).

1 5

Diomidis Spinellis @coolsweng.bsky.social · Aug 30

A special place in Hell is reserved who those who create in their programs custom editing areas where the operating system's keyboard editing bindings, copy, paste, and our macros stop working.

1 6

Reposted by Diomidis Spinellis

Hannah Shelley, MLIS (Metadata, Lattes & Impostor Syndrome) @hannahshelley.bsky.social · Aug 13

Since search is dead, how soon do you think Google Scholar is headed for the Google Graveyard? I'm betting it's soon, and academia is NOT prepared

Google Scholar Is Doomed

Academia built entire careers on a free Google service with zero guarantees. What could go wrong?

hannahshelley.neocities.org

48 340 720

Diomidis Spinellis @coolsweng.bsky.social · Aug 27

Google Scholar's lack of an API hinders automation and scientometric studies. In common with all proprietary academic databases, searches are opaque and non-replicable. With the Alexandria3k Python package you can run sophisticated queries on your laptop. dspinellis.github.io/alexandria3k/

Logos of Crossref, DataCire, ORCID, ROR, PubMed, USPTO

3 7

Diomidis Spinellis @coolsweng.bsky.social · Aug 24

That's a first! As I write a program's error message, I catch myself thinking whether GenAI will be able to understand it and produce a useful suggestion. (It does.)

That message means your script `fetch-uspto.sh` is expecting an environment variable named **`MYODP_KEY`**, which hasn’t been set in your shell.

This usually happens when a script needs an API key to authenticate against the USPTO Open Data Portal (ODP).

To fix it, you need to:

1. **Obtain the API key**

* Go to the [USPTO Open Data Portal](https://developer.uspto.gov/data/bulk-search) and sign in (or register).
* Generate an API key.

2. **Set the environment variable** before running the script:

```bash
export MYODP_KEY="your_api_key_here"
./fetch-uspto.sh
```

Or prepend it inline:

```bash
MYODP_KEY="your_api_key_here" ./fetch-uspto.sh
```

3. If you want it to persist across sessions, add the line to your shell config (e.g., `~/.bashrc`, `~/.zshrc`):

```bash
export MYODP_KEY="your_api_key_here"
```

---

Do you want me to show you how to modify `fetch-uspto.sh` so that it gives clearer instructions (like printing how to set the variable), instead of just failing?

3

Diomidis Spinellis @coolsweng.bsky.social · Aug 14

Many years ago I contributed to #GraphViz code for several new node shapes, including UML notes. Now #ChatGPT advises me how to use them! 😃

$— In UML create a comment note. — In UML diagrams, a **comment** (or “note”) is shown as a rectangle with a dog-eared corner. Graphviz can mimic this by using `shape=note`. Example in DOT: ```dot digraph UML_Comment { rankdir=LR; node [fontname="Arial", fontsize=12]; // Regular UML component A [label="«component»\nOrder Service", shape=component]; // UML comment Cmt [label="Processes orders\nand handles payments", shape=note, style=filled, fillcolor="#ffffcc"]; // Dashed association from comment to element Cmt -> A [style=dashed, arrowhead=none]; } ``` **Key points:** * `shape=note` → dog-eared corner rectangle. * `fillcolor` → makes the note background match UML style (light yellow is common). * `style=dashed, arrowhead=none` → matches UML’s “annotates” relationship from a comment to a model element. --- If you want, I can insert UML comment nodes into your **O1/O2 Graphviz diagram** to annotate entire clusters or specific tasks, so they look like sticky notes connected to the relevant part. Would you like me to do that?$

5

Reposted by Diomidis Spinellis

Peter Hilton @hilton.org.uk · Aug 9

This kind of thing looks like an opportunity for a new generation of developers to make their mark in open source

Diomidis Spinellis @coolsweng.bsky.social · Aug 8

What are the challenges and gains when porting legacy systems software written in C into #Rust? Here's my own experience porting the BSD Unix / macOS stream editor sed into Rust as part of #uutils.
IEEE Software article: doi.org/10.1109/MS.2...
Source code: github.com/uutils/sed/

GitHub - uutils/sed: Rewrite of sed in Rust

Rewrite of sed in Rust. Contribute to uutils/sed development by creating an account on GitHub.

github.com

1 5

Diomidis Spinellis @coolsweng.bsky.social · Aug 8

What are the challenges and gains when porting legacy systems software written in C into #Rust? Here's my own experience porting the BSD Unix / macOS stream editor sed into Rust as part of #uutils.
IEEE Software article: doi.org/10.1109/MS.2...
Source code: github.com/uutils/sed/

GitHub - uutils/sed: Rewrite of sed in Rust

Rewrite of sed in Rust. Contribute to uutils/sed development by creating an account on GitHub.

github.com

6 17

Diomidis Spinellis @coolsweng.bsky.social · Aug 7

I'm sure psychologists have a diagnostic name for this disorder.

1

Diomidis Spinellis @coolsweng.bsky.social · Jul 9

Ever wondered why car ergonomics have gone down the drain as touch screens replaced buttons to drive down costs? This reader comment on an @financialtimes.com article on the topic says it all.

Please use the sharing tools found via the share button at the top or side of articles. Copying articles to share with others is a breach of FT.com T&Cs and Copyright Policy. Email licensing@ft.com to buy additional rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found at https://www.ft.com/tour.
https://www.ft.com/content/b46311ba-a5a1-4754-8d40-eaae475f01eb

I’m an industrial designer and I used to work in the design department at a major car manufacturer. I’m a petrol head, and have been my whole life. By now, I’ve probably driven over a hundred different vehicles across various generations of interface technology.

Here’s the thing: the user experience designers in my team — the ones tasked with designing the digital interfaces now frequently deployed and rightly deplored in this article — had little-to-no interest in cars. In many cases, they were only working for a car maker because they couldn’t get the jobs they really wanted in tech companies. But the kicker was that many of them didn’t have drivers’ licenses, let alone any interest in how cars really work. All they saw was a screen, abstracted from the broader reality of operating a 2-tonne projectile in a world of sometimes moving, sometimes fleshy, and sometimes immovable targets.

2 1 11

Diomidis Spinellis @coolsweng.bsky.social · Jun 27

My LLM coding interactions taught me that their benchmark results vastly overstate their capabilities. The paper by M. Mancoridis et al. “Potemkin Understanding in LLMs” explains and formalizes my feeling. Benchmarks derived from human tests are unsuitable for LLM assessment. doi.org/10.48550/arX...

2 12

Diomidis Spinellis @coolsweng.bsky.social · Jun 26

This week two US courts ruled in favor of AI's use of copyrighted works to train LLMs. How is such a use legal? Should it be legal? Two questions with very different answers. More (and an explanation of the audio cassette's relevance) at www.spinellis.gr/blog/20250626/

A Sony compact cassette. https://en.wikipedia.org/wiki/Cassette_(format)#/media/File:Sony_C60_Low_Noise_Compact_Cassettte.jpg by Mikus

2 5

Reposted by Diomidis Spinellis

Dorothy Bishop @deevybee.bsky.social · Jun 19

Predatory journals should not mess with Diomidis Spinellis (@coolsweng.bsky.social)
doi.org/10.1186/s410...

N.B.this journal, full of AI-generated crap has DOIs and is listed in Crossref, but not DOAJ.

False authorship: an explorative case study around an AI-generated article published under my name - Research Integrity and Peer Review

Background The proliferation of generative artificial intelligence (AI) has facilitated the creation and publication of fraudulent scientific articles, often in predatory journals. This study investig...

doi.org

8 15

Diomidis Spinellis @coolsweng.bsky.social · Jun 13

The fascinating story of how Microsoft Office developers moved from a proprietary version control system to Git. danielsada.tech/blog/carreer...

Part 7: Office Migration from Source Depot to Git, or how I learned to love DevEx. | Daniel Sada Caraveo | Developer Productivity & Culture

Part 7 of my software journey, getting to know developer experience.

danielsada.tech

1 1

Diomidis Spinellis @coolsweng.bsky.social · Jun 11

Here's how machine learning can be applied to detect risky software changes and thereby increase the reliability of a large bank's IT systems. medium.com/ing-blog/lev...

Leveraging AIOps for Enhanced Change Reliability at ING

From an Idea to Moving to Production

medium.com

1

Diomidis Spinellis @coolsweng.bsky.social · May 29

Wow! The Rust compiler correctly guessed the correct method I should be calling. Helpful and very impressive error handling!

1 9

Diomidis Spinellis @coolsweng.bsky.social · May 27

In 2023 a fake AI-generated scientific article was falsely attributed to me. I was not the scheme's only victim. My study, just published in the Research Integrity and Peer Review journal, analyzes what going on and who profited from the mass publication of AI-generated articles. rdcu.be/enQqG

Title and abstract page: False authorship: an explorative case study around an AI-generated article published under my name

5 11

Reposted by Diomidis Spinellis

Adolfo Neto @adolfont.github.io · May 23

Manifesto on Technical Debt published🚀.
It was the result of a Schloss Dagstuhl – Leibniz-Zentrum für Informatik (LZI) workshop.

We value:

Psychological safety and trust between technical and business stakeholders.
Simple, actionable, value-based communication of Technical Debt to all stakeholders over excessive, minute, overwhelming details.
Transparency, explainability, and replicability in the identification, measurement, and prioritization of Technical Debt.
Both objective and subjective data collection and research methods to identify, measure, and prioritize Technical Debt over single-method approaches.
Architecture understanding across the team.
We believe that:

Sustainable software delivery requires proactive and continuous Technical Debt Management.
The most effective Technical Debt Management is as automatic as possible and as manual as needed.
Technical Debt must be managed irrespective of how the software system is created, including generation of code and other artifacts by AI-assisted tools.
Items that developers observe as Technical Debt should be addressed, even when they are not supported by metrics.
Not all all issues identified by stakeholders or tools are Technical Debt.
And we adhere to the following principles:

Share Responsibility for Technical Debt Management.
Manage Technical Debt in Alignment with its Context.
Collect Comprehensive Data for Technical Debt Management.
Avoid a One-Size-Fits-All Technical Debt Metric.
Build Seamless and Integrated Technical Debt Management Toolchains with Human Oversight.
Make Technical Debt Visible.
Elevate the Role of Architecture in Technical Debt Management.
Develop Fit-For-Purpose Technical Debt Benchmarks.
Make Intentional Technical Debt Trade-off Decisions.

1 1 1