Sean Pedrick-Case
banner
seanpedrickcase.bsky.social
Sean Pedrick-Case
@seanpedrickcase.bsky.social
Data science and AI to benefit people and society. Data scientist in local government in the UK.
Works with PDF, images, and tabular data as CSV/XLSX files. You can also chat over the documentation with a free chatbot here @hf.co: huggingface.co/spaces/seanp...
Light PDF web QA chatbot - a Hugging Face Space by seanpedrickcase
Ask questions based on content from uploaded PDFs, web pages, or data files (.csv / .xlsx). The app retrieves and processes relevant text to provide answers, allowing you to switch between differen...
huggingface.co
May 1, 2025 at 9:19 AM
The app has comprehensive redaction review features to review and modify redactions. Also, fuzzy search through extracted text, identify duplicate pages in your documents or export to Adobe Acrobat to modify suggested redactions there. Github repo and user guide: github.com/seanpedrick-...
May 1, 2025 at 9:19 AM
This augmented topics table is then presented to the LLM for the next batch, which grows iteratively as it progresses through the dataset. You can see the prompts used, and other settings on the LLM Settings tab. 7/7
December 12, 2024 at 2:43 PM
How does it work? The LLM is called iteratively on a 'batch' of open text responses from the data. The model assigns each response to existing or new topics (if none are relevant). The LLM returns a markdown table the topics alongside relevant response rows. 6/7
December 12, 2024 at 2:43 PM
This app is best for small(ish) open text datasets with hundreds to a few thousand rows - where 'traditional' topic modelling struggle to find enough data to create useful topics. You can compare with my BERTopic-based topic modelling app here: huggingface.co/spaces/seanp... 5/7
Topic modelling - a Hugging Face Space by seanpedrickcase
Discover amazing ML apps made by the community
huggingface.co
December 12, 2024 at 2:43 PM
Here are a couple of suggested datasets to test it with: a dummy consultation for flats to be built on Main Street, along with an example 'zero shot' topics file to test: huggingface.co/datasets/sea.... And some dummy social care case notes: huggingface.co/datasets/sea... 4/7
seanpedrickcase/dummy_development_consultation at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
December 12, 2024 at 2:43 PM
You can also provide your own list of topics to the app that the LLM will assign to by default unless it finds novel topics (zero shot). The app uses Gemma 2B Instruct locally, or Google Gemini models / Claude models served on AWS via API. 3/7
December 12, 2024 at 2:43 PM
Through judging sentiment and producing summaries for each topic, the app can pick up on more nuanced aspects of topics than 'traditional' topic modelling approaches based on clustering (such as the excellent BERTopic by @maartengr.bsky.social ). 2/7
December 12, 2024 at 2:43 PM