Diensttagebuch

Day 2446 (12 Sep 2025)

Downloading stuff from HF hub through huggingface cli

 pip install -U "huggingface_hub[cli]"
 
 #either of
 hf auth login 
 hf auth login --token $HF_TOKEN
 
 # models
 hf download adept/fuyu-8b --cache-dir ./path/to/cache

// TODO — vllm — will it be VLLM_CACHE_ROOT or HF_HOME?

Also: Troubleshooting - vLLM — they literally recommend getting it first via hf cli and passing the full path

Day 2445 (11 Sep 2025)

pydantic validation and fields and assignments

zc
zc/it

Lost cumulatively hours on these things this months.

MODEL_CONFIG = ConfigDict(
    serialize_by_alias=True,  # why doesn't this, alone, work?
)

Guess why? Because I have pydantic 2.10, the config above was introduced in 2.11, and it just quietly allows me to set this config value. (Configuration - Pydantic) (Ty ty for picking up on this)

Next. Configuration - Pydantic

ConfigDict(
    arbitrary_types_allowed=False,  # disallow obj.invalid_field = "whatever"
)

For my own models as well. Setting obj.name='test' when you want obj.step_name is almost never a good idea.

And again about serialize_by_alias: will be default in pydantic v3, which I welcome, because if you forget to model_dump(by_alias=True) then the model will be dumped with unexpected names, which will then be quietly deleted when you try to initialize a new model from that dict through e.g. NewModel(**old_model.model_dump()). (Should’ve validated anyway, but…)

Day 2444 (10 Sep 2025)

pdb and pdbpp aliases and configs

~/.pdbrc gets read by both of them, and can import stuff and use aliases!

# ! makes it python code to be executed
!import rich


# # alternative if not !importing it above in one comman:d
# alias P !import rich; rpprint(%1)
alias I rich.inspect(%1)
alias ppp rpprint(%1)

print("Custom commands:")
print("\t i $thing — rich inspect $thing")
print("\t ppp $thing — rich pretty print $thing")

EDIT: the above works only if rich is already imported.

Day 2438 (04 Sep 2025)

More uv things

I found out about uv self upgrade which took me from 0.5 to 0.8, read the CHANGELOGs etc. and many neat things exist.

Upgrading packages:
- .. is confusing: Add option to upgrade all packages in the environment, e.g., upgrade --all · Issue #1419 · astral-sh/uv
- change uv.lock only, not pyproject.toml
  - uv add -U pydantic one package
  - uv lock --upgrade all of them?
Testing packages not yet on pypi/…:
- uv build package you want to use
- in the other project: uv add ../other/dist/package-2.4.0-py3-none-any.whl
- when doen, just delete from pyproject lines like:
```
[tool.uv.sources]
coral-ucf = { path = "../other/dist/package-2.4.0-py3-none-any.whl" }
```
- (see also: 250804-1444 Using a private gitlab repository with uv)

Day 2437 (03 Sep 2025)

Effective presentations

Copied from the old Semantic Wiki. Half of the old links are gone, yet another reminder of bit rot and memento mori.

Day 2436 (02 Sep 2025)

Jless as less for jq json data

Since forever I have some variations of this:

function lq
    command jq . "$argv" -C | less -r
end

jless - A Command-Line JSON Viewer is this, but much better. Intuitive CLI viewer for json data, works seamlessly for large JSONs (which sometimes breaks my lq command, especially for large lines)

Now:

function lq
	echo "not again, Serhii!!! Use jless/jl"
    command jless -r -m line "$argv"
    # command jq . "$argv" -C | less -r
end

Jless

m toggles the mode between line-mode (jq-like) and the (default) data mode
-r does relative numbers! -N disables any line numbers
h opens the help, the usual vim things do what I expect
- H focuses the parent
Yank/copy (all below can be pX to just print the value)
- yy !!! copy the value, including "
- yk copy the KEY under cursor
- yp copy the path from root to current object

It does yaml as well!

GPU memory requirements rules of thumb

zc
zc/it

Long overdue, will update this page as I find better options.

Tutorials

Transformer Math 101 | EleutherAI Blog AWESOME — TODO

Links

LLaMA 7B GPU Memory Requirement - 🤗Transformers - Hugging Face Forums
- 7B full precision => $7*4=28gb$ of GPU RAM
  - quantization: torch_dtype=torch.float16 etc. to use half the memory
  - this is for inference, training requires a bit more.
- Why *4? storing weights+gradient, better explanation at that link.
- based on optimizer, might be *8 etc.
HF: GPU, more about training but less about mental models

Calculators

Model Memory Utility - a Hugging Face Space by hf-accelerate
- manually select ’transformers'
- results for llama-2.3-1B are 4.6GB total size, 18gb for training
  - doesn’t match info above, but matches e.g. Hardware requirements for Llama 3.2 3B with full context 128k? : r/LocalLLaMA — LLM Model VRAM Calculator - a Hugging Face Space by NyxKrage

Day 2432 (29 Aug 2025)

Old code to generate Ukrainian eval task for feminitives

Old langchain code for generating pairs of questions about feminitive usage, didn’t use it in my thesis but don’t want to lose it

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.pydantic_v1 import BaseModel, Field, validator
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate
from langchain.prompts import ChatPromptTemplate
from langchain.schema import BaseOutputParser
from langchain.output_parsers.json import SimpleJsonOutputParser

from tqdm import tqdm

#  from json import loads
from typing import List

from rich import print, inspect

b = breakpoint

# https://openai.com/pricing?ref=ghostcms.tenten.co
MODEL = "text-ada-001"  # cheap, bad
MODEL = "text-davinci-003"  # acceptable
MODEL = "gpt-4"
MODEL = "gpt-3.5-turbo-1106"  # 'capable, cost-effective'

WOMEN_VARIANTS: str = "дівчина, моя сестра, моя жінка, колишня однокласниця, дочка маминої подруги, імена (Марія, Марія Петрівна, Кассандра, та ін.)"
COMPLETE_PROMPT: str = """Наведи будь-ласка {N_PROFS} однозначні короткі дефініції цій професії або слову, так, щоб по ним було однозначно очевидно про яку саме професію йде мова.

Зміни дефініції так, щоб вони стали фразами, де мова однозначно йде про жінку.  Придумай різні варіанти жінок, про яких йде мова, умовно: {WOMEN_VARIANTS}. Але придумай і свої різноманітніші приклади. 

Уникай використання самого слова чи поняття у визначеннях. Уникай слів 'фахівецька' чи 'спеціалістка'.

Наприклад:
Актор: "Моя жінка виконує ролі на сцені чи екрані" 
Акушерка: "Марія Петрівна допомагає при пологах"
Автор: "Я знаю дівчину, яка пише твори та книжки". 

Будь творчим. Але професія, про яку іде мова, має все рівно бути однозначно зрозумілою.
"""

FORMAT_INSTRUCTIONS = """
Формат виводу - JSON. Обʼєкт виглядати таким чином:
{
    "profession": "", 
    "description_f": ["", ..., ""] 
}

В полі description_f список всіх згенерованих дефініцій для цієї професії.

Виводь тільки код JSON, без ніяких додаткових даних до чи після.
"""

INSTRUCTIONS_GENDER_CHANGE = """Я писатиму речення про професію про жінку. Зміни 
речення так, щоб мова йшла про чоловіка, а не жінку, не міняючи сам опис професії.
Імʼя чи опис жінки можеш міняти як завгодно, головне щоб на виході було речення
про чоловіка. """

def get_model(model_name = None):
    model = OpenAI(model_name=model_name, temperature=0.0)
    return model

def run_and_parse(model, profession: str, n_profs: int | str = 3, women: str = WOMEN_VARIANTS):
    prompt = PromptTemplate(
        template="{complete_prompt}\n{format_instructions}\n Професія, яку потрібно описати: {query}\n",
        input_variables=["query"],
        partial_variables={
            "format_instructions": FORMAT_INSTRUCTIONS,
            "complete_prompt": COMPLETE_PROMPT.format(
                N_PROFS=n_profs, WOMEN_VARIANTS=women
            ),
        },
    )

    json_parser = SimpleJsonOutputParser()
    #  prompt_and_model = prompt | model | json_parser
    prompt_and_model = prompt | model
    model_output = prompt_and_model.invoke({"query": profession})
    output = json_parser.parse(model_output)
    return output

def run_and_parse_gender_change(model, profession_description: str):
    prompt = PromptTemplate(
        template="{complete_prompt}\n Речення наступне: {query}\n",
        input_variables=["query"],
        partial_variables={
            #  "format_instructions": FORMAT_INSTRUCTIONS,
            "complete_prompt": INSTRUCTIONS_GENDER_CHANGE
            },
    )

    #  json_parser = SimpleJsonOutputParser()
    #  prompt_and_model = prompt | model | json_parser
    prompt_and_model = prompt | model
    model_output = prompt_and_model.invoke({"query": profession_description})
    output =model_output
    #  b()
    #  output = json_parser.parse(model_output)
    return output

def generate_descriptions(model, profession: str, n_profs: int | str = 3, women: str = WOMEN_VARIANTS, do_male_version:bool = False):
    desc = run_and_parse(model=model, profession=profession, n_profs =n_profs, women=women)
    if do_male_version:
        description_male = list()
        for d in desc['description_f']:
            changed = run_and_parse_gender_change(model=model, profession_description=d)
            description_male.append(changed)
        desc['description_m'] = description_male
    return desc


def run():
    professions_raw = """
    абстракціоністка
    автомобілістка
    авторка
    """
    """
    агрономка
    адвокатка
    анархіст
    англієць
    антрополог
    асистентка
    астронавт
    аптекар
    """
    profs = [x.strip() for x in professions_raw.splitlines()]

    model=get_model(MODEL)

    results = list()
    for p in tqdm(profs):
        r = generate_descriptions(model=model, profession=p, n_profs=2)
        print(r)
        results.append(r)
    print(results)


if __name__ == "__main__":
    run()

    {
        'profession': 'лікар',
        'description_f': [
            'Моя сестра працює в лікарні та лікує хворих',
            'Дочка маминої подруги є лікарем та допомагає людям'
        ]
    },
    {
        'profession': 'абстракціоністка',
        'description_f': ['Моя сестра створює картини, які відображають абстрактні ідеї та почуття',
        'Дівчина, яку я знаю, малює абстракціоністські полотна'
        ]
    },    {
        'profession': 'автомобілістка',
        'description_f': [
            'Моя сестра вміє водити автомобіль',
            'Дочка маминої подруги працює водієм'
        ]
    },
    {
        'profession': 'авторка',
        'description_f': [
            'Моя сестра пише книги та статті',
            'Дочка маминої подруги є відомою письменницею'
        ]
    },
    {
        'profession': 'Вчитель',
        'description_f': [
            'Моя сестра працює в школі та навчає дітей',
            'Дочка маминої подруги викладає університетські предмети'
        ]
    }
]

Day 2421 (18 Aug 2025)

Active training and active testing in ML

[2103.05331] Active Testing: Sample-Efficient Model Evaluation
- <_(@kossenactivetesting2021) “Active Testing: Sample-Efficient Model Evaluation” (2021) / Jannik Kossen, Sebastian Farquhar, Yarin Gal, Tom Rainforth: z / http://arxiv.org/abs/2103.05331 / 10.48550/arXiv.2103.05331 _>¹
- Introduces active testing
[2101.11665] On Statistical Bias In Active Learning: How and When To Fix It the paper describing the estimator for AL in detail, the one that removes bias when sampling
[2508.09093] Scaling Up Active Testing to Large Language Models
- <_(@berradascalingactive2025) “Scaling Up Active Testing to Large Language Models” (2025) / Gabrielle Berrada, Jannik Kossen, Muhammed Razzak, Freddie Bickford Smith, Yarin Gal, Tom Rainforth: z / http://arxiv.org/abs/2508.09093 / 10.48550/arXiv.2508.09093 _> ²
- adapts it to LLMs to LLMs

OK, so.

Active testing¹

Active learning: picking the most useful training instances to the model (e.g. because annotation is expensive, and we pick what to annotate)
Active testing: pick the most useful subset of testing instances that approximates the score of the model on the full testing set.
We can test(=I use label below) only a subset of the full test set, $D_{test}^{observed}$ from $D_{test}$
We decide to label only a subset $N>M$ of the test instances, and not all together, but one at a time, because then we can use the information of the already labeled test instances to pick which next one to label
It calls these (estimated or not) test scores test risk, $R$ .

Strategies

Naive: we uniformly sample the full test dataset
- BUT if $M«N$, its variance will be large, so though it’s uniformly/unbiasedly sampled, it won’t necessarily approximate the real score well
Actively sampling, to reduce the variance of the estimator.
- Actively selecting introduces biases as well if not done right
  - e.g. active learning you label hard points to make it informative, but in active testing testing on hard points will inflate the risk

Active sampling

sth monte carlo etc. etc.
TODO grok the math and stats behind all of this.
- follows my understanding that might be wrong
- The [2101.11665] On Statistical Bias In Active Learning: How and When To Fix It paper that introduces $R_{lure}$ is might be a starting point but at first sight it’s math-heavy
TL;DR pick the most useful points, and get rid of the biases this introduces (e.g. what if they are all complex?) by using smart auto-correcting mechanisms (=‘unbiased estimator’)
You have some $q(i_m)$ strategy that picks interesting samples, though they might bias the result.
- $q(i_m)$ is actually shorthand for $q(i_m, i_{1:m-1}, D_{test}, D_{train})$ — so we have access to the the datasets and the already aquired test data.
$R_{LURE}$: An unbiased estimator that auto-corrects the discribution to be closer to the one expected if you iid-uniformly sample the test dataset.
ChatGPT on the intuition behind this:
- also:
This is the classic promise of importance sampling: move probability mass toward where the integrand is large/volatile, then down-weight appropriately.
- also:
Bottom line: active testing = (i) pick informative test points stochastically with $q$; (ii) compute a weighted Monte Carlo estimate with $v_m$ to remove selection bias; (iii) enjoy lower variance if $q$ is well-chosen.

<_(@kossenactivetesting2021) “Active Testing: Sample-Efficient Model Evaluation” (2021) / Jannik Kossen, Sebastian Farquhar, Yarin Gal, Tom Rainforth: z / http://arxiv.org/abs/2103.05331 / 10.48550/arXiv.2103.05331 _> ↩︎ ↩︎
<_(@berradascalingactive2025) “Scaling Up Active Testing to Large Language Models” (2025) / Gabrielle Berrada, Jannik Kossen, Muhammed Razzak, Freddie Bickford Smith, Yarin Gal, Tom Rainforth: z / http://arxiv.org/abs/2508.09093 / 10.48550/arXiv.2508.09093 _> ↩︎

Day 2410 (07 Aug 2025)

fish arguments and functions

Below:

h/help is a flag
o/output-dir gets+requires a value
or return fails on unknown -a/--args
If found, the value is removed from $argv and put into _flag_argname
set -query (tests if var is set) -local

argparse h/help o/output-dir= -- $argv 
or return

if set -ql _flag_help
	echo "Usage: script.fish [-o/--output-dir <dir>] <input_files_or_directory>"
	echo "Example: script.fish --output-dir /path/to/output file1.png directory_with_pngs/ dir/*.png"
	exit 0
end

if set -ql _flag_output_dir
	set output_dir $_flag_output_dir
end


for arg in $argv
	# do something with file $arg
end