Diensttagebuch

Day 1451 (21 Dec 2022)

pandoc standalone option
- zc
- zc/it
- latex
- pandoc
- cli
When using pandoc to convert (in my case) markdown to latex, it generates only the “text” but without the header, \usepackages etc., and it fails when using pdflatex.

To generate a standalone file there’s the -s/--standalone option:
```
pandoc -f markdown -t latex -o bench.tex -s 2022-11-20-221120-1419-benchmark-tasks-for-evaluation-of-language-models.md 
```
Day 1440 (10 Dec 2022)

Knuth et al. and others on writing style in mathematics
- zc
- zc/it
- paper
- writing
- Knuth, Larrabee, and Robert’s notes on writing style in Mathematics, originally linked in SO: https://cstheory.stackexchange.com/a/37699
  - It’s quite long and comprehensive.
  - Starts with a list of rules, not all math-related, that are nice. Then contains excerpts of classes, letter etc., all on the same topic.
  - Generally focuses a lot of comprehension and keeping the reader in mind through it all.
- Three Sins of Authors in Computer Science and Math
  - Main ideas are:
    
    get to the point, especially in introductions
    
    don’t to table of contents in paragraphs
    
    conclusions shouldn’t be an introduction, and
    
    Conclusions should synthesize the results of your paper and separate what is significant from what is not.
Day 1439 (09 Dec 2022)

Philosophy of your should be able to fix it yourself
- zc
- zc/it
- zc/rl
Had a discussion with a friend about this, me not wanting to set up a more complex solution once because I didn’t feel like learning it but wanted to understand what I’m running - especially what I consider my core infrastructure.

So I ended up using a sub-optimal solution that I understand

Stumbled upon this bit that phrases the concept in a better way:

I would recommend gitea to anyone looking at gitlab and vice versa. These two are very similar. I think that blindly running either of them in a container just because you can is asking for trouble though. Go through the manual instillation and know how to set things up from scratch. If you can’t do that, you shouldn’t run it, because you won’t be able to fix it when things go wrong. You want a sysadmin that knows how to set these up and how to manage them, back them up, and fix problems along the way.¹
1. Git Self Hosted: GitLab vs Gitea vs Gogs in 2022 – erwin.co ↩︎
Day 1435 (05 Dec 2022)

Untitled
- zc
- zc/it
- Gp
Previously: 221119-2306 LM paper garden has more context about such metrics, 221204-2349 Interesting block with explanations of ML stuff has the compression angle for it.

Dumping these here for now.

The GPT2¹ paper puts it like this:

“Results on language modeling datasets are commonly reported in a quantity which is a scaled or ex- ponentiated version of the average negative log probability per canonical prediction unit - usually a character, a byte, or a word.”

GPT-2 (Metrics : PPL, BPB, BPC) led me to:
- python - How to calculate bits per character of a string? (bpc) - Stack Overflow
- python - How to calculate bits per character of a string? (bpc) - Stack Overflow
Evaluation Metrics for Language Modeling is really detailed.
1. Radford: Language models are unsupervised multitask learners - Google Scholar / https://qdata.github.io/deep2Read/talks2019/19sCourse/20190417-Faizan-OpenAIGPT2.pdf ↩︎
Day 1434 (04 Dec 2022)

Interesting blog with explanations of ML stuff
- zc
- zc/it
- link
- ml
- nlp
- blog
Vaclav Kosar’s Software & Machine Learning Blog, sample: OpenAI’s DALL-E 2 and DALL-E 1 Explained. Found it originally through Bits-Per-Byte and Bits-Per-Character.

Software engineering, ML, Thinkpad P52 Disassembly - Categories. Often with nice graphics.

Close in spirit, randomness and citing-your-sources to this/my DTB but way more in depth. But the most brilliant part is the big “Ask or report a mistake” button.

I should do in-depth stuff more often.

…And resurrect my link wiki, and go back to the pre-war tradition of reading RSS feeds :(

Sparse language models are a thing
- zc
- zc/it
- ml
- nlp
- nlp/lm
The GPT3¹ paper mentioned that it’s 10x bigger than any previous non-sparse LM.

So - sparse LMs () are LMs with A LOT of params where only a subset is used for each incoming example.²
1. [2005.14165] Language Models are Few-Shot Learners ↩︎
2. Two minutes NLP — Switch Transformers and huge sparse language models | by Fabio Chiusano | NLPlanet | Medium ↩︎
Docker using custom Dockerfile name
- zc
- zc/it
- cli
- docker
To pass a custom dockerfile, add -f custom_filename:
```
docker build . -f custom.Dockerfile -t tag:latest ....
```
Dockerfile naming conventions exist: Dockerfile Naming Convention and Organization – mohitgoyal.co, quoting options from there:
```
myapp.azure.dev.Dockerfile
myapp.gcp.dev.Dockerfile
myapp.aws.dev.Dockerfile
- 
Dockerfile.myapp.azure.dev
Dockerfile.myapp.i386.azure.dev
Dockerfile.myapp.amd.azure.Dev
```
From that article I learned that Dockerfiles don’t have to be inside build context anymore! Link: Allow Dockerfile from outside build-context by thaJeztah · Pull Request #886 · docker/cli · GitHub

TL;DR from there
```
$ docker build --no-cache -f $PWD/dockerfiles/Dockerfile $PWD/context
```

Day 1429 (29 Nov 2022)

Redis basics

Links:
- Official docu: Redis data types | Redis
Basics:
- Installed through apt-get
- after that, redis-cli set test 1 etc. immediately work - did it start a server in the background?
  - Ah, it then becomes a service that I can systemctl disable redis-cli etc!
- Without arguments, redis-cli starts in interactive mode!
  - That has nice hints a la fish shell!

Transactios:

> r
127.0.0.1:6379> multi
OK
127.0.0.1:6379> get google
QUEUED
127.0.0.1:6379> incr google_accesses
QUEUED
127.0.0.1:6379> exec
1) "http://google.com"
2) (integer) 1
127.0.0.1:6379>

Help:
- help <Tab> autocompletes
- help @hash

Data structures

Hashes:

# Create a hashset that has field f1 w/ value v1 etc.:
127.0.0.1:6379> hmset myhash f1 v1 f2 v2
OK
127.0.0.1:6379> hgetall myhash
1) "f1"
2) "v1"
3) "f2"
4) "v2"
127.0.0.1:6379> hget myhash f1
"v1"

Operations on hashes:

# We create a hset s_google that has an url and accesses counter
127.0.0.1:6379> hset s_google url url_google accesses 0
(integer) 2
127.0.0.1:6379> hmget s_google url accesses
1) "url_google"
2) "0"
# Increase accesses by 1
127.0.0.1:6379> HINCRBY s_google accesses 1
(integer) 1
127.0.0.1:6379> hmget s_google url accesses
1) "url_google"
2) "1"

Deleting stuff

DEL key
FLUSHALL to delete everything

Using files

cat file.txt | redis-cli --pipe

Sorted sets

127.0.0.1:6379> zadd myss 1 'one' 2 'two'
(integer) 2
127.0.0.1:6379> ZSCORE myss 'one'
"1"
127.0.0.1:6379> ZSCORE myss 'one'

127.0.0.1:6379> get B
"https://www.wikipedia.org"
127.0.0.1:6379> get A
"http://www.openstreetmap.org"
127.0.0.1:6379> ZCARD accesses
(integer) 2
127.0.0.1:6379> ZCARD accesses
(integer) 2
127.0.0.1:6379> ZRANGE accesses 0 40
1) "A"
2) "B"
127.0.0.1:6379> ZRANGE accesses 0 40 withscores
1) "A"
2) "1"
3) "B"
4) "1"
127.0.0.1:6379>

Day 1428 (28 Nov 2022)

Gitlab code review works better inside merge requests, not commits
- zc
- zc/it
- gitlab
You can comment on commits but they’re limited, comments on a merge requests give much more functionality incl. closing threads etc.!

Google scholar automatically shows new papers
- zc
- zc/it
- research
Google scholar, in the default search interface, showed only papers written after 2016 - can’t reproduce anymore, but important to keep in mind when looking for 2011 papers.

Day 1451 (21 Dec 2022)

Day 1440 (10 Dec 2022)

Day 1439 (09 Dec 2022)

Day 1435 (05 Dec 2022)

Day 1434 (04 Dec 2022)

Day 1429 (29 Nov 2022)

Data structures

Hashes:

Deleting stuff

Using files

Sorted sets

Day 1428 (28 Nov 2022)