App-Bin4TSV-8
view release on metacpan or search on metacpan
# README.md
- ããã«ã¯æ¸ãã¦ãªããã¨ã«ã¤ãã¦
- æåã³ã¼ããæ¹è¡ã³ã¼ãã®èª¬æããã¾ããã
- `cpanm`ã®è§£èª¬ã¯ãã¾ããã¾ãã(`cpanm`ã¯CPANã¢ã¸ã¥ã¼ã«ã®ã¤ã³ã¹ãã¼ã©ã¼ã§ã)ã
- ä¸è¬çãªUNIXç³»ã³ãã³ããããã®ã¤ã³ã¹ãã¼ã«ã®ä»æ¹ã®èª¬æããã¾ãã(nkfã¨ãshufã¨ã)ã
- ã·ã§ã«(bashã¨ãzshã¨ã)ã¯ä½ãã説æãã¾ããããã¤ãããªãã¤ã¬ã¯ã·ã§ã³ãããã»ã¹ç½®æãåæ§ã§ãã
- `man hier`ãè¦ãã¨ã`/var/passwd`ã®ãã¨ãªã©åãã£ã¦è¯ãããç¥ãã¾ããã
- ãããããUNIXç³»ã®ã³ãã³ãã使ãã¤ã¤ãã¼ã¿åæããã¦ãã人ã«ã¯ãèªã¿ãããã¨ã¯æãã¾ãããããã§ãªãå ´åã¯ãã®éãã§ã¯ãªãããããã¾ãããããã«æ¸ããããã°ã©ã ã鿬¡å...
# è¨è¼ããã³ãã³ãã©ã¤ã³ã«ã¤ãã¦ã®æ³¨æäºé
- bash 㨠zsh ã®ã©ã¡ãã®ã·ã§ã«ã§ãåãã¾ãã
- ä¸è¨ã¯1è¡ãã¤å®è¡ãããã¨ãæ³å®ãã¦ãã¾ãã**è¤æ°è¡ãã³ãããã¦å®è¡ããã¨ä¸å
·åãèµ·ããããããã¾ããã**
- å¥ã«æåããé ã«å®è¡ããªãã¦ãè¯ãã§ããéä¸ã®ãé©å½ãªåºåãããå§ãã¦ã大ä½ã®å ´åã¯åä½ãã¾ãã
- ã³ãã³ãã¯è©³ãã解説ãã¾ãããå®éã«å®è¡ãã¦ã¿ã¦ãçè§£ã«åªããã¨è¯ãã¨æãã¾ãã
- ã·ã§ã«ããã°ã©ãã³ã°ã§ã**ãã¤ã**("|"ã®1æå)ã§ã³ãã³ããã¤ãªããã¨ããä¸è¨ã§ãã使ãã¾ãã
- ããã»ã¹ç½®æãä¸è¨ã§ãã使ãã¾ãã
- **ããã»ã¹ç½®æã®ä¾** : `tail <(seq 20)` ããã¯`seq 20`ã®çµæãã¾ãã§ãã¡ã¤ã«ã¨ãã¦`tail`ã«æ¸¡ããã¾ãã
- ã·ã§ã«ã¹ã¯ãªããã®é¢æ°ã使ãã¾ãã`newfunc (){ .... }` ã®ããã«ãã¦å®ç¾©ãã¾ãã
- zsh ã§`#`以éãã³ã¡ã³ãæ±ãã«ãªããªãå ´åã¯ã次ã®ã³ãã³ãã®å®è¡ããé¡ããã¾ãã
```
setopt interactivecomments
# â ãã®ä¸è¨ã®ã³ãã³ã1è¡ãå®è¡ãããã¨ã§ãzshã«ãã㦠# ããå¾ãã bash ã®ããã«ã³ã¡ã³ãã¨è¦ãªãããã
# â ãããä¸è¨ãè§£é¤ããããªã£ãå ´å㯠unsetopt interactivecomments
```
### æ¬ã¬ãã¸ããªã³ãã³ããå¥åã«ã¤ã³ã¹ãã¼ã«ããæ¹æ³
```
cpanm App::csv2tsv #âText::CSVã«ä¾å. ç´20ç§
cpanm App::expandtab #â Text::VisualWidthã«ä¾å
cpanm App::colsummary # æ®ãã¯ãããããç´2ç§ã§å®äº
cpanm App::venn # ã¢ã¸ã¥ã¼ã«åã¯App::ã³ãã³ãå
cpanm App::csel # ã¢ã³ã¤ã³ã¹ãã¼ã«ã®æã¯ -U ã使ãâ
cpanm App::crosstable # ä¾. cpanm -U App::csel
cpanm App::freq # cpanm -vã§ã¤ã³ã¹ãã¼ã«è©³ç´°ã表示
cpanm App::digitdemog
```
### CSVå½¢å¼ããTSVå½¢å¼ã«å¤æãã(csv2tsv)
ããã§ã¯CSVå½¢å¼ã®ãã¼ã¿ã®ä¾ã¨ãã¦ãåçå´åçã®ãµã¤ãã®ããªã¼ãã³ãã¼ã¿ãã®ã**PCRæ¤æ»å®æ½äººæ°**ãã®ãã¼ã¿ã¨ã
å
é£åºãã¼ã ãã¼ã¸ã®ã**彿°ã®ç¥æ¥ã«ã¤ãã¦**ã(**æå30å¹´ï¼1955å¹´ï¼ãã令å4å¹´ï¼2022å¹´ï¼å½æ°ã®ç¥æ¥**)ãåå¾ãã
ãããTSVå½¢å¼ã«å¤æãã¾ãã
1. ããã§ç¨ãããã¼ã¿ã«é¢ãã¦:
- åºå
¸ï¼åçå´åçãã¼ã ãã¼ã¸ https://www.mhlw.go.jp/stf/covid-19/open-data.html
- ãã®åºå
¸ã®è¨è¼ã®æ¸ãæ¹ã¯å³ã«å¾ã£ããhttps://www.mhlw.go.jp/chosakuken/index.html
- 2021å¹´6æ20æ¥(æ¥)ç¾å¨ã®æ
å ±ã§ããã
```
cpanm App::csv2tsv
wget https://www.mhlw.go.jp/content/pcr_case_daily.csv
less pcr_case_daily.csv
csv2tsv < pcr_case_daily.csv > pcr_case_daily.tsv
# (ç¾ç¶ ãªãã¤ã¬ã¯ã·ã§ã³ã®"<" ã¯çç¥ããã¨åä½ãçµäºããªãä¸å
·åãçºçããå ´åãããã¾ãã)
less -x25 pcr_case_daily.tsv
# less ã¯ãªãã·ã§ã³ -x ã§ã¿ãåºåãã®æ¡ã®ééã調æ´ã§ãã¾ãã
# less ãå®è¡ä¸ã®æã§ãã"-x"ã®2æåãæéµãã¦ãããã«æ°å¤ãå
¥åããã¨ã調æ´ãã§ãã¾ãã
```
2. ããã§ç¨ãããã¼ã¿ã«é¢ãã¦:
- åºå
¸ï¼å
é£åºãã¼ã ãã¼ã¸ https://www8.cao.go.jp/chosei/shukujitsu/gaiyou.html
- ãã®åºå
¸ã®è¨è¼ã®æ¸ãæ¹ã¯å³ã«å¾ã£ããhttps://www.cao.go.jp/notice.html
- 2021å¹´6æ20æ¥(æ¥)ç¾å¨ã®æ
å ±ã§ããã
```
cpanm App::csv2tsv
wget https://www8.cao.go.jp/chosei/shukujitsu/syukujitsu.csv
file syukujitsu.csv # â Non-ISO extended-ASCII text, with CRLF line terminators
nkf syukujitsu.csv | less
nkf syukujitsu.csv | csv2tsv > syukujitsu.tsv
less -x25 syukujitsu.tsv
```
### åãç¸¦ã«æãã¦è¡¨ç¤ºããæ©è½(expandtab)
```
cpanm App::expandtab # ç´8ç§. Text::VisualWidthã«ä¾å
tr ":" "\t" < /etc/passwd | expandtab | less -NS
sed 1,10d /etc/passwd | expandtab -i: -b. | less -NS
```
### å
¨ååã®ç¹å¾´ãã³ã³ãã¯ãã«è¡¨ç¤º(colsummary)
1. /etc/passwdã使ãä¾
```
colsummary -i: <( grep -v -e '^#' /etc/passwd )
```
2. ã¤ã³ã¿ã¼ãããä¸ã®ãTRCæ°å峿¸ãªã¼ãã³ãã¼ã¿ã ããããã¼ã¿ããã¦ã³ãã¼ããã¦ãä¸è¨ã®æ§ãª
ã³ãã³ãã©ã¤ã³ãå®è¡ããããã¡ã¤ã«åã¯é©å®æ¥ä»ã®é¨åãªã©ã¯éã£ã¦ããã§ãããã
ãªãã®ãã®ãµã¤ãã«ã¯ãããã®æ¸èªæ
å ±ã¯ãå¶å©ã»éå¶å©ãåãããå©ç¨æç¶ããªãã§ãèªç±ã«ãå©ç¨ããã ãã¾ããã¨è¨è¼ããã£ã(2021å¹´6æ20æ¥(æ¥)ç¾å¨)ã
```
cpanm App::colsummary # ç´2ç§
unzip TRCOpenBibData_20210605.zip # zipãã¡ã¤ã«ãè§£å
ln -s TRCOpenBibData_20210605.txt 0605.txt
colsummary -v9 -g3 -m0 0605.txt | expandtab -s30
```
3. 彿°ã®ç¥æ¥ã®ãã¼ã¿ã使ã£ãä¾
```
nkf syukujitsu.csv | tr "/," "\t\t" | sed 1d | ~/bin4tsv/*/colsummary | expandtab
```
ä¸è¨âã®åºåã¯ãããªãâ
```
cpos diff ave. range frequent frequency~lower(multi) digits
1 68 1992.323 1955~2022 2019|2018|2001|2007|1996|1990 22|20|19(4)|18(3)|17(11)|16(8)~14(7)|13(6)|12(8)|11|10|9(10) 4
2 12 6.104 1~12 5|11|1|9|4|3 185|151|147|138|78|74|67|60|35|30|9|1 1~2
3 25 13.618 1~30 23|3|15|11|1|5 161|136|87|70|69|68(2)~10|9|8(3)|7|6(3)|5 1~2
4 23 0.000 ãã©ãã®æ¥~çµå©ã®å 伿¥|ãã©ãã®æ¥|å¤å´æè¬ã®æ¥|æåã®æ¥|æ²æ³è¨å¿µæ¥|å
æ¥ 104|68(8)|67|57|56|53~27|16|7|3|2(2)|1(3) 2~12
```
4. PCRæ¤æ»å®æ½äººæ°ã®ä¾
```
colsummary -= -i, pcr_case_daily.csv | expandtab
# -= ã§ 1è¡ç®ãååã®ä¸¦ã³ã¨è¦ãªãããã¼ã¿ã®å¤ã¨è¦ãªããªãã
# -i, ã«ããå
¥åã®åºåãæåã(ã¿ãæåã§ã¯ãªãã¦)ã³ã³ãæå(,)ã«å¤æ´ãã¦ããã
```
ä¸è¨âã®åºåã¯ãããªãâ
```
cpos diff ave. name range frequent frequency~lower(multi) digits
1 487 2020.347 æ¥ä» 2020/10/1~2021/6/9 2020/8/21|2020/7/19|2020/7/24|2020/5/10|2020/10/26|2020/11/11 1(487) 8~10
2 85 24.988 å½ç«ææçç ç©¶æ 0|1~517 0|20|8|3|18|36 354|6(2)|5|3(6)|2(23)|1(52) 1~3
3 178 222.912 æ¤ç«æ 0|1~1733 0|1|13|4|11|3 257|12|5(2)|4(2)|3(3)|2(22)|1(147) 1~4
4 474 3910.355 å°æ¹è¡çç ç©¶æã»ä¿å¥æ 398~11856 2425|867|4678|2502|6314|1326 2(13)|1(461) 3~5
5 471 20647.676 æ°éæ¤æ»ä¼ç¤¾ ~0|2~89005 0|37|28|3651|11|5 9|2(8)|1(462) 0|1~5
6 453 1974.018 大å¦ç ~0|4~6367 0|1012|4766|558|650|3924 14|2(21)|1(431) 0|1~4
7 455 7567.082 å»çæ©é¢ |9~35210 |430|543|1808|795|10649 18|2(15)|1(439) 0|1~5
```
ä¸è¨ã®`colsummary`ã®åºåã®ä¾ã¨ãã¦ç¾ããave.ã¯æ°å¤ã¨ãã¦ã®å¹³åã§ãã(æ°ã§ç¡ãå
¥åå¤ã¯0ã¨ãã¦è¨ç®ãã)ããã®åºåå㯠`-m 0`ã¨ãããªãã·ã§ã³ã§æå¶ã§ããã`expandtab`ã§åè§ç©ºç½æåã§...
### è¤æ°ãã¡ã¤ã«ã®è¡ãã¼ã¿ã®éãªããè¦ã(venn)
4åã®ããã»ã¹ç½®æã§ãã¢:
```
cpanm App::venn
function y(){ echo -n $* | perl -pe's/./$&\n/g' }
perldoc List::Util # qãã¼ã§çµäº. minstrã¨maxstrãåç
§
venn <(y ABCCCC) <(y AABDEF) <(y ABBDEF) <(y DEEEF)
```
ä¸è¨ã®åºåâ
```
cardi. file1 file2 file3 file4 minstr maxstr
2. 2 3 3 0 'A' 'B'
1. 4 0 0 0 'C'
3. 0 3 3 5 'D' 'F'
```
### åãããããæä½(csel)
```
cpanm App::csel
# â ã³ãã³ã csel ãã¤ã³ã¹ãã¼ã«ã
perl -E'say join"\t",1..12' | tee c12.tsv
# 1 2 3 4 5 6 7 8 9 10 11 12
awk '{print$3,$4,$5,$6,$7}' c12.tsv # AWKã®ä¾
# 3 4 5 6 7
csel -p 3..7 c12.tsv # cselã¯AWKããç°¡å
# 3 4 5 6 7
csel -d3..7,10,12 c12.tsv # -dã§æå¶(AWKã ã¨é£)
# 1 2 8 9 11
csel -h12..10 c12.tsv # -hã§å
é (å·¦)ã¸ç§»å
# 12 11 10 1 2 3 4 5 6 7 8 9
csel -t9,6 c12.tsv # -tã§æ«å°¾(å³)ã¸ç§»å
# 1 2 3 4 5 7 8 10 11 12 9 6
```
### ã¯ãã¹éè¨(crosstable)
```
cpanm App::crosstable
```
1. ç¥æ¥ã®ä¾
```
awk -F/ 'NR>1{print $2"\t"$1}' syukujitsu.csv | crosstable | csel -p1,-43..-1 | expandtab
```
ä¸è¨ã®åºåâ (6æã¯é叏伿¥ã¯ãªããã1993å¹´ã«ã¯ãã£ãã)
```
X1*X2 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
1 2 2 2 2 4 2 2 2 2 4 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 2 2
2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 3 2 2
3 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1
4 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1
5 2 3 2 2 2 3 2 3 3 3 3 4 3 3 3 3 4 2 3 3 3 3 4 2 3 3 3 3 4 4 3 3 3 4 4 4 3 3 3 6 4 3 3
6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 1 2 1
9 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 2 2 2 2 3 3 2 2 2 2 3 2 3 2 2 2 2 2 3 2 2 3 2 2 2 2
10 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0 0 1
11 3 2 2 2 2 3 3 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 2 2 2 2 3 3 2 2 2 2 3 2 2 2
12 0 0 0 0 0 0 0 0 0 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 0 0 0 0
```
### é »åº¦è¡¨ã®ä½æ(freq)
```
cpanm App::freq # ã¾ã ã¤ã³ã¹ãã¼ã«ãã¦ããªãå ´å
# cpanm -U App::freq ã§ã¢ã³ã¤ã³ã¹ãã¼ã«ãå¯è½ã
```
1. 伿¥ã®ãã¼ã¿ã使ã£ãä¾:
```
awk -F/ 'NR>1{print $1}' syukujitsu.csv | freq -~ -a | head
# ãã®freqã¯å¯¾è±¡ã¨ãªãå¤ã®è¾æ¸å¼é åºã§å
ã
åºåãããã
# -~ ã®æå®ã«ãããããéé ã«ãªã(2022å¹´ãã1955å¹´ã«åãã£ã¦è¡¨ç¤ºãã)ã
# -a ã§é »åº¦ã®ç´¯ç©åã表示ããã
# headã³ãã³ãã§å
é ã®10è¡ã®ã¿è¦ããã
```
ä¸è¨ã®åºåâ
```
Freq* AccSum LinStr
16 16 2022
17 33 2021
18 51 2020
22 73 2019
20 93 2018
17 110 2017
17 127 2016
17 144 2015
17 161 2014
```
2. 伿¥ã®ãã¼ã¿(伿¥ã®å¤ãã£ãå¹´ã®é ã«è¡¨ç¤º)
```
awk -F/ 'NR>1{print $1}' syukujitsu.csv | freq -f -~ -a | head -7
# ãã®freqã¯å¯¾è±¡ã¨ãªãå¤ã®è¾æ¸å¼é åºã§å
ã
åºåãããã
# -f ã®æå®ã追å ãããã¨ã§ãé »åº¦ã®å°ãªãé ã§ããã-~ ã§ãã®éé ã«ãªãã
# -a ã§é »åº¦ã®ç´¯ç©åã表示ããããªãããããã®ãªãã·ã§ã³ã®é åºã¯æ¼ç®ã«å½±é¿ããªãã
```
ä¸è¨ã®åºåâ
```
Freq* AccSum LinStr
16 16 2022
17 33 2021
18 51 2020
22 73 2019
20 93 2018
17 110 2017
```
### å
¨è¡ã§åæåãã©ã®æ¡ã«ä½å(digitdemog)
```
digitdemog <( csel -p11 TRCOpenBibData_20210605.txt ) | expandtab
# ä¸è¨ã®çµæã¯ä¸è¨ã®ããã«ãªãâ
1 2 3 4 5 6 7 8 9
'.' 0 0 0 0 3 0 0 0 0
'0' 0 137 0 0 3 0 0 0 0
'1' 385 313 0 4 0 0 0 0 0
'2' 666 59 0 10 4 0 0 0 0
'3' 118 16 0 4 1 0 0 0 0
'4' 3 6 0 0 0 0 0 0 0
'5' 0 90 0 0 2 3 0 0 0
'6' 0 208 0 0 4 0 0 0 0
'7' 0 21 0 0 1 0 0 0 0
'8' 0 69 0 3 0 0 0 0 0
'9' 0 253 0 0 3 0 0 0 0
'c' 0 0 1151 0 0 18 3 0 0
'm' 0 0 0 1151 0 0 18 3 0
'Ã' 0 0 21 0 0 0 0 0 0
end 80 0 0 0 1151 0 0 18 3
```
`digitdemog -L2`ã§å
·ä½ä¾ãæ½åºã
```
digitdemog -L2 <( csel -p11 TRCOpenBibData_20210605.txt ) | expandtab
# ä¸è¨ã®çµæã¯ä¸è¨ã®ããã«ãªãâ
length freq minstr maxstr
0 80 '' <-- same
4 1151 '11cm'(2) '43cm'(3)
7 18 '12Ã12cm'(2) '27Ã39cm'(2)
8 3 '18Ã8.5cm' <-- same
```
`digitdemog -.`ã使ãã
```
digitdemog -. <( csel -p11 TRCOpenBibData_20210605.txt ) | expandtab
# ä¸è¨ã®çµæã¯ä¸è¨ã®ããã«ãªãâ
1 2 3 4 5 6 7 8 9
'.' 0 0 0 0 3. 0 0 0 0
'0' 0 137 0 0 3 0 0 0 0
'1' 385 313 0 4 0 0 0 0 0
'2' 666 59 0 10 4 0 0 0 0
'3' 118 16 0 4 1 0 0 0 0
'4' 3 6 0 0 0 0 0 0 0
'5' 0 90 0 0 2 3. 0 0 0
'6' 0 208 0 0 4 0 0 0 0
'7' 0 21 0 0 1 0 0 0 0
'8' 0 69 0 3. 0 0 0 0 0
'9' 0 253 0 0 3 0 0 0 0
'c' 0 0 1151. 0 0 18. 3. 0 0
'm' 0 0 0 1151. 0 0 18. 3. 0
'Ã' 0 0 21 0 0 0 0 0 0
end 80 0 0 0 1151. 0 0 18. 3.
```
( run in 0.371 second using v1.01-cache-2.11-cpan-5b529ec07f3 )