Quantcast
Channel: Ethio Helix ኢትዮ:ሒሊክስ
Viewing all articles
Browse latest Browse all 74

Cross Validating and K Selection

$
0
0

There are two ways of choosing a K value for any given dataset that one wishes to perform an ADMIXTURE run on, one is to throw a dart at a random set of numbers and hope it works out for the very best, the other is to run ADMIXTURE at different K's while computing a cross validation error for each of the K values using the --cv flag, I did this with the studentized global dataset that I discussed earlier in this post. The Cross Validation error values for K 1-14 for that particular dataset can be seen in the graphs below,

close up :
While the CV-Error values do not start flattening out until about K=10, the CV error values do not start inflecting until K=13, meaning K=13 is the appropriate choice for this dataset.

Cross Validation can take a considerably long time to run, as each consecutive K has to be evaluated along with its error separately, unless one has access to a very fast machine off-course.

As a reference, the Bash shell code to run Cross Validation in ADMIXTURE for up-to K=14 is:

for K in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; \
do ./admixture32 -j2 --cv=14 “filename.bed” $K | tee log${K}.out; done

where CV error values will be recorded in the .out files for each K.

Peaking populations for each cluster for K =2-13

K=2
Cluster1: pygmy,mbutipygmy,sotho/tswana,biakapygmy,fang

Cluster2: chinese-americans,tujia,miao,hezhen,han

East Asians and Africans split, with West Asians and Europeans belonging to 1/3 African and 2/3 East Asian, the reverse is seen with Ethiopians, 2/3 African and 1/3 East Asian.



K=3
Cluster1:sardinian,basque,tuscans,italian,spaniards

Cluster2: pygmy,mbutipygmy,sotho/tswana,biakapygmy,bantusouthafrica

Cluster3: she,chinese-americans,han,singapore-chinese,chinese
West Asians Split off.

 
K=4
Cluster1: sardinian,basque,tuscans,italian,cypriots

Cluster2: pygmy,mbutipygmy,sotho/tswana,biakapygmy,bantusouthafrica

Cluster3:colombian,karitiana,surui,pima,totonac

Cluster4: she,han,singapore-chinese,chinese,miao
Native Americans split off.

K=5
Cluster1: she,han,chinese-americans,chinese,singapore-chinese

Cluster2: surui,karitiana,colombian,pima,totonac

Cluster3: sardinian,basque,spaniards,italian,tuscans

Cluster4: pygmy,mbutipygmy,biakapygmy,bantusouthafrica,sotho/tswana

Cluster5:papuan,irula,tn-dalit,ap-mala,malayan
Oceanians and South Asians split off together.

K=6
Cluster1:papuan,melanesian,tongan,samoan,paniya

Cluster2: pygmy,mbutipygmy,biakapygmy,bantusouthafrica,sotho/tswana

Cluster3: karitiana,colombian,surui,pima,totonac

Cluster4: she,han,chinese-americans,singapore-chinese,chinese

Cluster5: sardinian,basque,spaniards,italian,tuscans

Cluster6:irula,tn-dalit,ap-madiga,ap-mala,north-kannadi
Oceanians and South Asians split off from each other.

K=7
Cluster1: sardinian,basque,spaniards,italian,tuscans

Cluster2:dogon,yoruba,bambaran,hausa,igbo

Cluster3: irula,tn-dalit,ap-mala,ap-madiga,north-kannadi

Cluster4:san-nb,san,!kung,pygmy,mbutipygmy

Cluster5: papuan,melanesian,tongan,samoan,paniya

Cluster6: colombian,surui,karitiana,pima,totonac

Cluster7: she,han,chinese-americans,singapore-chinese,chinese
San split off from the African component.

K=8
Cluster1: dogon,yoruba,bambaran,hausa,igbo

Cluster2: irula,tn-dalit,ap-mala,ap-madiga,north-kannadi

Cluster3: papuan,melanesian,tongan,samoan,paniya

Cluster4:koryaks,nganassans,chukchis,evenkis,yakut

Cluster5:dai,vietnamese,singapore-chinese,she,han

Cluster6: sardinian,basque,spaniards,italian,tuscans

Cluster7: san-nb,san,!kung,pygmy,mbutipygmy

Cluster8: surui,karitiana,colombian,pima,totonac
Siberians split off from the East Asian component.

K=9
Cluster1: papuan,melanesian,tongan,samoan,paniya

Cluster2:iban,samoan,tongan,singapore-malay,dai

Cluster3: japanese,hezhen,han-nchina,xibo,beijing-chinese

Cluster4: sardinian,basque,spaniards,italian,tuscans

Cluster5: san-nb,san,!kung,pygmy,mbutipygmy

Cluster6: dogon,yoruba,bambaran,hausa,igbo

Cluster7: surui,karitiana,colombian,pima,totonac

Cluster8: irula,tn-dalit,ap-mala,ap-madiga,north-kannadi

Cluster9: koryaks,chukchis,nganassans,east-greenlanders,kets
A South East Asian Component forms.

K=10
Cluster1: saudis,bedouin,yemen-jews,samaritians,tunisia

Cluster2: papuan,melanesian,tongan,samoan,paniya

Cluster3: dai,vietnamese,iban,singapore-chinese,she

Cluster4: hadza,maasai,ethiopians,ethiopian-jews,bulala
Cluster5: irula,tn-dalit,ap-madiga,ap-mala,north-kannadi

Cluster6: surui,karitiana,colombian,pima,totonac

Cluster7: koryaks,nganassans,chukchis,evenkis,yakut

Cluster8: dogon,yoruba,brong,igbo,bambaran

Cluster9: san-nb,san,!kung,pygmy,mbutipygmy

Cluster10: lithuanians,belorussian,orcadian,n-european,utahn-whites
West Asian component splits into 2 components; North European and Middle East & North African (MENA).  An East African component that was previously concealed by  the West Asian and African components forms. The previous South East Asian component disappears.

K=11
Cluster1: dai,vietnamese,singapore-chinese,she,han

Cluster2: koryaks,nganassans,chukchis,evenkis,yakut

Cluster3: surui,karitiana,colombian,pima,totonac

Cluster4: tunisia,bedouin,saudis,sahara-occ,yemen-jews

Cluster5: dogon,yoruba,brong,igbo,bambaran

Cluster6: lithuanians,belorussian,orcadian,n-european,utahn-whites

Cluster7: papuan,melanesian,tongan,samoan,paniya

Cluster8: san-nb,san,!kung,pygmy,mbutipygmy

Cluster9: irula,malayan,tn-dalit,ap-mala,ap-madiga

Cluster10: hadza,maasai,ethiopians,sandawe,bulala

Cluster11: kalash,brahui,balochi,makrani,georgians
A central Asian component forms.

K=12
Cluster1: surui,karitiana,colombian,pima,totonac

Cluster2: lithuanians,belorussian,orcadian,n-european,utahn-whites

Cluster3: san-nb,san,!kung,pygmy,mbutipygmy

Cluster4:iban,samoan,tongan,singapore-malay,cambodian

Cluster5: bedouin,saudis,yemen-jews,samaritians,tunisia

Cluster6: papuan,melanesian,tongan,samoan,paniya

Cluster7: japanese,beijing-chinese,han-nchina,chinese-americans,xibo

Cluster8:koryaks,chukchis,east-greenlanders,west-greenlanders,kets

Cluster9: irula,tn-dalit,ap-madiga,ap-mala,north-kannadi

Cluster10: dogon,yoruba,brong,igbo,bambaran

Cluster11: nganassans,evenkis,yakut,dolgans,kets

Cluster12: hadza,maasai,ethiopians,ethiopian-jews,bulala
Central Asian component disappears, a second Siberian component is formed, the S. East Asian component reappears.

 
K=13
Cluster1: san-nb,san,!kung,xhosa,bantusouthafrica

Cluster2: surui,karitiana,colombian,pima,totonac

Cluster3: papuan,melanesian,tongan,samoan,paniya

Cluster4: japanese,han-nchina,beijing-chinese,xibo,hezhen

Cluster5: hadza,maasai,ethiopians,sandawe,bulala

Cluster6: lithuanians,belorussian,orcadian,n-european,utahn-whites

Cluster7: koryaks,chukchis,nganassans,evenkis,east-greenlanders

Cluster8: tunisia,bedouin,saudis,yemen-jews,sahara-occ

Cluster9:kalash,brahui,balochi,makrani,georgians

Cluster10:pygmy,mbutipygmy,biakapygmy,alur,fang

Cluster11: irula,malayan,tn-dalit,ap-mala,ap-madiga

Cluster12: dogon,yoruba,brong,bambaran,igbo

Cluster13: iban,samoan,tongan,singapore-malay,dai

Central Asian Component reappears, a new Pygmy component is formed, second Siberian component disappears.

Fst for K=13.

UPDATE: Median cluster % for all populations, K13.
- no title specified
ADMIXTURE, Global K13NSanN. AmericanOceanianE. AsianE. AfricanN. EuropeanSiberianMENACentral AsianPygmyS. AsianW. AfricanS.E. Asian
!kung 878%0%0%0%2%0%0%0%0%2%0%16%0%
adygei 110%1%0%3%0%32%3%20%42%0%1%0%0%
african-americans 372%1%0%0%1%13%0%1%3%3%0%72%0%
algeria 120%0%0%0%5%22%1%48%5%0%3%13%0%
altaians 80%2%0%37%0%12%31%0%12%0%0%0%0%
alur 70%0%0%0%34%0%0%0%0%17%0%50%0%
ap-brahmin 140%1%2%1%0%8%2%1%36%0%48%0%2%
ap-madiga 50%0%2%2%0%0%0%0%24%0%66%0%5%
ap-mala 80%0%2%2%0%0%0%0%22%0%67%0%5%
armenians 110%0%0%0%0%19%0%34%43%0%2%0%0%
armenians-b 30%0%1%0%0%48%4%17%26%0%1%0%0%
ashkenazy-jews 150%0%0%1%0%37%0%34%24%0%1%0%0%
azerbaijan-jews 60%1%0%0%0%15%0%37%44%0%0%0%1%
balochi 180%1%0%1%0%7%1%13%53%0%20%0%0%
bambaran 143%1%0%0%1%0%0%1%0%1%0%91%0%
bamoun 103%0%0%0%4%0%0%0%0%7%0%85%0%
bantukenya 53%0%0%0%20%0%0%2%0%5%0%67%0%
bantusouthafrica 324%0%0%1%6%0%0%0%0%4%0%65%0%
basque 240%0%1%0%0%75%0%16%6%0%1%0%0%
bedouin 330%0%0%0%3%0%0%65%27%0%0%2%0%
beijing-chinese 910%0%0%68%0%0%2%0%0%0%0%0%28%
belorussian 40%1%1%0%0%77%4%3%15%0%1%0%0%
biakapygmy 1217%0%0%0%1%0%0%0%0%33%0%45%0%
bnei-menashe-jews 40%0%2%1%0%7%0%16%34%0%34%0%3%
bolivian 170%95%0%1%0%1%3%0%0%0%0%0%0%
brahui 180%1%0%0%0%8%1%13%55%0%20%0%0%
brong 44%0%0%0%0%0%0%0%1%3%0%91%0%
bulala 120%0%0%0%38%0%0%3%0%0%0%57%0%
burusho 170%2%1%7%0%13%4%2%41%0%27%0%2%
buryat 160%0%1%49%0%5%38%1%5%0%0%0%1%
buryats 130%0%1%47%0%5%38%0%5%0%1%0%0%
cambodian 50%0%1%31%0%0%0%0%1%0%11%0%57%
chinese 50%0%0%60%0%0%0%0%0%0%0%0%38%
chinese-americans 730%0%0%63%0%0%0%0%0%0%0%0%36%
chukchis 110%17%0%0%0%0%80%0%0%0%0%0%2%
chuvashs 120%2%0%6%0%54%19%1%15%0%2%0%0%
cochin-jews 40%2%2%0%1%5%2%8%34%0%46%0%1%
colombian 60%100%0%0%0%0%0%0%0%0%0%0%0%
cypriots 70%0%1%1%0%29%0%39%30%0%0%0%0%
dai 60%0%0%36%0%0%0%0%0%0%3%0%62%
daur 80%1%1%63%0%1%25%0%1%0%0%0%8%
dogon 241%0%0%0%0%0%0%1%0%0%0%94%0%
dolgans 50%0%0%28%0%10%56%0%3%0%2%0%0%
druze 300%0%0%0%0%17%0%42%38%0%0%0%0%
east-greenlanders 60%35%0%0%0%4%60%0%0%0%0%0%0%
egypt 120%0%0%0%7%11%0%47%24%0%0%7%0%
egyptans 70%0%0%0%8%10%0%49%23%0%0%7%0%
ethiopian-jews 121%0%1%0%37%0%0%38%8%0%0%11%0%
ethiopians 121%0%0%1%36%0%1%39%7%0%0%11%0%
evenkis 110%0%0%34%0%3%61%0%2%0%0%0%0%
fang 76%0%0%0%5%0%0%0%0%7%0%80%0%
french 220%1%0%0%0%70%0%14%12%0%1%0%0%
fulani 72%0%0%1%5%7%1%25%0%0%2%58%0%
georgia-jews 40%0%0%1%0%16%0%37%43%0%0%0%0%
georgians 170%0%0%0%0%23%0%28%46%0%0%0%0%
gujaratis 530%1%1%1%0%2%0%0%37%0%55%0%2%
gujaratis-b 140%2%1%0%0%13%2%0%40%0%40%0%1%
hadza 1119%0%0%0%80%0%0%0%0%0%0%0%0%
han 240%0%0%60%0%0%0%0%0%0%0%0%39%
han-nchina 60%0%0%68%0%0%4%0%2%0%0%0%24%
hausa 91%0%0%0%2%0%0%0%0%3%0%90%0%
hazara 160%1%0%31%0%14%16%6%23%0%8%0%4%
hema 113%0%1%0%31%0%1%10%2%4%0%46%0%
hezhen 40%1%0%66%0%0%28%0%0%0%0%0%6%
hungarians 90%2%0%0%0%69%2%10%15%0%1%0%0%
iban 150%0%2%11%0%0%2%0%0%0%7%0%77%
igbo 103%0%0%0%1%0%0%0%0%2%0%90%0%
iranian-jews 40%0%0%1%0%12%1%39%44%0%2%0%0%
iranians 120%1%1%0%0%16%1%28%45%1%7%1%0%
iraq-jews 80%0%1%0%0%14%0%41%40%1%1%0%1%
irula 240%0%0%0%0%1%0%2%1%0%89%0%0%
italian 80%0%1%0%0%60%0%23%14%0%0%0%1%
japanese 1540%0%1%91%0%0%1%0%0%0%0%0%6%
jordanians 141%0%0%0%3%16%1%42%33%0%1%3%1%
kaba 92%0%0%1%10%0%0%0%0%4%0%80%0%
kalash 160%2%1%0%0%10%3%0%65%0%16%0%2%
karitiana 140%100%0%0%0%0%0%0%0%0%0%0%0%
kets 20%5%0%13%0%19%54%0%8%0%1%0%0%
khmer-cambodian 30%0%3%27%0%0%0%0%0%0%13%0%55%
kongo 53%0%0%0%5%0%0%0%0%6%0%83%0%
koryaks 130%7%0%0%0%0%93%0%0%0%0%0%0%
kurd 160%1%1%0%0%19%0%29%46%0%3%0%0%
kyrgyzstani 150%1%0%40%0%13%24%3%12%0%2%0%3%
lahu 50%0%1%42%0%0%1%0%0%0%3%0%52%
lebanese 30%1%2%0%1%20%0%40%33%0%2%2%0%
lezgins 130%2%0%0%0%32%2%16%45%0%1%0%0%
libya 90%1%1%0%7%17%0%50%10%0%2%9%0%
lithuanians 60%1%0%0%0%80%2%0%12%0%3%0%0%
luhya 732%0%0%0%22%0%0%0%0%6%0%67%0%
maasai 1002%0%0%0%55%0%0%14%0%1%0%24%0%
mada 80%1%0%0%22%0%0%0%0%3%0%73%0%
makrani 190%1%0%0%0%7%0%15%54%0%18%3%0%
malayan 20%1%5%3%0%1%2%0%12%1%70%0%6%
mandenka 133%0%0%0%2%0%0%3%0%1%0%88%0%
maya 120%86%0%1%0%3%3%2%1%0%0%0%0%
mbutipygmy 130%0%0%0%0%0%0%0%0%100%0%0%0%
melanesian 70%0%74%0%0%0%0%0%0%0%0%0%25%
mexicans 380%44%0%1%0%27%2%12%6%0%1%3%0%
miao 60%0%0%56%0%0%1%0%0%0%0%0%42%
mongola 60%1%0%64%0%4%14%1%1%0%0%0%13%
mongolians 80%2%1%46%0%10%30%2%7%0%0%0%2%
moroccans 51%0%0%0%3%18%1%54%0%1%3%15%0%
morocco-jews 70%0%0%0%1%32%0%39%23%0%1%2%1%
morocco-n 120%1%0%0%3%27%0%49%1%0%4%12%0%
morocco-s 130%0%0%0%5%18%0%50%0%1%3%16%0%
mozabite 210%0%0%0%3%20%0%53%0%0%4%16%0%
n-european 140%1%0%0%0%74%1%8%13%0%0%0%0%
naxi 50%0%1%63%0%0%6%0%0%0%4%0%26%
nepalese 170%1%1%7%0%11%3%0%35%0%35%0%4%
nganassans 150%0%0%11%0%0%88%0%0%0%0%0%0%
nguni 418%0%1%0%6%0%0%0%0%4%0%71%0%
north-kannadi 60%0%3%3%0%0%0%0%23%0%65%0%3%
orcadian 90%1%0%0%0%75%2%7%14%0%0%0%0%
oroqen 70%0%0%52%0%0%40%0%0%0%0%0%5%
palestinian 270%1%1%0%3%14%0%46%32%0%1%2%0%
paniya 40%0%13%16%0%0%1%0%0%1%14%1%48%
papuan 170%0%100%0%0%0%0%0%0%0%0%0%0%
pathan 140%2%0%1%0%17%1%6%44%0%26%0%1%
pedi 818%0%0%0%5%0%0%0%1%4%0%71%0%
pima 110%95%0%0%0%0%5%0%0%0%0%0%0%
punjabi-arain 150%2%1%0%0%10%1%4%45%0%34%0%0%
pygmy 170%0%0%0%0%0%0%0%0%100%0%0%0%
romanians 90%0%0%0%0%55%3%19%19%0%0%0%0%
russian 200%2%0%0%0%70%9%1%14%0%2%0%1%
sahara-occ 100%0%0%0%6%16%1%57%0%0%3%15%0%
sakilli 40%0%3%3%0%1%0%0%25%0%64%0%2%
samaritians 31%0%2%0%0%11%0%49%35%0%1%0%0%
samoan 110%0%25%0%0%0%0%0%0%0%0%0%74%
san 2488%0%0%0%0%0%0%0%0%0%0%0%0%
san-nb 12100%0%0%0%0%0%0%0%0%0%0%0%0%
sandawe 1712%1%0%0%38%0%0%13%1%5%0%29%0%
sardinian 220%0%0%0%0%59%0%35%4%0%0%0%0%
saudis 150%0%0%0%4%0%0%63%30%0%0%0%0%
selkups 70%5%0%9%0%26%47%0%10%0%1%0%0%
sephardic-jews 130%0%0%0%0%33%0%37%26%0%1%0%0%
she 90%0%0%59%0%0%0%0%0%0%0%0%40%
sindhi 150%2%1%0%0%11%1%5%44%0%35%0%0%
singapore-chinese 700%0%0%60%0%0%0%0%0%0%0%0%40%
singapore-indians 530%1%2%1%0%2%1%1%32%0%54%0%3%
singapore-malay 590%1%4%15%0%0%1%0%1%0%10%0%65%
slovenian 170%1%0%0%0%70%2%9%15%0%1%0%0%
sotho/tswana 525%0%0%0%3%0%0%0%0%4%0%67%0%
spaniards 50%0%0%0%0%68%1%19%10%0%0%1%1%
stalskoe 50%2%0%2%0%34%3%16%39%0%2%0%0%
surui 70%100%0%0%0%0%0%0%0%0%0%0%0%
syrians 100%1%0%0%1%16%0%40%35%0%3%2%0%
thai 170%1%2%15%0%1%2%1%3%0%16%0%57%
tn-brahmin 90%2%2%0%0%8%2%0%36%0%48%0%1%
tn-dalit 70%0%3%0%0%0%1%0%23%0%67%0%5%
tongan 110%0%30%0%0%0%0%0%0%0%0%0%70%
totonac 150%91%0%1%0%3%5%0%0%0%0%0%0%
tu 70%1%1%63%0%3%8%1%3%0%1%0%18%
tujia 50%0%0%62%0%0%0%0%0%0%0%0%36%
tunisia 110%0%0%0%1%20%0%59%0%0%4%13%0%
turks 130%1%0%4%0%26%3%28%35%0%2%0%0%
tuscans 790%0%0%0%0%53%0%26%18%0%0%0%0%
tuvinians 110%1%1%41%0%9%40%0%6%0%0%0%1%
urkarah 110%2%0%0%0%36%2%11%45%0%0%0%0%
utahn-whites 720%1%0%0%0%75%1%7%12%0%1%0%0%
uygur 70%2%0%29%0%17%12%5%22%0%7%0%6%
uzbekistan-jews 20%1%1%0%0%18%1%35%42%0%2%0%1%
uzbeks 100%1%0%27%0%21%17%6%20%0%6%0%1%
vietnamese 40%0%1%42%0%0%0%0%0%0%4%0%52%
west-greenlanders 80%26%0%0%0%23%45%1%2%0%2%0%0%
xhosa 327%0%0%0%7%0%0%1%0%2%0%61%0%
xibo 60%0%1%67%0%1%15%0%2%0%0%0%13%
yakut 180%0%1%37%0%3%53%1%4%0%0%0%0%
yemen-jews 120%0%1%0%4%3%0%58%31%0%1%0%0%
yemenese 71%0%1%1%5%3%1%42%28%1%3%7%1%
yi 60%0%1%62%0%0%7%0%0%0%3%0%26%
yoruba 922%0%0%0%0%0%0%0%0%2%0%93%0%
yukaghirs 60%0%0%16%0%31%42%0%6%0%1%0%0%

All results can be downloaded here: ADMIXTURE_K1-14.tar.gz
which contains:
PLINK formatted *.bed, *.bim, *.fam files
*.txt file with complete list of samples
K folders containing:
*.P and *.Q ADMIXTURE output files
log file, with Fst distances and CV errors
Processed Output folder containing:
Median Cluster %
Average Cluster %
Standard Deviations
Cluster Key: Top five populations in each cluster
list of Unique Populations
GNU OCTAVE variable   loading file, *.mat

Viewing all articles
Browse latest Browse all 74

Trending Articles