A commenter requested that I post the script I use to convert 23andME raw data to the required PLINK format that can be then used for ADMIXTURE computations, there are several ways to do this using different types of script, but since the script I use and am most familiar with is one that is compatible with GNU OCTAVE, that is the one I will post here.
I am assuming readers will be using a linux platform, eg. Ubuntu.
In addition, the script requires for PLINK to be already installed on your machine.
In addition, the script requires for PLINK to be already installed on your machine.
- Download and Install GNU OCTAVE, you can do this from Ubuntu's 'Software Centre' by simply searching for OCTAVE, it takes less than 5 minutes to download and install.
- Create a new folder that you will use for converting raw data and name the folder, for instance create a folder on your desktop and rename the new folder “Convert_23andME”.
- Download and then copy and paste this file into the “Convert_23andME” folder that you just created.
- Download your raw data from 23andME, unzip it and copy the .txt file and paste it into the “Convert_23andME” folder you just created. You should have only 2 files in that folder now.
- Start the Terminal window in Ubuntu. Change directory to the Desktop/Convert_23andME folder you created by typing in the command line of the terminal window :cd Desktop/Convert_23andME/
- Start octave by typing “octave” in the command line of the terminal window
- Next, type: Raw_Convert ("My_Rawdata.txt") where the string argument being passed in-between the quotations, i.e. My_Rawdata.txt, should be EXACTLY the name of the raw-data file you placed into the Convert_23andME folder in step 4.
- Avoid any spaces when answering all the questions*, and press enter, allow the program to process your raw data, V2 data takes about 22 minutes on my machine, V3 will obviously be longer. The speed will depend on your machine.
- When it is done, you will see 3 additional folders created within your “Convert_23andME” folder, the first folder (_conversion) will have three files with extensions .tped, .tfam and .nocall, these are the files converted by the script, where the .tped and .tfam files are the PLINK formatted transposed pedigree files of your raw data, while the .nocall is a file with the Chromosome#, assigned reference SNP IDs and position of your raw data points that were not successfully genotyped and is just for your record. The second folder (_binaryPED) will contain the files with extensions .bed, .bim, and .fam, which are created by PLINK and are the binary PED and associated files of your raw data that can be then merged with other data-sets to perform ADMIXTURE , MDS, as well as various other genome-wide analysis on. The last folder (_misc) is a folder containing miscellaneous files created by PLINK as a result of conversion from tped to binary ped, they may include files containing lists of heterozygous haploid genotypes and so forth, consult the PLINK manual for details.
- Exit octave, just type 'exit'
*for the Questions the converting program in octave asks you;
“Output File Name?”
This is the name you want to give to your converted raw data file, the name you give it here will have the necessary extensions automatically appended to it so there is no need to include any extensions here, enter just the name sans the extension.
“Family ID?”
This will be the family ID PLINK identifies your raw data with,
“Individual ID?”
This will be the individual ID PLINK identifies your raw data with whereby the combination of a family and individual ID should uniquely identify a person,
“Paternal ID (Default=0)?”
You can just leave this at 0,
“Maternal ID (Default=0)?”
You can also leave this at 0,
“SEX (1=male; 2=female; other=unknown)?”
Enter 1 for male and 2 for female.
--------------------------------------------------------
Edit_Rev2: Converted Program into function, included Chromosome # and Position fields in No call list.
Edit_Rev3: Segregated No calls between Mitochondria, X and Y, included total-passed SNPs for PLINK in summary.
Edit_Rev4: Automated binary PED file creation.
--------------------------------------------------------
Edit_Rev2: Converted Program into function, included Chromosome # and Position fields in No call list.
Edit_Rev3: Segregated No calls between Mitochondria, X and Y, included total-passed SNPs for PLINK in summary.
Edit_Rev4: Automated binary PED file creation.