Blood DNAs (and a few rare cases, EBV-immortalized DNAs) from nearly 1000 families (of the 3000 planned) were sent to our group for processing and analysis. Approximately one-tenth of the families we analyzed are not yet officially in the SSC databases. DNA samples were shipped to NimbleGen’s Icelandic facility, where two-color hybridizations using a single reference male genome were performed. SSC samples were labeled with Cy3, and the reference was labeled with Cy5. Ninety-seven percent
of families passed gender and pedigree checks for all members selleckchem and are called “valid” herein. Those are the only families considered in this report. We define a trio as consisting of a mother, a father, and a child, either affected or unaffected. If each member of a trio has a hybridization that passes minimum quality thresholds (see Experimental Procedures), that trio and its associated hybridizations are called “high quality” (or “HQ”). Out of 1721 valid trios from 887 families, 1475 (86%) are HQ. For convenience, throughout this report we refer to the children with diagnosed ASDs as “probands” and to the children who do not have ASDs as “sibs.” For purposes of statistical evaluation, we establish the “HQ quads,” a subset of 510 HQ families with exactly one proband and one sib each. The composition of the children and families
for the various subpopulations under study is summarized in Table 1. There are roughly equal proportions of probands and sibs. The male-to-female ratio among the probands is 7:1, typical of high-functioning ASDs (Newschaffer et al., 2007). We mention here the observation (to be discussed later) that there are fewer male sibs than female sibs. G protein-coupled receptor kinase Hybridization Baf-A1 data underwent extensive
processing before determining segments of altered copy number (Experimental Procedures, Supplemental Experimental Procedures, and Figure 1). We extracted signal and noise parameters from each hybridization and used these for quality control and to model integer copy-number states (Figure 2). For partitioning the genome into intervals of constant copy number, we used KS segmentation (Grubor et al., 2009). We also employed a trio-based Hidden Markov Model (HMM) to build databases of high-confidence events and transmissions. High-confidence events in 1500 parents were used to compile a frequency table of copy-number variation for all probes. We searched for de novo events in the 1475 HQ trios, initially restricting evidence to autosomal probes that did not have known extra mappings to the human genome (hg18 build) outside the event region, and probes that were rarely polymorphic in the high-confidence parental database (i.e., present in no more than 5/1500 parents). We compiled those events with high statistical significance of being de novo (p value < 10−9), creating a “stringent” automated list of 70 de novo events (Table S1, “stringent”). Figure 3 illustrates the family probe ratio data for two typical de novo events, a duplication and a deletion.