National educational organizations have called upon scientists to become involved in K–12 education reform. From sporadic interaction with students to more sustained partnerships with teachers, the engagement of scientists takes many forms. In this case, scientists from the American Society of Human Genetics (ASHG), the Genetics Society of America (GSA), and the National Society of Genetic Counselors (NSGC) have partnered to organize an essay contest for high school students as part of the activities surrounding National DNA Day. We describe a systematic analysis of 500 of 2443 total essays submitted in response to this contest over 2 years. Our analysis reveals the nature of student misconceptions in genetics, the possible sources of these misconceptions, and potential ways to galvanize genetics education.
THE rapid advances in genetic research, the popularity of the topic in the news and in current popular television shows (e.g., CSI: Crime Scene Investigation), and the direct role that genetics plays in human health and reproduction make it a scientific discipline that everyone needs to understand. Yet, several studies reveal that students fail to critically understand the genetics knowledge taught in the classroom, and this lack of understanding translates to an inability to apply basic knowledge to their everyday lives (Lewis and Wood-Robinson 2000; Lewis and Kattmann 2004).
State science standards reflect the important role that genetic advances are playing in our lives. More than 80% of middle and high school science standards adopted since 2003 include terminology on the Human Genome Project, bioethics, cloning, stem cells, and/or other biotechnology terminology that did not exist in previous versions of the standards. However, even the adoption of national science standards, which include the coverage of genetics concepts, does not guarantee understanding of the concepts. The compulsory science education standards in England and Wales, for example, failed to yield deep conceptual understanding in genetics for their students (Lewis and Wood-Robinson 2000). The important role genetics plays in society, human health, and our responses to the environment makes these deficiencies in genetics content knowledge revealed by state, national, and international standardized tests even more troubling. Therefore, a strategic effort to improve secondary genetics education is especially needed.
MISCONCEPTIONS AND CRITICAL THINKING
One strategy that can have an impact on student understanding of a specific discipline is to encourage deep, critical thinking about that discipline. In an age where at least superficial information is at our fingertips on a limitless number of topics including genetics, we must find methods of ensuring an enduring understanding of this information. Because students often learn only passively through lectures, reading assignments, or cursory searching of the Internet, developing critical thinking skills is necessary to ensure a level of literacy and the eventual ability to apply the knowledge (Connally and Vilardi 1989; Rivard 1994; Keys 1999). Providing students with an opportunity to explore challenging areas in genetics through writing is one manner of achieving this goal.
Research on student learning suggests that student misconceptions serve as barriers to student achievement. These misconceptions are often based on personal experiences and are difficult to bypass en route to meaningful understanding in any content area (Gelman and Gallistel 1986; Wellman 1990). Even after instruction designed to address scientific content in an area where misconceptions are held, many students do not reconstruct their thinking. Only those students able to deconstruct their knowledge and reconstruct it using critical thinking and logical reasoning appear to have fewer misconceptions even after high-quality instruction (Lawson and Thompson 1988). Similarly, conceptual change generally occurs only if a learning experience can demonstrate both that a student's explanation is insufficient and that an alternative explanation is more applicable (Posneret al. 1982).
The National Assessment of Education Progress (NAEP) assesses proficiency of U.S. students in a variety of content areas, including science, using a random sampling of students from the 4th, 8th, and 12th grades. The last NAEP tests in science were administered in 1996, 2000, and 2005. Unfortunately, the data from the 2005 test is still not completely accessible to the public. However, an analysis of the 2000 NAEP test results reveals dramatic deficiencies in genetics content knowledge at both 8th and 12th grades. Mastery of 12 concepts from earth, physical, and life sciences is required for students to demonstrate proficient or advanced knowledge in the sciences; one-quarter of these concepts are in the field of genetics (O'Sullivanet al. 2003). The NAEP test results reveal specific deficits in student understanding of classification, evolution, mutation, and DNA technology as shown in Table 1. Publicly available data on the 2000 NAEP science assessment (at http://nces.ed.gov/) provides sample questions and answers from students, as well as the criteria for scoring answers as “complete or essential, partial, or unsatisfactory.” We specifically examined the subset of data regarding the broad category of molecular and human genetics (footnote a in Table 1).
All questions referring to genes, mutation, cell differentiation, genetic disease, and recombinant DNA usage for 12th grade students had a difficulty of “hard” and required a written response. This type of question enables investigators to explore student thinking in more depth. The example for the year 2000 provided an adapted text that was taken from an article in the March 1990 issue of Discover magazine. This article was based on the work of Richard Mulligan and other geneticists that are currently examining the use of viruses as vehicles for introducing genes into human cells as a form of therapy for genetic diseases in humans. A majority of students were not able to describe a gene, its structure, or its function. It was very rare for students to have a thorough understanding of the types of mutations that occur, the causes of those mutations, and the physiological effect of gene alterations. Moreover, few were able to transfer the knowledge from the article to the information they had learned in class about inherited diseases.
Therefore, to encourage a transformation from passive knowledge in genetics gained via classroom lectures, the National DNA Day Essay Contest (http://www.genednet.org/pages/k12_dnaday08.shtml) was established by K. R. Mills Shaw, Director of Education at the American Society of Human Genetics (ASHG), to provide a distinct opportunity for students to think critically and articulate scientific arguments related to genetics. Teachers from across the country were invited to participate through list serves, blast e-mails, and the ASHG education website, http://www.genednet.org. Each year two questions have been provided: one to allow students to explore the methods and research that genetics entails and the second to explore the ethical, legal, and social issues influenced by genetics (see Table 2). Table 3 summarizes the number of essays submitted during each year of the contest. The students who wrote the top three essays for each question were declared first, second, and third place winners through the judging process described in methods. These students were awarded $350, $250, and $150, respectively. The monetary awards were made possible by the sponsorship of Applied Biosystems (Foster City, CA). While many essays demonstrated a clear understanding of genetics and its implications, a significant number of contributed essays revealed firmly held misinformation and misconceptions by U.S. students in grades 9–12. This article examines those misconceptions, provides possible explanations for their origins, and suggests ways that scientists, professors, and teachers can collaborate to improve genetics education at the K–12 level.
Essay contest questions in 2006 and 2007
Essay contest participation in 2006 and 2007
Judging of essays:
All aspects of the National DNA Day Essay Contest were managed online from initial advertisement to final judging. Information technology specialists from ASHG and the Genetics Society of America (GSA) were able to adapt existing society resources to facilitate essay acceptance, cataloging, and scoring. Judges were recruited from the active membership of ASHG, GSA, and the National Society of Genetics Counselors (NSGC). Three groups of judges were utilized. Each year students were given a choice between two essay questions. The questions from 2006 and 2007 are highlighted in Table 2. The first group of judges read large groups of essays on either of the two essay topics, scanning these essays to ensure they fulfilled all criteria and addressed all aspects of the judging criteria. The criteria were slightly different for each of the two questions and were all published online for all students and teachers. The scoring criteria for the 2007 questions are documented in Table 4. Essays not fulfilling these criteria after being reviewed by at least two judges were removed from more detailed consideration. The second group of judges scored ∼10–15 essays in depth, providing a score (from 1 to 10) in each of the five categories. Each essay was scored by at least three independent judges. Scores were tabulated and the 10 essays with the highest scores for each topic were named as finalists. The last set of judges reviewed and scored each of the finalist essays with the highest-scoring essays being chosen as winners. One hundred ten members of the ASHG, GSA, or NSGC membership served as judges each year. The entire adjudication process is reviewed in Figure 1. This system allowed us to perform all judging anonymously and ensure that each essay was read and scored by multiple independent reviewers while simultaneously investigating each essay for scientific accuracy.
Essay contest scoring guidelines for 2007
Identification of misconceptions:
All judges, along with scientists on ASHG staff, were asked to examine each essay for misconceptions or incorrect statements and forward this information along with their scores. All misconceptions were collected over 2 years from individual judges but were placed into categories by two individual coders (K. R. Mills Shaw and K. Van Horne) on the basis of the genetic topic that the misconception addressed (Table 5). These topics were generated de novo after reviewing all the misconceptions submitted from judges and after K. R. Mills Shaw and K. Van Horne additionally independently evaluated 125 randomly selected essays from each year (2006 and 2007). All misconceptions were then cataloged under these specific topic areas to better characterize the areas where misconceptions are most common (seen in Table 6). Five hundred essays, or 20%, were randomly selected for this level of systematic review. Specifically, every fourth essay was analyzed in detail. If, however, essays were deemed completely unsatisfactory for review (e.g., too short, too poorly defined, too poorly written), the essay was not included in the systematically reviewed sample of 500. A misconception/misunderstanding was identified as any clearly written statement that did not accurately reflect the nature of genetic science, technology, or research as defined by K. R. Mills Shaw and J. A. Boughman, both Ph.D. scientists with a background in genetics. Essays where language or communication barriers were obvious (due to vocabulary, grammatical, and spelling errors) were not included as part of this review. Once misconceptions were identified, coders both independently and in communication with each other cataloged misconceptions according to topic to ensure consistency in grouping. The quantitation of the examples revealed in this article reflects observations from analysis of the critical writing from 500 high school essays (9th–12th grade submissions).
Key ideas/subtopics used to categorize misconceptions
Common misconceptions revealed in student essays
Essays collected represent data from multiple states, grades, and classroom teachers:
All essays were submitted online. In the online submission form we collected demographic data on all students and their teachers, including their grade, city, state, and school. In both years of the contest we included a rule that stated only three essays per teacher for each question, for a total of six essays per teacher, would be accepted. However, this rule was often overlooked, and teachers would submit essays from their entire classrooms. Thus, while we collected more essays in 2006, this total number of essays reflects a representation of fewer classrooms. In 2007 we rectified this problem by adding an algorithm that blocked any more than three essays from the same teacher. The data presented in Table 3 show that the essay contest grew between years 1 and 2 in the overall number of classrooms reached and that the essays collected represent a wide geographical distribution. In 2007, we did not receive essays from Alaska, Hawaii, Vermont, South Dakota, Wyoming, Maine, Washington, DC, Nebraska, or Mississippi despite sending out multiple e-mail solicitations to teacher contacts in those states.
Identification of misconceptions and misinformation from student essays:
During the process of reading and scoring essays, judges were asked to identify and document examples of misconceptions in their essays. Additionally, all essays were cursorily scanned by either K. R. Mills Shaw or K. Van Horne. Tables 5 and 6 provide an overview of the topics where misconceptions are common as well sample statements taken directly from student essays. While several hundred individual misconceptions were identified during the course of judging and review, many of the individual misconceptions could be categorized under broad topics in genetics (summarized in Table 5). To quantify the frequency of these common misconceptions we reanalyzed 500 (∼20%) of the essays, which included 250 essays chosen at random from each year's submissions. Individual misconceptions were identified and cataloged. After cataloging each misconception in the 500 essays and defining the categories of genetics in which they fell, “common” categories were defined by those being present in at least 5% of the essays examined. Of the 500 systematically reviewed essays, 278 (55.6%) revealed at least one obvious misconception. Another 101 essays (20.2%) were recognized for having two or more misconceptions. Misconceptions that were linked to essays with obvious language or writing barriers were excluded from quantitative analysis to avoid overrepresentation in our quantitative analysis. The prevalence of misconceptions per topic area is summarized in Figure 2.
Prevalence of misconceptions by genetics topic. A total of 500 essays were chosen at random (20% of total submitted) and were systematically reviewed for misconceptions. Frequently observed topics of misconceptions were identified and essays were cataloged...
Standards and common areas of misconceptions:
Misconceptions were identified and categorized into a general topic area. We then examined how standards were related to these main topic areas, specifically patterns of inheritance and the deterministic nature of genes. We analyzed 20 sets of state biology standards at random to determine the nature of the standards in patterns of inheritance at the introductory biology or life science level in high school. Supplemental Table 1 at http://www.genetics.org/supplemental/ highlights four sets of these standards that provide a range of coverage of patterns of inheritance. A majority of these basic genetics/cell biology standards (15/20) included an examination of Mendel's laws of inheritance, some specifically describing the requirement to understand probability, Punnett squares, and the differences between autosomal dominant, autosomal recessive, and sex-linked traits. Other states included only more broad descriptions where a student would, for example, “Explain current scientific ideas and information about the molecular and genetic basis of heredity” (see supplemental Table 1). These are important data because they reflect the highly diverse nature of the level of detail required of students in U.S. high schools. While standards that fail to provide comprehensive detail allow talented teachers to provide creative and challenging learning opportunities for students, they can often also result in learning experiences that fail to effectively teach students even the most basic concepts in biology.
The single greatest number of misconceptions identified from student essays could be broadly defined as falling into the category of “genetic technologies.” When answering the question “If you were a genetics researcher, what would you study and why?” students often expressed their goal of curing multiple unrelated diseases. The reality is that most genetics researchers are often several steps removed from work on specific cures but instead devote their efforts to improving the molecular understanding of disease with the ultimate goal of improved treatments. Moreover, scientists generally study only one specific illness or class or related diseases. The work scientists currently perform to identify a disease-causing mutation is prominent in student essays with the common extrapolation to the “curing” of disease through gene replacement. Often, student essays also suggested that genetic engineering allows us to put a gene from any species into another species to have that trait expressed in exactly the same manner as in the original species. Students do not understand the complexity of biotechnology and genetic engineering. They make broad leaps without demonstrating an understanding for the multiple genetic and epigenetic (or environmental) factors that play a role in genetic regulation and manipulation of genetic materials in the laboratory setting. Moreover, there is a disconnect between observed characteristics and the physiological function of genes:
We could eliminate all the premature deaths of people dying around the world from thirst if we genetically modified people to inherit some of the characteristics of the camel, allowing them to go for months at a time without drinking water.
Finally, we note the prevalence of essays that included information on the importance of stem cell research. While clearly a prevalent topic in the popular literature and press, students often discussed stem cell biology without ever discussing the genetics of stem cells. We did not include essays that included information on stem cells in our quantitative analysis. However, we note that scanning our database from 2006 for all references of “stem cells” revealed that almost one-quarter of essay submissions included this terminology without actively exploring the genetic nature of these cells despite the clear genetics-oriented nature of the essay questions.
Deterministic nature of genes:
Another common misconception we observed is that one gene is always responsible for one trait or one gene with one mutation always causes one disease. The discovery of genes that convey and determine a specific phenotype is often displayed and hyped in the media. A cursory search of online news outlets yielded example headlines that could easily be misinterpreted, adding credibility to students' misconceptions. Some examples include the following titles: “Turning Off Suspect Gene Makes Mice Smarter” (nytimes.com, May 29, 2007) and “Researchers narrow search for longevity gene” (cnn.com, August 28, 2001). It is important for students to understand that it is rare that a single gene has complete control over an exhibited phenotype. Instead, multiple factors contribute to phenotype. Multiple genes often work together, with the environment, to determine ultimate phenotype. Our examination of standards revealed that only 3 of 20 state standards specifically mentioned that students should learn about polygenic inheritance (that more than one gene can contribute to a specific phenotype) and only 2 described the role of the environment in controlling phenotype. Thus, it is not surprising that we would see a common misconception that single genes are the cause of most traits and inherited diseases. Compared to the general nature of genetic inheritance, far fewer students would have necessarily been exposed to the concepts of non-Mendelian and polygenic inheritance.
Patterns of inheritance:
Patterns of inheritance was another topic that revealed numerous misconceptions and misunderstandings of students. Not only were students often unable to correctly describe the nature of simple dominant and recessive patterns of inheritance, but also they were not able to go into any level of depth regarding genes or alleles, the physiological function of genes (proteins), or non-Mendelian patterns of inheritance. Some students even described genetic technologies as being able to “prevent the inheritance” of disease genes. Students focused primarily on simple Mendelian inheritance that was able to be analyzed via Punnett square analysis. All students described only monogenic traits that followed simple autosomal dominant, autosomal recessive, or X-linked inheritance. Students were often unable to adequately describe sources of abnormal chromosome numbers. Essays did not mention errors during meiotic cell division and generation of gametes as the source of monosomies or trisomies. Our review of state science standards for high school students in biology suggests, not surprisingly, that the majority of states provide specific, detailed standards that mandate teaching students, even at the earliest levels of their life science education in high school, the basic biology of inheritance patterns. Although 15 of the 20 biology standards included basic patterns of inheritance knowledge, when we reviewed the essays that were cataloged as having an error or a misconception falling under “patterns of inheritance,” 80% of those essays inaccurately described a basic tenet of Mendelian inheritance, despite their expected coverage of this material at their current grade level or in previous biology courses.
Nature of genes and genetic material:
All 20 state standards examined require coverage of the nature of DNA as the hereditary material in living things. Nevertheless, students suggested that lower organisms, including bacteria and fungi, often do not carry DNA. We also noted student confusion regarding the hierarchal organization of genetic material. Notably, students were frequently unable to accurately define DNA, genes, and chromosomes. Often, these terms were instead used interchangeably. In 2007, <1% of essays included any information on additional genetic material in the genome. Students did not mention gene expression control elements, repetitive sequences (unless discussing Huntington's disease), or other nongene elements in the genome. Finally, students often described specific protein-encoding segments, or genes, as discrete elements that could easily be removed from one context and added in a separate context. While this view likely extends from students learning basic biotechnology and bacterial transformation techniques (for example, adding the green fluorescent protein from jellyfish to bacterial strains), likening this process to the adding of a chemical to a solution is an oversimplification, at best.
Genetic basis of disease:
One of the principle errors observed in this category was the confusion of “hereditary” and “genetic” when describing diseases. In a small subset of cases, ∼10% of the total essays categorized as having a misconception in this topic, students completely misrepresented the genetic nature of a specific illness (e.g., calling HIV-1 an inherited disorder). While most illnesses have a genetic component, this does not make them hereditary. Moreover, while even infectious diseases can be considered to have a genetic component whether it be of the genetics of the virus itself or how individual genetics could result in different manifestations of the same illness, students must learn to clarify these differences. Cancer is a genetic disease. Only rare cancers, however, are hereditary. However, students often described breast and ovarian cancer as hereditary due to the mutations in BRCA1 or BRCA2. While mutations in these genes often do result in a cancer predisposition that appears to be inherited in a dominant-like fashion, the majority of breast cancer cases are not due to mutations in these genes.
A large number of student essays focused on the promise of genetic engineering in human health and reproduction (see also the Reproductive technologies section). While superficially this reflects that students recognize the positive influence that the study of genetics can have in their lives, numerous misconceptions suggest that students still fail to truly understand the nature and limitations of genetic research. An examination of state science standards, briefly described above, reveals that while state and national process (not necessarily content) standards require coverage of the nature of scientific research, inquiry, and discovery, this does not necessarily equate to students learning about how scientists actively perform research. Instead, these process standards reflect the fact that teachers are expected to provide students with opportunities for inquiry and investigation in the context of their own classroom laboratory experiments and activities. In short, state science standards do not require students to learn about the nature of scientific research.
Misconceptions falling under the category of “reproductive technologies” could have been accurately cataloged under genetic technologies. However, this class of misconceptions was frequent enough to require special treatment. In these cases students continued to explore their ideas of genetic engineering and cloning to describe the future of reproductive control where prospective parents would “improve” and “design” their offspring, with the ultimate goal of having the “perfect” child. Eugenics, either in specific use of terminology or in concept, appeared in 15% of essays collected in 2007. This percentage is not reflective of the goals or ongoing work of genetics research. Unfortunately, its prevalence in student essays is likely due to both its historical role in research as well as the “genohype.” Interestingly, the idea behind eugenics is not overtly described in the standards of any state science standards that we explored. The frequency of students describing using genetics to improve genotypes and design human beings, however, suggests that this is either the hook that teachers are using or the message that students are hearing from the media. More research would be required to determine which of these options is most prevalent.
Role of standards:
State science standards are not the only source of direction for what is taught in public schools; textbooks, laboratories, statewide assessments, and teacher quality also play significant roles. These benchmarks serve as the cornerstone of “standards-based reform,” which has become prominent since the adoption of the “No Child Left Behind” legislation's requirement for stricter accountability of student achievement. After examining multiple state standards, it became clear that there is extreme variation between the levels of breadth and depth that individual states require of students at the same educational level (supplemental Table 1). Teachers rightly demand a balance between rigorous standards and flexibility, allowing them to establish creative and effective teaching methods. However, the current teaching environment makes it difficult for teachers to include information in the classroom not explicitly included in their state standards and therefore presumably their state content assessments. Thus, as states consider revisions of their next content standards in genetics, they should reevaluate their requirements in light of the body of literature that suggests that neither current standards nor current pedagogical methods being employed for conveying genetics to students are sufficient to produce enduring understanding of the material.
Misconceptions, scientific literacy, and genetic citizenry:
Interestingly, many of the errors observed in the NAEP questions were also observed in our essays, reaffirming the wide-scale deficiency in genetics knowledge of high school students. While the data sets cannot be directly compared, it is of some concern that students in 2007 hold the same misconceptions as students in 2000 despite the rapid pace of advances in genetics technology and knowledge that occurred during that same period.
In genetics, anecdotal evidence from practitioners of high school life science (teacher e-mails and listserv communications) and direct evidence collected through these 2 years of ASHG-sponsored nationwide essay contests suggest that genetics is an area where many high school students harbor multiple misconceptions and significant misinformation. Some of this is likely due the exaggeration of the benefits and risks of genetics research and health information (Looet al. 1998; Moynihanet al. 2000; Ransohoff and Ransohoff 2001). Students are clearly getting significant quantities of information from the Internet (most student essays referenced stories from a variety of different websites, but not from scientific references); students often rely on their teachers for ultimate validation of their information through discussions and grading. Scientists must work proactively with professional science writers to ensure that information in their field is accurately represented in the press. One study compared the text of original scientific articles with news reports about them (Ransohoff and Ransohoff 2001). Interestingly, these authors reported that when “hype” was identified in the popular press, it was the result of the original article and the scientists' own interpretations of their results.
Due to advances in genetic screening, genetic technology, the promise of individual genome sequencing, and other progress in the field of genetic research, it is more important than ever for the public to have a critical understanding of basic genetic information. This understanding will be vital for individuals to be informed advocates for their own health care when it comes to providing consent for testing and treatment as well as for being able to understand and interpret test results accurately. This will become an even greater need as private companies begin to provide genetic tests through mail order such that individuals can test themselves at home without the consultation of a physician (Advisory Committee on Genetic Testing 1998). For patients to understand the tests and results, and their own risk, they must be able to understand the biological underpinning of the tests themselves. Furthermore, as genetic research becomes more firmly embedded in medical practice and care, the public must be able to make informed decisions regarding specific pieces of legislation. Multiple studies, including this one, demonstrate that the current classroom methods for genetics instruction are not developing a citizenry with accurate mental models of inheritance and the genetic basis of disease (Henderson and Maguire 2000).
Similar to our analysis, work from Lewis and Kattmann (2004) reveals that students equate genotype and phenotype. Their work suggests that this is at least in part due to an incomplete understanding of genetic terminology. Other work suggests that acceptance of genetic determinism might negatively influence individual behavior and lifestyle choices. Believing a genetic illness is “something that is inherited that nothing can be done about,” individuals may not heed the advice of clinicians to alter diet or behavior (Henderson and Maguire 2000).
Lack of precision in student writing results in difficulty in differentiating misconception from poor writing skills:
Another significant observation we made after reviewing 2446 student essays is that students need to be instructed in writing with precision. In science, terminology and specialized vocabularies are important and can be problematic for students. Words used in everyday language can carry different meanings in science. Simply, it is clear that students are not being taught to write using technical language and appear to approach their scientific writing in the same way they might as an essay for an English or Social Studies assignment. For example, students often related that people “carry obesity” or that they have the “disease gene.” But, neither of these is a precise description of the biological concept. Perhaps it is reasonable to extract that the first student meant that people “carry alleles that predispose them toward obesity” and the second student meant that a person with a genetic disease inherited a “mutation in a gene that caused a disease.” But these inaccuracies leave us with the perception that students do not understand these intricacies in the language of science. While we can infer that a student understood a topic but was ineffective in the communication of that knowledge, this might be a leap that is ultimately damaging. Investigators have demonstrated that precise language usage appears even more important in scientific fields because it is not merely a vehicle for communicating understanding, but itself actively facilitates learning and comprehension (Connally and Vilardi 1989; Halliday and Martin 1993; Roth and Roychoudhury 1992, 1993; Rivard 1994). It is reasonable to suggest that writing-to-learn strategies that are successful in other disciplines and levels should be considered for inclusion in the high school science curriculum. Unfortunately, some work suggests that the adoption of writing across the curriculum programs—specifically those that engage students in scientific written discourse—is not widely used in the United States, despite success in Australia and the United Kingdom (Keys 1999).
Implications for undergraduate biology education:
Another interesting observation from our work is that individual teacher knowledge, interest, and bias were clearly observed in student essays. Up to six students from a particular teacher could submit an essay to the contest. Frequently, similar themes were seen to run through many of the essays from a shared teacher–sponsor. For example, 3 essays in 2006 from the same teacher noted an interest in studying “gene doping.” Yet, of the 2443 other essays collected over 2 years, not a single other essay mentioned this as a topic. Examples such as this reflect the critical influence individual teachers have on student interest, knowledge, comprehension, and possible misconception. Moreover, it is important to note that teachers were asked to submit their “top” essays. While it is impossible to determine if teachers vetted each essay prior to submission, it is reasonable to assume that many of the essays that were submitted were reviewed. Yet, 55.6% of essays reviewed exhibited a major misconception. This, in combination with the observation that student writing often clearly reflected specific information learned in the classroom, implies that student writing might be indicative of misconceptions held and perpetuated by the teachers.
This conclusion has important implications for instructors of undergraduate biology and genetics courses. Most high school biology educators receive their training in genetics through their undergraduate coursework in biology. Therefore, students are likely entering their undergraduate courses with these misconceptions and leaving with the same misconceptions. Our work should provide undergraduate science educators with the information they need to begin to eliminate the perpetuation of these misconceptions.
Responsibility of scientists for marshaling change:
In addition to scientists recognizing these misconceptions when they direct their classroom agenda at the undergraduate level, this research calls on those practicing genetic research to adopt other changes in their communication about their research. Scientists must model accurate language and terminology usage when communicating to their peers, the press, and the community about their own work and the work of others. Genetics has a precise vocabulary (e.g., it is a mutation in the cystic fibrosis gene, not the “cystic fibrosis gene,” that results in a disease phenotype), and scientists must ensure that misconceptions are not perpetuated through their own misuse of these terms.
Another potential way for scientists to make significant inroads into correcting misconceptions at the K–12 level is to dedicate themselves to spending time in the classroom with teachers and students. Multiple programs offer the opportunity for scientists to mentor classroom teachers and students through either long- or short-term experiences. Indeed, many different scientific disciplinary societies foster such programs. Descriptions and information about these programs can be found at http://hub.mspnet.org/. Scientists can use these opportunities to promote accurate information to students and teachers about the nature of their discipline and scientific research. Unfortunately, this type of work is often viewed as secondary to professors' main responsibilities in their departments, especially in research-oriented departments. New programs must be developed to encourage, promote, and provide infrastructural support to scientists who dedicate themselves to this type of work. Indeed, the National Science Foundation has recently funded one such program currently sponsored by a scientific society (http://www.genednet.org/pages/GENA_about.shtml).
Challenges for change—Is it time to switch the paradigm?
Gregor Mendel's work is clearly among the most important in genetics. However, the relatively simple view of one gene, one trait has yielded generations of students who can predict that “tt” will result in a small plant and “TT” will result in a tall plant. Unfortunately, this monogenic view of the world, while accurate for a small subset of characteristics, is clearly a limited one. While students that have an understanding of genetics consistent with a “Mendelian model” reflect a certain depth of understanding of genetic disease, can describe dominant and recessive patterns, and can grasp the concept of carrier vs. affected status (Henderson and Maguire 2000), the reality is that the nature of most traits and human disease is more complicated than this model. Only a minority of state standards require the coverage of alleles. Even in cases where dihybrid crosses are required per their inclusion in state standards, this still represents only monogenic inheritance of two separate traits (the tall, yellow pea plant vs. the short, green pea plant). But in the case of both traits, a single gene contributes wholly to the observed (height or color) phenotype.
Additionally, the requirement of teaching basic Mendelian genetics likely is a factor contributing to student confusion regarding the deterministic nature of a single gene in phenotype control. For example, multiple students specifically selected human height as a character to explain how genetics is involved in phenotype. One example, also shown above, is “If everyone on both sides of your family is tall, you are going to be tall.” Students take concepts of true-breeding plants (everyone being tall) and extrapolate them to human development. While we teach students that the genetic material is common between plants, animals, and humans, we must be careful to also teach them that genes and phenotypes are often under distinctly different molecular and biochemical controls in various organisms.
Despite multiple studies that have enumerated student misconceptions in genetics, no studies have shown that high school curricula have been altered to address these concerns. Additionally, little specific work has been done to determine the classroom curriculum that will most effectively address misconceptions in genetics. Some of the work currently being done in this area is through a program called the Geneticist–Educator Network of Alliances (GENA), a National Science Foundation-funded Math and Science Partnership program of the ASHG (http://www.genednet.org/pages/GENA_about.shtml). Several groups have performed extensive analyses of genetics curricula for the K–12 classroom. Reviews of genetics curricula can be found at (http://genetics-education-partnership.mbt.washington.edu/rev/revres.html and http://genednet.org/pages/GENA_CCRC.shtml). A number of challenges remain. The first is the question of how to reconcile data from a limited number of research studies that suggest that students do not retain information when taught in a traditional manner, relying on lecture to a prescribed curriculum (Kaufmanet al. 1989). Then, once data are collected, compiled, and compared, how does one use that data to alter curriculum and textbooks to achieve better student understanding? Finally, while a number of individuals suggest that scientific education would benefit from a retooling that includes the requirement for students to learn information as it applies to their lives today and in the future, as well as the ability to evaluate scientific claims and information, this change is impeded by the need for districts, states, and even entire nations to demonstrate scientific content knowledge instead of deep conceptual understanding (Allen and Tanner 2003). Until significant research is performed by scientists and their educator colleagues that demonstrates which methods adequately teach both content and concepts, schools systems are unlikely to change their methods.
The authors thank Jane Nelson, Lauren Lum, Sophia Patel, and Dennis Gilbert for their assistance with the programmatic and public relations aspects related to the DNA Day Essay Contest. The essay contest would not have been possible without the generous support of the American Society of Human Genetics, the Genetics Society of America, and the National Society of Genetic Counselors members that volunteered to read and score essays or the financial support of Applied Biosystems. This material is based upon work supported by the National Science Foundation under grant no. 0634296.
- Advisory Committee on Genetic Testing (Editors), 1998. Genetic testing for late onset disorders. Health Departments of the United Kingdom, London.
- Allen, D., and K. Tanner, 2003. Approaches to cell biology teaching: learning content in context–problem-based learning. Cell. Biol. Educ.2 73–81. [PMC free article][PubMed]
- Connally, P., and T. Vilardi, 1989. Writing to Learn Mathematics and Science. Teachers College Press, New York.
- Gelman, R., and C. R. Gallistel, 1986. The Child's Understanding of Number. Harvard University Press, Cambridge, MA.
- Halliday, M. A. K., and J. R. Martin, 1993. Writing Science: Literacy and Discursive Power. University of Pittsburgh Press, Pittsburgh.
- Henderson, B. J., and B. T. Maguire, 2000. Three lay mental models of disease inheritance. Soc. Sci. Med.50 293–301. [PubMed]
- Kaufman, A., S. Mennin, R. Waterman, S. Duban, C. Hansbarger et al., 1989. The New Mexico experiment: educational innovation and institutional change. Acad. Med.64 285–294. [PubMed]
- Keys, C. W., 1999. Revitalizing instruction in scientific genres: connecting knowledge productions with writing to learn in science. Sci. Educ.83 115–130.
- Lawson, A. E., and L. D. Thompson, 1988. Formal reasoning ability and misconceptions concerning genetics and natural selection. J. Res. Sci. Teach.25 733–746.
- Lewis, J., and U. Kattmann, 2004. Traits, genes, particles and information: re-visiting students understandings of genetics. Int. J. Sci. Educ.26 195–206.
- Lewis, J., and C. Wood-Robinson, 2000. Genes, chromosomes, cell division and inheritance: Do students see any relationship? Int. J. Sci. Educ.22 177–195.
- Loo, L. K., J. M. Byrne, S. B. Hardin, D. Castro and F. P. Fisher, 1998. Reporting medical information: Does the lay press get it right? J. Gen. Intern. Med.13(Suppl. 1):60.
- Moynihan, R., L. Bero, D. Ross-Degnan, D. Henry, K. Lee et al., 2000. Coverage by the news media of the benefits and risks of medications. N. Engl. J. Med.342 1645–1650. [PubMed]
- O'Sullivan, C. Y., M. A. Lauko, W. S. Grigg, J. Qian and J. Zhang, 2003. The nation's report card: Science 2000. National Center for Education Statistics. http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2003453.
- Posner, G. J., K. A. Strike, P. W. Hewson and W. Gertzog, 1982. Accommodation of a scientific conception: toward a theory of conceptual change. Sci. Educ.66 211–227.
- Ransohoff, D., and R. Ransohoff, 2001. Sensationalism in the media: when scientists and journalists may be complicit collaborators. Eff. Clin. Pract.4 185–188. [PubMed]
- Rivard, L. P., 1994. A review of writing to learn in science: implications for practice and research. J. Res. Sci. Teach.31 969–983.
- Roth, W., and A. Roychoudhury, 1992. The social construction of scientific concepts or the concept map as conscription device and tool for social thinking in high school science. Sci. Educ.76 531–557.
- Roth, W., and A. Roychoudhury, 1993. The development of science process skills in authentic contexts. J. Res. Sci. Teach.30 127–152.
- Webby, R., D. Perez, J. Coleman, Y. Guan, J. Knight et al., 2004. Responsiveness to a pandemic alert: use of reverse genetics for rapid development of influenza vaccines. Lancet363 1099–1103. [PubMed]
- Wellman, H. M., 1990. The Child's Theory of Mind. MIT Press, Cambridge, MA.
Although there are many possible causes of human disease, family history is often one of the strongest risk factors for common disease complexes such as cancer, cardiovascular disease (CVD), diabetes, autoimmune disorders, and psychiatric illnesses. A person inherits a complete set of genes from each parent, as well as a vast array of cultural and socioeconomic experiences from his/her family. Family history is thought to be a good predictor of an individual’s disease risk because family members most closely represent the unique genomic and environmental interactions that an individual experiences (Kardia et al., 2003). Inherited genetic variation within families clearly contributes both directly and indirectly to the pathogenesis of disease. This chapter focuses on what is known or theorized about the direct link between genes and health and what still must be explored in order to understand the environmental interactions and relative roles among genes that contribute to health and illness.
For more than 100 years, human geneticists have been studying how variations in genes contribute to variations in disease risk. These studies have taken two approaches. The first approach focuses on identifying the individual genes with variations that give rise to simple Mendelian patterns of disease inheritance (e.g., autosomal dominant, autosomal recessive, and X-linked) (see Table 3-1; Mendelian Inheritance in Man). The second approach seeks to understand the genetic susceptibility to disease as the con sequence of the joint effects of many genes. Each of these approaches will be discussed below.
Online Mendelian Inheritance in Man (OMIM) Statistics (as of May 15, 2006), Number of Entries.
In general, diseases with simple Mendelian patterns of inheritance tend to be relatively uncommon or frequently rare, with early ages of onset, such as phenylketonuria, sickle cell anemia, Tay-Sachs disease, and cystic fibrosis. In addition, some of these genes have been associated with extreme forms of common diseases, such as familial hypercholesterolemia, which is caused by mutations in the low-density lipoprotein (LDL) receptor that predispose individuals to early onset of heart disease (Brown and Goldstein, 1981).
Another example of Mendelian inheritance is familial forms of breast cancer associated with mutations in the BRCA1 and BRCA2 genes that predispose women to early onset breast cancer and often ovarian cancer. The genes identified have mutations that often are highly penetrant—that is, the probability of developing the disease in someone carrying the disease susceptibility genotype is relatively high (greater than 50 percent). These genetic diseases often exhibit a genetic phenomenon known as allelic heterogeneity, in which multiple mutations within the same gene (i.e., alleles) are found to be associated with the same disease. This allelic heterogeneity often is population specific and can represent the unique demographic and mutational history of the population.
In some cases, genetic diseases also are associated with locus heterogeneity, meaning that a deleterious mutation in any one of several genes can give rise to an increased risk of the disease. This is a finding common to many human diseases including Alzheimer’s disease and polycystic kidney disease. Both allelic heterogeneity and locus heterogeneity are sources of variation in these disease phenotypes since they can have varying effects on the disease initiation, progression, and clinical severity.
Environmental factors also vary across individuals and the combined effect of environmental and genetic heterogeneity is etiologic heterogeneity. Etiologic heterogeneity refers to a phenomenon that occurs in the general population when multiple groups of disease cases, such as breast cancer clusters, exhibit similar clinical features, but are in fact the result of differing events or exposures. Insight into the etiology of specific diseases as well as identification of possible causative agents is facilitated by discovery and examination of disease cases demonstrating etiologic heterogeneity. The results of these studies may also highlight possible gene-gene interactions and gene-environment interactions important in the disease process. Identifying etiologic heterogeneity can be an important step toward analysis of diseases using molecular epidemiology techniques and may eventually lead to improved disease prevention strategies (Rebbeck et al., 1997).
As opposed to the Mendelian approach, the second approach to studying how variations in genes contribute to variations in disease risk focuses on understanding the genetic susceptibility to diseases as the consequence of the joint effects of many genes, each with small to moderate effects (i.e., polygenic models of disease) and often interacting among themselves and with the environment to give rise to the distribution of disease risk seen in a population (i.e., multifactorial models of disease). This approach has been used primarily for understanding the genetics of birth defects and common diseases and their risk factors. As described below, several steps are involved in developing such an understanding.
As a first step, study participants are asked to provide a detailed family history to assess the presence of familial aggregation. If individuals with the disease in question have more relatives affected by the disease than individuals without the disease, familial aggregation is identified. While familial aggregation may be accounted for through genetic etiology, it may also represent an exposure (e.g., pesticides, contaminated drinking water, or diet) common to all family members due to the likelihood of shared environment.
When there is evidence of familial aggregation, the second step is to focus research studies on estimating the heritability of the disease and/or its risk factors. Heritability is defined as the proportion of variation in disease risk in a population that is attributable to unmeasured genetic variations inferred through familial patterns of disease. It is a broad population-based measure of genetic influence that is used to determine whether further genetic studies are warranted, since it allows investigators to test the overarching null hypothesis that no genes are involved in determining disease risk. Twin studies and family studies are frequently used in the study of heritability.
Twin studies comparing the disease and risk factor variability of monozygotic and dizygotic twins have been a common study design used to easily estimate both genetic and cultural inheritance. Studies of monozygotic twins reared together versus those reared apart also have been important in estimating both genetic and environmental contributions to patterns of inheritance. The modeling of the sources of phenotypic variation using family studies has become quite sophisticated, allowing the inclusion of model parameters to represent the additive genetic component (i.e., polygenes), the nonadditive genetic component (i.e., genetic dominance, as well as gene-environment and gene-gene interactions), shared family environment, and individual environments. The contributions of these factors have been shown to vary by age and population.
When significant evidence of genetic involvement is established, the next step is to identify the responsible genes and the mutations that are associated with increased or decreased risk, using either genetic linkage analysis or genetic association studies. For example, in the study of birth defects, this often involves the search for chromosomal deletions, insertions, duplications, or translocations.
GENETIC LINKAGE ANALYSIS AND GENETIC ASSOCIATION STUDIES
The human genome is made up of tens of thousands of genes. With approximately 30,000 genes to choose from, assigning a specific gene or group of genes to a corresponding human disease demands a methodical approach consisting of many steps. Traditionally, the process of gene discovery begins with a linkage analysis that assesses disease within families. Linkage analyses are typically followed by genetic association studies that assess disease across families or across unrelated individuals.
Genetic Linkage Analysis
The term linkage refers to the tendency of genes proximally located on the same chromosome to be inherited together. Linkage analysis is one step in the search for a disease susceptibility gene. The goal of this analysis is to approximate the location of the disease gene in relation to a known genetic marker, applying an understanding of the patterns of linkage. Traditional linkage analysis that traces patterns of heredity of both the disease phenotype and genetic markers in large, high-risk families have been used to locate disease-causing gene mutations such as the breast cancer gene (BRCA1) on chromosome 17 (Hall et al., 1990).
Because the mode of inheritance is often not clear for common diseases, an alternative approach to classic linkage analysis was developed to capitalize on the basic genetic principle that siblings share half of their alleles on average. By investigating the degree of allelic sharing across their genomes, pairs of affected siblings (i.e., two or more siblings with the same disease) can be used to identify chromosomal regions that may contain genes whose variations are related to the disease being studied. If numerous sibling pairs affected by the disease of interest exhibit a greater than expected sharing of the known alleles of the polymorphic genetic marker being used, then the genetic marker is likely to be linked (that is, within close proximity along the chromosome) to the susceptibility gene responsible for the disease being studied. To find chromosomal regions that show evidence for linkage using this affected sibling pair method typically requires typing numerous affected sibships with hundreds of highly polymorphic markers uniformly positioned along the human genome (Mathew, 2001).
This approach has been widely used to identify regions of the genome thought to contribute to common chronic diseases. However, results of linkage analyses have not been consistently replicated. The inability to successfully replicate linkage findings may be a result of insufficient statistical power (that is, including an inadequate number of sibling pairs with the disease of interest) or results that included false positives in the original study. An alternate explanation could be that different populations are affected by different susceptibility genes than those that were studied originally (Mathew, 2001). Without consistent replication of results it is premature to draw conclusions about the contribution of a gene locus to a specific disease.
Upon the confirmation of a linkage, researchers can begin to search the region for the candidate susceptibility gene. The search for a single susceptibility gene for common diseases often involves examination of very large linkage regions, containing 20 to 30 million base pairs and potentially hundreds of genes (Mathew, 2001). It is also important to note, however, that while linkage mapping is a powerful tool for finding Mendelian disease genes, it often produces weak and sometimes inconsistent signals in studies of complex diseases that may be multifactorial. Linkage studies perform best when there is a single susceptibility allele at any given disease locus and generally performs poorly when there is substantial genetic heterogeneity.
Genetic Association Studies
Technological advances in high-throughput genotyping have allowed the direct examination of specific genetic differences among sizable numbers of people. Genetic association techniques are often the most efficient approach for assessing how specific genetic variation can affect disease risk. Genetic association studies, which have been used for decades, have perpetually progressed in terms of the development of new study designs (such as case-only and family-based association designs), new genotyping systems (such as array-based genotyping and multiplexing assays), and new methods used for addressing biases such as population (Haines and Pericak-Vance, 1998).
Analysis of the effects of genetic variation typically involves first the discovery of single nucleotide polymorphisms (SNPs)1 and then the analysis of these variations in samples from populations. SNPs occur on average approximately every 500 to 2,000 bases in the human genome. The most common approach to SNP discovery is to sequence the gene of interest in a representative sample of individuals. Currently, sequencing of entire genes on small numbers of individuals (~25 to 50) can detect polymorphisms occurring in 1 to 3 percent of the population with approximately 95 percent confidence. The Human DNA Polymorphism Discovery Program of the National Institute of Environmental Health Sciences’ Environmental Genome Project is one example of the application of automated DNA sequencing technologies to identify SNPs in human genes that may be associated with disease susceptibility and response to environment (Livingston et al., 2004). The National Heart, Lung, and Blood Institute’s Programs in Genomic Applications also has led to important increases in our knowledge about the distribution of SNPs in key genes thought to be already biologically implicated in disease risk (i.e., biological candidate genes2).
Impressive and rapid advances in SNP analysis technology are rapidly redefining the scope of SNP discovery, mapping, and genotyping. New array-based genotyping technology enables “whole genome association” analyses of SNPs between individuals or between strains of laboratory animal species (Syvanen, 2005). Arrays used for these analyses can represent hundreds of thousands of SNPs mapped across a genome (Klein et al., 2005; Hinds et al., 2005; Gunderson et al., 2005). This approach allows rapid identification of SNPs associated with disease and susceptibility to environmental factors. The strength of this technology is the massive amount of easily measurable genetic variation it puts in the hands of researchers in a cost-effective manner ($500 to $1,000 per chip). The criteria for the selection of SNPs to be included on these arrays are a critical consideration, since they affect the inferences that can be drawn from using these platforms. Of course, the ultimate tool for SNP discovery and genotyping is individual whole genome sequencing. Although not currently feasible, the rapid advancement of technology now being stimulated by the National Human Genome Research Institute’s “$1,000 genome” project likely will make this approach the optimal one for SNP discovery and genotyping in the future.
With the ability to examine large quantities of genetic variations, researchers are moving from investigations of single genes, one at a time, to consideration of entire pathways or physiological systems that include information from genomic, transcriptomic, proteomic, and metabonomic levels that are all subject to different environmental factors (Haines and Pericak-Vance, 1998). However, these genome- and pathway-driven study designs and analytic techniques are still in the early stages of development and will require the joint efforts of multiple disciplines, ranging from molecular biologists to clinicians to social scientists to bioinformaticians, in order to make the most effective use of these vast amounts of data.
GENE-ENVIRONMENT AND GENE-GENE INTERACTIONS
The study of gene-environment and gene-gene interactions represents a broad class of genetic association studies focused on understanding how human genetic variability is associated with differential responses to environmental exposures and with differential effects depending on variations in other genes. To illustrate the concept of gene-environment interactions, recent studies that identify genetic mutations that appear to be associated with differential response to cigarette smoke and its association with lung cancer are reviewed below. Tobacco smoke contains a broad array of chemical carcinogens that may cause DNA damage. There are several DNA repair pathways that operate to repair this damage, and the genes within this pathway are prime biological candidates for understanding why some smokers develop lung cancers but others do not. In a study by Zhou et al. (2003), variations in two genes responsible for DNA repair were examined for their potential interaction with the level of cigarette smoking and concomitant association with lung cancer. Briefly, one putatively functional mutation in the XRCC1 (X-ray cross-complementing group 1) gene and two putatively functional mutations in the ERCC2 (excision repair cross-complementing group 2) gene were genotyped in 1,091 lung cancer cases and 1,240 controls. When the cases and controls were stratified into heavy smokers versus nonsmokers, Zhou et al. (2003) found that nonsmokers with the mutant XRCCI genotype had a 2.4 times greater risk of lung cancer than nonsmokers with the normal genotype. In contrast, heavy smokers with the mutant XRCCI genotype had a 50 percent reduction in lung cancer risk compared to their counterparts with the more frequent normal genotype. When the three mutations from these two genes were examined together in the extreme genotype combination (individual with five or six mutations present in his/her genotype) there was a 5.2 time greater risk of lung cancer in nonsmokers and a 70 percent reduction of risk in the heavy smokers compared to individuals with no mutations. The protective effect of these genetic variations in heavy smokers may be caused by the differential increase in the activity of these protective genes stimulated by heavy smoking. Similar types of gene-smoking interactions also have been found for other genes in this pathway, such as ERCC1. These studies illustrate the importance of identifying the genetic variations that are associated with the differential risk of disease related to human behaviors. Note that this type of research also raises many different kinds of ethical and social issues, since it identifies susceptible subgroups and protected subgroups of subjects by both genetic and human behavior strata (see Chapter 10).
The study by Zhou et al. (2003) also demonstrates the increased information provided by jointly examining the effects of multiple mutations on toxicity-related disease. Other studies of mutations in genes involved in the Phase II metabolism (GSTM1, GSTT1, GSTP1) also have demonstrated the importance of investigating the joint effects of mutations (Miller et al., 2002) on cancer risk. Although these two studies focused on the additive effects of multiple genes, gene-gene interactions are another important component to develop a better understanding of human susceptibility to disease and to interactions with the environment.
To adequately understand the continuum of genomic susceptibility to environmental agents that influences the public’s health, more studies of the joint effects of multiple mutations need to be conducted. Advances in bioinformatics can play a key role in this endeavor. For example, methods to screen SNP databases for mutations in transcriptional regulatory regions can be used for both discovery and functional validation of polymorphic regulatory elements, such as the antioxidant regulatory element found in the promoter regions of many genes encoding antioxidative and Phase II detoxification enzymes (Wang et al., 2005). Comparative sequence analysis methods also are becoming increasingly valuable to human genetic studies, because they provide a means to rank order SNPs in terms of their potential deleterious effects on protein function or gene regulation (Wang et al., 2004). Methods of performing large-scale analysis of nonsynonymous SNPs to predict whether a particular mutation impairs protein function (Clifford et al., 2004) can help in SNP selection for genetic epidemiological studies and can be used to streamline functional analysis of mutations that are found to be statistically associated with differential response to environmental factors such as diet, stress, and socioeconomic factors.
MECHANISMS OF GENE EXPRESSION
Identifying genes whose variations are associated with disease is just the first step in linking genetics and health. Understanding the mechanisms by which the gene is expressed and how it is influenced by other genes, proteins, and the environment is becoming increasingly important to the development of preventive, diagnostic, and therapeutic strategies.
When genes are expressed, the chromosomal DNA must be transcribed into RNA and the RNA is then processed and transported to be translated into protein. Regulating the expression of genes is a vital process in the cell and involves the organization of the chromosomal DNA into an appropriate higher-order chromatin structure. It also involves the action of a host of specific protein factors (to either encourage or suppress gene expression), which can act at different steps in the gene expression pathway.
In all organisms, networks of biochemical reactions and feedback signals organize developmental pathways, cellular metabolism, and progression through the cell cycle. Overall coordination of the cell cycle and cellular metabolism results from feed-forward and feedback controls arising from sets of dependent pathways in which the initiation of events is dependent on earlier events. Within these networks, gene expression is controlled by molecular signals that regulate when, where, and how often a given gene is transcribed. These signals often are stimulated by environmental influences or by signals from other cells that affect the gene expression of many genes through a single regulatory pathway. Since a regulatory gene can act in combination with other signals to control many other genes, complex branching networks of interactions are possible (McAdams and Arkin, 1997).
Gene regulation is critical because by switching genes on or off when needed, cells can be responsive to changes in environment (e.g., changes in diet or activity) and can prevent resources from being wasted. Variation in the DNA sequences associated with the regulation of a gene’s expression are therefore likely candidates for understanding gene-environment interactions at the molecular level, since these variations will affect whether an environmental signal transduced to the nucleus will successfully bind to the promoter sequence in the gene and stimulate or repress gene expression. Combining genomic technologies for SNP genotyping with high-density gene expression arrays in human studies has only recently elucidated the extent to which this type of molecular gene-environment interaction may be occurring.
Cells also regulate gene expression by post-transcriptional modification; by allowing only a subset of the mRNAs to go on to translation; or by restricting translation of specific mRNAs to only when and where the product is needed. The genetic factors that influence post-transcriptional control are much more difficult to study because they often involve multiprotein complexes not easily retrieved or assayed from cells. At other levels, cells regulate gene expression through epigenetic mechanisms, including DNA folding, histone acetylation, and methylation (i.e., chemical modification) of the nucleotide bases. These mechanisms are likely to be influenced by genetic variations in the target genes as well as variations manifested in translated cellular regulatory proteins. Gene regulation occurs throughout life at all levels of organismal development and aging.
A classic example of developmental control of gene expression is the differential expression of embryonic, fetal, and adult hemoglobin genes (see Box 3-1). The regulation of the epsilon, delta, gamma, alpha, and beta genes occurs through DNA methylation that is tightly controlled through developmental signals. During development a large number of genes are turned on and off through epigenetic regulation. One of the fastest growing fields in genetics is the study of the developmental consequences of environmental exposures on gene expression patterns and the impact of genetic variations on these developmental trajectories.
Gene Expression and Globin. The production of hemoglobin is regulated by a number of transcriptional controls, such as switching, that dictate the expression of a different set of globin genes in different parts of the body throughout the various stages (more...)
An Example of a Single-Gene Disorder with Significant Clinical Variability: Sickle Cell Disease3
Sickle cell disease refers to an autosomal recessive blood disorder caused by a variant of the β-globin gene called sickle hemoglobin (Hb S). A single nucleotide substitution (T→A) in the sixth codon of the β-globin gene results in the substitution of valine for glutamic acid (GTG→GAG), which can cause Hb S to polymerize (form long chains) when deoxygenated (Stuart and Nagel, 2004). An individual inheriting two copies of Hb S (Hb SS) is considered to have sickle cell anemia, while an individual inheriting one copy of Hb S plus another deleterious β-globin variant (e.g., Hb C or Hb β-thalassemia) is considered to have sickle cell disease. An individual is considered to be a carrier of the sickle cell trait if he/she has one copy of the normal β-globin gene and one copy of the sickle variant (Hb AS) (Ashley-Koch et al., 2000).
Four major β-globin gene haplotypes have been identified. Three are named for the regions in Africa where the mutations first appeared: BEN (Benin), SEN (Senegal), and CAR (Central African Republic). The fourth haplotype, Arabic-India, occurs in India and the Arabic peninsula (Quinn and Miller, 2004).
Disease severity is associated with several genetic factors (Ashley-Koch et al., 2000). The highest degree of severity is associated with Hb SS, followed by Hb s/β0-thalassemia, and Hb SC. Hb S/β+-thalassemia is associated with a more benign course of the disease (Ashley-Koch et al., 2000). Disease severity also is related to β-globin haplotypes, probably due to variations in hemoglobin level and fetal hemoglobin concentrations. The Senegal haplotype is the most benign form, followed by the Benin, and the Central African Republic haplotype is the most severe form (Ashley-Koch et al., 2000).
Thus, although sickle cell disease is a monogenetic disorder, its phenotypic expression is multigenic (see Appendix D). There are two cardinal pathophysiologic features of sickle cell disease—chronic hemolytic anemia and vasoocclusion. Two primary consequences of hypoxia secondary to vasoocclusive crisis are pain and damage to organ systems. The organs at greatest risk are those in which blood flow is slow, such as the spleen and bone marrow, or those that have a limited terminal arterial blood supply, including the eye, the head of the femur and the humerus, and the lung as the recipient of deoxygenated sickle cells that escape the spleen or bone marrow. Major clinical manifestations of sickle cell disease include painful events, acute chest syndrome, splenic dysfunction, and cerebrovascular accidents.
Efforts to enhance clinical care are focusing on increasing our understanding of the pathophysiology of sickle cell disease in order to facilitate a precise prognosis and individualized treatment. Required is knowledge about which genes are associated with the hemolytic and vascular complications of sickle cell disease and how variants of these genes interact among themselves and with their environment (Steinberg, 2005).
ASPECTS OF HEALTH INFLUENCED BY GENETICS
Because every cell in the body, with rare exception, carries an entire genome full of variation as the template for the development of its protein machinery, it can be argued that genetic variation impacts all cellular, biochemical, physiological, and morphological aspects of a human being. How that genetic variation is associated with particular disease risk is the focus of much current research. For common diseases such as CVD, hypertension, cancer, diabetes, and many mental illnesses, there is a growing appreciation that different genes and different genetic variations can be involved in different aspects of their natural history. For example, there are likely to be genes whose variations are associated with a predisposition toward the initiation of disease and other genes or gene variations that are involved in the progression of a disease to a clinically defined endpoint. Furthermore, an entirely different set of genes may be involved in how an individual responds to pharmaceutical treatments for that disease. There also are likely to be genes whose variability controls how much or how little a person is likely to be responsive to the environmental risk factors that are associated with disease risk. Finally, there are thought to be genes that affect a person’s overall longevity that may counteract or interact with genes that may otherwise predispose that person to a particular disease outcome and thus may have an additional impact on survivorship.
In many ways, we are only at the beginning the process of developing a true understanding of how genomic variations give rise to disease susceptibility. Indeed many would argue that, without incorporating the equally important role of the environment, we will never fully understand the role of genetics in health. As progress is made through utilizing the new technologies for measuring biological variation in the genome, transcriptome, proteome, and metabonome, we are likely to have to make large shifts in our conceptual frameworks about the roles of genes in disease. Global patterns of genomic susceptibility are likely to emerge only when we consider the influence of the many interacting components working simultaneously that are dependent on contexts such as age, sex, diet, and physical activity that modify the relationship with risk. For the most part, we are still at the stage of documenting the complexity, finding examples and types of genetic susceptibility genes, understanding disease heterogeneity, and postulating ways to develop models of risk that use the totality of what we know about human biology, from our genomes to our ecologies to model risk.
Cardiovascular Disease (CVD)
The study of CVD can be used to illustrate the issues that are encountered in using genetic information in order to understand the etiology of the most common chronic diseases as well as in identifying those at highest risk of developing these diseases. The majority of CVD cases have a complex multifactorial etiology, and even full knowledge of an individual’s genetic makeup cannot predict with certainty the onset, progression, or severity of disease (Sing et al., 2003). Disease develops as a consequence of interactions between a person’s genotype and exposures to environmental agents, which influence cardiovascular phenotypes beginning at conception and continuing throughout adulthood. CVD research has found many high-risk environmental agents and hundreds of genes, each with many variations that are thought to influence disease risk. As the number of interacting agents involved increases, a smaller number of cases of disease will be found to have the same etiology and be associated with a particular genotype (Sing et al., 2003). The many feedback mechanisms and interactions of agents from the genome through intermediate biochemical and physiological subsystems with exposure to environmental agents contribute to the emergence of a given individual’s clinical phenotype. In attempting to sort out the relative contributions of genes and environment to CVD, a large array of factors must be considered, from the influence of genes on cholesterol (e.g., LDL levels) to psychosocial factors such as stress and anger. Although hundreds of genes have been implicated in the initiation, progression, and clinical manifestation of CVD, relatively little is known about how a person’s environment interacts with these genes to tip the balance between the atherogenic and anti-atherogenic processes that result in clinically manifested CVD. Please see Chapters 4 and 6 for further discussion of effects of social environment on CVD.
It is well known that many social and behavioral factors ranging from socioeconomic status, job stress, and depression, to smoking, exercise, and diet affect cardiovascular disease risk (see Chapters 2, 3, and 6 for more detailed discussion of these factors). As more studies of gene-environment interaction consider these factors as part of the “environment,” which are examined in conjunction with genetic variations, multiple intellectual and methodological challenges arise. First, how are the social factors embodied such that an interaction with a particular genotype can be associated with differential risk? Second, how can we handle complex interactions to address questions, such as how does an individual’s genotype influence his/her behavior? For example, one’s genetic susceptibility to nicotine addiction is actually a risk factor for CVD and its effect on CVD risk may be contingent on interactions with other genetic factors.
It has been well established that individuals often respond differently to the same drug therapy. The drug disposition process is a complex set of physiological reactions that begin immediately upon administration. The drug is absorbed and distributed to the targeted areas of the body where it interacts with cellular components, such as receptors and enzymes, that further metabolize the drug, and ultimately the drug is excreted from the body (Weinshilboum, 2003). At any point during this process, genetic variation may alter the therapeutic response of an individual and cause an adverse drug reaction (ADR) (Evans and McLeod, 2003). It has been estimated that 20 to 95 percent of variations in drug disposition, such as ADRs, can be attributed to genetic variation (Kalow et al., 1998; Evans and McLeod, 2003).
Sensitivity to both dose-dependent and dose-independent ADRs can have roots in genetic variation. Polymorphisms in kinetic and dynamic factors, such as cytochrome P450 and specific drug targets can cause these individuals susceptibilities to ADRs. While the characteristics of the ADR dictate the true significance of these factors, in most cases, multiple genes are involved (Pirmohamed and Park, 2001). Future analyses using genome-wide SNP profiling could provide a technique for assessing several genetic susceptibility factors for ADRs and ascertaining their joint effects. One of the challenges to the study of the relationship between genetic variation and ADRs is an inadequate number of patient samples. To remedy this problem, Pirmohamed and Park (2001) have proposed that prospective randomized controlled clinical trials become a part of standardized practice to ultimately prove the clinical utility of genotyping all patients as a measure to prevent ADRs.
Here we review some of the current work in pharmacogenetics as an example of what might be expected to arise from rigorous study of the interaction between social, behavioral, and genetic factors. Researchers have provided a few well-established examples of differences in individual drug response that have been ascribed to genetic variations in a variety of cellular drug disposition machinery, such as drug transporters or enzymes responsible for drug metabolism (Evans and McLeod, 2003). For example:
With the knowledge that the HER2 gene is overexpressed in approximately one fourth of breast cancer cases, researchers developed a humanized monoclonal antibody against the HER2 receptor in hopes of inhibiting the tumor growth associated with the receptor. Genotyping advanced breast cancer patients to identify those with tumors that overexpress the HER2 receptor has produced promising results in improving the clinical outcomes for these breast cancer patients (Cobleigh et al., 1999).
A therapeutic class of drugs called thiopurines is used as part of the treatment regimen for childhood acute lymphoblastic leukemia. One in 300 Caucasians has a genetic variation that results in low or nonexistent levels of thiopurine methyltransferase (TPMT), an enzyme that is responsible for the metabolism of the thiopurine drugs. If patients with this genetic variation are given thiopurines, the drug accumulates to toxic levels in their body causing life-threatening myelosuppression. Assessing the TPMT phenotype and genotype of the patient can be used to determine the individualized dosage of the drug (Armstrong et al., 2004).
The family of liver enzymes called cytochrome P450s plays a major role in the metabolism of as many as 40 different types of drugs. Genetic variants in these enzymes may diminish their ability to effectively break down certain drugs, thus creating the potential for overdose in patients with less active or inactive forms of the cytochrome P450 enzyme. Varying levels of reduced cytochrome P450 activity is also a concern for patients taking multiple drugs that may interact if they are not properly metabolized by well-functioning enzymes. Strategies to evaluate the activity level of cytochrome P450 enzymes have been devised and are valuable in planning and monitoring successful drug therapy. Some pharmaceutical drug trials are now incorporating early tests that evaluate the ability of differing forms of cytochrome P450 to metabolize the new drug compound (Obach et al., 2006).
Some pharmacogenetics research has focused on the treatment of psychiatric disorders. With the introduction of a class of drugs known as selective serotonin re-uptake inhibitors (SSRIs), pharmacological treatment of many psychiatric disorders changed drastically. SSRIs offer significant improvements over the previous generation of treatments, including improved efficacy and tolerance for many patients. However, not all patients respond positively to SSRI treatment and many experience ADRs. New pharmacogenetic studies have indicated that these ADRs may be the result of genetic variations in serotonin transporter genes and cytochrome P450 genes. Further study and replication of these findings are necessary. If the characterization of the genetic variations is completed and is fully understood it would be possible to screen and monitor patients using genotyping techniques to create individualized drug therapies similar to those discussed above (Mancama and Kerwin, 2003).
A significant challenge to the development of individualized drug therapies is the often polygenic or multifactorial inherited component of drug responses. Isolating the polygenic determinants of the drug responses is a sizable task. A good understanding of the drug’s mechanism of action and metabolic and disposition pathways should be the basis of all investigations. This knowledge can aid in directing genome-wide searches for gene variations associated with drug effects and subsequent candidate-gene approaches of investigation. Additionally, proteomic and gene-expression profiling studies are also important ways to substantiate and understand the pathways by which the gene of interest operates to affect the individual’s response to the drug (Evans and McLeod, 2003). It is not enough to show an association; characterization of the underlying biological mechanisms is an essential component of moving genetic findings into the area of risk reduction. Another key component of utilizing genetics to improve prevention and reduce disease is an understanding of the distribution of the genetic variations in the populations being served.
GENETICS OF POPULATONS AS RELATED TO HEALTH AND DISEASE
Human populations differ in their distribution of genetic variations. This is a consequence of their historical patterns of mutation, migration, reproduction, mating, selection, and genetic drift. Inherited mutations typically occur during gametogenesis within a single individual and then can be passed on to offspring for many generations. Whether that mutation goes on to become a prevalent polymorphism (i.e., a mutation with a population frequency of greater than 1 percent) is determined by both evolutionary forces and chance events. For example, it depends on whether the original child who inherited the mutation survives to adulthood and reproduces and whether that child’s children survive to reproduce, and so on. The number of children in a family also influences the prevalence of the mutation, and this is often tied to environmental factors that impact fertility and mating patterns that influence the speed with which a private mutation becomes a public polymorphism. There are well-known examples of what are called founder mutations in which this trajectory can be documented. For example, one particular district in what is Quebec (Canada) today was originally founded by only a few families from a particular French province. One of the founding fathers carried a 10kb deletion in his LDL receptor (LDL-R) gene that was passed down through the generations quickly and today is carried by 1 in 154 French Canadians in northeastern Quebec. This mutation is associated with familial hypercholesterolemia, and French Ca nadians have one of the highest prevalences of this disease in the world because of the small founding populations followed by population expansion (Moorjani et al., 1989).
There are also a number of examples where mutations that arise in an individual become more prevalent because of the selective advantage they impart on their carriers. The best known example is the mutation associated with sickle cell anemia. The geographical pattern of this mutation strongly mirrors the geographical pattern of malarial infection. It has been molecularly demonstrated that individuals carrying the sickle cell mutation have a resistance to malarial infection. Because many of the selection pressures that may have given rise to the current distribution of mutations in particular populations are in our evolutionary past, it is difficult to assess how much variation within or among populations is due to these types of selection forces.
Another major force in determining the distribution of genetic variations within and among human populations is their migration and reproductive isolation. According to our best knowledge, one of the most important periods in human evolution occurred approximately 100,000 years ago, when some humans migrated to other continents from the African basin and established new communities with relative reproductive isolation. Genetic differences among people in different geographical areas have been associated with the concept of race for hundreds of years. Although race is still used as a label, the original concept of race as genetically distinct subspecies of humans has been rejected through modern genetic information. For numerous reasons, discussed in the section below, it is more appropriate to reconceptualize the old genetics of race into a more accurate genetics of ancestry.
In addition to distant evolutionary patterns of migration, more modern migration patterns also have had a profound effect on the genetics of populations. For example, the current population of the United States and much of North America is very diverse genetically as a consequence of the mixing of many people from many different countries and continents.
A central reason for studying the origins and nature of human genetic variation is that the similarities and differences in the type and frequencies of genetic variations within and among populations can have a profound impact on studies that attempt to understand the influence of genes on disease risk. For example, some genetic variations, such as the apolipoprotein E protein polymorphisms, are found in every population and have very similar genotype frequencies around the world (Wu et al., 2002; Deniz Naranjo et al., 2004). The variation’s association with increased heart disease and Alzheimer’s disease could be and has been tested in many of the world’s populations. Other mutations such as the 10kb deletion in the LDL-R gene described above are more population-specific variations.
Furthermore, from a statistical point of view, the effect of a genetic variation on the continuum of risk found in any population is correlated with its frequency. For example, common genetic polymorphisms with frequencies near 50 percent cannot be associated with large phenotypic effects within a population because the genotype classes each represent a large fraction of the population and, since most risk is normally distributed, the average risk for a highly prevalent genotype class cannot deviate from the overall risk of the population to any large degree. This correlation between genotype frequency and effect does not mean that common variations cannot be significant in their effects. The statistical significance of an association between a genetic variant and a disease is a joint function of sample size and the size of the effect. In addition, genetic research among populations that differ in their genotype frequencies can differ in their inferences about which polymorphisms have significant effects even if the absolute phenotypic effect is the same. See Cheverud and Routman (1995) for a more formal statistical explanation of this phenomenon and its impact on assessing gene-gene interactions.
Another key consideration in understanding the relationship between genetic variations and measures of disease risk is the population differences in the correlations between genotype frequencies at different SNP locations. There are two common reasons why the frequency of an allele or genotype at a particular SNP could be correlated with the frequency of an allele or genotype for a different SNP. First, a phenomenon known as linkage disequilibrium creates correlations among SNPs as a consequence of the mutation’s history. When mutations arise, they occur on a particular genetic background, which creates a correlation with the other SNPs on the chromosome. Second, the mixing of populations known as admixture that occurs typically through migration means that SNPs with population-specific frequencies will be correlated in a larger mixed sample. In this case, population stratification is the cause of the correlation, and there has been much genetic epidemiological research on this phenomenon and how to control for it. Population stratification is thought to be a possible source of spurious genetic associations with disease (see Box 3-2).
Population Stratification (Confounding). When the risk of disease varies between two ethnic groups, any genetic or environmental factor that also varies between the groups will appear to be related to disease. This phenomenon is called “population (more...)
In large part, the twentieth century was dominated by studies of human health and disease that focused on identifying single genetic and environmental agents that could explain variation in disease susceptibility. This new century has been characterized by huge advances in our understanding of Mendelian disorders with severe clinical outcomes. However, the Men delian paradigm has failed to elucidate the genetic contribution to susceptibility to most common chronic diseases, which researchers know have a substantial genetic component because of their familial aggregation and studies that demonstrate significant heritabilities for these diseases. Likewise, environmental and social epidemiological studies have been wildly successful in illuminating the role of many environmental factors such as diet, exercise, and stress on disease risk. However, these environmental factors still do not, by themselves, fully explain the variance in the prevalence of several diseases in different populations. Researchers are only now beginning to study in earnest the potential interactions between the genetic and environmental factors that are likely to be contributing to a large fraction of disease in most populations. There is much that can be done to incorporate measures of social environment into genetic studies and to also incorporate genetic measures into social epidemiological studies.
Over the last two decades, progress in identifying specific genes and mutations that explain genetic susceptibility to common conditions has been relatively slow, for a variety of reasons. First, the diseases being studied tend to be complex in their etiology, meaning that different people in a population will develop disease for different genetic and/or environmental reasons. Any single genetic or environmental factor is expected to explain only a very small fraction of disease risk in a population. Moreover, these factors are expected to interact, and other biological processes (e.g., epigenetic modifications) are likely to be contributors to the complex puzzle of susceptibility. An accurate phenotypic definition of disease and its subtypes is crucial to identifying and understanding the complexities of disease-specific genetic and environmental causes.
Second, geneticists only recently have developed the knowledge base or methods needed to measure genetic variations and their metabolic consequences with sufficient ease and cost-effectiveness so that the large number of genes thought to be involved can be studied. With the completion of the Human Genome Project in 2003, many different scientific entities (e.g., the Environmental Genome Project and the International HapMap Consortium) have been working to identify the mutational spectra in human populations, and genetic epidemiologists are just now beginning to understand the extensive nature of common variations (>1 percent population frequency) within the human genome that could be affecting people’s risk of disease. The SNP data generated by these initiatives are now centrally located in a number of public databases, including the National Center for Biotechnology Information’s dbSNPs database, the National Cancer Institute’s CGAP Genetic Annotation Initiative SNP Database, and the Karolinska Institute Human Genic Bi-Allelic Sequences Database. At present, the largest dataset on human variation is being generated by the International HapMap Project,4 which is genotyping millions of SNPs on 270 individuals from 4 geographically separated sites from around the world. The International HapMap Project has greatly increased the number of validated SNPs available to the research community to be used to study human variation and is producing a map of genomic haplotypes in four populations with ancestry from parts of Africa, Asia, and Europe. In addition, high-throughput methods of genotyping large numbers of SNPs (thousands) in large epidemiological cohorts are only now becoming available (see above). Unfortunately, high-throughput methods of measuring the environment have not kept a similar pace. For many studies of common disease, a rate-limiting step to increasing our understanding will continue to be the difficult and costly measurement of environmental factors.
Finally, progress also has been hampered because of a lack of adequate investment in developing new methods of analysis that can incorporate the high-dimensional biological reality that we can now measure. The complex genetic and environmental architecture of multifactorial diseases is not easily detected or deciphered using the traditional statistical modeling methods that are focused on the estimation of a single overall model of disease for a population. For example, using traditional logistic regression methods it would be simply impossible to enter all the hundreds of genetic variations that are thought to be involved in CVD risk or in any of the other common disease complexes currently being studied. Beyond the obvious issues of power and overdetermination in such a large-scale model, we also do not know how to model or interpret interactions among many factors simultaneously or how to incorporate the rare, large effects of some genes relative to the common, small effects of others. New modeling strategies that take advantage of advances in pattern recognition, machine learning, and systems analysis (e.g., scale-free networks, Bayesian belief networks, random forest methods) are going to be needed in order to build more comprehensive, predictive models of these etiologically heterogeneous diseases.
The field of human genetics, like many other disciplines, is in transition, and there is much to be gained by joining forces with a wide range of other disciplines that are focused on improving prevention and reducing the disease burden in our populations.
Altshuler D, Kruglyak L, Lander E. Genetic polymorphisms and disease. New England Journal of Medicine. 1998;338(22):1626. [PubMed: 9606122]
Ardlie KG, Lunetta KL, Seielstad M. Testing for population subdivision and association in four case-control studies. American Journal of Human Genetics. 2002;71(2):304–311. [PMC free article: PMC379163] [PubMed: 12096349]
Armstrong VW, Shipkova M, von Ahsen N, Oellerich M. Analytic aspects of monitoring therapy with thiopurine medications. Therapeutic Drug Monitoring. 2004;26(2):220–226. [PubMed: 15228169]
Ashley-Koch A, Yang Q, Olney R. Sickle hemoglobin (Hb S) allele and sickle cell disease: A HuGE review. American Journal of Epidemiology. 2000;151(9):839–845. [PubMed: 10791557]
Bridges K. Hemoglobinopathies (Hemoglobin Disorders). 2002. [accessed May 15, 2006]. [Online]. Available:sickle.bwh.harvard.edu/hemoglobinopathy.html.
Brown MS, Goldstein JL. Lowering plasma cholesterol by raising LDL receptors. New England Journal of Medicine. 1981;305(9):515–517. [PubMed: 6265781]
Cardon LR, Bell JI. Association study designs for complex diseases. Nature Reviews Genetics. 2001;2(2):91–99. [PubMed: 11253062]
Cheverud JM, Routman EJ. Epistasis and its contribution to genetic variance components. Genetics. 1995;139(3):1455–1461. [PMC free article: PMC1206471] [PubMed: 7768453]
Clifford RJ, Edmonson MN, Nguyen C, Buetow KH. Large-scale analysis of non-synonymous coding region single nucleotide polymorphisms. Bioinformatics. 2004;20(7):1006–1014. [PubMed: 14751981]
Cobleigh MA, Vogel CL, Tripathy D, Robert NJ, Scholl S, Fehrenbacher L, Wolter JM, Paton V, Shak S, Lieberman G, Slamon DJ. Multinational study of the efficacy and safety of humanized anti-HER2 monoclonal antibody in women who have HER2-overexpressing metastatic breast cancer that has progressed after chemotherapy for metastatic disease. Journal of Clinical Oncology. 1999;17(9):2639–2648. [PubMed: 10561337]
Deniz Naranjo MC, Munoz Fernandez C, Alemany Rodriguez MJ, Perez Vieitez MC, Irurita Latasa J, Suarez Armas R, Suarez Valentin MP, Sanchez Garcia F. Gender has a strong modulating effect on the risk of Alzheimer’s disease conferred by the apolipoprotein E gene in the population of the Canary Islands, Spain. Revista de Neurologia. 2004;38(7):615–618. [PubMed: 15098180]
Evans WE, McLeod HL. Pharmacogenomics—drug disposition, drug targets, and side effects. New England Journal of Medicine. 2003;348(6):538–549. [PubMed: 12571262]
Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS. A genome-wide scalable SNP genotyping assay using microarray technology. Nature Genetics. 2005;37(5):549–554. [PubMed: 15838508]
Haines JL, Pericak-Vance MA. Approaches to Gene Mapping in Complex Human Diseases. New York: Wiley-Liss; 1998.
Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, King MC. Linkage of early-onset familial breast cancer to chromosome 17q21. Science. 1990;250(4988):1684–1689. [PubMed: 2270482]
Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. Whole-genome patterns of common DNA variation in three human populations. Science. 2005;307(5712):1072–1079. [PubMed: 15718463]
IOM (Institute of Medicine). Implications of Genomics for Public Health. Washington, DC: The National Academies Press; 2005.
Kalow W, Tang BK, Endrenyi L. Hypothesis: Comparisons of inter- and intra-individual variations can substitute for twin studies in drug research. Pharmacogenetics. 1998;8(4):283–289. [PubMed: 9731714]
Kardia SL, Modell SM, Peyser PA. Family-centered approaches to understanding and preventing coronary heart disease. American Journal of Preventive Medicine. 2003;24(2):143–151. [PubMed: 12568820]
Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308(5720):385–389. [PMC free article: PMC1512523] [PubMed: 15761122]
Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265(5181):2037–2048. [PubMed: 8091226]
Livingston RJ, von Niederhausern A, Jegga AG, Crawford DC, Carlson CS, Rieder MJ, Gowrisankar S, Aronow BJ, Weiss RB, Nickerson DA. Pattern of sequence variation across 213 environmental response genes. Genome Research. 2004;14(10A):1821–1831. [PMC free article: PMC524406] [PubMed: 15364900]
Mancama D, Kerwin RW. Role of pharmacogenomics in individualising treatment with SSRIs. CNS Drugs. 2003;17(3):143–151. [PubMed: 12617694]
Mathew C. Science medicine and the future—postgenomic technologies: Hunting the genes for common disorders. British Medical Journal. 2001;322(7293):1031–1034. [PMC free article: PMC1120184] [PubMed: 11325769]
McAdams HH, Arkin A. Stochastic mechanisms in gene expression. Proceedings of the National Academy of Sciences of the United States of America. 1997;94(3):814–819. [PMC free article: PMC19596] [PubMed: 9023339]
Miller DP, Liu G, De Vivo I, Lynch TJ, Wain JC, Su L, Christiani DC. Combinations of the variant genotypes of GSTP1, GSTM1, and p53 are associated with an increased lung cancer risk. Cancer Research. 2002;62(10):2819–2823. [PubMed: 12019159]
Moorjani S, Roy M, Gagne C, Davignon J, Brun D, Toussaint M, Lambert M, Campeau L, Blaichman S, Lupien P. Homozygous familial hypercholesterolemia among French Canadians in Quebec Province. Arteriosclerosis. 1989;9(2):211–216. [PubMed: 2923577]
Obach RS, Walsky RL, Venkatakrishnan K, Gaman EA, Houston JB, Tremaine LM. The utility of in vitro cytochrome P450 inhibition data in the prediction of drug-drug interactions. Journal of Pharmacology and Experimental Therapeutics. 2006;316(1):336–348. [PubMed: 16192315]
Pirmohamed M, Park BK. Genetic susceptibility to adverse drug reactions. Trends in Pharmacological Sciences. 2001;22(6):298–305. [PubMed: 11395158]
Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. American Journal of Human Genetics. 1999;65(1):220–228. [PMC free article: PMC1378093] [PubMed: 10364535]
Quinn CT, Miller ST. Risk factors and prediction of outcomes in children and adolescents who have sickle cell anemia. Hematology/Oncology Clinics of North America. 2004;18(6 SPEC.ISS):1339–1354. [PubMed: 15511619]
Rebbeck TR, Walker AH, Phelan CM, Godwin AK, Buetow KH, Garber JE, Narod SA, Weber BL. Defining etiologic heterogeneity in breast cancer using genetic biomarkers. Progress in Clinical and Biological Research. 1997;396:53–61. [PubMed: 9108589]
Rimoin DL, Connor JM, Pyeritz RE, Korf BR, editors. Emery and Rimoin’s Principles and Practice of Medical Genetics. 4th edition. Vol. 2. New York: Churchill Livingstone; 2002.
Risch NJ. Searching for genetic determinants in the new millennium. Nature. 2000;405(6788):847–856. [PubMed: 10866211]
Satten GA, Flanders WD, Yang Q. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. American Journal of Human Genetics. 2001;68(2):466–477. [PMC free article: PMC1235279] [PubMed: 11170894]
Sing CF, Stengard JH, Kardia SLR. Genes, environment, and cardiovascular disease. Arteriosclerosis, Thrombosis, and Vascular Biology. 2003;23:1190–1196. [PubMed: 12730090]
Smith G. The Genomics Age: How DNA Technology Is Transforming the Way We Live and Who We Are. New York: AMACOM; 2005.
Steinberg MH. Predicting clinical severity in sickle cell anaemia. British Journal of Haematology. 2005;129(4):465–481. [PubMed: 15877729]
Stuart MJ, Nagel RL. Sickle-cell disease. Lancet. 2004;364(9442):1343–1360. [PubMed: 15474138]
Syvanen AC. Toward genome-wide SNP genotyping. Nature Genetics. 2005;(37 Suppl):S5–S10. [PubMed: 15920530]
Thompson MW, McInnes RR, Willard, editors. Thompson & Thompson Genetics in Medicine. 5th edition. Philadelphia, PA: W.B. Saunders Company; 1991.
Wacholder S, Rothman N, Caporaso N. Population stratification in epidemiologic studies of common genetic variants and cancer: Quantification of bias. Journal of the National Cancer Institute. 2000;92(14):1151–1158. [PubMed: 10904088]
Wang X, Tomso DJ, Liu X, Bell DA. Single nucleotide polymorphism in transcriptional regulatory regions and expression of environmentally responsive genes. Toxicology and Applied Pharmacology. 2005;207(2 Suppl):84–90. [PubMed: 16002116]
Wang Z, Fan H, Yang HH, Hu Y, Buetow KH, Lee MP. Comparative sequence analysis of imprinted genes between human and mouse to reveal imprinting signatures. Genomics. 2004;83(3):395–401. [PubMed: 14962665]
Weinshilboum R. Inheritance and drug response. New England Journal of Medicine. 2003;348(6):529–537. [PubMed: 12571261]
Wu JH, Lo SK, Wen MS, Kao JT. Characterization of apolipoprotein E genetic variations in Taiwanese association with coronary heart disease and plasma lipid levels. Human Biology. 2002;74(1):25–31. [PubMed: 11931577]
Zhou W, Liu G, Miller DP, Thurston SW, Xu LL, Wain JC, Lynch TJ, Su L, Christiani DC. Polymorphisms in the DNA repair genes XRCC1 and ERCC2, smoking, and lung cancer risk. Cancer Epidemiology, Biomarkers and Prevention. 2003;12(4):359–365. [PubMed: 12692111]
An SNP is the DNA sequence variation that occurs when a single nucleotide (A, T, C, or G) in the genome sequence is altered (Smith, 2005).
A candidate gene is a gene whose protein product is involved in the metabolic or physiological pathways associated with a particular disease (IOM, 2005).
The sickle cell example is abstracted from a commissioned paper prepared by Robert J. Thompson, Jr., Ph.D. (Appendix D).