Hubbry Logo
Chemical formulaChemical formulaMain
Open search
Chemical formula
Community hub
Chemical formula
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Chemical formula
Chemical formula
from Wikipedia

Chemical formula for aluminium sulfate
Structural formula for butane

A chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule, using chemical element symbols, numbers, and sometimes also other symbols, such as parentheses, dashes, brackets, commas and plus (+) and minus (−) signs. These are limited to a single typographic line of symbols, which may include subscripts and superscripts. A chemical formula is not a chemical name since it does not contain any words. Although a chemical formula may imply certain simple chemical structures, it is not the same as a full chemical structural formula. Chemical formulae can fully specify the structure of only the simplest of molecules and chemical substances, and are generally more limited in power than chemical names and structural formulae.

The simplest types of chemical formulae are called empirical formulae, which use letters and numbers indicating the numerical proportions of atoms of each type. Molecular formulae indicate the simple numbers of each type of atom in a molecule, with no information on structure. For example, the empirical formula for glucose is CH2O (twice as many hydrogen atoms as carbon and oxygen), while its molecular formula is C6H12O6 (12 hydrogen atoms, six carbon and oxygen atoms).

Sometimes a chemical formula is complicated by being written as a condensed formula (or condensed molecular formula, occasionally called a "semi-structural formula"), which conveys additional information about the particular ways in which the atoms are chemically bonded together, either in covalent bonds, ionic bonds, or various combinations of these types. This is possible if the relevant bonding is easy to show in one dimension. An example is the condensed molecular/chemical formula for ethanol, which is CH3−CH2−OH or CH3CH2OH. However, even a condensed chemical formula is necessarily limited in its ability to show complex bonding relationships between atoms, especially atoms that have bonds to four or more different substituents.

Since a chemical formula must be expressed as a single line of chemical element symbols, it often cannot be as informative as a true structural formula, which is a graphical representation of the spatial relationship between atoms in chemical compounds (see for example the figure for butane structural and chemical formulae, at right). For reasons of structural complexity, a single condensed chemical formula (or semi-structural formula) may correspond to different molecules, known as isomers. For example, glucose shares its molecular formula C6H12O6 with a number of other sugars, including fructose, galactose and mannose. Linear equivalent chemical names exist that can and do specify uniquely any complex structural formula (see chemical nomenclature), but such names must use many terms (words), rather than the simple element symbols, numbers, and simple typographical symbols that define a chemical formula.

Chemical formulae may be used in chemical equations to describe chemical reactions and other chemical transformations, such as the dissolving of ionic compounds into solution. While, as noted, chemical formulae do not have the full power of structural formulae to show chemical relationships between atoms, they are sufficient to keep track of numbers of atoms and numbers of electrical charges in chemical reactions, thus balancing chemical equations so that these equations can be used in chemical problems involving conservation of atoms, and conservation of electric charge.

Overview

[edit]

A chemical formula identifies each constituent element by its chemical symbol and indicates the proportionate number of atoms of each element. In empirical formulae, these proportions begin with a key element and then assign numbers of atoms of the other elements in the compound, by ratios to the key element. For molecular compounds, these ratio numbers can all be expressed as whole numbers. For example, the empirical formula of ethanol may be written C2H6O because the molecules of ethanol all contain two carbon atoms, six hydrogen atoms, and one oxygen atom. Some types of ionic compounds, however, cannot be written with entirely whole-number empirical formulae. An example is boron carbide, whose formula of CBn is a variable non-whole number ratio with n ranging from over 4 to more than 6.5.

When the chemical compound of the formula consists of simple molecules, chemical formulae often employ ways to suggest the structure of the molecule. These types of formulae are variously known as molecular formulae and condensed formulae. A molecular formula enumerates the number of atoms to reflect those in the molecule, so that the molecular formula for glucose is C6H12O6 rather than the glucose empirical formula, which is CH2O. However, except for very simple substances, molecular chemical formulae lack needed structural information, and are ambiguous.

For simple molecules, a condensed (or semi-structural) formula is a type of chemical formula that may fully imply a correct structural formula. For example, ethanol may be represented by the condensed chemical formula CH3CH2OH, and dimethyl ether by the condensed formula CH3OCH3. These two molecules have the same empirical and molecular formulae (C2H6O), but may be differentiated by the condensed formulae shown, which are sufficient to represent the full structure of these simple organic compounds.

Condensed chemical formulae may also be used to represent ionic compounds that do not exist as discrete molecules, but nonetheless do contain covalently bound clusters within them. These polyatomic ions are groups of atoms that are covalently bound together and have an overall ionic charge, such as the sulfate [SO4]2− ion. Each polyatomic ion in a compound is written individually in order to illustrate the separate groupings. For example, the compound dichlorine hexoxide has an empirical formula ClO3, and molecular formula Cl2O6, but in liquid or solid forms, this compound is more correctly shown by an ionic condensed formula [ClO2]+[ClO4], which illustrates that this compound consists of [ClO2]+ ions and [ClO4] ions. In such cases, the condensed formula only need be complex enough to show at least one of each ionic species.

Chemical formulae as described here are distinct from the far more complex chemical systematic names that are used in various systems of chemical nomenclature. For example, one systematic name for glucose is (2R,3S,4R,5R)-2,3,4,5,6-pentahydroxyhexanal. This name, interpreted by the rules behind it, fully specifies glucose's structural formula, but the name is not a chemical formula as usually understood, and uses terms and words not used in chemical formulae. Such names, unlike basic formulae, may be able to represent full structural formulae without graphs.

Types

[edit]

Empirical formula

[edit]

In chemistry, the empirical formula of a chemical is a simple expression of the relative number of each type of atom or ratio of the elements in the compound. Empirical formulae are the standard for ionic compounds, such as CaCl2, and for macromolecules, such as SiO2. An empirical formula makes no reference to isomerism, structure, or absolute number of atoms. The term empirical refers to the process of elemental analysis, a technique of analytical chemistry used to determine the relative percent composition of a pure chemical substance by element.

For example, hexane has a molecular formula of C6H14, and (for one of its isomers, n-hexane) a structural formula CH3CH2CH2CH2CH2CH3, implying that it has a chain structure of 6 carbon atoms, and 14 hydrogen atoms. However, the empirical formula for hexane is C3H7. Likewise the empirical formula for hydrogen peroxide, H2O2, is simply HO, expressing the 1:1 ratio of component elements. Formaldehyde and acetic acid have the same empirical formula, CH2O. This is also the molecular formula for formaldehyde, but acetic acid has double the number of atoms.

Like the other formula types detailed below, an empirical formula shows the number of elements in a molecule, and determines whether it is a binary compound, ternary compound, quaternary compound, or has even more elements.

Molecular formula

[edit]
Isobutane structural formula
Molecular formula: C4H10
Condensed formula: (CH3)3CH
n-Butane structural formula
Molecular formula: C4H10
Condensed formula: CH3CH2CH2CH3

Molecular formulae simply indicate the numbers of each type of atom in a molecule of a molecular substance. They are the same as empirical formulae for molecules that only have one atom of a particular type, but otherwise may have larger numbers. An example of the difference is the empirical formula for glucose, which is CH2O (ratio 1:2:1), while its molecular formula is C6H12O6 (number of atoms 6:12:6). For water, both formulae are H2O. A molecular formula provides more information about a molecule than its empirical formula, but is more difficult to establish.

Structural formula

[edit]

In addition to indicating the number of atoms of each elementa molecule, a structural formula indicates how the atoms are organized, and shows (or implies) the chemical bonds between the atoms. There are multiple types of structural formulas focused on different aspects of the molecular structure.

The two diagrams show two molecules which are structural isomers of each other, since they both have the same molecular formula C4H10, but they have different structural formulas as shown.

Condensed formula

[edit]

The connectivity of a molecule often has a strong influence on its physical and chemical properties and behavior. Two molecules composed of the same numbers of the same types of atoms (i.e. a pair of isomers) might have completely different chemical and/or physical properties if the atoms are connected differently or in different positions. In such cases, a structural formula is useful, as it illustrates which atoms are bonded to which other ones. From the connectivity, it is often possible to deduce the approximate shape of the molecule.

A condensed (or semi-structural) formula may represent the types and spatial arrangement of bonds in a simple chemical substance, though it does not necessarily specify isomers or complex structures. For example, ethane consists of two carbon atoms single-bonded to each other, with each carbon atom having three hydrogen atoms bonded to it. Its chemical formula can be rendered as CH3CH3. In ethylene there is a double bond between the carbon atoms (and thus each carbon only has two hydrogens), therefore the chemical formula may be written: CH2CH2, and the fact that there is a double bond between the carbons is implicit because carbon has a valence of four. However, a more explicit method is to write H2C=CH2 or less commonly H2C::CH2. The two lines (or two pairs of dots) indicate that a double bond connects the atoms on either side of them.

A triple bond may be expressed with three lines (HC≡CH) or three pairs of dots (HC:::CH), and if there may be ambiguity, a single line or pair of dots may be used to indicate a single bond.

Molecules with multiple functional groups that are the same may be expressed by enclosing the repeated group in round brackets. For example, isobutane may be written (CH3)3CH. This condensed structural formula implies a different connectivity from other molecules that can be formed using the same atoms in the same proportions (isomers). The formula (CH3)3CH implies a central carbon atom connected to one hydrogen atom and three methyl groups (CH3). The same number of atoms of each element (10 hydrogens and 4 carbons, or C4H10) may be used to make a straight chain molecule, n-butane: CH3CH2CH2CH3.

Chemical names in answer to limitations of chemical formulae

[edit]

The alkene called but-2-ene has two isomers, which the chemical formula CH3CH=CHCH3 does not identify. The relative position of the two methyl groups must be indicated by additional notation denoting whether the methyl groups are on the same side of the double bond (cis or Z) or on the opposite sides from each other (trans or E).[1]

As noted above, in order to represent the full structural formulae of many complex organic and inorganic compounds, chemical nomenclature may be needed which goes well beyond the available resources used above in simple condensed formulae. See IUPAC nomenclature of organic chemistry and IUPAC nomenclature of inorganic chemistry 2005 for examples. In addition, linear naming systems such as International Chemical Identifier (InChI) allow a computer to construct a structural formula, and simplified molecular-input line-entry system (SMILES) allows a more human-readable ASCII input. However, all these nomenclature systems go beyond the standards of chemical formulae, and technically are chemical naming systems, not formula systems.[2]

Polymers in condensed formulae

[edit]

For polymers in condensed chemical formulae, parentheses are placed around the repeating unit. For example, a hydrocarbon molecule that is described as CH3(CH2)50CH3, is a molecule with fifty repeating units. If the number of repeating units is unknown or variable, the letter n may be used to indicate this formula: CH3(CH2)nCH3.

Ions in condensed formulae

[edit]

For ions, the charge on a particular atom may be denoted with a right-hand superscript. For example, Na+, or Cu2+. The total charge on a charged molecule or a polyatomic ion may also be shown in this way, such as for hydronium, H3O+, or sulfate, SO2−4. Here + and − are used in place of +1 and −1, respectively.

For more complex ions, brackets [ ] are often used to enclose the ionic formula, as in [B12H12]2−, which is found in compounds such as caesium dodecaborate, Cs2[B12H12]. Parentheses ( ) can be nested inside brackets to indicate a repeating unit, as in Hexamminecobalt(III) chloride, [Co(NH3)6]3+Cl3. Here, (NH3)6 indicates that the ion contains six ammine groups (NH3) bonded to cobalt, and [ ] encloses the entire formula of the ion with charge +3. [further explanation needed]

This is strictly optional; a chemical formula is valid with or without ionization information, and Hexamminecobalt(III) chloride may be written as [Co(NH3)6]3+Cl3 or [Co(NH3)6]Cl3. Brackets, like parentheses, behave in chemistry as they do in mathematics, grouping terms together – they are not specifically employed only for ionization states. In the latter case here, the parentheses indicate 6 groups all of the same shape, bonded to another group of size 1 (the cobalt atom), and then the entire bundle, as a group, is bonded to 3 chlorine atoms. In the former case, it is clearer that the bond connecting the chlorines is ionic, rather than covalent.

Isotopes

[edit]

Although isotopes are more relevant to nuclear chemistry or stable isotope chemistry than to conventional chemistry, different isotopes may be indicated with a prefixed superscript in a chemical formula. For example, the phosphate ion containing radioactive phosphorus-32 is [32PO4]3−. Also a study involving stable isotope ratios might include the molecule 18O16O.

A left-hand subscript is sometimes used redundantly to indicate the atomic number. For example, 8O2 for dioxygen, and 16
8
O
2
for the most abundant isotopic species of dioxygen. This is convenient when writing equations for nuclear reactions, in order to show the balance of charge more clearly.

Trapped atoms

[edit]
Traditional formula: MC60
The "@" notation: M@C60

The @ symbol (at sign) indicates an atom or molecule trapped inside a cage but not chemically bound to it. For example, a buckminsterfullerene (C60) with an atom (M) would simply be represented as MC60 regardless of whether M was inside the fullerene without chemical bonding or outside, bound to one of the carbon atoms. Using the @ symbol, this would be denoted M@C60 if M was inside the carbon network. A non-fullerene example is [As@Ni12As20]3−, an ion in which one arsenic (As) atom is trapped in a cage formed by the other 32 atoms.

This notation was proposed in 1991[3] with the discovery of fullerene cages (endohedral fullerenes), which can trap atoms such as La to form, for example, La@C60 or La@C82. The choice of the symbol has been explained by the authors as being concise, readily printed and transmitted electronically (the at sign is included in ASCII, which most modern character encoding schemes are based on), and the visual aspects suggesting the structure of an endohedral fullerene.

Non-stoichiometric chemical formulae

[edit]

Chemical formulae most often use integers for each element. However, there is a class of compounds, called non-stoichiometric compounds, that cannot be represented by small integers. Such a formula might be written using decimal fractions, as in Fe0.95O, or it might include a variable part represented by a letter, as in Fe1−xO, where x is normally much less than 1.

General forms for organic compounds

[edit]

A chemical formula used for a series of compounds that differ from each other by a constant unit is called a general formula. It generates a homologous series of chemical formulae. For example, alcohols may be represented by the formula CnH2n + 1OH (n ≥ 1), giving the homologs methanol, ethanol, propanol for 1 ≤ n ≤ 3.

Hill system

[edit]

The Hill system (or Hill notation) is a system of writing empirical chemical formulae, molecular chemical formulae and components of a condensed formula such that the number of carbon atoms in a molecule is indicated first, the number of hydrogen atoms next, and then the number of all other chemical elements subsequently, in alphabetical order of the chemical symbols. When the formula contains no carbon, all the elements, including hydrogen, are listed alphabetically.

By sorting formulae according to the number of atoms of each element present in the formula according to these rules, with differences in earlier elements or numbers being treated as more significant than differences in any later element or number—like sorting text strings into lexicographical order—it is possible to collate chemical formulae into what is known as Hill system order.

The Hill system was first published by Edwin A. Hill of the United States Patent and Trademark Office in 1900.[4] It is the most commonly used system in chemical databases and printed indexes to sort lists of compounds.[5]

A list of formulae in Hill system order is arranged alphabetically, as above, with single-letter elements coming before two-letter symbols when the symbols begin with the same letter (so "B" comes before "Be", which comes before "Br").[5]

The following example formulae are written using the Hill system, and listed in Hill order:

  • BrClH2Si
  • BrI
  • CCl4
  • CH3I
  • C2H5Br
  • H2O4S

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A chemical formula is a concise symbolic representation of the composition of a or , indicating the types and numbers of atoms present using element symbols and numerical subscripts. This notation serves as the fundamental language of chemistry, enabling precise communication of molecular structure and facilitating calculations in reactions, , and material properties. Chemical formulas are categorized into several types based on the level of detail they provide. The expresses the simplest whole-number ratio of atoms in a , such as CH₂O for , which does not specify the total number of atoms but only their relative proportions. In contrast, the molecular formula provides the exact count of each atom in a single , for example, H₂O for (two atoms and one oxygen atom) or C₆H₁₂O₆ for glucose. The structural formula goes further by illustrating the connectivity and bonding between atoms, such as showing the two atoms bonded to a central oxygen in . Notation conventions in chemical formulas follow standardized rules to ensure clarity and universality. Element symbols are derived from the periodic table, with single-letter symbols in uppercase (e.g., H for ) and two-letter symbols having the first letter uppercase and the second lowercase (e.g., He for ). Subscripts appear after symbols to denote atom counts, omitted when equal to one (e.g., CO₂ for , indicating one carbon and two oxygens), while coefficients before formulas represent multiples of the entire unit in equations but not in single-molecule representations. For complex compounds, parentheses group atoms with external subscripts applying to the enclosed unit, as in Ca(OH)₂ for . These formulas are essential in fields ranging from to industrial chemistry, underpinning the description of both simple diatomic molecules like O₂ and intricate polymers.

Introduction

Definition and Purpose

A chemical formula is a symbolic notation that uses elemental symbols from the periodic table, combined with numerical subscripts, to represent the types and relative numbers of atoms present in or compound. This representation provides a concise way to denote the elemental composition and of substances, distinguishing it from linguistic names that describe the same entities in verbal form—for instance, the formula H₂O corresponds to the name "." According to IUPAC recommendations, such formulae serve as a simple and clear method for designating compounds, essential for unambiguous communication in chemical contexts. The primary purpose of a chemical formula is to convey the precise makeup of a substance, including its atomic ratios, which facilitates the of chemical and reactivity. By specifying , formulas enable chemists to balance chemical equations, ensuring and atoms during reactions, as seen in the balanced equation for formation: 2H₂ + O₂ → 2H₂O. This foundational role extends to practical applications, such as planning syntheses and calculating reaction yields, where accurate formulas underpin quantitative analysis and experimental design. Basic examples illustrate these principles: the formula H₂O indicates two atoms bonded to one oxygen atom in a , while NaCl represents one sodium atom and one atom in . These notations sometimes hint at structural arrangements, though full structural details are reserved for more advanced representations. In essence, chemical formulas bridge abstract atomic theory with tangible chemical processes, standardizing information across scientific disciplines.

Historical Development

The development of chemical formulas began in the early , building on foundational atomic concepts. In 1808, proposed his atomic theory, which posited that elements consist of indivisible atoms and that compounds form from atoms combining in simple whole-number ratios, laying the groundwork for representing quantities of atoms in chemical notations through stacked symbols or early subscript-like indications. This theory influenced subsequent notations by emphasizing the need to denote atomic proportions explicitly. Three years later, in 1811, Amedeo Avogadro's hypothesis that equal volumes of gases at the same temperature and pressure contain equal numbers of molecules enabled the distinction between atomic and molecular compositions, facilitating the shift toward molecular formulas that accurately reflected actual particle counts rather than just empirical ratios. A pivotal advancement came in 1813–1814 when Swedish chemist Jöns Jacob Berzelius introduced the modern system of elemental symbols, replacing alchemical signs and Dalton's pictorial representations with abbreviated Latin-derived letters, such as H for hydrogen and O for oxygen. Berzelius further refined this by using superscripts to indicate the number of atoms, as in H²O for water, providing a concise and systematic way to denote compound compositions that became the basis for contemporary formulas. Subsequently, in the mid-19th century, German chemists replaced these superscripts with subscripts, establishing the notation still used today. In the mid-19th century, structural representations emerged; August Kekulé, in 1858, proposed the theory of chemical structure, using lines to depict atomic connections in organic molecules, such as the tetravalent carbon atom in chains, which evolved into structural formulas by the 1860s. This innovation, developed alongside contributions from chemists like Archibald Couper, allowed formulas to convey not just composition but also bonding arrangements. The periodic table, published by in 1869, further aided systematic formula representation by organizing elements by atomic weight and properties, predicting valences and thus common bonding patterns in compounds. Standardization efforts intensified with the 1892 Geneva Nomenclature Congress, the first international on chemical naming and notation, which established rules for organic formulas and influenced inorganic representations, though full consensus on symbols awaited later refinements. The International Union of Pure and Applied Chemistry (IUPAC), founded in , played a central role in formalizing notation thereafter, endorsing condensed formulas (e.g., CH₃COOH for acetic acid) and line-angle formulas for complex molecules by mid-century to enhance clarity and efficiency in scientific communication.

Standard Types of Formulas

Empirical Formula

The of a represents the simplest whole-number ratio of atoms of each element present in the compound, without indicating the actual number of atoms or their arrangement. This formula is derived from experimental data such as composition by mass or results from , focusing solely on relative proportions rather than molecular structure. To calculate the empirical formula from percentage composition, assume a 100 g sample to convert percentages directly to grams for each element, then divide by the respective atomic masses to obtain moles. Next, divide each mole value by the smallest number of moles to find the ratio, rounding to the nearest whole numbers; if the ratios are not integers, multiply all by the smallest factor that yields whole numbers. In , a known mass of the compound is burned in oxygen to measure the masses of products like CO2_2 and H2_2O, from which the moles of carbon and are calculated, with oxygen determined by difference, followed by the same ratio-finding steps. A classic example is a with 40% carbon, 6.7% , and 53.3% oxygen by : assuming 100 g, this yields 3.33 mol C, 6.7 mol H, and 3.33 mol O; dividing by 3.33 gives the ratio 1:2:1, so the is CH2_2O. Glucose, analyzed similarly, also has the CH2_2O, though its molecular formula is C6_6H12_{12}O6_6, illustrating how the empirical form is the simplest multiple of the actual composition. Empirical formulas are applied in qualitative chemical to identify elemental ratios and classify compounds, such as recognizing carbohydrates by their general empirical formula CH2_2O, originally termed "hydrates of carbon" due to this ratio. They provide essential compositional insights without needing structural details, aiding in preliminary compound identification across organic and .

Molecular Formula

The molecular formula of a represents the exact number and type of atoms present in a single of that substance, providing a precise composition for covalent compounds. It is typically expressed using chemical symbols with subscripts indicating the count of each atom, such as H2OH_2O for , where two atoms and one oxygen atom form the . Unlike simpler representations, the molecular formula accounts for the full molecular structure, often appearing as a multiple of the , which only gives the simplest atom ratio. To determine the molecular formula, chemists first obtain the through and then use the compound's to find the multiplier nn. The value of nn is calculated by dividing the experimentally determined by the empirical formula mass; the subscripts in the are then multiplied by nn to yield the molecular formula. For instance, if the is CH2OCH_2O (empirical mass approximately 30 g/mol) and the is 180 g/mol, then n=180/30=6n = 180 / 30 = 6, resulting in C6H12O6C_6H_{12}O_6 for glucose. Techniques like are commonly employed to measure the accurately, often providing the molecular ion peak that confirms the formula by identifying the exact . Representative examples illustrate this concept clearly. has the molecular formula C6H6C_6H_6, indicating six carbon and six atoms per molecule, which is six times its CHCH. Water's molecular formula H2OH_2O matches its empirical formula, as the simplest ratio is already the actual composition. This notation applies primarily to covalent compounds that form discrete molecules, whereas ionic compounds lack true molecules and are instead represented by formula units, such as NaClNaCl for sodium chloride, which denotes the ratio in the crystal lattice rather than a single entity.

Structural Formula

A structural formula is a graphical representation of a chemical compound that illustrates the connectivity between atoms, using lines to denote chemical bonds and often indicating bond types such as single, double, or triple bonds. Unlike the molecular formula, which only specifies the types and numbers of atoms, the structural formula provides insight into the molecule's architecture, essential for understanding reactivity and properties. This representation emerged as a key tool in organic chemistry to depict how atoms are arranged and linked. One primary type is the , also known as a full , which shows all atoms, bonds, and lone pairs of electrons. For example, the of depicts two atoms each bonded to a central oxygen atom with a , represented as H–O–H, alongside two lone pairs on the oxygen to account for its octet. These structures emphasize valence electrons and are crucial for predicting and polarity. Another common variant is the , or bond-line notation, which simplifies representation by omitting atoms and carbon atoms at chain intersections, assuming standard valences. In , carbon chains are depicted as zig-zag lines, with bonds implied at angles. This type is widely used in for its efficiency in illustrating complex molecules without clutter. For instance, the skeletal formula of is depicted as a straight with an –OH group attached to one end, implying the two carbon atoms connected by a , with hydrogens filling the valences. Structural formulas are vital for distinguishing isomers, compounds with the same molecular formula but different connectivity. (C₄H₁₀) has a straight-chain structure, CH₃–CH₂–CH₂–CH₃, while its isomer (2-methylpropane) branches as (CH₃)₂CH–CH₃, leading to distinct physical properties like boiling points. Such depictions highlight how bond arrangements affect molecular behavior. The evolution of structural formulas began in the mid-19th century with Friedrich August Kekulé's development of structural theory in 1858, introducing line notations to represent carbon-carbon bonds and valences, as seen in his ring proposal of alternating double bonds in 1865. This "Kekulé structure" revolutionized chemistry by enabling the visualization of molecular skeletons. Over time, advancements led to perspective drawings incorporating 3D angles, such as wedge-dash notations for tetrahedral geometry. In modern , structural formulas are represented digitally using graph-based data structures, where atoms are nodes and bonds are edges, facilitating simulations and predictions via software like programs. These computational models build on traditional notations to generate 3D visualizations and analyze large s, enhancing research in and .

Condensed Formula

A condensed formula is a compact linear notation used in chemistry to represent the structure of a by grouping atoms and implying bonds, particularly for organic compounds, without the need for a full graphical structural . This method abbreviates the representation by writing atoms in sequence along the carbon chain, with subscripts indicating the number of identical atoms or groups, such as in depicted as CH₃CH₂OH, where the single bonds between carbons and the hydroxyl group are understood rather than explicitly drawn. Conventions for condensed formulas include the use of parentheses to denote branches or substituents off the main chain, as seen in isopropanol written as (CH₃)₂CHOH, where the two methyl groups are attached to the central carbon. Dashes are optional for single bonds and often omitted to maintain brevity, while double or triple bonds may be indicated with "=" or "≡" symbols when necessary. This notation follows guidelines from standards, prioritizing readability in textual contexts like and databases. The primary advantages of condensed formulas lie in their balance of structural detail and conciseness, making them ideal for representing organic molecules in naming conventions, synthetic planning, and without the space demands of expanded structural formulas. For instance, acetic acid is succinctly shown as CH₃COOH, capturing the carbonyl and carboxyl functionalities efficiently. However, this notation has limitations, as it cannot adequately convey complex , such as cis-trans isomerism or , requiring supplementary descriptors or diagrams for precise three-dimensional information.

Specialized Formulas

Formulas for Ions and Polymers

Chemical formulas for ions incorporate charge notation to indicate the ionic state, distinguishing them from neutral molecules. For monoatomic ions, the element symbol is followed by a superscript representing the charge magnitude and sign, such as Na⁺ for the sodium ion or Cl⁻ for the chloride ion. Polyatomic ions, composed of multiple atoms, use similar notation but enclose the formula in parentheses if multiple units are present in a compound; examples include the sulfate ion SO₄²⁻ and the ammonium ion NH₄⁺. In ionic compounds, the formula represents the simplest neutral ratio of ions, known as the , with cations listed first followed by anions and subscripts to balance charges. For instance, is written as CaCl₂, indicating one Ca²⁺ and two Cl⁻ s, while is (NH₄)₂SO₄ to show two NH₄⁺ s and one SO₄²⁻ . Parentheses are used around polyatomic ions when subscripts greater than one are needed to ensure electroneutrality. Polymer formulas emphasize the repeating rather than the full molecular size, using brackets to denote the constitutional repeating unit (CRU) followed by a subscript n to indicate the . The chain, for example, is represented as [\ceCH2CH2]n[-\ce{CH2-CH2}-]_n, where the CRU is the ethane-1,2-diyl unit. Similarly, is represented by the molecular formula (\ceC8H8)n(\ce{C8H8})_n or more precisely [\ceCH2CH(C6H5)]n[\ce{-CH2-CH(C6H5)-}]_n to capture the repeating styrene-derived unit. Representing polymers presents challenges, particularly for end groups and copolymers, as standard notation focuses on the infinite approximation. End groups, which terminate the , are denoted using α- for the initiating end and ω- for the terminating end, such as in α-methyl-ω-hydroxy-polystyrene, though they are often omitted in simplified formulas unless critical to properties. Copolymers, with multiple CRUs, require specifying sequence (e.g., random or block) alongside the units, but this extends beyond basic repeating unit notation.

Isotopic Formulas

Isotopic formulas extend standard molecular formulas by specifying the isotopic composition of atoms, allowing precise identification of nuclide variants within a compound. These notations are essential for distinguishing isotopes that share the same atomic number but differ in neutron count, thereby altering atomic mass without affecting chemical reactivity. The primary notation uses a left superscript to indicate the mass number preceding the atomic symbol, such as 14C^{14}\text{C} for carbon-14 or 2H^{2}\text{H} for deuterium. For hydrogen isotopes, the symbols D (deuterium) and T (tritium, 3H^{3}\text{H}) are permitted in formulas when no other nuclides require disambiguation, as in heavy water represented as D₂O. According to IUPAC guidelines, nuclide symbols are placed before the relevant structural part of the formula, often in parentheses for isotopically substituted compounds (e.g., (²H₂)O) or square brackets for specifically labeled ones (e.g., [¹⁴C]methane), with ordering by element symbol or increasing mass number. In applications, isotopic formulas facilitate tracing in nuclear chemistry and biochemical pathways, where the labeled atoms enable detection via or without altering reaction mechanisms. For instance, tritium-labeled (T₂O) tracks hydrological cycles to its beta emission, while \ceCH318OH\ce{CH3^{18}OH} ( with ) elucidates oxygen transfer in synthetic like dimethyl carbonate formation. These tools provide insights into metabolic processes and environmental dynamics, leveraging the mass differences for precise quantification.

Formulas for Trapped Atoms and Radicals

Formulas for radicals denote the presence of an using a superscripted dot (•), distinguishing these highly reactive from stable molecules with paired electrons. This notation is standard in Lewis structures and molecular representations, where the dot is placed adjacent to the atom bearing the . For instance, the methyl radical is represented as •CH₃, and the as •OH./01%3A_Structure_and_Bonding/1.04%3A_Lewis_Structures_Continued) Trapped atoms and radicals, being unstable under normal conditions, are often studied using matrix isolation techniques at low temperatures (typically 4–20 K) to prevent recombination or reaction. In this method, reactive are embedded in an inert solid host matrix, such as or , allowing isolation for spectroscopic analysis like or . The notation for such trapped species incorporates the host matrix to specify the environment, for example, [Ar]H for a trapped in an matrix. These notations are crucial in low-temperature studies to characterize electronic and vibrational properties of otherwise fleeting intermediates. For example, (O₃) is frequently trapped in matrices to probe its resonance structures, which delocalize electrons across the three oxygen atoms, contributing to its bent geometry and reactivity. Persistent radicals, which exhibit unusual stability due to steric hindrance or delocalization, provide key examples in this context. The nitroxide radical TEMPO, known chemically as 2,2,6,6-tetramethylpiperidin-1-yl)oxyl, is denoted as C₉H₁₈NO•, where the dot highlights the on the nitrogen-oxygen group; it is often studied in matrices for its applications in spin trapping and .

Non-Stoichiometric and Variable Formulas

Non-Stoichiometric Formulas

Non-stoichiometric formulas represent chemical compounds or phases that exhibit variable composition, deviating from fixed atomic ratios due to structural defects, partial site occupancies, or solid solutions within their lattices. These formulas use subscripts, variables, or ranges to denote the variability, such as Fe_{1-x}O where x indicates iron vacancies, distinguishing them from stoichiometric "line compounds" that maintain precise ratios across narrow homogeneity ranges. Non-stoichiometric behavior arises from mechanisms like cation or anion vacancies, atoms, or crystallographic shear planes, allowing the material to adapt while preserving an ordered structure over a composition range. Such compounds are prevalent in oxides and sulfides, where defect structures enable wide homogeneity ranges under varying temperature, pressure, or synthesis conditions. For instance, wüstite, a metallic phase, is denoted as Fe_{1-x}O with 0.04 < x < 0.16, reflecting iron deficiencies that stabilize the rock-salt structure and influence its magnetic and electrical properties. Similarly, titanium monoxide exists over a broad range as TiO_y (0.7 ≤ y ≤ 1.25), resulting from vacancies on both titanium and oxygen sublattices, which affects its metallic conductivity and potential superconducting applications. In materials, non-stoichiometric notation captures variable uptake, as in LaNi_5 H_x (0 ≤ x ≤ 6.7), where the absorbs interstitially, forming a phase whose composition tunes storage capacity and reversibility. This variability extends the concept by incorporating continuous rather than discrete ratios, essential for describing real-world defect-driven behaviors. The composition range in non-stoichiometric formulas profoundly impacts material properties, particularly electronic conductivity in superconductors; for example, in denoted YBa_2 Cu_3 O_{7-δ} (0.1 ≤ δ ≤ 0.3), oxygen deficiencies modulate hole doping in the CuO_2 planes, optimizing the critical temperature for high-T_c up to 93 K.

General Forms for Organic Compounds

General forms for organic compounds provide a concise notation to represent entire families or classes of molecules, particularly where successive members differ by a -CH₂- unit. These formulas, akin to empirical representations but tailored to structural patterns, allow chemists to describe the composition of related compounds without specifying individual members. For instance, they capture the repeating carbon backbone and associated functional groups in carbon-based structures. In hydrocarbons, the simplest organic class, alkanes follow the general formula \ceCnH2n+2\ce{C_nH_{2n+2}}, where nn is the number of carbon atoms, representing saturated chains with single bonds (e.g., ethane as \ceC2H6\ce{C2H6}). Alkenes, containing one double bond, adhere to \ceCnH2n\ce{C_nH_{2n}} (e.g., ethene \ceC2H4\ce{C2H4}), while alcohols are denoted as R-OH, with R as an alkyl group (e.g., methanol \ceCH3OH\ce{CH3OH}). Carboxylic acids use R-COOH, featuring a carbonyl and hydroxyl group on the same carbon (e.g., acetic acid \ceCH3COOH\ce{CH3COOH}). These notations emphasize the carbon-centric nature of organic chemistry, enabling systematic classification. Such general forms find applications in predicting physical properties within , where trends like correlate with increasing nn due to enhanced van der Waals forces from longer chains. For example, boiling points rise from -162°C for (n=1n=1) to 69°C for (n=7n=7), aiding material selection in fuels and solvents. They also underpin systems, where prefixes (e.g., meth-, eth-) derive from nn, standardizing names like propanol for \ceC3H7OH\ce{C3H7OH}. Additionally, the , calculated as 2C+2+NHX2\frac{2C + 2 + N - H - X}{2} (with C = carbons, H = hydrogens, N = nitrogens, X = ), compares a compound's formula to its saturated analog, revealing rings or multiple bonds (e.g., \ceC6H6\ce{C6H6} has 4 degrees, indicating one ring and three double bonds). This metric supports structural elucidation in synthesis and analysis. These notations primarily apply to carbon-based compounds, leveraging the versatility of C-C and C-H bonds to form extensible chains and rings, in contrast to the often fixed, ionic stoichiometries prevalent in inorganic materials. This flexibility facilitates the broad applicability of general forms in organic contexts, from pharmaceuticals to polymers.

Notation and Indexing Systems

Hill System

The Hill system is a standardized notation and sorting convention for writing and indexing molecular chemical formulas, primarily used in chemical and databases to facilitate systematic organization. In this system, formulas for organic compounds are arranged with carbon (C) atoms listed first, followed by (H) atoms, and then all other elements in based on their chemical symbols; for inorganic compounds lacking carbon, elements are simply listed alphabetically. This notation ensures a unique, canonical representation for each molecular formula, ignoring structural details or , and is applied exclusively to empirical or molecular formulas without subscripts influencing the element order—though numerical subscripts are included in the written form for atom counts. Developed by Edwin A. Hill, a at the Patent Office, the system was first proposed in as a method for indexing chemical to streamline searches in growing patent and scientific collections. Hill's approach addressed the chaos of varied formula notations at the time by prioritizing carbon and —key elements in —while using alphabetical sequencing for others to enable straightforward sorting of formulas. For sorting purposes, formulas are ordered primarily by the number of carbon atoms in ascending order, then by the number of hydrogen atoms, and subsequently by other elements in , allowing indexes to group related compounds efficiently. The system remains a cornerstone of modern chemical information retrieval, notably in the databases, where molecular formulas are indexed in Hill order to support comprehensive searches across millions of substances. For example, is represented as CH₄, as C₂H₆O, and as C₆H₆; in an index, these would appear in the sequence CH₄, C₂H₆O, C₆H₆ due to ordering by increasing number of carbon atoms. This ordering promotes logical progression from simpler to more complex carbon-containing compounds, enhancing usability in bibliographic and registry systems while applying uniformly to both empirical and molecular notations.

Limitations and Alternatives to Formulas

Chemical formulas, while essential for denoting the composition of compounds, exhibit significant limitations in representing structural and spatial details, particularly for isomers. Molecular formulas such as [C₃H₆O] fail to distinguish between constitutional isomers like propanal (CH₃CH₂CHO) and propan-2-one ((CH₃)₂CO), as they provide only atomic ratios without connectivity information. Similarly, these formulas cannot convey , such as the configuration around chiral centers or double bonds, leading to ambiguity in identifying enantiomers or diastereomers that share the same empirical composition but differ in spatial arrangement. This inadequacy becomes pronounced in complex organic molecules, where multiple stereoisomers may exist, potentially affecting or reactivity. Historically, the proliferation of organic compounds in the exposed these shortcomings, as early empirical and molecular formulas proved insufficient for uniquely identifying structures amid growing synthetic diversity. Chemists like and recognized that formulas alone could not capture bond arrangements or substitutions, prompting the development of systematic naming conventions to supplement or replace them. This led to international efforts, culminating in the establishment of IUPAC nomenclature rules in the early , which aimed to resolve ambiguities arising from ad hoc naming practices. To address these limitations, IUPAC-recommended names incorporate stereochemical descriptors, enabling precise identification of isomers; for instance, (2S)- specifies the S configuration at the chiral carbon, distinguishing it from its R . For computational and database applications, line notations like SMILES (Simplified Molecular Input Line Entry System) offer a compact, machine-readable alternative, representing structures as ASCII strings—e.g., CC(O)CC for —while supporting basic through symbols like @ for . The IUPAC (InChI) provides a more standardized solution, generating unique, layered strings that encode full structural and stereochemical information, such as InChI=1S/C4H10O/c1-3-4(2)5/h4-5H,3H2,1-2H3/t4-/m0/s1 for (S)-, facilitating unambiguous global exchange of chemical data. In cases requiring visualization of three-dimensional arrangements, such as in molecular modeling software, 3D representations serve as visual alternatives, depicting bond angles, conformations, and dynamics beyond what planar formulas can achieve.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.