3D DISCRETE DISLOCATION DYNAMICS APPLIED TO ... - NUMODIS

INSTITUT NATIONAL POLYTECHNIQUE DE GRENOBLE 

THESE 

pour obtenir le grade de 

DOCTEUR DE L’INPG 

N o attribué par la bibliothèque 

Spécialité : « SCIENCE ET GENIE DES MATERIAUX » 

préparée au laboratoire Génie Physique et Mécanique des Matériaux (GPM2) 

dans le cadre de l’Ecole Doctorale «MATERIAUX ET GENIE DES PROCEDES » 

présentée et soutenue publiquement 

par 

Chansun SHIN 

le 25 novembre 2004 

Titre : 

3D DISCRETE DISLOCATION DYNAMICS APPLIED TO 

DISLOCATION-PRECIPITATE INTERACTIONS 

Directeur de thèse : 

Marc FIVEL 

JURY 

M. A. PINEAU ,Président, Rapporteur 

M. F. LOUCHET ,Examinateur 

M. M. FIVEL ,Directeur de thèse 

M. K. H. OH ,Co-encadrant 

M. H. N. HAN ,Rapporteur 

M. C. ROBERTSON ,Invité 

M. M. VERDIER ,Invité

3D DISCRETE DISLOCATION DYNAMICS APPLIED TO 

DISLOCATION-PRECIPITATE INTERACTIONS 

The 3D Discrete Dislocation Dynamics (DDD) method has been applied to investigate the effects of 

precipitates on the plasticity of FCC single crystals. 

A method to represent the internal interfaces by a series of facets with a pre-defined strength has been 

proposed. For a full account of the mutual elastic interactions between dislocations and second-phase 

particles, the coupling method with a finite element method is extended. In order to accelerate the 

computing time, the serial 3D DDD algorithm has been improved by revisiting the ’box method’ and a new 

parallel code has been developed using the standard Message passing Interface (MPI). 

The image stresses due to a three-dimensional particle were computed using the FEM/DDD coupling 

code. The numerical results have been compared to the corresponding analytical solutions. The ef- 

fect of the elastic modulus mismatch on the flow stress and the subsequent hardening behavior has 

then been analyzed. The image stresses were found to affect significantly the work hardening and 

the local events such as cross slip and climb. Finally, the fatigue of precipitate-hardened materials 

was simulated using the new parallel DDD code. The effects of shearable and non-shearable particles 

on the fatigue properties were well reproduced by the simulations, and the numerical results showed 

good agreements with the available experimental observations in a qualitative way. The mechanism of 

the intense slip band formation is proposed from the observation of the simulated dislocation microstructure. 

KEY WORDS: DISLOCATION, PRECIPITATE, PLASTICITY, FATIGUE, IMAGE FORCES, 

DAMAGE, DYNAMICS, PARALLELIZATION 

DYNAMIQUE DES DISLOCATIONS DISCRETES APPLIQUEE AUX 

INTERACTIONS ENTRE DISLOCATIONS ET PRECIPITES 

La dynamique des dislocations discrètes (DDD) a été appliquée pour examiner les effets des précipités sur 

la plasticité des monocristaux de structure CFC. 

Les précipités sont modélisés par un assemblage de facettes franchissable pour une contrainte donnée. 

Afin de tenir compte des interactions élastiques entre les dislocations et les particules, un couplage avec la 

méthode des éléments finis (MEF) a été utilisé. Afin d’accélérer les temps de calculs, la ’méthode des boites’ 

a été revisitée et une version parallele du code a été développée en utilisant le standard du programmation 

’Message Passing Interface (MPI)’. 

Dans un premier temps, les contraintes images créées par une particule 3D ont été calculées grâce un cou- 

plage entre la MEF et le code de DDD. Les résultats numériques ont été comparés aux solutions analytiques 

correspondantes. L’effet de la différence des modules d’Young sur la limite élastique et le comportement 

durcissant qui en découle ont ensuite été étudiés numériquement. Nous avons montré que les contraintes 

image ont un effet significatif sur le durcissement et les événements locaux tels que le glissement dévié et la 

montée. Finalement, la fatigue des matériaux durcis par des précipités cisaillables et non-cisaillables a été 

simulée avec le nouveau code parallèle de DDD. Les résultats obtenus grâce à nos simulations sont en accord 

avec nos observations experimentales et les données de la littérature. Un mécanisme de formation des ban- 

des de glissement intense a été proposé à partir de l’observation des microstructures obtenues par simulation. 

MOTS CLES: DISLOCATION, PRECIPITE, PLASTICITE, FATIGUE, FORCES IMAGES, EN- 

DOMMAGEMENT, DYNAMIQUE, PARALLELISATION 

Laboratoire Génie Physique et Mécanique des Matériaux (GPM2), ESA5010, 

ENSPG, 101 Rue de la Physique, BP46, 38402 Saint Martin d’Hères Cedex

Acknowledgements 

First of all, I express my big thanks to my advisor Marc Fivel. Five years ago, he kindly replied 

to my audacious e-mail, which could be easily neglected considering the content, and gave me an 

opportunity to visit him. This short visit led to the three-year Ph.D program between INP Greno- 

ble and Seoul National University, and from the moment we shook hands for the first time, to the 

moment we shook hands after the thesis defence, he has been my mentor on both work and life. 

I am also grateful to Professor Kyu Hwan Oh, whom I have been working with since I began my 

Master study eight years ago. He gave me many opportunities to experience in research, and kept 

giving me much good advice. 

I owe my special thanks to Marc Verdier (LTPCM) and Christian Robertson (CEA Saclay). 

They guided me and advised me as an unofficial co-advisor, from the start of the thesis work to 

the rehearsal of the thesis presentation with great patience and encouragement. And I cannot help 

attributing some of my work to the fantastic tools of Christophe Déprés, who started and finishes 

the Ph.D study with me. 

I want to thank Professor André Pineau (ENS Mines Paris) for serving both as ’Président’ and 

’Rapporteur’ for my thesis defence. From the moment I met him for the first time at the meeting 

of the project ’FAMICRO’ 1 , that supported my work on fatigue simulations, I was fascinated with 

his enthusiasm for research and with his boundless memory.. he is a walking library! I also thank 

Professor François Louchet (LGGE) and Heung Nam Han (SNU) for serving on my thesis 

committee and for their useful suggestions and critical assessment on my work. 

My work has been supported by EGIDE 2 , and I want to thank the CNOUS at Grenoble and the 

French Embassy in Korea for their efficient professional services. 

I am much grateful to all the members of GPM2 Laboratory for pleasant daily life, in the blue room: 

Julien Chaussidon, Thomas Nogaret, computing room: David Rodney, Valérie Quatela 

and on the playground with a soccer ball: Dider Bouvard, Rémy Dendievel, Luc Salvo, 

Charles Josserond, Franck Pelloux and Shigesato Genechi. 

And finally, I want to express my thanks and love to my wife Suejung, who is both my great 

supporter and best friend, for her love and devotion to our family, and to my little daughter Yvine, 

who likes to play with my laptop, for laughter and happiness we all share in our growing family. 

1 Modélisation de la durée de vie en Fatigue de matériaux métalliques structuraux, à partir de mécanismes physiques 

microscopiques 

2 Bourse Pasteur du ministère des affaires étrangères

Abstract 

The 3D Discrete Dislocation Dynamics (DDD) method has been applied to investigate the effects 

of precipitates on the plasticity of FCC single crystals. 

A method to represent the internal interfaces by a series of facets with a pre-defined strength 

has been proposed. For a full account of the mutual elastic interactions between dislocations and 

second-phase particles, the coupling method with a finite element method is extended. In order 

to accelerate the computing time, the serial 3D DDD algorithm has been improved by revisiting 

the ’box method’ and a new parallel code has been developed using the standard Message passing 

Interface (MPI). 

The image stresses due to a three-dimensional particle were computed using the FEM/DDD coupling 

code. The numerical results have been compared to the corresponding analytical solutions. The 

effect of the elastic modulus mismatch on the flow stress and the subsequent hardening behavior has 

then been analyzed. The image stresses were found to affect significantly the work hardening and 

the local events such as cross slip and climb. Finally, the fatigue of precipitate-hardened materials 

was simulated using the new parallel DDD code. The effects of shearable and non-shearable par- 

ticles on the fatigue properties were well reproduced by the simulations, and the numerical results 

showed good agreements with the available experimental observations in a qualitative way. The 

mechanism of the intense slip band formation is proposed from the observation of the simulated 

dislocation microstructure.

Résumé 

La dynamique des dislocations discrètes (DDD) a été appliquée pour examiner les effets des précip- 

ités sur la plasticité des monocristaux de structure CFC. 

Les précipités sont modélisés par un assemblage de facettes franchissable pour une contrainte don- 

née. Afin de tenir compte des interactions élastiques entre les dislocations et les particules, un 

couplage avec la méthode des éléments finis (MEF) a été utilisé. Afin d’accélérer les temps de 

calculs, la ’méthode des boites’ a été revisitée et une version parallele du code a été développée en 

utilisant le standard du programmation ’Message Passing Interface (MPI)’. 

Dans un premier temps, les contraintes images créées par une particule 3D ont été calculées grâce un 

couplage entre la MEF et le code de DDD. Les résultats numériques ont été comparés aux solutions 

analytiques correspondantes. L’effet de la différence des modules d’Young sur la limite élastique 

et le comportement durcissant qui en découle ont ensuite été étudiés numériquement. Nous avons 

montré que les contraintes image ont un effet significatif sur le durcissement et les événements lo- 

caux tels que le glissement dévié et la montée. Finalement, la fatigue des matériaux durcis par des 

précipités cisaillables et non-cisaillables a été simulée avec le nouveau code parallèle de DDD. Les 

résultats obtenus grâce à nos simulations sont en accord avec nos observations experimentales et 

les données de la littérature. Un mécanisme de formation des bandes de glissement intense a été 

proposé à partir de l’observation des microstructures obtenues par simulation.

Contents 

Acknowledgements iii 

Abstract v 

1 Introduction 1 

1.1 Computational methods in plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 

1.2 Dislocation dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 

1.2.1 2D simulations of dislocation dynamics . . . . . . . . . . . . . . . . . . . . . . 3 

1.2.2 3D simulations of dislocation dynamics . . . . . . . . . . . . . . . . . . . . . . 4 

1.3 Scope of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 

2 Description of the simulation method 9 

2.1 Representation of the dislocation lines in FCC metals . . . . . . . . . . . . . . . . . . 10 

2.1.1 Preparation of the simulation space . . . . . . . . . . . . . . . . . . . . . . . . 10 

2.1.2 Discretization of the dislocation lines . . . . . . . . . . . . . . . . . . . . . . . 10 

2.1.3 Existence of a subnetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2.1.4 Comments on other crystal structures and dislocation dynamics models . . . 13 

2.2 Computation of stresses and displacements of dislocations . . . . . . . . . . . . . . . 14 

2.2.1 Evaluation of the driving force . . . . . . . . . . . . . . . . . . . . . . . . . . 14 

2.2.2 Computation of displacements . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 

2.3 Motion of dislocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 

2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 

2.3.2 Dislocation mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 

2.3.3 Dislocation-dislocation interactions . . . . . . . . . . . . . . . . . . . . . . . . 25 

2.3.4 Cross-slip of screw dislocation segments . . . . . . . . . . . . . . . . . . . . . 28

viii CONTENTS 

2.3.5 Plastic strain due to dislocation movement . . . . . . . . . . . . . . . . . . . . 29 

2.4 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 

2.4.1 Periodic Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 

2.4.2 Internal interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

2.5 Acceleration of the DDD code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 

2.5.1 Problem description and review of literatures . . . . . . . . . . . . . . . . . . 35 

2.5.2 The Box method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 

2.5.3 Speedup and Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 

2.5.4 Boxes and Periodic boundary conditions . . . . . . . . . . . . . . . . . . . . . 46 

2.6 Computation procedure of the DDD program . . . . . . . . . . . . . . . . . . . . . . 48 

3 Parallelization of the Discrete Dislocation Dynamics method 51 

3.1 An introduction to Supercomputing . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 

3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 

3.1.2 Classification of hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 

3.1.3 Parallel programming models . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 

3.1.4 Classification of parallel languages . . . . . . . . . . . . . . . . . . . . . . . . 58 

3.1.5 Supercomputers in France and Korea . . . . . . . . . . . . . . . . . . . . . . . 60 

3.2 Towards a parallel DDD code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 

3.2.1 Basic Steps of Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 

3.2.2 Writing a parallel program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 

3.3 Parallelization of the serial DDD program . . . . . . . . . . . . . . . . . . . . . . . . 67 

3.3.1 Initialization of parallel environments . . . . . . . . . . . . . . . . . . . . . . 67 

3.3.2 Long-distance stresses computations . . . . . . . . . . . . . . . . . . . . . . . 70 

3.3.3 Short-distance stresses computation . . . . . . . . . . . . . . . . . . . . . . . 70 

3.3.4 Data structures for distributing and the gathering segments . . . . . . . . . . 71 

3.3.5 Motion of segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 

3.3.6 Summary and comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 

3.4 Performance improvment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 

3.4.1 Measure of performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 

3.4.2 Conditions for good performance . . . . . . . . . . . . . . . . . . . . . . . . . 80 

3.4.3 Performance tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

CONTENTS ix 

3.4.4 Load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 

3.4.5 Comparison of simulation results between the serial and parallel DDD code . 88 

3.5 Application to Stage I-II transition simulation . . . . . . . . . . . . . . . . . . . . . . 88 

3.5.1 Stress-strain curves of FCC single crystals . . . . . . . . . . . . . . . . . . . . 88 

3.5.2 Simulation conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 

3.5.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 

4 Dislocation-precipitate interactions 95 

4.1 Image stresses due to a 3D particle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 

4.1.1 Motivations and review of the literature . . . . . . . . . . . . . . . . . . . . . 95 

4.1.2 Interaction of an edge dislocation with a circular cylindrical particle . . . . . 97 

4.1.3 Interaction of an edge dislocation with a spherical particle . . . . . . . . . . . 99 

4.1.4 Interaction of an edge and a screw dislocation with a cubical particle . . . . . 102 

4.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 

4.2 A simple case of dislocation-particle interaction . . . . . . . . . . . . . . . . . . . . . 105 

4.2.1 Motivation and review of literatures . . . . . . . . . . . . . . . . . . . . . . . 105 

4.2.2 Calculation procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 

4.2.3 Flow stress of impenetrable particles with a different shear modulus . . . . . 107 

4.2.4 Increment in hardening stress . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 

4.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 

4.3 Fatigue simulations of materials hardened by particles . . . . . . . . . . . . . . . . . 116 

4.3.1 Motivation and review of literatures . . . . . . . . . . . . . . . . . . . . . . . 116 

4.3.2 Description of the simulation method . . . . . . . . . . . . . . . . . . . . . . . 118 

4.3.3 Evolution of the dislocation microstructure during the fatigue tests . . . . . . 123 

4.3.4 Mechanical behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 

4.3.5 Surface slip markings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 

4.3.6 Fatigue properties of materials containing particles with a bimodal size dis- 

tribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 

4.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 

5 Conclusions and perspectives 149

Chapter 1 

Introduction 

1.1 Computational methods in plasticity 

A dislocation is a line defect within a crystal, which represents permanent deviations of atoms 

from their original crystallographic periodicity. The dislocation glide gives rise to macroscopic 

deformation of metals. A dislocation thus is a microscopic carrier of the metallic plasticity. 

Modeling the plasticity of metals involves understanding the nature of dislocations, which is defined 

at the atomistic scale and also evaluating the deformation behaviors at the macroscopic scale. Many 

models have been developed to understand the plasticity of metals. Since the features of plasticity 

vary much in size and time, the models also vary largely in length and time scales. Out of a range of 

models, most attention is given in this section on Molecular Dynamics (MD), Dislocation Dynamics 

(DD) and continuum mechanics. 

Atoms are the basic constituent elements of MD simulations. Atoms interact with each other through 

an interatomic potential. The temporal trajectory of an ensemble of atoms under an external loading 

is simulated by minimizing the total potential energy of the system. The deviations of the position of 

the atoms from the lattice sites implicitly represent the dislocations. The atomistic scale topology of 

a dislocation line thus can be investigated by MD. MD simulations are employed mostly in studying 

physical properties of a single or a few dislocation lines due to the constraints of the simulation size 

(< (200nm) 3 ). 

In DD methods, dislocation lines are represented explicitly. The collective evolution of a large 

number of interacting dislocations is simulated under an external loading. Properties of dislocations 

such as mobility, junction strength etc., are input parameters of DD simulations, and dislocation 

glide results in plastic strain in the simulation volume. The stress-strain behavior is thus an output

2 Introduction 

Time (sec) 

10 3 

1 

10 -3 

10 -6 

10 -9 

10 -12 

Molecular 

Dynamics 

10 -11 10 -10 10 -9 

10 -8 

10 -7 

10 -6 

Single crystal 

models 

Dislocation 

Dynamics 

10 -5 

Space (m) 

Homogenization technique 

Polycrystal 

models 

Continuum mechanics 

Figure 1.1: Figure illustrating length and time scales of each model. Solid lines represent the limit 

ranges imposed by the intrinsic physics of the model. Dashed lines represent the limit imposed by 

the available computing power. 

of the DD simulations. 

Continuum mechanics treat the behavior of a continuum medium by a set of equations and boundary 

conditions. There are a wide range of numerical techniques which can solve the equations. Finite 

difference and finite element methods are two broad subsets of such techniques. In these methods, 

a continuum domain of interest is subdivided into discrete cells or elements, in which the values of 

certain physical quantities are determined by solving a system of equations. The output of a typical 

application to the metallic plasticity is the deformation behavior of the simulated volume, for which 

a governing constitutive equation is assumed. 

As introduced briefly above, MD, DD and continuum methods have their own characteristic length 

and time scale. Fig. 1.1 shows such ranges of length and time scales of each method. As the 

performance of each numerical method is improved, the volume and the physical time which can 

be simulated are increasing (top and right domain limit of each method in Fig. 1.1). Recently the 

length and time scales of the various methods begin to be overlapped. This gives a great impetus 

to exchange information in order to build up a unified model of the metallic plasticity, which would 

be able to predict the behavior of a material from the fundamental properties of the material. 

10 -4 

10 -3 

10 -2 

10 -1 

1

1.2 Dislocation dynamics 3 

(a) Weak obstacles (b) Hard obstacles 

Figure 1.2: 2D simulations of dislocations moving through a random array of point obstacles: Effects 

of obstacles’ strength ([Foreman & Makin 66]) 

1.2 Dislocation dynamics 

1.2.1 2D simulations of dislocation dynamics 

Based on the well understood elementary properties of a single dislocation, numerical DD methods 

have been developed first in 2D. 

Dislocation dynamics in 2D can be further divided in terms of the crystallographic orientation of 

the plane used for the simulations: (i) parallel and (ii) perpendicular to dislocation lines. In the 

case (i), the plane of the simulations is parallel to the glide plane of dislocation lines, thus nei- 

ther cross-slip nor climb of dislocations are allowed. This configurations have been applied initially 

to study line tension and the shape of a dislocation under stress ([Brown 64]). The dynamical 

movements of dislocations have also been simulated in the case of a glide plane containing random 

distribution of point obstacles ([Foreman & Makin 66]). The effects of obstacles’ strength on the 

initial flow-stress have been studied, and some of the simulation results are shown in Fig. 1.2. This 

type of 2D simulations is still in use to study the effect of particles’ parameters on the flow stresses, 

see for example [Mohles & Nembach 01]. 

In the case (ii), dislocations are perpendicular to the simulation plane, that is, dislocations are 

infinite, parallel to each and have the same character. This configuration can simulate the multi- 

plication, annihilation, cross-slip and climb of dislocations. It is, however, difficult to include the


line tension effect explicitly. This kind of configuration has been used to simulate the spontaneous 

microstructure formation ([Lépinoux & Kubin 87]). Because of its simplicity, this 2D method 

can simulate dislocation motion up to relatively large strains. This method is still largely under 

development and applied to several studies, see for example [Cleveringa et al. 97]. 

1.2.2 3D simulations of dislocation dynamics 

The motivation of a 3D DD can be summarized as the needs 

• to include the 3D nature of the dislocation behavior, cross-slip, junction formation, ... 

• to explain the formation of dislocation structures during the plastic deformation 

The first simulation in 3D is proposed in [Canova & Kubin 91]. Since then, the proposed method 

has been developed and applied to investigate the collective motion of dislocations under various 

conditions by two leading groups 1 in France. This method is based on the representation of dislo- 

cation lines by segments in an integer space. Other versions of DD in 3D have emerged since the 

end of 1990s, as will be detailed in Sec. 2.1.4. Due to the development of simulation methods and 

the increased computing power, these simulation methods have strengthened their positions in the 

field of crystal plasticity. The 3D discrete dislocation dynamics (DDD) method has proven to be a 

powerful tool to investigate the plasticity of metals and been expected to serve as a link between 

atomistic and continuum scale simulations (see Fig. 1.1). 

1.3 Scope of Thesis 

This thesis aims at applying the 3D DDD method to both rigorous computations of dislocation- 

precipitate interactions and studying the effects of precipitates on the fatigue properties of metals. 

For the rigorous computations, we extended the code coupled with a finite element method ([Fivel 97]) 

in order to incorporate 3D precipitates with a differing elastic modulus. The interaction forces due 

to a second phase particle are computed and the effects of these forces on the flow stress and the 

subsequent hardening are investigated. 

1 Génie Physique et Mécanique des Matériaux (GPM2) and Laboratoire d’Etude Métallurgique (LEM)

1.3 Scope of Thesis 5 

Recently the 3D DDD method are applied successfully to the study of early fatigue crack initi- 

ation of 316L stainless steels ([Déprés 04]). The critical role of cross-slip was pointed out, which 

demonstrates the advantages of the 3D DD simulations over the 2D simulations. Inspired by this 

study, we applied the 3D DDD method to simulate the fatigue behavior of materials hardened by 

precipitates. It was found, however, that the feasible volume fraction of precipitates is quite small 

considering the performance of the currently available computing machines with a single processor 

and the computing efficiency of the serial 3D DDD code. This is due to the additional computa- 

tional loads induced when many precipitates are introduced in the 3D DDD simulations, which are 

already computationally demanding because of the long-ranged stress field of a dislocation segment 

and the need to handle the dislocation interactions during the segment motions. Because of the 

inherent computational load of the 3D DDD simulations, a maximum strain which can be simulated 

in a reasonable time still remains in the order of 10 −3 in multislip condition. 

The easiest way to suffice the computational demands of the fatigue simulations of precipitation- 

hardened materials would be to waite until a faster single processor is available. Considering the 

relatively short period of a doctorate, however, it cannot be a good way to choose notwithstanding 

the speed of a single processor has improved tremendously 2 . 

The other way is to increase the computational capacity by collecting single processors and making 

them work together, that is, parallel computing. A parallel computer simply comprises a number of 

processors that solve a problem together to reduce the elapsed computation time. In fact, parallel 

computing has been widely adopted in many research fields to resolve the increase of the compu- 

tational demands, which arises due to many reasons, e.g. encompassing sophisticated boundary 

conditions, involving nonlinear material behaviors and many unknowns. Evident successes of the 

parallel computing in the field of the computational plasticity can be found in both MD meth- 

ods ([Abraham 97]) and continuum mechanics ([Demmel et al. 93], [Fahrat & Roux 94]). The 

parallel codes have enabled each model to perform large scale simulations in reasonable time. In 

MD simulations, for example, a volume of 0.01nm 3 can be treated over a period of time in the order 

of 10 −12 seconds using massively parallel machines ([Abraham 97]). 

As can be seen in many references including the few examples cited above, the subject of parallel 

2 Semiconductor technology has been known to increase a processor clock rate by double in 18 months up to now. 

This is known as the Moore’s law first published in 1965 ([Moore 65]) and which still holds true today. Intel expects 

that it will continue at least through the end of this decade. In the end, the performance of a single processor 

computing device will reach an upper limit due to the physical limits of semiconductor technology


computing has been investigated extensively and is now a well established field. From the success in 

atomistic and continuum parallel simulations, we came to the conclusion that parallel computation 

will be the best and the only choice in order to include a relatively high volume fraction of precip- 

itates in the simulation volume, because a dramatic increase in computational power can only be 

met through it. 

A parallel DDD code thus has been developed and applied to the fatigue simulations containing a 

large number of particles. The effects of particles on the fatigue properties are studied focusing on 

the irreversibility of slips and the formation of the intense slip bands during the cyclic deformation. 

The parallel DDD code developed would be of benefit to not only small scale simulations which 

involve a large number of internal defects but also large scale simulations which would make a com- 

parison with the macroscopic simulations possible. The parallel DDD code would hence reinforce 

the role of the 3D DDD method in the series of the plasticity simulation methods. 

This thesis is organized as follows. 

Chapter 1 Introduction (this chapter) 

Chapter 2 summarizes the theoretical backgrounds and methodologies of the 3D discrete disloca- 

tion dynamics method. The computation of the displacement fields of a dislocation loop is 

introduced. Several boundary conditions are explained, such as the implementation of internal 

interfaces and the periodic boundary conditions. The numerical efficiency of the serial DDD 

algorithm is increased by revisiting the so-called box method ([Verdier et al. 98]). 

Chapter 3 presents the parallel algorithm developed to parallelize the 3D DDD program used 

in this work. The parallel version of the DDD program aims at simulating fatigue tests of 

materials containing large number of particles in reasonable time using parallel computers. 

The new parallel DDD program is tested and improved in performance by balancing the load 

dynamically, and then applied to stage I-II simulations. This chapter also contains general 

introduction to parallel computing. 

Chapter 4 contains three applications of the method developed and detailed in the preceding 

chapters. Image stresses by a cylindrical, spherical and cubical particle are computed. The 

effects of image stresses on flow stresses and hardening are investigated. The FEM/DDD

1.3 Scope of Thesis 7 

coupling method presented in Sec. 2.4.2 is used for these applications. Finally, the new 

parallel program is used for fatigue simulations of precipitate-hardened metals. 

Chapter 5 gives concluding remarks and perspectives.

Chapter 2 

Description of the simulation method 

THE discrete dislocation dynamics (DDD) method initially proposed in [Canova & Kubin 91] has 

been improved much in its numerical precision and applicability to problems involving complex 

boundary conditions over the past 15 years. The purpose of this chapter is to review the theoretical 

backgrounds and methodologies of the DDD method, and also to describe the author’s contributions 

: computation of displacement fields, implementation of internal interfaces and the periodic boundary 

conditions and acceleration of the code using the revised box method. 

The DDD method used in this thesis only deals with perfect dislocations in face-centered cubic (FCC) 

metals. Sec. 2.1 introduces the simulation lattice and the discretization of a dislocation line of the DDD 

model, and the model is compared with other dislocation dynamics models. Although the focus is given 

on the FCC lattice, the methodology is quite general. The extension of the method to the other cubic 

crystals is also discussed briefly. 

Computation of the effective stress of each dislocation segment is presented in Sec. 2.2. The method 

used for computing the displacement field of a dislocation loop is detailed also, and the extension of the 

method to more general dislocation structures can be found in Sec. 4.3.5. The stress and displacement 

solutions are all based on the theory of linear elasticity in isotropic frame. 

Sec. 2.3 introduces the motion of dislocation segments. This section includes a description of the several 

local rules needed to handle interactions between dislocations. 

New boundary conditions are explained in Sec. 2.4. Representation of internal interfaces is discussed 

in both a simple method using facets and a more rigorous way with full elastic interactions. The 

implementation of periodic boundary conditions is also detailed. 

The performance of the DDD code is improved by revising the box method which was first described in 

[Verdier et al. 98]. The computational efficiency of the method is significantly increased by using the

10 Description of the simulation method 

linked-list of segments. The methodology and the performance of the box method is described in Sec. 

2.5. The overall flowchart of the code is presented at the end of this chapter. 

2.1 Representation of the dislocation lines in FCC metals 

2.1.1 Preparation of the simulation space 

The lattice of the simulation volume is homothetic to that of FCC metals. The lattice spacing 

of the simulation lattice is adopted from an experimental measurement of the athermal critical 

self annihilation distance between edge dislocations 1 . The experiments of Essmann and Mughrabi 

([Essmann & Mughrabi 79]), for example, show that no edge dislocations coexist within the 

distance of the order of 1.5 nm in their copper specimens at room temperature. Thus the shortest 

distance of two edge dislocation in the simulation is set to this critical distance. 

An inter-planar distance of two adjacent {111} planes equals to a/ √ 3, with ’a’ the lattice spacing 

(see Fig. 2.2(a)). If ye denotes the critical self annihilation distance, a can be expressed as Eq. 2.1 

by equating a/ √ 3 to 2ye. 

a = 2 √ 3ye 

A typical value of the simulation lattice spacing xl(= a/2) is around 2.598 nm with the value of 

ye = 1.5 nm. 

The reader should note that xl is the value of the order of 10b, where b is the magnitude of the 

Burgers vector. This is certainly larger than the dislocation core radius (∼ 2b). The use of the 

lattice spacing larger than the core radius has two effects on the simulation method. 

1. Linear elastic solutions of stress and displacement of a dislocation is valid all over the simu- 

lation network (Sec. 2.2). 

2. It requires to express the core properties of a dislocation in a phenomenological manner (Sec. 

2.3.3 & 2.3.4). 

2.1.2 Discretization of the dislocation lines 

Only perfect dislocations in FCC metals are considered in this work and no dissociation into partials 

is allowed. It is probable that the width of spacing of two partial dislocations is smaller than the 

1 Screw dislocations annihilate more easily than edge ones by the cross-slip mechanism, thus the critical distance 

of edge dislocations defines the lattice spacing of the simulation lattice. 

(2.1)

2.1 Representation of the dislocation lines in FCC metals 11 

[-12-1] 

[111] 

[-101] 

Edge segment 

Screw segment 

2 xl 

Figure 2.1: Representation of a curved dislocation line with a link of pure edge and screw segments: 

The dots represent lattice points on (111) slip plane. Unit lengths of edge ( √ 6xl) and screw segment 

( √ 2xl) are shown. 

lattice spacing used (∼ 10b), because the stacking fault energy γ is about 140mJm −2 for aluminium, 

40mJm −2 for copper and 20mJm −2 for silver, which gives the corresponding width of staking-fault 

ribbons of √ 2b, 5 √ 2b and 7 √ 2b for aluminum, copper and silver respectively for the case of Poisson’s 

ratio being zero ([Hull & Bacon 83]). 

A curved dislocation line is represented as a connected set of discrete dislocation segments of a pure 

edge and a pure screw type. This is why the method is called as the edge-screw model. Fig. 2.1 

schematically shows the discretization of a dislocation line by a succession of orthogonal edge and 

screw segments of the same Burgers vector on the same slip plane 2 . 

Maximum length of a segment is set to the discretization length ld and any segment with a length 

lseg longer than ld is subdivided further into lseg/ld segments. 

The edge (< 112 > type) and screw (< 110 > type) vectors for each of the 12 slip systems used 

in the DDD simulations are shown in Table 2.1 3 . Each screw direction is associated to two edge 

directions, Edge1 and Edge2, defining the two glide systems, (Screw, Edge1) and (Screw, Edge2), 

which share the same Burgers vector. The line directions of the 6 screw vectors (or Burgers vectors) 

6 xl 

2 Edge segments move along the screw vector direction and vice versa. Edge segments of the line vector [¯12¯1] in 

Fig. 2.1, for example, move along ±[¯101], and screw segments of [¯101] move along either ±[¯12¯1] or ±[¯1¯2¯1] direction 

(the cross-slip mechanism (see Sec.2.3.4)). 

3 The notation of Schmid and Boas [Schmid & Boas 35] is written with the system number.


are adopted from the Thompson tetrahedron given in [Hirth & Lothe 92], p319. The signs of the 

vectors are defined from the following 2 rules: 

1. Edge × Screw = n, where n is the outgoing normal of the Thompson tetrahedron 

2. Edge1 × Edge2 = b so that any prismatic loop is unambiguously defined 4 

System 1 (B4) 2 (D4) 3 (D1) 4 (C1) 5 (B5) 6 (C5) 

Screw [¯101] [011] [1¯10] 

Edge [¯12¯1] [¯1¯2¯1] [¯2¯11] [2¯11] [¯1¯12] [¯1¯1¯2] 

Plane normal (111) (¯11¯1) (¯11¯1) (¯1¯11) (111) (¯1¯11) 

System 7 (D6) 8 (A6) 9 (A2) 10 (B2) 11 (C3) 12 (A3) 

Screw [¯1¯10] [0¯11] [101] 

Edge [1¯1¯2] [1¯12] [211] [¯211] [1¯2¯1] [12¯1] 

Plane normal (¯11¯1) (1¯1¯1) (1¯1¯1) (111) (¯1¯11) (1¯1¯1) 

Table 2.1: Vectors of line and glide directions of dislocation segments used in the DDD code. 

Each segment is represented numerically by a set of integers that are the three coordinates of the 

starting point, the length and the two indexes of the line and the moving vector. The coordinates 

are expressed in units of the simulation lattice parameter xl. The length is in unit of the norm of 

the line vector. The connection of a line is built through a pointer of segments index. 

2.1.3 Existence of a subnetwork 

There exist certain sets of slip planes in which mutual dislocation interactions cannot be treated 

properly. We shall call each set as subnetwork. This is due to the fact that in the edge-screw model, 

a unit line vector of an edge dislocation is , whose length is √ 6xl (Table 2.1). 

An edge dislocation [11¯2] on a (111) plane, for example, is shown in Fig. 2.2(b). There are two 

(¯1¯11) slip planes which intersect with the [11¯2] edge dislocation in a unit cell of a simulation volume 

as illustrated in Fig. 2.2(a). The lattice points along the intersecting lines are shown with filled 

and hollow points for each plane in Fig. 2.2(b). One of the planes cuts the unit edge segments in 

the middle, which is not permitted 5 . 

4 the Right Hand Final to Start(RHFS) rule is adopted, which can be seen in Fig.2.10 

5 It is noted that this improper intersection happens between two planes with the same Burgers vector.

2.1 Representation of the dislocation lines in FCC metals 13 

[100] 

xl 

[001] 

a 

(a) The unit cell 

[010] 

xl[-110] 

xl[11-2] 

(b) Subnetwork 

Unit edge segment 

Figure 2.2: The unit cell of the simulation space and the existence of subnetworks: The lattice is 

homothetic to that of FCC crystal, where xl is usually taken as ∼ 10b. There exist subnetworks 

due to the definition of dislocation line vectors in Tab. 2.1 

This indicates that there exist subnetworks which cannot be used simultaneously. Attention should 

thus be given to the initial dislocation configurations of the simulations so that segments in two 

slip planes of the same Burgers vector share a common point on one of the planes. In practice each 

starting point of the dislocation segments is described in the elementary basis (Screw, Edge1, Edge2) 

so that the origin point (0,0,0) is the same for the two involved slip systems. The subnetwork also 

imposes certain restrictions while applying periodic boundary conditions (see Sec.2.4.1). 

2.1.4 Comments on other crystal structures and dislocation dynamics models 

Although it is not treated in this thesis, dislocations in other cubic crystal structures can be rep- 

resented in a similar manner. For example, in the body-centered cubic (BCC) crystal structure, 

slip occurs in close packed directions. The crystallographic slip planes are {110},{112} and 

{123}. By the same analogy as the construction of Table 2.1, the slip systems of {110} or 

{112} can be defined by 4 screw and 12 edge line vectors respectively. The {123} 6 

slip system involves 4 screw and 24 edge line vectors. Dislocation dynamics models using the BCC 

crystal structure can be found in [Devincre & Roberts 96] and [Tang et al. 98]. 

There exist several dislocation dynamics models. The difference comes mainly from how to dis- 

6 The {123} slip system is less closed packed, thus at low temperatures, it would be sufficient to take only 

{110} and {112} slip systems into account.


(a) Edge-screw model (b) Pure-mixed model (c) Nodal model 

Figure 2.3: Discretization of a curved dislocation line in edge-screw, pure-mixed and nodal model 

cretize dislocation lines. Zbib et al. ([Zbib et al. 98]), for example, has approximated the dis- 

location curves by series of mixed straight segments of an arbitrary length and orientation. The 

scheme, which parameterizes a dislocation line by a set of nodes, is often called as ’nodal dislocation 

dynamics’. Some of the nodal dislocation dynamics can even treat dislocation splitting into partials 

([Shenoy et al. 00], [Weygand et al. 01]). The nodal model has advantages in the numerical pre- 

cision. The nodal model is, however, much complex in dealing with topological aspects of segments, 

because it involves more degrees of freedom in segment types as compared to the edge-screw model. 

Thus the nodal model is used preferably to investigate phenomena involving a small number of 

dislocations and a high precision in the dislocation topology ([Schwarz 99], [Ghoniem et al. 00]). 

Recently, there has been an attempt to increase numerical accuracy by introducing one more seg- 

ment type in the edge-screw model. It is called as the ’pure-mixed’ model. This model incorporates 

additional line directions, i.e. ±60 o characters. The model aims at an accurate description of a 

curved dislocation line with a minimum number of segments ([Devincre et al. 01], [Madec 01]). 

In Fig. 2.3, the discretization description methods for a curved dislocation line used in the edge- 

screw, pure-mixed and nodal model are compared side by side. 

2.2 Computation of stresses and displacements of dislocations 

2.2.1 Evaluation of the driving force 

The velocity of each segment is governed by the effective stress τe acting on the segment. The 

effective stress is given by τe = fg/b, where b is the magnitude of the Burgers vector and fg is the 

magnitude of the glide force per unit length. fg is computed at the center of each segment and

2.2 Computation of stresses and displacements of dislocations 15 

includes four contributions: 

(i) the force due to the internal stress field produced by all the other dislocation segments in the 

simulation volume except by two neighboring segments and the considered segment itself 

(ii) the force due to applied stress fields 

(iii) the force due to the line tension 

(iv) the force due to the Peierls stress 

The forces due to atomistic-level interactions, such as dragging forces by solute atoms or jogs, are not 

treated explicitly. They can be included implicitly, however, by modifying the motion rule which de- 

fines the relation between the glide velocity and the effective shear stress of a segment (see Sec. 2.3). 

Internal stresses 

To compute the internal stresses at the center of a segment, the expression of the stress field of a 

single finite straight segment is required. This problem has been addressed by Li ([Li 64]). Li has 

found an interesting fact from the stress solution of an angular dislocation made of two semi-infinite 

dislocations joined together at one point. According to Li, the stress field of an angular dislocation 

is the sum of the stress fields of each dislocation arm, i.e., a semi-infinite dislocation. Although the 

stress field of a semi-infinite dislocation does not obey the equations of equilibrium, the sum of the 

stress fields of two semi-infinite dislocations satisfies the equilibrium. 

If a semi-infinite dislocation lies in the positive z axis running into the origin, O, the stress field 

produced at a point r(x,y,z) has the following components ([Li 64]). 

σxx(r) = −bxy−byx 

r(r−z) − x2 (bxy−byx)(2r−z)) 

r 3 (r−z) 2 

σyy(r) = bxy+byx 

r(r−z) − y2 (bxy−byx)(2r−z)) 

r 3 (r−z) 2 

σzz(r) = z(bxy−byx) 

r 3 

σyz(r) = y(bxy−byx) 

r 3 

σzx(r) = x(bxy−byx) 

r 3 

σxy(r) = bxx−byy 

r(r−z) 

− 2ν(bxy−byx) 

r(r−z) 

− νbx 

r 

+ νby 

r 

+ (1−ν)bzx 

r(r−z) 

− (1−ν)bzy 

r(r−z) 

− xy(bxy−byx)(2r−z) 

r 3 (r−z) 2 

In Eq. 2.2, the stresses are given in unit of µ/4π(1 − ν) with µ and ν being the shear modulus and 

the Poisson ratio respectively. r is the distance to the point r(x,y,z) as shown in Fig. 2.4. The 

stress field of a dislocation segment lying on the z axis running from z2 into z1 is obtained from 

(2.2)


X 

Z 

z2 

z1 

O 

Dislocation 

segment (z2-z1) 

r 

r(x,y,z) 

Y 

= 

X 

Z 

z1 

O 

Semi-infinite 

dislocation 

Y 

- 

X 

Z 

z2 

O 

Semi-infinite 

dislocation 

Figure 2.4: A configuration of a semi-infinite dislocation and a calculation of a stress field of a 

dislocation segment 

that of two semi-infinite dislocations as shown in Fig. 2.4. The stress field is constructed by using 

Eq. 2.2 twice, and substituting z in the equation for z − z1 and z − z2 respectively. 

σij(r) = σij(r)z−z1 − σij(r)z−z2 

(2.3) 

The expressions of Li (Eq.2.2) are derived such that a semi-infinite dislocation line lies on the z 

axis. A rotation of the stress tensor would be necessary for an arbitrary segment in order to bring 

the segment into the reference coordinate. 

The compact formulae of de Wit [dewit 67], on the other hand, are given with respect to an 

arbitrary Cartesian coordinate system. Thus the expressions of de Wit can be used without any 

rotation of the coordinate system. The final form is shown in Eq. 2.4, which has been derived by 

Devincre in [Devincre 95]. 

σij(r) = µ 

πY 2 

 

[bYt] s ij − 1 

1 − ν [btY]s 

(b, Y, t) 

ij − δij + titj + 

2(1 − ν) 

2 

Y 2 

 

ρiYj + ρjYi + L 

R YiYj 

 

(2.4) 

The vectors in Eq. 2.4 are shown on a dislocation line of a line vector t and Burgers vector b 

in Fig. 2.5. The vectors and the scalars are defined as R = r − r ′ , L = R · t, ρ=R − Lt and 

Y = R + Rt. δij is the Kronecker delta and (b, Y, t) is the mixed product. [abc] s ij 

Y 

is defined as 

1 

2 ((a × b)icj + (a × b)jci). The stress field of a dislocation segment between two points A and B 

is determined by inserting Eq. 2.4 in Eq. 2.3 and substituting r ′ for r ′ A and r′ B 

in Eq. 2.4.


X 

Z 

O 

r 

r’ 

R 

t 

Y 

ρ 

L 

Infinite 

dislocation 

Figure 2.5: Definitions of the geometry of Eq. 2.4 

The formulae of both Li and de Wit are derived within the frame of the isotropic elasticity theory. 

A numerical method for stress fields in anisotropic elasticity has been developed recently by Rhee 

et al. in [Rhee et al. 01]. The difference between the isotropic and anisotropic solution was found 

to have an important effect within only about 15b from the distorted hexagon they used for the 

calculations. The difference, however, becomes smaller as the distance from the hexagon increases, 

therefore it is sufficient to use the solution of the isotropic elasticity for long-range interactions. 

The stress field of a prismatic loop represented by successive straight segments is shown to exhibit a 

satisfactory accuracy comparing with the corresponding exact analytical solution ([Khraishi et al. 00a]). 

The computation of the segment stress fields shows no anomaly even near the joint of two rectan- 

gular segments. The contour of the resolved shear stress on the (¯11¯1) plane is shown in Fig. 2.6(b) 

and the corresponding dislocation segments in Fig. 2.6(a). 

Computation of internal stresses is the most computationally demanding spot in the DDD algo- 

rithm. A method to increase the efficiency of computation will be discussed in Sec. 2.5. 

Applied stresses 

External stresses are applied in two ways, depending on the boundary conditions involved. 

In the first case, the simulation volume represents a small element in a single crystal or a grain of 

a polycrystal. In this case, the external stress field is assumed to be homogeneous throughout the 

simulation volume. The same stress tensor is applied to each segment in the volume. The magnitude 

of this tensor is updated according to a certain rule, constant stress or strain rate ([Fivel 97]). 

In the second case, the simulation volume represents a finite volume with free surfaces, thus exter-


1 µ m 

Dislocation 

segments 

1 µ m 

3 µ m 

n=[111] 

b=[110] 

(a) Dislocation segments configuration (b) Contour of the resolved shear 

Figure 2.6: A planar set of dislocation segments and the contour of the resolved shear stress on the 

glide plane: The stress is computed at the corner where two orthogonal segments meet (shaded area 

stress 

of 1 µm × 1 µm). The resolved shear stress shows no anomaly. 

nal stresses produce inhomogeneous stress fields in the volume. This inhomogeneity of the applied 

stresses can be incorporated using a code coupled with a finite element method ([Fivel et al. 98]). 

The more general cases which include internal interfaces, e.g. second phase particles or multilayer 

films are treated in Sec. 2.4.2. 

Line tension 

The mutual effect between two adjacent segments, which is not considered in the internal stress 

computation, is accounted for by a local line tension computation. The line tension T (θ) creates 

a force τlt = T (θ)/(bR) along the center of a dislocation arc with a radius of curvature R. T (θ) is 

given by the energy of a dislocation line E(θ) with θ being the angle that the Burgers vector makes 

with the dislocation line direction. 

T (θ) = E(θ) + d2 E(θ) 

dθ 2 

The simplest form of the line tension would be obtained by assuming that edge, screw and mixed 

segments have the same energy per unit length, i.e., E = αµb 2 . The line tension of an arc of 

dislocation then becomes τlt = αµb 

R 

from Eq. 2.5. 

The energy of a dislocation is dependent on the character however: a screw dislocation has lower 

energy than an edge one. This explains why a dislocation line shape is approximately elliptical with 

a major axis parallel to the Burgers vector. To include the variation of the energy with a segment 

character, the analytical equation of line tension suggested by Foreman [Foreman 67] (Eq. 2.6) is 

(2.5)


used. 

b θ L 

τlt τ’ lt 

Dislocation line vector 

Figure 2.7: Definition of the geometry of the line tension calculation. 

τlt = 

R 

µb 

4π(1 − ν)R (1 − 2ν + 3ν cos2 

 

L 

θ) ln − ν cos(2θ) 

2b 

µ and ν stand for the shear modulus and the Poisson ratio respectively. R is the radius of a circle 

defined by the three center points of segments. L is the length of a segment and θ is the angle 

between the Burgers vector b and the dislocation line vector. The dislocation line vector is taken 

as parallel to the vector of two center points of the neighbor segments as illustrated in Fig. 2.7. 

τlt is, in fact, the magnitude of the line tension along the direction to the center of the circle. τlt 

projected to the glide direction of a segment is finally taken as the line tension acting on a segment. 

The Peierls force 

The Peierls stress refers to the applied resolved shear stress required to make a dislocation glide in 

an otherwise perfect crystal. This effect arises as a direct consequence of the periodic structure of 

the crystal lattice and acts as a friction to the dislocation motion. In the DDD, which cannot treat 

atomistic effects explicitly because of the lattice parameter xl of the order of 10b (Sec.2.1.1), the 

Peierls stress is simply implemented as a frictional force τp and contributes to the effective stress as 

a back stress to motion of a segment. In practice, the frictional force τp includes all the chemical 

effect, the impurities, and solutes etc. identified on experiments ([Déprés et al. 04]). In the case 

of FCC metals, τp is the order of 10 −5 µ, thus is expected to have a minute effect on the simulation 

results. 

(2.6)


Effective stresses 

After the internal (σint) and the applied stresses (σapp) are computed, the force on a slip system 

is defined by the Peach-Koehler equation and a projection along the glide direction g as shown in 

Eq.2.7. 

, where l is the unit vector tangent to the dislocation line. 

τg b = {[(σint + σapp) · b] × l} .g (2.7) 

It should be noted that σint and σapp are computed at the center of a given segment on the 

assumption that the stress field variations are small over the segment length. The effective stress 

τe is then computed by summing all the contributions as τe = τg + τlt − τp. Then, the velocity of 

the dislocation segment is given by Eq. 2.13. 

2.2.2 Computation of displacements 

The computation of the displacement field of dislocations is very useful not only in analyzing surface 

deformation induced by dislocations, but also in imposing displacement boundary conditions in a 

coupling method with a finite element method (Sec. 2.4.2). 

The displacement solution of any closed curved dislocations can be found from the Burgers formula 

in the frame of elastic isotropy. The Burgers equation is given in terms of line and area integrals as 

shown in Eq. 2.8 in a vector form. 

u(r) = − b 

 

1 b × dl 

Ω − 

4π 4π C 

′ 

R + 

1 

8π(1 − ν) ∇ 

 

(b × R) dl 

C 

′ 

R 

b is the Burgers vector and ν is the Poisson ratio. Ω is the solid angle through which the positive 

side of a loop is seen and is defined as follows. 

 

RdA 

Ω = − 

A R3 The parameters for the computation Eq. 2.8 and Eq. 2.9 are shown for the configuration of a closed 

loop in Fig. 2.8. 

An analytical solution of the displacement fields can be obtained using Eq. 2.8 for the case of 

simple dislocation loops 7 . The solutions of complex dislocation loops are generally difficult to be 

resolved analytically. The general way of computing a displacement field of an arbitrary dislocation 

loop is to decompose the loop into triangular loops as illustrated in Fig. 2.8. The methodology 

to construct a displacement field from triangular loops was first presented by Hirth and Lothe (see 

7 Khraishi et al. ([Khraishi et al. 00b]) have found a closed-form analytical solution of a circular dislocation loop. 

(2.8) 

(2.9)


b 

C 

A 

Field point 

Ω 

R 

dl’ 

n Slip plane normal 

Dislocation 

loop 

b 

Triangular loop 

n 

Dislocation 

segments 

Figure 2.8: The parameters in the Burgers equation (Eq. 2.8) and decomposition of a dislocation 

loop by triangular dislocation loops 

[Hirth & Lothe 92]). Special care, however, should be taken at evaluating the inverse trigono- 

metric functions, as the author experienced. Barnett ([Barnett 85]) has developed a formula more 

suitable for numerical computation, which will be detailed below. 

The displacement at a field point P(r) generated by a triangular dislocation loop with points 

A(rA), B(rB) and C(rC) are expressed as Eq. 2.10. The triangular dislocation loop ABC and a 

field point are shown in Fig. 2.9. 

u(r) = − b 

4π Ω + FAB + FBC + FCA 

(2.10) 

Ω is the solid angle associated with the triangle ABC, which generates a discontinuity of ∆u = b 

in traversing the cut surface ABC. F ij(i,j=A,B or C) is a continuous displacement field term ex- 

cept on the dislocation line. The solid angle Ω and the continuous terms Fij are given as follows 

([Barnett 85]). 

 

 

s 

 

s − a s − b s − c 

Ω = −sign (Ri.n) 4 arctan tan tan tan tan 

2 2 

2 

2 

 

Fij = − 

1 − 2ν 

8π(1 − ν) (b × tij) ln Rj + Rj.tij 

Ri + Ri.tij 

+ 

1 

8π(1 − ν) (b.nij) 

 

Rj 

Rj 

The vectors and the constants in Eq. 2.11 and Eq. 2.12 are listed below. 

− Ri 

 

× nij 

Ri 

(2.11) 

(2.12)


n 

B(r 

) 

B 

A 

(r ) 

A 

RA 

RB 

P (r) 

RC 

C (r C ) 

Triangular loop 

Figure 2.9: A geometric configuration of a triangular loop and the parameters for the computation 

of displacements using Eq. 2.10 

⎧ 

⎪⎨ 

⎪⎩ 

s = a+b+c 

2 

a = arccos (rB−r).(rC−r) 

rB−rrC−r 

b = arccos (rA−r).(rC−r) 

rA−rrC−r 

c = arccos (rA−r).(rB−r) 

rA−rrB−r 

Ri = ri − r 

tij = rj−ri 

rj−ri 

nij = Ri×Rj 

RiRj 

The displacements at any field point by a dislocation loop are obtained by the summation of the 

displacements of triangular loops which comprise the dislocation loop. As an example, the displace- 

ment field of a interstitial prismatic loop (Fig. 2.10(a)) computed by Eq. 2.11 and Eq. 2.12 is 

shown in Fig. 2.10(b). It can be seen that the interstitial prismatic loop induces the maximum 

displacement of 0.5b on the plane just above the loop. 

Displacement computation of more general cases of dislocation loops will be presented in Sec. 4.3.5, 

where the presented computation method is applied to the analysis of surface deformation during 

fatigue tests (see Sec. 4.3.5).


Computation plane 

(a) Schematic of the deformation around an interstitial prismatic 

loop 

b 

e 1 

e2 

Probing line 

Surface, b 

Probing line, µ m 

(b) Computed displacement field around an interstitial prismatic loop 

Figure 2.10: Computations of displacements induced by an interstitial prismatic loop using Eq. 2.10


2.3 Motion of dislocations 

2.3.1 Preliminaries 

The stress field of a moving dislocation is, in fact, not equivalent to that of a static dislocation. 

Under most dynamic conditions of practical interest, however, dislocations move in such a way 

that the dynamic stresses and displacements can be approximated quite accurately by the static 

solutions, e.g., the stress equations presented in Sec. 2.2. 

Only dislocation glide on a slip plane is considered in the current DDD code. No climb mechanisms 8 

are implemented here. Theoretically, diffusion theories could be incorporated in the DDD code to 

treat climb event properly, because climb involves interactions between dislocations and point defects 

(vacancies or interstitial atoms). Numerically, it would be necessary to include a new line vector 

and a glide direction into Tab. 2.1, because climb involves the nucleation and motion of jogs. 

Dislocation mobility is dependent on the applied shear stress and temperature. It varies also with the 

crystal purity and the dislocation type 9 . There are a number of forms for the relations between glide 

velocity and the effective shear stress, including power law forms and expressions with an activation 

term in an exponential function to represent the temperature-dependency ([Hirth & Lothe 92], 

[Kocks et al. 75]). A simple power law form is adopted in this work for convenience sake, but any 

forms of equation can be readily adopted. 

2.3.2 Dislocation mobility 

The simple power law relation (v ∝ (τ) m ) is used to compute the dislocation velocity. A linear form 

of the equation, m=1, is known to predict well the case of glide over the Peierls barrier in FCC 

metals. 

The velocity of a dislocation segment is given by 

vi = τe|b| 

B 

(2.13) 

with the effective stress of segment (τe), the Burgers vector (b) and the phonon drag coefficient 

(B) 10 . At room temperature, the coefficient B is found to be of the order of 10 −4 Pa·s for aluminium 

8 a process by which an edge dislocation can move out of its slip plane by diffusion 

9 In BCC single crystals, for example, a pure screw dislocation is more difficult to move than a mixed one at low 

temperature, since a screw dislocation has a complex core structure ([Urabe & Weertman 75]). 

10 Damping forces, which oppose dislocation motion, arise from the scattering of lattice vibrations (phonons) or 

electrons.

2.3 Motion of dislocations 25 

([Mason 68]) and 1.5 · 10 −4 Pa·s for copper ([Fusenig & Nembach 75]). The coefficient B, in 

fact, changes with the velocity of a dislocation as B = B0 

1−v 2 /c 2 . For simplicity, a constant value of B 

is used by putting a limit on the velocity of dislocations as vmax, so that v 2 /c 2 becomes relatively 

small. 

Using the velocity of a segment given by Eq. 2.13, the next position of the segment is solved by 

explicit integration such that x t+∆t 

i 

= xt i + vi∆t, where xt i is the position of the segment at time t 

and ∆t is the time step. As is a feature of the forward explicit algorithm, the use of a larger value 

of ∆t causes a numerical instability. In the DDD method, a dislocation segment may oscillate, 

because a large time increment causes a segment to move over a too large distance. This brings a 

significant change in the local curvature, and in turn, produces an increase of the back stress (the 

line tension). The segment oscillates consequently. The use of a constant value of ∆t in the range 

from 0.5 × 10 −9 to 1. × 10 −9 has been verified successful in practice, but ∆t has to be adapted for 

each simulation. The maximum velocity vmax is imposed so as to prevent the segments to glide over 

a too large distance. 

2.3.3 Dislocation-dislocation interactions 

A segment can interact with other segments during the glide. The task is then to search any possible 

intersection with segments within a virtual glide area of the gliding segment, which is defined by 

the length of the segment, Li and the free flight distance, vi∆t. The nearest intersection point of 

the possible interaction events is found from simple geometry of two finite lines (segments). The 

type of interaction is, then, determined by the relation of the Burgers vectors and the slip systems 

of the two intersecting segments. 

The types of possible dislocation-dislocation interactions considered in the DDD model are catego- 

rized as follows: 

a. coplanar cases in which two dislocation segments glide on the same plane 

b. non-coplanar cases in which two dislocation segments glide on different planes 

(a) Coplanar cases 

The portion of intersection of two segments with the same Burgers but opposite in direction (oppo- 

site sign) is deleted and the links of the rest segments are rebuilt as shown in Fig. 2.11(a). In case 

of the same sign, no interaction is realized, since it is elastically repulsive. Only discretization of a 

segment is done for the next step as illustrated in Fig. 2.11(b).


Opposite sign 

Same sign 

(a) Annihilation 

(b) Repulsion 

Annihilation 

Discretization 

Figure 2.11: Interaction between two segments in the same glide plane: Segments are annihilated 

if the sign is opposite, and discretized if the sign is same. 

No explicit handling is done in the case of two different Burgers vectors in the same plane, which 

corresponds to the a copla 

1 

(b) Non-coplanar cases 

case explained below. 

Before introducing interaction handling schemes for non-planar cases, dislocation junctions are 

presented, because such interactions result in the formation of junctions. In the frame of the 

hardening theory, five different forms of dislocation junctions are usually considered: 

(i) a coli 

1 

(ii) a ortho 

1 

(iii) a copla 

1 

for which b1 = b2 on different slip planes 

for which b1 ⊥ b2 on different slip planes 

for which b1 = b2 on the same slip plane 

(iv) a2 for which b1 + b2 is glissile on either of the planes 

(v) a3 for which b1 + b2 is sessile on either of the planes 

The junctions formed between slip systems are tabulated in Table 2.2 for the 12 slip systems defined 

in Table 2.1. 

(i) a coli 

1 

is represented in the DDD by changing neighboring arms between two interacting seg- 

ments 11 . Fig. 2.12 shows the intersection of two dislocation segments, which glide on a slip plane 

11 Its role in dislocation-hardening can be found in [Madec et al. 03].

2.3 Motion of dislocations 27 

A2 A3 A6 B2 B4 B5 C1 C3 C5 D1 D4 D6 

A2 a0 a copla 

1 

a copla 

1 

a coli 

1 a2 a2 a ortho 


1 a3 a2 

A3 a0 a copla 

1 a2 aortho 1 a3 a2 acoli 1 a2 a3 aortho 1 

A6 a0 a2 a3 aortho 1 a3 a2 aortho 1 a2 a2 acoli 1 

B2 a0 a copla 

1 

a copla 

1 

a ortho 


1 a2 a3 

B4 a0 a copla 

1 a3 aortho 1 a2 a2 acoli 1 

B5 a0 a2 a2 acoli 1 a3 a2 aortho 1 

C1 a0 a copla 

1 

a copla 

1 

a2 

a2 

a coli 

1 a2 a2 

C3 a0 a copla 

1 a2 aortho 1 

C5 symmetric a0 a2 a3 a ortho 

1 

D1 a0 a copla 

1 

a3 

a copla 

1 

D4 a0 a copla 

1 

D6 a0 

Table 2.2: Hardening coefficients 

and its deviate plane respectively. Segments change its neighbors upon intersection and make an 

angular dislocation with θ = 70.53 ◦ . 

(ii) No explicit treatment is done on a ortho 

1 

(iii) No explicit treatment is done on a copla 

1 

(known as Hirth lock). 

as explained in the coplanar cases above. 

(iv) & (v) a2 (known as Glissile junction) and a3 (known as Lomer-Cottrell lock) are implicitly 

adopted with the simple energy analogy in Eq. 2.14 Two segments of the Burgers vector b1 and b2 

n prim 

b 

devi 

n 

Figure 2.12: Changing of neighbor arms between two segments in primary and deviate planes. 

n prim 

b 

devi 

n


b 

(a) 

b 

(b) 

Figure 2.13: Cross-slip of a screw segment 

are considered to form a junction if a simple energy criterion 

b 2 1 + b 2 2 > (b1 + b2) 2 

b 

(c) 

(2.14) 

is satisfied. The energy of a dislocation is assumed to be proportional to |b| 2 and has no dependence 

on the line character in this criterion. Once a junction is formed, it is given a certain breaking 

strength τjunc. The junction can be broken afterward only if the effective stress of a component 

segment of the junction is larger than τjunc, thus the junction acts as a pinning point for the 

dislocation motion. 

Detailed studies of the dislocation junctions ([Shin et al. 01], [Rodney & Phillips 99]) show that 

they are formed due to the elastic fields of two dislocation lines and the strength of junction is 

governed by the "unzipping" mechanism. Thus the properties of junctions can be treated by the 

elastic stress fields of the involved dislocations. The local breaking strength of a junction will serve 

as pinning points in so-called mass simulations. 

2.3.4 Cross-slip of screw dislocation segments 

The cross slip of a screw segment (Fig. 2.13) is implemented in a stochastic manner accounting for 

its thermally activated character. A cross slip probability P over each time step is computed first 

using the equation 

P = β l 

δt 

L0 t0 

 

τd − τIII 

exp V 

κT 

(2.15) 

,where β is a normalization coefficient, l is the length of the particular screw segment, L0 is 1 µm, 

t0 is 1 sec, V is the activation volume, τd is the resolved shear stress in the cross slip system and 

τIII is a threshold stress. A random number r is generated and the dislocation cross slip occurs 

only if r is lower than P . As an example, the values used for V and τIII in copper are V = 350eV 

and τIII = 32 MP a.

2.4 Boundary conditions 29 

2.3.5 Plastic strain due to dislocation movement 

Dislocation motion results in plastic strain. The plastic strain of a simulation volume is determined 

by summing up the slipped area taking place in each slip system. The slip γ (s) of a slip system ’s’ 

is computed as 

γ (s) = |b|A(s) 

V 

(2.16) 

with b being the Burgers vector, V the volume of the simulation box and A (s) the area swept by 

all the mobile dislocations of the slip system s over a time step. A (s) is defined as 

A (s) = 

Livi∆t (2.17) 

i 

where the summation is done over all the segments of the system s and Livi∆t is the area of glide 

of a segment i with the length Li. The components of the plastic strain tensor are given by 

ɛij = 

12 

s=1 

1 

 

n 

2 

(s) 

i b(s) 

j + n(s) 

j b(s) 

 

i γ (s) 

(2.18) 

with n (s) 

i and b (s) 

i being the component of the slip plane normal and the Burgers vector of the slip 

system s respectively. 

2.4 Boundary conditions 

2.4.1 Periodic Boundary Conditions 

Motivation and review of literatures 

Typically simulation volume are 10 3 − 15 3 µm 3 large simulation volume for a so-called mass simu- 

lation addressing work hardening or dislocation cell formation. In order to compare the simulations 

to experiments, it is desirable to build a simulation volume representative of a small element taken 

out from the single crystal or from the grain of a ploycrystal. 

Periodic boundary conditions (PBC) forcing segments to cross a boundary between two cells to 

emerge in all cells at the equivalent position on the opposite boundary, are extensively used to 

avoid undesirable size effects due to the finite dimensions of a simulation volume. PBC can be 

easily applied in 2D case, for example, by subtracting a simulation volume size (Lx, Ly) from coor- 

dinates of dislocation segments leaving the initial volume. The simulations of Gómez-García et al. 

[GG et al. 00] have shown many advantages of PBC compared to free boundary surfaces which are 

bound to have artificial dislocation losses and undesirable size effects.


0 1 ... ... L-2 L-1 

L-1 

Figure 2.14: Property of p.b.c 

PBC in 3D was considered to be difficult because of the complexities related to the connectivity 

of the dislocation lines segments after exiting from one boundary. Since [Bulatov et al. 01] has 

demonstrated that PBC can be applied to dislocation dynamics, attentions have been given on the 

stress calculation, initial configuration of dislocations and balancing the incoming and outgoing dis- 

location fluxes. Madec et al. ([Madec et al. 04]) have reported that portions of dislocation loops 

may self-annihilate with replicas having emerged after a certain number of boundary crossings. 

This self-annihilation reduces the mean free-path of dislocations and consequently leads to spurious 

self-interactions, because a short effective mean free-path affects the density of mobile dislocations 

and their storage rate and, hence, both the microstructure arrangements and the strain hardening 

properties. The artifact of self-annihilation can be avoided by using an orthorhombic simulation 

volume. 

A numerical method that Madec et al. ([Madec et al. 04]) have used to apply PBC in 3D is to 

translate all the segments about a selected segment by shifting it to the center of a volume, and 

apply "MODULO" operations so that the segment coordinates x larger than the simulation size Lx 

are replaced by the remainder of x/Lx. 

Numerical implementation 

A perpendicular line joining two opposite boundaries is actually equivalent to a line on a bracelet 

(Fig. 2.14). A quantity ic(i) with an integer argument i in the range of [0 : L − 1] has the 

property as given by Eq.(2.19) under PBC. Eq. (2.19) is merely another expression of Fig. 2.14 in 

L-2 

... 

0 

... 

1


a mathematical form. 

ic(L + i) = ic(i) 

ic(−i) = ic(L − i) for i = 0, . . . (2.19) 

The array ipc can be used to redirect a segment coordinates, which has left an initial volume lattice, 

to an equivalent position in the initial lattice. Then it is possible to apply PBC with a simple 

array reference. The orthorhombic simulation volume is readily realizable by changing the range of 

periodicity according to the maximum length of the simulation volume along each axis. 

Because of the subnetwork in the simulation volume (see Sec. 2.1.3), the periodicity should be a 

multiple of 4xl along each axis. 

2.4.2 Internal interfaces 

Motivations and review of the literature 

The collective behavior of dislocations in a single crystal can be simulated with the stress com- 

putation and motion treatments as explained in the previous sections. More rigorous boundary 

conditions need to be implemented on the method in order to treat more general cases, such as 

a crystal with free surfaces, a crystal containing particles of a second-phase or a polycrystal with 

grain boundaries. 

A dislocation experiences forces near an interface because the dislocation energy is different in the 

two mediums involved. The dislocation is attracted towards a free surface, for example, and repelled 

by a rigid surface layer. These image stresses can be treated by using the superposition principle. 

The effects of free surfaces were treated by Fivel et al. ([Fivel & Canova 99]). The forces ex- 

erted on a free surface by dislocations are computed assuming that the dislocations are embedded 

in an infinite medium. These forces are then reversed and changed into the appropriate point 

forces to enforce the traction free surface condition. Applications of this method can be found in 

[Fivel et al. 98]. 

The image stresses by a free surface is, in fact, a special case of the more general situation in which 

an interface separates two materials of differing elastic constants, e.g., oxide layers and particles. 

The image stresses on a dislocation in the presence of a second phase particle can be computed also 

by the superposition principle. The formulation follows that of Van der Giessen and Needleman 

([Giessen & Needleman 95]). Previous applications of this method to 2D cases can be found in 

Cleveringa et al. ([Cleveringa et al. 97]).


Facet 

,τ facet 

Glide plane 

Intersection 

Segment, τ e 

Figure 2.15: Geometries of a facet and construction of a sphere by square facets. 

In this section, complete method to treat internal interfaces is described. Firstly, an interface is 

represented by a set of facets, which have a certain strength thus, can act as barriers to dislocation 

motion. The application of this method can be found in Sec. 4.3. Secondly, a full account of elastic 

interaction with dislocations or image stresses is presented using a coupling method with a finite 

element method. The method is applied to compute the image stresses in Sec. 4.1. 

Internal interfaces represented by facets 

A curved 3D boundary is approximated by a series of facets, in the same way as a surface is repre- 

sented by a finite element meshing. Each facet is defined by indexes of its nodes or vertices, whose 

coordinates are stored separately. 

Intersection events between a segment and a facet is detected by determining the nearest intersec- 

tion point between a facet and a virtual glide plane of a segment (Fig. 2.15). 

Each facet is given a strength τfacet. Only segments whose effective stress are greater than τfacet 

are allowed to cross the facet, i.e. a facet acts as a barrier to dislocation motion. The strength 

is further specified by τ + 

facet 

and τ − 

facet 

depending on the relation between moving direction of a 

segment (g) and the normal direction of a facet (n), which makes the application of facets more 

flexible. A facet of τ + 

facet 

being 100 MP a and τ − 

facet 

being 0, for example, means that the facet 

blocks all the segments of g · n > 0 if the effective stress is lower than 100 MP a, but is transparent 

to segments moving along −n direction, that is, g · n < 0. 

The facet method enables simple geometrical barriers to dislocation motion in dislocation dynamics. 

When applied to treat the case of dislocation-precipitate interactions, this method can be consid-


ered as a first-order approximation in the sense that elastic interactions between dislocations and 

interfaces are not considered. The application of the facet model to a particle-strengthened crystal 

is presented in Sec 4.3, where the hypothesis used to derive the strength τfacet are explained in 

detail for the unshearable and shearable particle cases. 

Full account of elastic interactions 

The problem of a finite volume containing dislocations is decomposed into two problems: 

1. the problem of dislocations in an infinite elastic isotropic medium 

2. the complementary problem of a finite dislocation-free volume, which compensates for the 

proper boundary conditions. 

The problem decomposition is shown schematically in Fig. 2.16. The stress and strain field of the 

current state of the body, σ and ɛ are determined by the governing equations 

∇ · σ = 0, ɛ = ∇u 

σ = LM : ɛ in VM and σ = LP : ɛ in VP 

with moduli LM, LP of matrix and particle respectively. Boundary conditions are 

u = Uap on Su, n · σ = Fap on St. 

After the decomposition, the stress and strain fields are written as the superposition of two fields: 

ɛ = ɛ1 + ɛ2, σ = σ1 + σ2 (2.20) 

In the first problem (denoted by 1), it is assumed that the dislocations are in an infinite elastic 

isotropic medium. The stress(σ1) /strain(ɛ1) relationship is expressed as σ1=LM : ɛ1 in the whole 

volume. The forces FD and displacements UD on the virtual boundaries can be computed by the 

expressions presented in Sec. 2.2. 

In the second problem, the simulation volume contains no dislocations. The fields in the second 

problem (denoted by 2) are the fields needed to correct the actual boundary conditions as well as for 

the presence of the inclusions. The governing equations for the complementary problem becomes: 

σ2 = LM : ɛ2 in VM 

σ2 = LP : ɛ2 + σcorrec in VP (2.21)


The complementary problem has a correction term such as σcorrec = (LP −LM) : ɛ1 in VP . With this 

correction term, the current stress field in the particle volume can be constructed by the superposi- 

tion of two fields (σ = σ1 +σ2 in VP ). By replacing ɛ1 with L −1 

M : σ1, this correction term equals to 

(LP : L −1 

M − I) : σ1, where I represents the fourth-order unit tensor. (LP : L −1 

M − I) is expressed by 

the Young’s modulii and the Poisson’s ratios of the matrix (E, ν) and the particle (E ∗ , ν ∗ ) as follows. 

(LP : L −1 

M 

⎛ 

⎞ 

a1 − 1 

⎜ 

− I) = ⎜ 

⎝ 

a2 

a1 − 1 

a2 

a2 

a1 − 1 

0 

0 

0 

a3 − 1 

0 

0 

0 

0 

a3 − 1 

0 

0 

0 

0 

0 

a3 − 1 

⎟ 

⎠ 

with a1 being E∗ (1−ν∗−2νν ∗ ) 

E(1+ν∗ )(1−2ν∗ ) , a2 being 

E∗ (ν∗−ν) E(1+ν∗ )(1−2ν∗ ) and a3 being E∗ (1+ν) 

E(1+ν∗ ) . 

The complementary problem is solved using CAST∃M( 12 ), a finite element code developed in France 

by the Commissariat à l’Energie Atomique. The FEM formulation in the particle volume of the 

second problem can be written as follows with a strain-displacement matrix (B) : 

 

VP 

B T 

· L · BdV · u + 

VP 

B T · LP : L −1 

M − I · σ1dV = 0 (2.22) 

In order to compute the body force-like term (right-hand side of Eq. 2.22) within the precipitate 

volume, first the stresses σ1 due to the dislocations at the points in VP , e.g., at the stress integration 

Gauss points, are computed. Then the correction term is computed within each finite element inside 

the particle. The computed stresses σcorrec are then changed into a nodal body force field, f b . This 

can be easily done with the ’BSIG’ operator in CAST∃M. These forces are applied to VP and the 

FEM gives the solution of a two phases boundary problem where forces Fap −FD and displacements 

Uap − UD are imposed at the boundary and body forces f b are applied inside the particle. The 

continuity of displacement and normal stresses at the particle/matrix interface are enforced by the 

FEM. 

12 Finite element code developed by Commissariat à l’Energie Atomique, CEA-DRN/DMT/SEMT

2.5 Acceleration of the DDD code 35 

Figure 2.16: Decomposition of the problem into the problem of dislocations in infinite media and 

the complementary problem of inhomogeneous finite volume without dislocations. Forces(Fap) and 

displacements(Uap) are applied on the boundary. In the complementary problem, boundary condi- 

tions are modified with forces(FD), displacements(UD) and a nodal body force field(f b ) generated 

by dislocations. 

2.5 Acceleration of the DDD code 

2.5.1 Problem description and review of literatures 

Internal stress computation is the most computationally intensive part in the DDD method. This 

is due to the fact that the stress field at a distance r from a dislocation line is proportional to 1/r. 

The stress field of a dislocation line is thus long-ranged. Another time consuming part in the DDD 

method is handling the dislocation segments interactions. Segment motion involves examination of 

possible interactions, between dislocations or between dislocation and internal interfaces. 

In a programming perspective, the two parts can be represented as follows in pseudo-code. 

Internal stress computation 

DO I=1,Nsegm 

DO J=1, Nsegm 

if(J=I and I’s neighbor) 

ENDDO 

ENDDO 

Compute σ int 

I←J 

Nsegm: Number of segments 

Nfacets: Number of facets 

Segment motion 

DO I=1,Nsegm 

DO J=1, Nsegm 

Examine interaction with segment J 

ENDDO 

DO K=1, Nfacets 

Examine interaction with facet K 

ENDDO 

Move segment I 

ENDDO


Both parts need the order of N 2 segm computation with Nsegm being the number of dislocation seg- 

ments. As for the segment motion, Nsegm × Nfacets additional computations are required. It should 

be noted that in the ’Segment motion’, each segment is treated and moved sequentially. In addition, 

each individual segment displacement generates a new dislocation configuration. In complex situ- 

ation, changing the computing order of the segments may slightly change the resulting dislocation 

structure. 

In Molecular Dynamics simulations, the stress computation favors the use of a cut-off distance, be- 

yond which the stress is called a long-distance stress and neglected, because the interatomic stress 

field is short-ranged. This cut-off scheme reduces the cost of the stress computation with a minor er- 

ror. In Dislocation Dynamics simulations, however, the cut-off distance scheme may cause a spurious 

formation of cells ([Gullouglu et al. 89]). The study of Devincre et al. ([Devincre et al. 01]) 

has shown, however, that neglecting the long-distance stresses does not affect much the yield stress 

and hardening properties of FCC single crystal. It should be noted that the study was dealing with 

dislocation patterning in multislip conditions, where cross-slip of dislocations, which is governed by 

a short-distance stress, is supposed to play an important role. However, it would be difficult to 

generalize Devincre et al.’s observation to other situations. Thus it is generally required to take into 

account all the dislocations in the simulation volume to compute the internal stresses. 

Reasonable approximations in the computation of the internal stresses have been made to overcome 

this severe computational limitation. In [Verdier et al. 98], the simulation volume is decomposed 

into boxes and short- and long-distance stresses are classified by the topology of boxes. The com- 

putational cost can be reduced by updating the long-distance stresses less frequently. The concept 

of superdislocation has been adopted in stress computation by [Zbib et al. 98]. The idea is to re- 

place a large number of dislocation segments beyond a certain distance into a limited number of 

superdislocations, which have a modified Burgers vector magnitude. This method is based on the 

multipolar expansion of the elastic field of a 2D dislocation array and extended in 3D by a simple 

’projection-extension’ method. 

2.5.2 The Box method 

The box method proposed by [Verdier et al. 98] is based on the fact that a dislocation micro- 

structure does not change rapidly by comparison with the short time step used in the simulation 

(O(10 −9 sec.)). Thus stress fields of the long-distance segments could be updated with a certain 

frequency and between updates the previous values could be used with an acceptable error.


(a) Dislocations in a cubical simulation volume (b) Division of the simulation volume into boxes 

Figure 2.17: Decomposition of a simulation volume into boxes: (a) A typical dislocation structure 

in a cubical simulation volume (b) Simulation volume divided into 10 × 10 × 10 boxes 

The simulation volume is first decomposed into boxes. For the sake of simplicity of the computation 

scheme, each side of the simulation volume is divided into M boxes. Hence the simulation volume 

comprises M 3 homologous boxes. Fig. 2.17(a) shows a typical simulation volume with dislocation 

segments. Fig.2.17(b) is an example of the same simulation volume decomposed into 10 3 boxes. 

To facilitate the identification of the segments in the box ib, linked-lists of segments are constructed. 

As shown in Fig. 2.18, the mid-point of a segment (imid(i)) is used to determine the box index (ib) 

to which it belongs. 

ib = 1 + imid(1) 

N1 

M 

+ imid(2) 

N2 

M 

M + imid(3) 

M N3 

M 

2 

(2.23) 

N1, N2 and N3 are the sizes of the orthorhombic simulation volume along x, y and z axis. The array 

indexb(ib) saves the number of segments belonging to the box ib. An array isbox(ib, 2) saves the 

index of the first segment in the box ib. The linked-list of segments is implemented with the array 

isbox(is, 1 : 2): isbox(is, 1) indicates the index of segment prior to is and isbox(is, 2) reserves the 

index of segment posterior to is. The identification of the segments in box ib is shown schematically 

in Fig. 2.18. A segment can be easily added or subtracted by switching the array isbox. 

Note that the number of boxes is limited since the box size should be big enough. The minimum size 

of the boxes is chosen so that the first neighboring boxes include the maximum free-flight distance of 

a segment. The criterion defining the minimum edge length of a to adopt for a box can be expressed


as Eq.(2.24). 

ib1 ib2 

1 

ib3 3 ib4 

5 

15 

14 

7 

11 

16 

6 

212 

9 

10 

8 

13 

4 

indexb 

ib1 7 1 

ib2 10 4 

ib3 3 5 

ib4 12 2 

Figure 2.18: Linked list of segments 

 

min a ≥ 1 

√ ld, a ≥ 

2 2 

√ vmaxδt 

6 

ib1 ib2 ib3 ib4 

2 4 5 5 

6 16 

14 11 15 

13 9 8 

(2.24) 

The first term in Eq.(2.24) states that neighbors of a segment are located inside the first neighbor- 

hood, hence it is given by the discretization length ld. The second term states that a segment is not 

allowed to move across the first neighboring boxes in one step, thus it is a function of the maximum 

velocity vmax (see Sec 2.3.2) and time step ∆t. Fig. 2.19 shows the criterion of the minimum edge 

length of a box. 

The use of the box sizes larger than the minimum size of the box reduces computing cost in ’Segment 

motion’ part. This is because only interactions within a maximum free-flight distance of a segment 

need to be considered instead of taking all the segments and facets into consideration. This will 

reduce the number of segments and facets to be inspected without any approximation. 

The internal stress acting on a segment is divided into a long-distance stress (σ LR ), which varies 

rather slowly over time steps and a short-distance stress (σ SR ), which shows large fluctuation over 

single time steps. σ SR of a segment in the box ib is computed by taking all the segments into 

account in the L th neighboring boxes at every simulation step. σ LR is adopted by the stress at the 

center point of ib from all the segments outside of the L th neighboring boxes. All segments in the 

same box, therefore, have the same σ LR . This approximation is valid if σ LR has a wave length 

larger than the box size. The computation of σ LR for all the boxes is updated every f step. 

The parameters involved in the box method are listed in Table 2.3. The maximum number of boxes 

M is given by the minimum box size (Eq.(2.24)) and the simulation volume size. The other param- 

eters should be chosen based on the numerical accuracy and the speedup, and will be the issue of 

the following section.


[100] 

[001] 

Slip direction, 

[112] 

(1) 

(2) 

Slip plane, 

(11-1) 

Screw 

segment 

Figure 2.19: Minimum box size 

Parameter Description 

[010] 

M number of boxes along each side of a simulation volume 

f frequency of σ LR update 

L number of layers for σ SR 

Table 2.3: Parameters of the box method 

The pseudo-code of internal stress computation in Sec. 2.5.1 is then substituted by the following 

pseudo-code. 

Internal stress computation 

DO I=1,Nsegm 

Identify the box ’ib’ of the segment ’I’ 

Compute the stresses σ SR by segments 

within short-distance boxes 

Add σ LR (ib) 

ENDDO 

Long-distance stress computation (every f 

step) 

DO iz=1,M 

DO iy=1,M 

DO ix=1,M 

compute the box index ’ib’ 

compute the long-distance stresses σ LR (ib) 

ENDDO 

ENDDO 

ENDDO 

at the center of ’ib’


And the pseudo-code of the segment motion in Sec. 2.5.1 is replaced by 

Segment motion 

DO I=1,Nsegm 

Identify the box ’ib’ of the segment ’I’ 

Examine interaction with segments and facets 

within short-distance boxes 

Move the segment ’I’ 

ENDDO 

2.5.3 Speedup and Error 

Optimum values of M, L and f in Table 2.3 should be chosen so as to minimize errors and maximize 

speedup. There exist two sources of errors in the internal stress computation, i.e. a spatial and a 

temporal error. The spatial error occurs because σ LR is computed at the center point of a box and 

assigned to all the segments in that box. The temporal error is induced by updating σ LR with a 

frequency f so that σ LR of the previous computation is used during f steps. 

Speedup 

The speedup is defined as the ratio between the execution time of the box method and that of the 

original method. It is used to measure the relative algorithm performance. To facilitate an analyt- 

ical relation, the execution time is assumed to be proportional to the number of computations. 

Nsegm segments are assumed to be homogeneously distributed over the simulation volume de- 

composed by M 3 boxes. When not using the box method, the number of computations of in- 

ternal stress (n orig 

s ) is N 2 segm 13 . Using the box method, the number of computation for σ SR is 

(2L + 1) 3 Nsegm 

M 3 Nsegm and M 3 − (2L + 1) 3 Nsegm 

M 3 

M 3 

f for σLR . The speedup of the box method is 

then given as Eq. 2.25 14 

Speedup = norig s 

nbox s 

= 

(2L + 1) 3 N 2 segm 

M 3 

N 2 segm 

+ (M 3 −(2L+1) 3 )Nsegm 

f 

(2.25) 

Solid lines in Fig. 2.20 show Eq. 2.25 as a function of M for Nsegm = 10, 000, 20, 000 and 90, 000. 

The number of layers L is set to 1 and σ LR update frequency f is 20. There exist maxima in 

13 The number of computations is Nsegm(Nsegm − 3) precisely because two neighbor segments and itself are not 

considered in the internal stress computation. For simplicity, it is approximated by N 2 segm 

14 It should be noted that the equation is derived with the assumption that periodic boundary conditions are applied 

as detailed in Sec. 2.5.4


Speedup 

140 

120 

100 

80 

60 

40 

20 

0 

N segm =90k 

N segm =20k 

N segm =10k 

N segm =20k(measured) 

N segm =10k(measured) 

5 10 15 20 25 

M 

Figure 2.20: Evolution of the speedup of domain decomposition method as a function of the number 

of boxes (M) and number of segments (N) in the condition that L = 1, f = 20 

speedup depending on the number of segments. Solid dots represent the actual data measured with 

a 3.0-GHz Intel Pentinum 4 processor and 1 GB of memory. Only the elapsed time for computing 

the internal stress is measured. The measured data reflects well the characteristic of Eq. 2.25, even 

though the segments are not distributed perfectly homogeneously. 

The effect of f on the speedup is shown in Fig. 2.21(a), and that of increasing L is shown in Fig. 

2.21(b). The optimum value of M is dependent on the value of f and L. 

There is always gain in speedup regarding the segment motion by increasing the number of boxes. 

Assuming Nsegm segments and Nfacets facets, the speedup in examining the interactions can be 

represented as Eq. 2.26. 

Speedup = norig o 

nbox o 

= N 2 segm + NsegmNfacets 

N 2 segm 

M 3 

+ NsegmNfacets 

M 3 

= M 3 

(2.26)


Speedup 

80 

70 

60 

50 

40 

30 

20 

10 

0 

f=30 

f=20 

f=10 

5 10 15 20 25 

M 

(a) Effect of f (L = 1, Nsegm = 20, 000) 

Speedup 

70 

60 

50 

40 

30 

20 

10 

0 

L=1 

L=2 

L=3 

5 10 15 20 25 

M 

(b) Effect of L (f = 20, Nsegm = 20, 000) 

Figure 2.21: Effect of the number of layers (L) and frequency (f) on the speedup of stress compu- 

tations 

Spatial error 

A large box size or small M is bound to have a large spatial error at the end of the diagonal due 

to a large deviation from σ LR computation position (center of the box)(see Fig. 2.22(a)). A small 

box size or large M has also a large spatial error (Fig. 2.22(b)), but in this case the reason is due to 

that the front boxes for σ LR computation are too close. Segments in the front boxes would generate 

highly inhomogeneous stress fields. 

In Fig. 2.23, relative spatial error along a diagonal of the central box is shown for each M. Here, 

the simulation volume is cubic shape with the edge length of 16.4 µm containing 22, 210 segments 

(ρ 2.5 × 10 12 m −2 ), which is taken from a tensile simulation along [001]. ɛr is defined as Eq. 2.27 

with σ exact computed at each point on the diagonal and σ approx computed at the center point of 

the box. 

6 

|σ 

i=1 

ɛr = 

exact 

i 

6 

σexact i 

i=1 

− σ approx 

i | 

(2.27) 

From the figure, it can be seen that both small and large values of M increase the relative spatial 

error. To compare the curves, ɛr is averaged and shown in Fig.2.24 as a function of M . There is a 

certain value M that has a minimum spatial error. 

The most effective way to minimize the spatial error would use the smallest box size with a certain 

number of layers L for σ LR as shown in Fig. 2.22(c) with L = 3. Indeed, the mean spatial stress


is 

Long-distance 

stress 

(a) M=5, L=1 

Short-distance 

stress 

is 

Long-distance 

stress 

(b) M=15, L=1 


stress 

is 

Long-distance 

stress 

(c) M=15, L=3 


stress 

Figure 2.22: Effect of the number of boxes (M) and layers (L) on the accuracy of stress computations 

Relative spatial Error 

0.25 

0.2 

0.15 

0.1 

0.05 

M=21 

M=15 

M=7 

0 

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 

Position on the diagonal(microns) 

Figure 2.23: Relative spatial error along a diagonal of the central box. (M=number of boxes along 

each axis, L=1)


Mean Relative Spatial Error 

0.1 

0.08 

0.06 

0.04 

0.02 

0 

6 8 10 12 14 16 18 20 22 

Figure 2.24: Mean relative spatial error of stress computation as a function of the number of boxes 

(M) and the number of layer (L) 

M 

L=2 

L=3 

could be decreased down to 2% using L = 3 as shown in Fig.2.24. 

Temporal error 

It is difficult to evaluate a temporal error because it is strongly related to how fast a dislocation 

structure changes, which is governed by both the type of mechanical test simulated and the time 

step. So it is difficult to set f a priori. In order to evaluate the effect of f, a simple tensile test 

has been done in a constant strain-rate condition (˙ɛ = 10 3 sec −1 ) with 22, 210 initial segments. The 

time step is set to 2.10 −10 second and M = 21, L = 3. During the test, internal stress is recorded 

at the center point of a simulation volume. A relative temporal error is defined as in Eq. 2.27. 

σ exact denotes the internal stress at the central point computed over each time step and σ approx is 

the stress with σ LR updated at the frequency f. In Fig. 2.25, the relative temporal error is shown 

for the cases f = 20, 40, and 60. In the case f = 60, the maximum error level observed reaches 5%. 

The update frequency of 60, however, induces a negligible effect on the overall stress-strain curve 

as shown in Fig. 2.26. 

In conclusion, the use of a maximum number of boxes is favorable, although speedup analysis in 

stress computation indicates that there exists an optimum number of boxes. One reason is that


Temporal error 

0.06 

0.05 

0.04 

0.03 

0.02 

0.01 

0 

1200 1250 1300 1350 1400 

Step number 

f=60 

f=40 

f=20 

Figure 2.25: Effect of the frequency of long-distance stress computation on the relative temporal 

error. (f=σ LR update frequency) 

Stress[MPa] 

180 

170 

160 

150 

140 

130 

120 

110 

Reference 

f=20 

f=40 

100 

f=60 

0.0e0 4.0e-5 8.0e-5 1.2e-4 1.6e-4 2.0e-4 

Strain 

Figure 2.26: Stress-strain curves of simulations with the σ LR update frequency f


Shifted 

boundaries 

Original 

boundaries 

Figure 2.27: Computation of stresses under periodic boundary conditions and the box method 

a large number of M always delivers an advantage on segment motion by the factor of 27/M 3 . 

Another one is related to the parallelization scheme, which will be detailed in Sec. 3.3. 

2.5.4 Boxes and Periodic boundary conditions 

An efficient method to apply PBC is presented in Sec. 2.4.1. When the simulation volume is divided 

into boxes and the internal stresses are decomposed into long- and short-distance stresses, attention 

should be paid to the segments in the boundary boxes. As shown in Fig. 2.27, some of the boxes 

(especially along the boundaries) may need to account for segments inside so-called image boxes 

for internal stress computation and segment motion. The segments’ coordinates in the image boxes 

are determined by translating the segments coordinates from the appropriate boxes. This operation 

can be performed by a simple array reference and addition/subtraction. 

Fig. 2.28 shows the example of the activation of a Frank-Read source in the cubic and the or- 

thorhombic simulation volume, and the number of segments is recorded in Fig. 2.29. In the case of 

the cubic simulation volume, the self annihilation of segments occurs and the number of segments 

oscillates as shown in Fig. 2.29, whereas the dislocation density increases in the orthorhombic sim- 

ulation volume. It is desirable to use the orthorhombic simulation volume to remove the artificial 

self annihilation of dislocations due to the periodic boundary conditions.


[100] 

[001] 

[010] 

(a) The cubic and the orthorhombic simu- 

lation volume 

(b) Dislocation structure seen at (110) di- (c) Dislocation structure seen at (110) di- 

rection in the cubic simulation volume 

rection in the orthorhombic simulation vol- 

Figure 2.28: Activation of a Frank-Read source in the cubic and the orthorhombic simulation volume 

under periodic boundary conditions 

ume


Number of segments 

1800 

1600 

1400 

1200 

1000 

800 

600 

400 

200 

Orthorhombic 

Cubic 

0 

2500 3000 3500 4000 4500 

Step number 

Figure 2.29: Change of the number of segments with respect to the simulation steps. In the case of 

the cubic simulation volume, self annihilation of dislocations occurs. 

2.6 Computation procedure of the DDD program 

The serial DDD program using the box method can be subdivided into the following tasks. 

a. Initialization 

b. Discretization of the segments 

c. Construction of the linked-lists 

d. Updating of the long-distance stresses every f steps 

e. Computation of the short-distance stresses 

f. Motion of segments 

g. Updating the external stresses 

h. Save of outputs 

A simulation initialization (a) requires to set parameters such as the number of time steps, the 

number of boxes, the material property constants and the loading conditions etc. It also reads the 

initial segment configurations, geometries of the simulation box and internal interfaces from external

2.6 Computation procedure of the DDD program 49 

files. 

Operations from ’b’ to ’h’ are executed sequentially over each time step. The segments which are 

larger than a maximum length (defined explicitly at initialization) are further discretized in the task 

’b’. Linked-lists of segments in each box are constructed (’c’). Long-distance stresses are computed 

every f steps at the center of each box (’d’) as described in Sec. 2.5.2. Short-distance stresses are 

computed using linked-list of segments (’e’). Once stresses on each segment are known, the effective 

stresses are computed using Eq. 2.7, and all the segments are moved to the next positions after 

examination of all possible interactions (’f’). The external stresses are updated according to the 

loading conditions (’g’), and output data like current stresses, strains and dislocation configurations 

are saved in external files (’h’). This completes one time step and the same procedure is performed 

at the next time step. 

Key points 

• The DDD method used in this work discretizes perfect dislocations into discrete dislocation 

segments of a pure edge and screw type in a volume homothetic to the FCC structure with 

the lattice spacing of ∼ 10b. 

• The effective stress acting on a segment is computed, accounting for the internal, applied 

stresses, line tension and the Peierls stress in the frame of linear isotropic elasticity. 

The displacements of dislocation loops are computed using Barnett’s expressions. 

• A linear relation between the effective stress and the velocity of segments is used. Dislocation 

interactions are taken into account by local rules. The cross-slip of a screw segment is 

implemented in a stochastic manner. 

• Periodic boundary conditions are applied in an orthorhombic simulation volume. 

• Internal interfaces are represented either by simple facets with certain strengths or by a 

coupled method with a finite element method. 

• The box method is revisited in order to increase the computing efficiency of the DDD code. 

A speedup of 50 with errors lower than 3% is obtained in the typical situation of 20, 000 

segments submitted to tensile loading (L=1, f=20, M=15)

Chapter 3 

Parallelization of the Discrete 

Dislocation Dynamics method 

Although the numerical efficiency of the serial DDD method has been improved by using the box 

method (see Sec. 2.5), the code is still insufficient to deal with a large density of dislocations or 

dislocations interacting with thousands of precipitates. It is often said that there exists the gulf between 

the desired problem size and the available computing power, since computational demands usually exceed 

the performance of currently available computing hardware. 

A parallel version of the DDD program has been developed in an attempt to simulate the interactions 

between dislocations and a large number of precipitates within a reasonable time using parallel comput- 

ers 1 . The object of this chapter is to present the development of the new parallel DDD program and its 

performance. 

In Sec. 3.1, parallel computing hardware is listed and various models and programming languages suit- 

able for each hardware are reviewed. This section is intended to explain the reason of the parallel model 

chosen in this work. 

In Sec. 3.2, the hot spots of the serial DDD program are analyzed focusing on the data flow dependen- 

cies. Based on the flow dependencies, the existing parallel algorithms are reviewed to help in establishing 

a parallelization strategy. 

Sec. 3.3 describes the parallelization of the serial DDD code from a programming perspective. The par- 

allelization algorithm of each of the hot spots of the serial DDD code is presented using pseudo-codes. 

An attempt to increase the performance of the new parallel DDD program is presented in Sec. 3.4. The 

1 A parallel computer refers to several computers that are interconnected to increase computing power.

52 Parallelization of the Discrete Dislocation Dynamics method 

performance of the program is quantified and issues such as the load balance are investigated. 

Although the DDD code used here is the edge-screw model presented in Chapter 2, the parallelization 

scheme is quite general and may be applied to any DDD code or finite difference methods which have 

similar data dependencies. 

3.1 An introduction to Supercomputing 

3.1.1 Overview 

The different types of parallel computer need to be reviewed before attempting to create a parallel 

program, since a programming model and a programming language should be chosen depending on 

the selection of an architecture. 

A parallel computer is in fact a subset of a supercomputer which is defined as a computer that 

performs at or near the currently highest operational rate for computers. Computation using a 

supercomputer is often called supercomputing or high performance computing. 

In the following sections, the technological trend of supercomputer is reviewed using data from the 

top 500 list 2 . The top 500 list compiles information regarding the top 500 fastest supercomputers 

in the world 3 . 

3.1.2 Classification of hardware 

All the current supercomputers use multiple processors and memories. There exist many classifi- 

cation methods according to the usage of multiple processors and memories and their interactions. 

Supercomputers are usually classified as follows: 

(1) by processor type: scalar and vector processor 

(2) by memory type: shared and distributed memory 

(1) by processor types 

Processor architectures can be divided into two principal types: scalar and vector processors. The 

main difference between the two types relates to the number of operations performed by a single 

2 visit at www.top500.org 

3 It is published twice a year, in June at the International Supercomputer Conference and in November at the 

ACM/IEEE Supercomputing Conference. The list has been compiled since 1993, when the first top 500 list have 

published at the International Supercomputer Conference, Mannheim.

3.1 An introduction to Supercomputing 53 

instruction. 

Scalar processors perform a single operation for each single instruction. An addition instruction 

(a + b), for example, results in the addition of two numbers. This type refers to a general-purpose 

processor and is widely used 4 . 

In vector processors, a single instruction results in identical operations being performed on differ- 

ent data. It means that the addition of two arrays of data (A(i) + B(i)) can be performed in a 

single instruction. Vector processors are developed for a high performance numerical computation 

of vector data or arrays and are relatively expensive as compared to scalar processors 5 . 

Examples of hardware for each processor type are listed in Table 3.1. Vector processors are known to 

Processor classification Example 

Scalar processor Intel x86, DEC Alpha, PowerPC, IBM Power 

Vector processor Cray vector, NEC vector, Fujitsu VPP 

Table 3.1: Processor classification 

have an excellent effective performance and facilitate the development of parallelization algorithms. 

However, they are less frequently used due to their high cost and limited scalability. Fig. 3.1, which 

plots the share of each processor types over the past ten years, shows this trends clearly. In June 

1993, vector processor architectures accounted for 66.8% of the top 500. That proportion decreased 

to 5% in June 2004, whereas the share of scalar processors increases to 95%. The advantage of scalar 

processors would be their relatively low price and excellent scalability, though they have a poor ef- 

fective performance. Because the majority of supercomputers are using multiple scalar processors, 

parallel computing (parallel computer) are often used instead of supercomputing (supercomputer). 

4 Scalar processor is divided further into two groups: CISC(Complex Instruction Set Computer) and RISC(Reduced 

Instruction Set Computer). The CISC group comprises Motorola 680x0, Intel x86 processors whereas DEC Alpha, 

PowerPC and IBM POWER processors are within the RISC group. 

5 Vector processors do perform parallel operations in a way that is sometimes described as ’data parallel’, though 

they are not a parallel computer in the sense of many machines working together.


Share % 

(2) by memory types 

100 

90 

80 

70 

60 

50 

40 

30 

20 

10 

0 

Scalar processor 

Vector processor 

Jun 93 

Nov 93 

Jun 94 

Nov 94 

Jun 95 

Nov 95 

Jun 96 

Nov 96 

Jun 97 

Nov 97 

Jun 98 

Nov 98 

Jun 99 

Nov 99 

Jun 00 

Nov 00 

Jun 01 

Nov 01 

Jun 02 

Nov 02 

Jun 03 

Nov 03 

Jun 04 

Figure 3.1: A change of processor types used in major supercomputers. 

Memory architectures can be classified into two principal types: shared and distributed memory. 

In shared memory systems, memories and processors are typically all interconnected by a common 

bus or switching network. Each processor can access all the memories of the system and a processor 

can directly load or store any shared address. In other words, the data movements are transparent to 

the user. This provides an easy and powerful model for creating and managing a parallel program. 

The shared memory systems can be further grouped into UMA (Uniform Memory Access) 6 and 

NUMA (Non Uniform Memory Access) 7 depending on whether the main memory is a single physical 

or a logical one. A schematic diagram of processors and memory in UMA system is shown in Fig. 

3.2(a) and NUMA system in Fig. 3.2(b). The NUMA model has been developed to overcome a 

technical difficulty of a UMA system, which limits the possible number of processors. Because a 

NUMA system uses physically distributed memories in several systems as a single logically shared 

memory, the access time to a certain memory could be different depending on whether a memory 

is a local or a remote one to a specific processor. 

In distributed memory systems or MPP (Massively Parallel Processor) systems, several computers, 

6 Intel Dual CPU system, Compaq ES40, Sun E10000 and HP N-class belong to the UMA category. 

7 Machines such as Compaq GS320, HP Superdome and SGI Origin 3000 belong to this category.


MEMORY 

. . . 

P P P P P 

(a) A UMA system 

MEMORY 

. . . 

P P 

Logical Memory interconnect 

MEMORY 

. . . 

P P 

. . . 

(b) An NUMA system 

MEMORY 

. . . 

P P 

Figure 3.2: (a) Schematics of UMA systems (b) Schematics of NUMA systems 

M 

P 

Communication Network 

M 

P 

. . . 

M 

P 

(a) An MPP system 

MEMORY 

. . . 

P P 

Communicaton Network 

UMA UMA UMA 

MEMORY 

. . . 

P P 

. . . 

(b) A SMP cluster system 

MEMORY 

. . . 

P P 

Figure 3.3: (a) Schematics of MPP systems (b) Schematics of SMP systems 

where a single processor has its own memory resource, are interconnected by a bus or network, and 

processors access to distributed memories through a network. Fig. 3.3(a) shows a configuration of 

processors and memories of such an MPP system. In this model, parallel processing is facilitated by 

explicit message passing, since each processor has its own memory resource which cannot be directly 

accessed by other processors in the MPP machine. Individual processors could all be of the same 

type such as a network or cluster of workstations or PCs, which could work independently or in 

unison. A heterogeneous networks of various platforms (vector processors, parallel supercomputers 

etc.) could also be assembled in principle. IBM P690 architecture, which is used for some of the 

results presented in this work, consists of several UMA machines as shown Fig. 3.3(b) 8 . 

Various architectures of each memory classification are summarized in Table 3.2. In the early 1990s, 

Memory classification Type 

Shared memory UMA, NUMA 

Distributed memory MPP, clusters of PCs, clusters of UMAs 

Table 3.2: Memory classification 

8 The UMA cluster system looks similar to a NUMA system, but as memories are not shared between nodes, the 

user should explicitly assign data movement like a distributed memory system.


Share % 

100 

90 

80 

70 

60 

50 

40 

30 

20 

10 

0 

Shared memory Distributed memory 

SMP 

SIMD 

Single processor 

Constellation 

Jun 93 

Nov 93 

Jun 94 

Nov 94 

Jun 95 

Nov 95 

Jun 96 

Nov 96 

Jun 97 

Nov 97 

Jun 98 

Nov 98 

Jun 99 

Nov 99 

Jun 00 

MPP 

Cluster 

Nov 00 

Jun 01 

Nov 01 

Jun 02 

Nov 02 

Jun 03 

Nov 03 

Jun 04 

Figure 3.4: The transition of the supercomputer structures for the past ten years. ’Cluster’ and 

’MPP’ belong to the distributed memory system and ’SMP’ , ’single processor’, ’SIMD’ and ’Con- 

stellation’ belongs to the shared memory system 

most of the top 500 was shared memory architecture. However, the mainstream has changed to 

distributed memory systems since the late 1990s (see Fig. 3.4). 

Merits and demerits of each supercomputer type 

Supercomputers are classified by a processor and a memory type. Each type has merits and demerits 

which come from the different architectures used. They are summarized in Table. 3.3. In the case 

Vector 

Scalar 

Shared Distributed 

Ease of use, Good effective performance 

High cost, Limited scalability 

Ease of use Excellent cost/peak performance 

Limited scalability Poor effective performance 

Table 3.3: Merits and demerits of each processor and memory type 

of a shared memory using vector processors (left-top of Table. 3.3), one can expect an excellent


effective performance and that it is relatively easy to vectorize a code using a compiler. But on the 

other hand, the system is relatively expensive and shows limited scalability. 

A distributed memory system using scalar processors has a relatively low price and yields a good 

scalability. It is often said however, that it needs high skills to parallelize a code and the system 

shows generally poor effective performance. 

3.1.3 Parallel programming models 

The main goal of parallel programming is to minimize the elapsed time of a program by utilizing 

several processors. Since there is no single programming model that can be used in any architecture, 

it is necessary to adopt different programming models for the different architectures summarized in 

Sec. 3.1.2. This section is intended to provide programming models used in shared and distributed 

memory systems. 

Shared memory based 

A program is made threads, each of which contains a work (computation) and a memory data 

(object of work). A single-threaded program processes a data sequentially. The main idea of shared 

memory based models is to create multiple threads and let each thread compute a portion of data 

simultaneously. All the threads share the same address space and it is easy to reference data 

that other threads have updated. So multi-thread programs are best fit with the shared memory 

architecture in which all the memory spaces are shared. 

This model is often called as the ’fork-join’ model as shown in Fig. 3.5(a). The single-thread 

program processes S1 through S2, where S1 and S2 are inherently sequential parts. In the multi- 

thread program, the first thread forks two more threads and the three threads process P1 through P3 

in parallel. They are joined to the first thread once finishing the work. The compiler automatically 

parallelizes certain types of ’DO’ loops, or else one can add some directives to tell the compiler how 

to divide a work. OpenMP is one of such a compiler, and will be briefly reviewed in Sec. 3.1.4. 

Distributed memory based 

If the address space is not shared among the different nodes, parallel processors have to transmit 

data over an interconnecting network in order to access data that other processors have updated. 

Fig. 3.5(b) illustrates how a message-passing program runs. Each processor computes its own part 

and the processors communicate with each other during the execution of the parallelizable part,


t s 

Single thread 

S1 

P1 

P2 

P3 

S2 

S1 

P1 

S2 

Multi thread 

Fork 

P2 P3 

Join 

(a) Single-thread process and Multi- 

thread process 

t 

s 

Serial Parallel 

S1 

P1 

P2 

P3 

S2 

S1 

P1 

S2 

S1 

P2 

S2 

Communications 

S1 

P3 

S2 

(b) Message passing between processors 

Figure 3.5: Parallel programming models for the shared and distributed memory architectures 

P1-P3 (S1 and S2 are inherently sequential parts.). The figure shows data passing between only 

two processors adjacent to each other. But in general, each processor communicates with all the 

other processors. Due to the communication overhead 9 , time spent for processing each of P1-P3 

is generally longer in the message-passing program than in the serial program. So only a modest 

fraction is achieved of the capacity of several interconnected processors in practice 10 . 

3.1.4 Classification of parallel languages 

Different types of parallel computing hardware and the corresponding parallel models have been 

reviewed in the preceding sections: shared memory-fork/join model, distributed memory-message 

passing model. The choice of a parallel language is largely determined by the hardware type and 

the parallel model to be used. Table 3.4 shows possible programming languages of each hardware- 

model pair. In this section, two main parallel programming languages are outlined, OpenMP (Open 

Message Passing) and MPI (Message Passing Interface). 

Hardware-Model Parallel programming languages 

Shared memory-fork/join model OpenMP, Pthread 

Distributed memory-message passing model MPI, PVM 

Table 3.4: Parallel hardware-model pairs and corresponding languages 

9 and work load unbalance, and synchronization as shown in Fig. 3.16 

10 Theoretically, the computational power should increase linearly with the number of interconnected processors. 

t 

p


OpenMP 

OpenMP is a set of compiler directives and callable runtime libraries that extend the Fortran and 

C languages to allow the development of scalable parallel programs on shared memory machines. 

OpenMP provides access to the strengths of the shared memory parallel computation without an 

excessive programming effort. For example, a single loop can be parallelized by simply inserting 

standard directives, ’!$OMP PARALLEL DO’, as follows. 

DO I=1, 100 DO I=1, 100 

!$OMP PARALLEL DO 

C(I)=A(I)+B(I) ⇒ C(I)=A(I)+B(I) 

ENDDO ENDDO 

!$OMP END PARALLEL DO 

The directive, ’!$OMP PARALLEL DO’ creates multi threads (Fork) as schematically shown in 

Fig.3.5(a). If four threads are created for example, the second thread would perform the addition 

from I = 26 to 50 in the above code. ’!$OMP END PARALLEL DO’ collects results to master 

thread (Join). Programming with OpenMP is relatively simple and it shows good efficiency if most 

of the program execution time is dominated by a single, simple ’DO’ loop. But the efficiency of 

this type of parallelization becomes poor as the data dependency inside the loops becomes com- 

plex. OpenMP is also bound to the limit of the shared memory architecture, such as the number 

of processors, size of memory, and it lacks portability between different platforms. 

MPI 

MPI enables message passing programming model in distributed memory architectures. As de- 

scribed in the previous sections, distributed memory machine holds all the variables in local memory 

space. The work shared across the different processors requires communication and message-passing 

is the context in which this communication takes place. MPI is a parallel language which facilitates 

message-passing between separated processors. Some of the implementations of MPI are listed in 

Table 3.5. MPI will be reviewed in Sec. 3.2.1.


Acronym Developers 

MPI/Pro MPI Software Technology 

IBM MPI IBM product implementation for the SP and RS/6000 workstation clusters 

MPICH Argonne National Lab and Mississippi State University 

UNIFY Mississippi State University 

CHIMP Edinburgh Parallel Computing Center 

LAM Ohio Supercomputer Center 

Table 3.5: Various version of MPI 

3.1.5 Supercomputers in France and Korea 

Before finishing Sec. 3.1, the states of supercomputers in Korea and France in June 2004 are listed. 

Table 3.6 and 3.7 shows the rank in top 500, machine specs, Rmax 11 and Rpeak 12 of the top 5 

supercomputers in Korea and France, respectively. At the date of this thesis (summer 2004), Korea 

possesses 9 supercomputers in the top 500 and France does 16 machines. 

Rank Site/Year Computer(manufacturer)/processors Rmax/Rpeak 

48 KIST/2003 xSeries Cluster Xeon(IBM)/1024 3067/4915.2 

113 KISTI/2004 xSeries Cluster Xeon(IBM)/512 1762/2867 

115 KISTI/2003 pSeries 690 (IBM)/544 1760/3699.2 

233 SNU/2002 Pegasus P4 Xeon cluster(Self-made)/400 1011/1843 

310 KT/2004 Integrity Superdome,HPlex(HP)/176 844/1056 

Table 3.6: Top 5 supercomputers in June 2004, Korea 

3.2 Towards a parallel DDD code 

3.2.1 Basic Steps of Parallelization 

In case of parallelizing an existing serial program, the basic steps could be summarized as follows 

([Aoyama & Nakano 99]). 

11 Maximal LINPACK performance achieved 

12 Theoretical peak performance

3.2 Towards a parallel DDD code 61 

Rank Site/Year Computer(manufacturer)/processors Rmax/Rpeak 

28 CEA/2001 AlphaServer SC45(HP)/2560 3980/5120 

120 TotalFinaElf/2003 xSeries Cluster Xeon(IBM)/1024 1755/4915.2 

124 SG SGBI/2003 xSeries Cluster Xeon(IBM)/968 1685.49/4646.4 

132 CNRS/IDRIS/2004 eServer pSeries 690(IBM)/384 1630/2611.2 

149 CNRS/IDRIS/2004 eServer pSeries 655(IBM)/384 1477/2611.2 

1. Tune the serial program 

Table 3.7: Top 5 supercomputers in June 2004, France 

The performance of a parallel program is bound to that of a serial program from which the 

parallel program is written. The first step thus is to tune the hot spots of the serial program 

and make the serial program as efficient as possible. 

2. Consider the outline of the parallelization 

It needs to get the profile of the tuned serial program and know which part or parts consume 

most of the CPU time. It might be sufficient to parallelize most time consuming parts only. 

At the same time, it is necessary to select the hardware on which the program is parallelized. 

3. Determine the strategy for the parallelization 

Depending on the hardware chosen and the data dependencies of the program, a parallel 

algorithm should be made. For this, the existing strategies can be adopted if a pattern of 

parallelization is similar, or else it needs to create a new algorithm. Then it should be decided 

which scalar variables and arrays must be transmitted. 

4. Parallelize the program 

The strategy chosen is then realized using an appropriate parallel language. 

The procedure of parallelization which has been selected in this work is summarized below according 

to the basic steps mentioned above. 

Step 1: Tune the serial program 

The numerical efficiency of the serial DDD code has been increased using the box method and 

the linked-list of segments (see Sec. 2.5). The internal stresses are approximated by the long- and 

short-distance stresses, and there is no approximation in handling the dislocations interactions. The


speedup is dependent on the parameters (in Table 2.3) chosen. The speedup of 50 is attained in the 

case of L = 1, M = 15 and f = 20 with Nsegm = 20, 000 (Fig. 2.20). 

Step 2: Consider the outline of the parallelization 

In this work, a parallel DDD code has been written on distributed memory machines using the 

MPI. The choice of the distributed memory systems and the MPI has several advantages such as 

popularity, portability and extendability even though they are not the most efficient and the easiest 

combination. As already shown in Fig. 3.4, most of the top 500 are a distributed memory system 

nowadays, and distributed memory systems becomes more popular and widely used as individual 

laboratories purchase parallel computers made of several PCs and workstations connected through 

a network. 

Computation of the internal stresses and handling of the dislocations interactions are still the most 

time consuming parts. 

Step 3: Determine the strategy for the parallelization 

Before developing a parallel algorithm suitable for the DDD method, a few characteristics of the 

method are summarized. First, the number of dislocation segments is not constant. Dislocation 

segments can be created or annihilated with time. Next, the DDD method has highly complex flow 

dependence in that a movement of a segment modifies not only its own position and connection, but 

also the surrounding dislocation configurations. This is because dislocation lines are represented as 

connected sets of segments and segments’ connections are often changed by cutting the dislocation 

lines. 

The stress computation has no flow dependence. If the computation load is distributed over P 

processors, ideally the elapsed time for stress calculation will decrease by a factor of 1/P . To fully 

make use of the box method as described in the previous chapter, it is pertinent to distribute the 

stress computation in the boxes to several processors. 

On the other hand, segment positions updating has a highly complex flow dependency as mentioned 

before. This dependency can be shown as follows. a(i, j, k) represents the quantity of the segments 

(e.g. the number of segments) in a box (i, j, k) indexed along x, y and z direction. In order to 

update a(i, j, k), all the information from the first neighboring boxes are needed because the segment 

interactions needs to take into account all the segments in the first neighboring boxes. In addition, 

any quantity inside the first neighbors are susceptible to modification by the motion of segments


a(i+1,j-1) a(i+1,j) 

a(i,j-1) 

a(i-1,j-1) 

a(i,j) 

a(i-1,j) 

a(i+1,j+1) 

a(i,j+1) 

a(i-1,j+1) 

Figure 3.6: Dependence on neighbors: The center element a(i,j) is being computed. All of the 

surrounding elements are used in the computation and also are modified after computing the center 

element. 

in the (i, j, k) box. This dependence is represented in Fig.3.6 in a simple 2D configuration. Thus a 

special attention should be paid in handling segment interactions so that no boxes are overlapped 

between processors when updating the segment positions. A specific sequence is required to avoid 

updating adjacent boxes in two different processors concurrently. 

Among the existing parallel strategies, that of molecular dynamics and of a finite difference method 

are of particular interest because inter-dislocation stress computation is similar to inter-atomic stress 

computation and the box method divides a simulation volume with 3D arrays of boxes, which is 

similar to a matrix in a finite difference method. 

In molecular dynamics programs, computation of forces on atoms usually accounts for most of the 

CPU time. For each atom i, the total force exerted by the other atoms is computed using a double- 

nested loop in which both loops are running from i = 1 to i = Natom, with Natom being the total 

number of atoms. These loops are often parallelized, for example, by distributing the atoms among 

the different processors. Each processor then computes forces of the resident atoms only. This 

method is referred to data decomposition. 

Matrix often represent physical data at grid points in a finite difference method. A parallel program 

breaks up these matrix and distributes the parts across the processors. This method is called 

domain decomposition. Domain decomposition simply refers to the subdivision or partitioning of


a problem over a number of processors in a parallel program. Various method such as red-black 

ordering and multi-color schemes have been proposed to deal with the data dependency between 

adjacent grids. Further information can be found in [Dongarra et al. 98]. The main point in the 

domain decomposition is how to specify the order of communication among processors to provide 

the necessary data. The number of inter-processor communication and the order are determined by 

the data dependencies of a problem. 

Step 4: Parallelize the program 

MPI is reviewed in more detail because it is the parallel library used in this work and it is used to 

explain the newly developed parallel code as presented in Sec. 3.3. 

The Message Passing Interface Forum (MPIF) has been organized to develop a standard library 

for writing message-passing programs in 1992. The MPIF comprised more than 40 organizations 

and endeavored to make the standard practical, efficient and flexible. Practically, it means that the 

standard should allow convenient C and Fortran bindings and define an interface not too different 

from the practice at that time, e.g., the Parallel Virtual Machine (PVM). The standard aimed at 

efficient communication on a reliable communication interface so that the users need not struggle 

with communication failures. Flexibility is guaranteed by defining an interface that can be imple- 

mented on many vendor’s platforms with no significant changes and allowing usage in heterogeneous 

environments. The first draft of the standard was published in 1994 and revised in 1997 (MPI-2). 

The standard MPI provides descriptions of the parallel tasks as subroutines in Fortran and functions 

in C. Only the Fortran version of MPI is presented here. 

There exist around 192 subroutines in the MPI. All of them facilitate the parallel tasks of a MPI 

program, which could be summarized as i) specifying a group of processors, ii) extracting a rank or 

processor ID and iii) defining message passings between or among processors. It needs not to know 

all of the subroutines since only about a dozen of the subroutines are frequently used to parallelize 

a program. 

The MPI subroutines could be categorized into three main groups as follows: 

• Environment Management Subroutines 

This group controls the overall environment of a MPI program. It includes initialization and 

finalization of a parallel environment. It also includes creation of a communicator or a group 

of processors. 

A general MPI program would look like as follows.


PROGRAM parallel 

INCLUDE ’mpif.h’ 

CALL MPI_INIT(ierr) 

CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) 

CALL MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr) 

Computations here . . . 

CALL MPI_FINALIZE(ierr) 

END 

Line 2 includes ’mpif.h’, which defines MPI-related parameters such as MPI_INTEGER and 

MPI_COMM_WORLD. All Fortran procedures that use MPI subroutines have to include 

this file. Line 3 calls ’MPI_INIT’ for initializing an MPI environment. ’MPI_INIT’ must 

be called only once before calling any other MPI subroutines. In Fortran, ’ierr’ is the return 

code of every MPI subroutine and ’0’ if successful or a non zero if failed. The subroutine 

’MPI_COMM_SIZE’ in line 4 returns the number of processors(nprocs) belonging to the 

communicator(MPI_COMM_WORLD). ’nprocs’ is given as the environmental variable of 

the parallel work. ’MPI_COMM_WORLD’ is an identifier associated with a group of pro- 

cessors and represents the group consisting of all the processors participating in the parallel 

job. Each processor in a communicator has its unique rank, which is in the range [0,nprocs- 

1]. The subroutine, ’MPI_COMM_RANK’ in line 5 returns the rank of the process within 

the communicator. In line 6, each processor does some work on its data, and line 7 calls 

’MPI_FINALIZE’. ’MPI_FINALIZE’ terminates MPI processing and no other MPI call can 

be made afterwards. 

• Point-to-point Communication Subroutines 

This group specifies data exchange between two processors in the communicator. There exist 

blocking and non-blocking communication subroutines. Details are not discussed here and the 

interested reader is advised to refer [Aoyama & Nakano 99]. 

As an example of using non-blocking send/receive subroutines, which are used in the paral- 

lelization of the DDD code by the author, let us consider two processors that need to exchange 

data with each other. 

IF (myrank==0) THEN


CALL MPI_ISEND(a, 1, MPI_REAL8, 1, itag, MPI_COMM_WORLD, ireq1, ierr) 

CALL MPI_IRECV(b, 1, MPI_REAL8, 0, itag, MPI_COMM_WORLD, ireq2, ierr) 

ELSEIF (myrank==1) THEN 

CALL MPI_ISEND(a, 1, MPI_REAL8, 0, itag, MPI_COMM_WORLD, ireq1, ierr) 

CALL MPI_IRECV(b, 1, MPI_REAL8, 1, itag, MPI_COMM_WORLD, ireq2, ierr) 

ENDIF 

CALL MPI_WAIT(ireq1, istatus, ierr) 

CALL MPI_WAIT(ireq2, istatus, ierr) 

In this example, the processor of rank 0 sends a variable ’a’, which is one element and a real 

number to the processor of rank 1 in the communicator ’MPI_COMM_WORLD’. ’itag’ is 

the message tag. ’ireq1’ and ’ierr’ are subroutine return values. The syntax of the subroutine 

’MPI_IRECV’ can be understood in the similar way, and the processor of rank 1 saves the 

received data from rank 0 to variable ’b’. 

• Collective Communication Subroutines 

This group allows the user to exchange data among a group of processors. It happens fre- 

quently that data in one processor need to be shared with all the processors in the commu- 

nicator, or, inversely, data in each processor need to be collected to one processor. It would 

not be efficient to use point-to-point communication in this case considering communication 

latency of a network. The syntax of subroutines in this group comprise a sending and/or 

receiving data array, its size and type and a rank of a processor which send or receive data 

from a communicator. The subroutine, MPI_BCAST, for example, has the following syntax, 

CALL MPI_BCAST(buffer, count, datatype, root, MPI_COMM_WORLD, ierr) 

, where ’buffer’ is broadcasted from a processor of rank root to all processors in the commu- 

nicator, MPI_COMM_WORLD. 

In addition to the groups categorized above, there exist subroutines relating to managing processor 

groups, defining data types and controlling input and output files. The MPI standard also provide 

supports for profiling interface and file management, etc.

3.3 Parallelization of the serial DDD program 67 

3.2.2 Writing a parallel program 

To save efforts to parallelize the serial DDD program, the serial subroutines are kept intact or made 

minor modifications if possible. In the following, parts that need to be modified for parallelization 

are indicated in bold characters on the general computation procedure of the serial code from Sec. 

2.6). 

a. Initialization of parallel environments 

b. Discretization of the segments 

c. Construction of the linked-lists 

d. Updating of the long-distance stresses every f steps 

e. Computation of the short-distance stresses 

f. Motion of segments 

g. Updating the external stresses 

h. Save of outputs 

The modified ’a’ is needed to build a parallel environment involved in partitioning the computation 

into a selected number of concurrent processors. The internal stress computation steps (’d’-’e’) 

need minor modifications. The computation step ’f’ needs complex interactions between processors 

because of the flow dependencies. In the following section, the parallelization scheme is detailed in 

programming perspectives. 

3.3 Parallelization of the serial DDD program 

3.3.1 Initialization of parallel environments 

The boxes which decompose a simulation volume (Fig. 3.7(a)) are partitioned into parallel-piped 

subsystems (Fig. 3.7(b)). The processors in a parallel computer are then logically arranged accord- 

ing to the topology of the physical subsystems, and assigned to each subsystem. 

A processor of rank ’p’ is assigned to a parallel-piped subsystem assuming that the subsystems are 

arranged in a 3D array of dimensions P1, P2 and P3. The total number P of processors required 

is then given by P1 × P2 × P3. Vector IDs of each subsystem on the cartesian system is stored in 

the array nid(:), that is, nid(1) is, for example, in the range of [0 : P 1 − 1]. A processor p is then 

assigned to each subsystem as defined as Eq. 3.1. Remember that each processor is given a unique


(a) Cubic simulation volume using the box method 

X 

Z 

Y 

Proc 1 

Proc 0 

Proc 3 

(b) Parallel-piped subsystems 

Figure 3.7: Domain decomposition of the simulation volume (a) into parallel-piped subsystems (b). 

The use of four processors is assumed and each parallel-piped is allocated to each processor. 

processor identification number (rank) p in the range of [0 : P − 1]. 

p = nid(1) + nid(2)P1 + nid(3)P2P3 

For each processor, the six face-shared neighbor processors are identified by a sequential array, 

nni(:), and given automatically by the following equation. 

X 

Z 

Y 

Proc 2 

(3.1) 

nni(k) = ick(1) + ick(2)P1 + ick(3)P2P3, k = 1, 6 (3.2) 

nni(:) will be used to identify neighbor processors for message-passing. 

In Eq. 3.2, ick(i) is a vector ID of neighbor processor k, and is written in Eq. 3.3 using an 

array iv(:, :) defined in Table. 3.8 and the ’MODULO’ operation. A torus connection between the 

processors is considered in Eq. 3.3 to enforce periodic boundary conditions. 

ick(i) = MOD(nid(i) + iv(i, k) + Pi, Pi) i = 1, 3 & k = 1, 6 (3.3) 

If there exist M boxes along each axis, the boxes are distributed as follows. Suppose when M is 

divided into Pi (number of processors), the quotient is q and the remainder is r, that is, M = qPi+r. 

Processors whose vector ID, nid(i) is smaller than r are assigned q+1 boxes and the other processors 

are assigned q boxes. Total number of boxes along i axis are kept as (q + 1)r + q(Pi − r) equals to


y 

12 

x 

Neighbor ID,k 1 2 3 4 5 6 

iv(3,k) -1 0 0 1 0 0 0 -1 0 0 1 0 0 0 -1 0 0 1 

Table 3.8: The relative location of each neighbor processor 

13 

14 

15 

8 9 10 11 

4 5 6 7 

0 1 2 3 

(a) 

y 

x 

3 4 

Figure 3.8: Top view of 20 × 20 × 20 subboxes being assigned to 4 × 4 × 1 processors. Numbers 

represent processor identification. 

M. This distribution method is useful when the number of boxes M is not divisible by the number 

of processors, Pi. 

Fig.3.8(a) shows the decomposition of 20 × 20 boxes into 4 × 4 subsystems. For simplicity, the 

configuration is considered in 2D, which is equivalent to 3D with P3 being 1, for example. All the 

subsystems have the equal number of the boxes. In Fig. 3.8(a), the number of boxes is not identical 

in each subsystem because M = 20 is not divisible by P1, P2 = 3. 

IDs of the boxes which bound the subsystem of processor p are stored in the array ibs(:): ibs(1), ibs(3) 

and ibs(5) save the first box number along x, y and z direction respectively, and ibs(2), ibs(4) and 

ibs(6) represent the last box number along each direction. Processor 6 in Fig. 3.8(a), for exam- 

ple, is bounded by ibs(1) = 11, ibs(2) = 15, ibs(3) = 6, ibs(4) = 10 and neighbor processors are 

nni(1) = 5, nni(2) = 7, nni(3) = 2, nni(4) = 10. 

6 

0 

7 

1 

(b) 

8 

5 

2


3.3.2 Long-distance stresses computations 

The serial version computes long-distance stresses as follows. The boxes that are situated at long- 

distance relative to one give box are recognized by a topological relation. The stresses due to the 

segments in the long-distance labeled boxes are computed at the center point of the given box. Thus 

one processor scans all the boxes and computes the long-distance stresses in the serial program. 

In a parallel program, the work is divided into several processors, since each processor is responsible 

for a fraction of the boxes. Boxes in each processor are distinguished by the array ibs(6), and each 

processor computes the long-distance stresses of boxes only in its subsystem. 

The serial and the parallel version are compared in the following. 

Serial version 

DO iz=1, M 

DO iy=1, M 

DO ix=1, M 

the box ’ib’ 


compute the long-distance stresses of 

ENDDO 

ENDDO 

ENDDO 

⇒ 

Parallel version 

DO iz=ibs(5), ibs(6) 

DO iy=ibs(3), ibs(4) 

DO ix=ibs(1), ibs(2) 


compute the long-distance stresses of the 

box ’ib’ 

ENDDO 

ENDDO 

ENDDO 

The parallel version uses most of the serial codings and only the range of the loops are slightly 

modified. And it should be noted that each subsystem shares all the segments information at the 

time of computing the long-distance stresses. 

3.3.3 Short-distance stresses computation 

The following pseudo-code explains how the short-distance stresses are computed both in the serial 

and in the parallel DDD code.


Serial version 

DO is=1, Nsegm 

segment ’is’ 

Identify the box ’ib’ containing the 

Compute the short-distance stresses 

due to the segments 

within the short-distance boxes 

ENDDO 

⇒ 

Parallel version 

DO is=1, iscnt(p) 

Identify the box ’ib’ containing the seg- 

ment ’is’ 

Compute the short-distance stresses due 

to the segments 

within the short-distance boxes 

ENDDO 

As for the computation of the long-distance stresses, only small modifications are made to the 

serial coding: in the case of the serial program, all the segments (Nsegm) are processed by a single 

processor. In the parallel program, on the contrary, segments are distributed among several proces- 

sors, and a processor p computes stresses of iscnt(p) segments only. The construction of iscnt(p) 

will be discussed in Sec. 3.3.4. 

Since the stress on a segment can be computed without regard to the stress on the other segments, 

all processors can work independently. The elapsed time for stress computation decreases by a 

factor of 1/P (the number of processors), if the number of segments of each processor is the same. 

Otherwise, the overall elapsed time for the stress computation is determined by the busiest proces- 

sor, because the other processors have to wait until the latest processor finishes the computation to 

move the segments. For higher efficiency, the segments have to be distributed uniformly over the 

different processors. This can be realized by shifting the subsystem boundaries, which changes the 

ibs array and consequently iscnt. This load balancing issue will be addressed in Sec. 3.4.4. 

3.3.4 Data structures for distributing and the gathering segments 

The processors do not work entirely independently in a parallel program. At some point of a pro- 

gram, it needs to collect all the information to one processor or to distribute the data to all the 

processors. An obvious example of gathering information is when data are written in external files. 

One processor normally takes charge of writing files, and the data to be written are sent to that 

processor from the other processors. 

In a parallel DDD program, segments’ information including coordinates, neighbors, linked-list and 

the effective stress etc. need to be communicated. The segments are identified by a vector of integer 

numbers. To send segments’ data to the other processor, the list of segments to be sent should 

be shared between the sender and the receiver processors. The arrays iswork(:,:) and iscnt(;) are


Proc 2 Proc 3 

6 

3 

14 

8 

11 

7 

13 

5 

10 

4 

12 

1 2 

Proc 0 Proc 1 

9 

Figure 3.9: List of segments 

p 0 1 2 3 

iscnt(p) 2 2 5 5 

iswork(i,p) 1 9 8 10 

i 7 2 14 4 

3 13 

11 12 

6 5 

constructed to facilitate this process and contain the list of segment identification number and the 

number of segments in each processor respectively. In Fig. 3.9, for example, four processors treat 

fourteen segments in a 2D configuration. The values of the arrays iswork and iscnt are written in 

the figure as an example. 

The segments in processor p can be recognized by scanning the processor box content and using the 

linked-lists, indexb(ib) and isbox(ib, 2) as described in Sec. 2.5.2. The arrays iswork and iscnt can 

be constructed as follows. 

DO iz = ibs(5), ibs(6) 

DO iy = ibs(3), ibs(4) 

DO ix = ibs(1), ibs(2) 

compute box number ’ib’ from ix,iy and iz 

call Bliste(ib, isliste) 

DO is=1, indexb(ib) 

iscnt(p)=iscnt(p)+1 

iswork(iscnt(p),p)=isliste(is) 

ENDDO 

ENDDO 

ENDDO 

ENDDO 

The subroutine Bliste generates the list isliste of segments in the box ib, and indexb(ib) contains


the number of segments inside this box. For a given processor p, the number of segments and the 

list are saved in iscnt(p) and iswork(:,p) respectively. The arrays iscnt(p) and iswork(:,p) can then 

be shared among all the processors by using the MPI_BCAST subroutine, 

DO irank=0, nprocs-1 

call MPI_BCAST(iscnt(irank), 1, MPI_INTEGER, 

ENDDO 


irank, MPI_COMM_WORLD, ierr) 

call MPI_BCAST(iswork(1,irank), iscnt(irank), MPI_INTEGER, 

ENDDO 

with nprocs being the total number of processors. 


Now all the processors in the MPI_COMM_WORLD communicator share the list of segments in 

each processor, segments’ information gathering or distributing can be realized using the lists. 

3.3.5 Motion of segments 

The segment motion induces interactions with the other dislocation segments. The dislocation in- 

teractions involve complex dependencies as shown in Fig. 3.6. The key idea of handling dislocation 

interactions is to avoid any overlap of neighbor boxes of concurrently updated boxes. The handling 

of the dislocation interactions is managed by first dividing the boxes inside a processor p into three 

groups according to the topology of the neighboring boxes: inner boxes (IB), boundary boxes (BB) 

and corner boxes (CB) (Fig. 3.10). It should be noted that at least three boxes are required along 

each axis in each subsystem to categorize the boxes into these three groups. 

The inner boxes have all their neighboring boxes in the same processor, thus the motion of the 

segments in the inner boxes modifies the segments located in the same processor only. Because 

all the information needed to handle the dislocation interactions are stored in the local memory, 

and there is no overlap of the neighboring boxes between the adjacent processors, the positions 

of the segments in the inner boxes of the different processors can be updated simultaneously and 

independently and involve no message passing. 

The boundary boxes and the corner boxes, on the other hand, have a lack of neighboring boxes 

in the same processor. Thus it needs message passing between processors to obtain the segments’


y 

12 

x 

13 

14 

15 

8 9 10 11 

4 5 6 7 

0 1 2 3 

(a) Parallel-piped subsystems 

CB 

BB 

BB 

CB 

IB BB 

CB BB CB 

(b) Categorization of boxes 

Figure 3.10: Three category of boxes in a processor p: Inner boxes (IB) have all the neighbor boxes 

in the same processor, thus need no communications. Boundary boxes (BB) have lack of neighbor 

boxes and need communications with a neighbor processor. Corner boxes (CB) have lack of neighbor 

boxes in three different processors. 

information from their neighboring boxes and to send back the information modified by the dislo- 

cation interactions. 

In the case of the boundary boxes, all the missing neighboring boxes are situated in the neighboring 

processors, therefore a message passing only with the adjacent processor is sufficient to provide the 

missing information. The corner boxes, however, have neighboring boxes scattered in more than 

four different processors including itself (in a 2D configuration), and thus are bound to involve 

complex message passings. 

Updating the positions of the segments is performed in the following three steps. 

• In the first step, all the segments in the inner boxes of each processor are updated indepen- 

dently and simultaneously. 

• In the second step, the segments in the boundary boxes are updated involving message passing 

with the their respective neighboring processors. The order of computation is from right to 

left in the x,y and z direction order (see Fig. 3.11). 

• In the final step, all the information of segments are collected into one processor (Master 

processor) and segment positions updating in the corner boxes are made in that processor 

only. This procedure avoids at least complex message passing between the different processors.


y 

y 

12 13 14 15 

8 9 10 11 

4 5 6 7 

0 1 2 3 

x 

x 

(a) Inner boxes 

(d) Boundary boxes y+ 

y 

y 

x 

(b) Boundary boxes x+ 

x 

(e) Boundary boxes y- 

Figure 3.11: Overall procedure of motion of segments 

y 

y 

x 

(c) Boundary boxes x- 

x 

(f) Corner boxes 

The overall procedure is drawn in Fig. 3.11. Fig. 3.11 shows that the simulation volume is 

subdivided into nine processors. In the first step, all the inner boxes of each processor are updated 

and the updated boxes are represented as shaded one. The boundary boxes are then updated in the 

order of x, y and z direction, and at the right (plus) and the left (minus) position of each direction 

sequentially. After updating all the boundary boxes, the corner boxes are treated by one processor 

exclusively. In what follows, the details and the corresponding message passing of each step are 

discussed. 

Inner boxes 

The segments in the inner boxes can be identified easily by performing the loops over [ibs(1) + 1, 

ibs(2) − 1], [ibs(3) + 1, ibs(4) − 1] and [ibs(5) + 1, ibs(6) − 1]. The same segment motion algorithm 

as in the serial program can be used to update the list of segments in the inner boxes.


A special care should be taken on the label numbers assigned to the different segments, in order to 

avoid giving duplicate numbers to the newly created segments, in the different processors. Duplicate 

labels can generate confusion when all the segment information are gathered into one processor. A 

new label list can be generated consistently sending to each processor the number of the new 

segments created inside all the other processors. The procedure for renumbering the new segments 

is shown in Fig. 3.12 in the case of four processors. The key point is that the label numbers of the 

newly created segments are updated in real time and the new segments are renumbered in ascending 

orders of processor ranks. 

For label renumbering, the array isnewcnt(p) is increased by one whenever a new segment is created 

inside a given processor p. After all the processors have finished treating the segment motion and 

related interactions, this array is synchronized and used to renumber the newly created segments 

as follows. 


call MPI_BCAST(isnewcnt(irank), 1, MPI_INTEGER, 

ENDDO 

iadd=0 

DO irank=0, p-1 

iadd=iadd+isnewcnt(irank) 

ENDDO 

DO is=nsegm, nsol+1, -1 

isnew=is+iadd 


Shifting segment information from ’is’ to ’isnew’ 

ENDDO 

nsegm is the local number of segments of each processor, and nsol is the global number of segments 

before to treat the segment motion. After renumbering, the global number of segments can be 

computed by summing the array isnewcnt:. 

Boundary boxes 

Fig.3.13 shows the sequence of message passings to update the segment positions inside the bound- 

ary boxes, on the +x direction. Before sending the information, arrays concerning the number of


Proc 2 

6 

3 

14 

8 

11 

Proc 3 

13 

5 

10 

4 

12 

Proc 2 6 

3 

14 

8 

11 

15 

16 

16 

Proc 3 

13 17 

15 

5 

10 

4 

12 

7 

1 2 

9 

7 

17 

16 

1 

15 15 

2 

9 

17 

16 

Proc 0 Proc 1 Proc 0 

Proc 1 

isnewcnt(0)=3 

isnewcnt(1)=3 

isnewcnt(2)=2 

isnewcnt(3)=3 

Synchronize 

Proc 2 6 

3 

14 

8 

11 

21 

22 

24 Proc 3 

13 25 

23 

5 

10 

4 

12 

7 

9 

17 20 

16 

1 

15 18 

2 

19 

Proc 0 Proc 1 

Figure 3.12: Label assignment to the newly created segments 

segments ibcnt and the list of segments ibwork to be sent and be received are synchronized between 

a sender and a receiver processor. The arrays ibcnt and ibwork are constructed and synchronized 

in a similar way as the arrays iscnt and iswork are processed in Sec. 3.3.4. 

Information of segments, e.g. coordinates, neighbor segments and effective stresses, etc., are packed 

in one dimensional buffer arrays of the integer, real and logical types. 

The buffer arrays are then sent to the next processor inext and received from the previous processor 

iprev using the subroutine MPI_ISEND and MPI_IRECV. An example code is shown below. 

call MPI_ISEND(bufsi(1), ibcnt(p)*11, MPI_INTEGER, 

inext, itag, MPI_COMM_WORLD, ireqs, ierr 

call MPI_IRECV(bufri(1), ibcnt(iprev)*11, MPI_INTEGER, 

call MPI_WAIT(ireqs, istatus, ierr) 

call MPI_WAIT(ireqr, istatus, ierr) 

iprev, itag, MPI_COMM_WORLD, ireqr, ierr 

Buffer arrays which are received, e.g. bufri in the above example, then are unpacked in the in-


Send buffer arrays of 

this column to nni(1) 

Receive buffer arrays of 

this column from nni(2) 

To be 

updated 

(a) Before updating the boundary boxes 

Receive modified 

column from nni(1) 

Updated 

Send modified 

column to nni(2) 

(b) After updating the boundary boxes 

Figure 3.13: A sequence of message passings to update the positions in the boundary boxes located 

at +x position (dark grey). Two message passing steps are involved: (a) Send segment information 

in the leftmost column to the neighboring processor in the −x direction and receive information 

from the neighboring processor in the +x direction (b) Send segment information, which is modified 

due to updating the boundary boxes, back to the processor in the +x direction and receive from 

the processor in the −x direction. 

verse sense of the packing using synchronized ibcnt, ibwork. 

When all the necessary information from the neighboring boxes are collected, segment positions in 

the boundary boxes are updated. The segment motion in the boundary boxes also modifies the 

segment configuration in the received boxes. In order to properly synchronize this modification, 

information of the received boxes are then repacked and resent to the original processor. This 

completes the boundary boxes updating in the +x direction, and likewise all the updating in the 

−x, ±y, ±z directions are completed. 

Corner boxes 

Once all the boundary boxes are updated, information of segments of each processor is sent to one 

processor (Master) and the segments in the corner boxes are treated by the Master processor only. 

After finishing the motion of segments in the corner boxes, only one processor (Master) contains 

the final configuration of the segments of the current time step. Before running the next time step, 

the information concerning the dislocation segments are sent to all the processors from the Master 

processor.


Initialization of 

parallel environments 

Discretization of 

the segments 

Linked-lists of 

the segments 

Computation of the 

long-distance stresses 

Computation of the 

short-distance stresses 

Motion of 

the segments 

Update external stresses 

and save outputs 

Inner boxes 

Boundary boxes 

Corner boxes 

(1) 

(2) 

Send/Receive 

Gather (3) 

Broadcast 

Figure 3.14: The overall flow chart of the new parallel DDD code 

3.3.6 Summary and comments 

The overall flow chart of the new parallel DDD code is shown in Fig. 3.14. The ’Motion of segments’ 

step is composed of three parts which correspond to inner, boundary and corner boxes. The message 

passing addresses are also indicated. 

It should be noted that all the processors begin each time step with the same segment information 

(marked as ’(1)’ in Fig. 3.14), although it is not indicated explicitly in the previous sections. 

After the segment discretization, each processor computes and thus alters its local segments’ data 

independently up to ’Inner boxes’ step (’(2)’). While updating information in the boundary boxes, 

two adjacent processors mutually send and receive data and then send the local data to one processor 

(as indicated ’Gather’ in Fig. 3.14). The Master processor then updates all the information in the 

corner boxes and broadcast data to the other processors (as indicated ’Broadcast’ in Fig. 3.14). 

Hence all the processors share the same segment information. 

Thus, there is no gain in a memory aspect of the program by using several processors in the present 

parallel version. The parallel code can further be improved by decomposing the data space, i.e. by 

making each processor to use only the necessary and sufficient amount of memory. This would save 

memory space for parallel computation, and would also decrease the communication overhead and 

eventually increase the performance of the code.


3.4 Performance improvment 

3.4.1 Measure of performance 

It is needed to measure the performance of the new parallel program in terms of gain of the elapsed 

time. The following measure is often used. 

Speedup(P ) = t0 

, where t0 is the elapsed time of the serial program and tp is that of the parallel program using P 

processors. The speedup indicates what the practical advantage is by using the parallel program 

instead of the serial program. t0 can be replaced by the elapsed time of the parallel program 

run with one processor. Then the speedup parameter shows the advantage of the use of several 

processors because both the numerator and the denominator contain overhead for initializing a 

parallel environment. This is often called the algorithmic speedup ratio. The speedup results 

presented in the following section are measured using the algorithmic speedup ratio, because it 

is difficult to compare directly the serial and the parallel DDD program compiled using different 

compilers and run in different platforms. 

The efficiency of a parallel program is a measure of the effectiveness of the hardware usage. The 

efficiency is represented as the ratio of the speedup on P processors to P , that is, Speedup(P )/P . 

An efficiency close to 1 indicates an excellent scalability. 

3.4.2 Conditions for good performance 

In the ideal case, the speedup is a linear function of the number of processors P , i.e. Speedup(P ) = 

P . This case is hardly achievable in practice because of the moderate fraction of the parallelizable 

part of an algorithm, the communication overhead and the load unbalance. 

Suppose that a fraction fp of a serial program can be parallelized and that the remaining 1 − fp 

cannot be parallelized. The Speedup(P ) can be written as the following equation, supposing a 

perfect load balancing and no communication overhead. 

Speedup(P ) = 

tp 

1 

(1 − fp) + fp/P 

Eq. 3.5 is plotted in Fig. 3.15 for fp = 1, 0.99, 0.9 and 0.5. The ideal case (Speedup(P ) = P ) is 

only possible if fp equals to 1, and a maximum speedup of only 10 is expected where fp = 0.9. 

A parallel program involves communication overhead to send and receive data, which does not exist 

(3.4) 

(3.5)

3.4 Performance improvment 81 

Speedup 

50 

45 

40 

35 

30 

25 

20 

15 

10 

5 

0 

f p =1.00 

f p =0.99 

f p =0.90 

f p =0.50 

5 10 15 20 25 30 35 40 45 50 

Number of CPUs 

Figure 3.15: Ideal speedup of a program with the number of processors when only a fraction fp of 

the program is parallelized 

in a serial program. In general situations, the performance of a parallel program is worse since the 

load is not perfectly balanced among the different processors. The performance of general parallel 

programs is shown in Fig. 3.16. It is assumed that only 80% of a serial program is parallelized, and 

the effects of the communication overheads and the load unbalance are shown as well. 

From the figure, it is obvious that good performance can be achieved by good load balancing among 

processors, minimizing communication overhead and increasing the parallelisable fraction f of a 

serial program. Note that the communication overhead can be decreased both by minimizing the 

amount of communication (good algorithm) and by using a fast network (good hardware). 

3.4.3 Performance tests 

A simple speedup model of our algorithm is made and compared to the actual timing results. It 

is assumed that the simulation volume is decomposed in M × M × M boxes and that the total 

number of processors used is P , dividing boxes into a 2D array of P 1/2 × P 1/2 × 1 or into a 3D array 

of P 1/3 × P 1/3 × P 1/3 . The elapsed time by using a single processor is approximately the sum of 

the time needed for the stress computation (t s stress) and that used to update the positions (t s update ). 

Assuming that t s update is a fraction of ts stress (t s update = αts stress), the total elapsed time t s then is


Serial 

Parallel 

cpu 1 

cpu 2 

cpu 3 

cpu 4 

20 80 

20 20 

Load unbalance 

Unparallelizable part 

Parallelizable part 

Communications 

Figure 3.16: Load unbalance and communication overhead of general parallel programs 

written as Eq.(3.6). 

t s = t s stress + t s update = (1 + α)ts stress 

The number of inner boxes (BI), boundary boxes (BB) of each processor and the total corner boxes 

(BC) can be expressed using M and P as Eq.(3.7) in the case of a 2D array and as Eq.(3.8) in the 

case of a 3D array of processors. It is assumed that every processors have the same number of boxes 

in its subsystem. 

BI = 

 

M 

BI = M − 2 

P 1/2 

 

M 

− 2 

P 1/3 

3 

2 

(3.6) 

 

M 

; BB = 4M − 2 ; BC = 4MP (3.7) 

P 1/2 

2 

M 

; BB = 6 − 2 ; BC = P 12 

P 1/3 M 

 

− 16 

P 1/3 

If dislocation segments are homogeneously distributed over all the processors, the elapsed time for 

the stress computation of each processor (t p 

stress ) is merely a division of ts stress by P . Considering 

that the elapsed time for updating segments’ positions of a box is t s update /M 3 , the elapsed time of a 

processor (t p ) for both the stress computation and the segment motion can be expressed as Eq.(3.9). 

t p = t p 

stress + tp 

update 

(3.8) 

= ts stress 

P + ts update 

M 3 (BI + BB + BC) + tc (3.9) 

The elapsed time for updating BC is included on each processor, because every processors wait until 

the updates of BC by the Master processor are finished. tc represents the time needed for message


Speedup 

70 

60 

50 

40 

30 

20 

10 

0 

Ideal case 

3D, t c =0 

2D, t c =0 

3D, t c =0.02t s stress 

2D, t c =0.02t s stress 

10 20 30 40 50 60 


Figure 3.17: Speedup model of the algorithm (Eq.(3.9)) with M = 21, t s update = 0.02ts stress for 2D 

array of processors (2D) and 3D array of processors (3D) 

passings. 

A speedup (t s /t p ) is plotted in Fig.3.17 using Eq.(3.9) with M = 21, α = 0.02 and tc/t s stress = 

0., 0.02. The curve is drawn up to P = 49 in the case of 2D array of processors. Note that a 

maximum of 49 processors can be used with M = 21 in the 2D array of processors, since there 

should be at least three boxes along any coordinate axis. 

The speedup of the algorithm is strongly dependent on the network speed. If the network is fast 

enough (tc 0), the algorithm speedup can be as high as 23 using 25 processors with the 2D array 

of processors. 

It seems that the 3D array of processors have an advantage over the 2D array if the same number 

of processors are used. In reality it is controversial, because a 3D array of processors involves more 

messages passing than a 2D array. A 3D array needs message passings along all of three coordinates, 

whereas a 2D array needs message passings along only two coordinates. The size of each message, 

however, is smaller in the case of a 3D array of processors. 

Dislocation structures with 13185, 37182, 57605 and 77198 segments are extracted from a simple 

tensile test of a single crystal with M = 20. Then execution time for 100 steps with zero applied 

stress is measured and the elapsed time per step is averaged by dividing the execution time by


Time per step(seconds) 

160 

140 

120 

100 

80 

60 

40 

20 

0 

0 5 10 15 20 25 30 35 40 


N=77198 

N=57605 

N=37182 

N=13185 

Figure 3.18: Elapsed time per step, in seconds, as a function of the number of processors for 13185, 

37182, 57605 and 77198 segments. 

100. Fig.3.18 shows the average elapsed time required to complete one time step using up to 

36 processors in a 2D array of processors on the IBM p690 architecture with 1.7GHz POWER4 

processors 13 . Fig.3.19 shows the speedup of each number of processors and compares the actual 

data with the speedup model. Measured data agrees well with the model except in the 13185 

segments case. The speedup decrease in the 13185 segments case for large values of P is due to the 

the proportion of the computation time over the communication time decreases with the number of 

processors. 

3.4.4 Load balancing 

As pointed out in Sec. 3.4.2, good load balancing is crucial to achieve high performance of a parallel 

computation. In many cases, DDD simulations involve highly heterogeneous dislocation structures. 

An example is the formation of intense slip bands in fatigue simulations as shown in Fig. 3.21 (see 

Sec. 4.3). Fig. 3.21 shows the worst case in load balancing, due to the inherent highly heterogeneous 

13 The author would like to acknowledge the support from KISTI (Korea Institute of Science and Technology 

Information) under "the 5th Strategic Supercomputing Applications Support Program’ with Dr. Sangmin LEE as 

the technical supporter. The use of the computing system of the Supercomputing Center is also greatly appreciated.


Speedup 

40 

35 

30 

25 

20 

15 

10 

5 

0 

Ideal case 

N=37182 

N=57605 

N=77198 

N=13185 

t c =0.015t s stress 

0 5 10 15 20 25 30 35 40 


Figure 3.19: Speedup by using P processors in 2D array for 13185, 37182, 57605 and 77198 segments 

(on IBM p690 architecture) 

Efficiency 

1.1 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

Ideal case 

N=37182 

N=57605 

N=77198 

N=13185 

0 5 10 15 20 25 30 35 40 


Figure 3.20: Efficiency by using P processors in 2D array for 13185, 37182, 57605 and 77198 

segments (on IBM p690 architecture)


(a) Intense slip bands of fatigue tested volume con- 

taining bimodal-sized particles 

[010] 

6 7 8 

3 4 5 

0 1 2 

[100] 

(b) Decomposition of the simulation volume by 3 × 3 × 1 proces- 

sors 

Figure 3.21: An example taken from fatigue tests of cylindrical simulation volume containing parti- 

cles of bimodal size distribution (see Sec.4.3). Load is highly unbalanced among processors due to 

the highly heterogeneous dislocation microstructure 

dislocation microstructure in fatigue and the geometry of the simulation volume. 

If the simulation volume is decomposed into equal sized subsystems as shown in Fig. 3.21(b), there 

is a high discrepancy in the number of segments between the different processors, and consequently 

in the computation time. A load-balance method is thus highly desirable to equilibrate the proces- 

sor loadings. 

One obvious way to better balance the loads is to shift the boundaries of each subsystem, or the 

array ibs, so that each processor has approximately the same number of segments, since the com- 

putation time is usually proportional to the number of segments. In surface grain simulations, 

however, the number of segments may not be a good yardstick because some segments are treated 

as virtual ones and need no internal stress computation, which will be detailed in Sec. 4.3. Hence 

the actual elapsed time for stress computations is taken as an indication for load balancing. 

The elapsed time is measured by using, for example, the MPI timer function MPI_WTIME(). Load 

balancing is processed every fbalan steps. To minimize the overhead of load balancing, the processor 

which having the minimum elapsed time is determined one step before the load balancing. This 

processor then takes charge of shifting the boundaries. This makes the overhead of load balancing 

to be hidden in the process of overall stress computation.


Y 

Z 

Initial 

boundaries 

Current 

boundaries 

Figure 3.22: Shifting of subsystem boundaries to balance load among processors 

The elapsed time for the stress computation of each processor is gathered to the processor in charge. 

The processor adds the elapsed time of processors in the same column along x, y, and z axis in the 

processor array, and the average elapsed time of the columns is calculated. By comparing the 

elapsed time of a column to the average time, the boundary is shifted by one box so that the size 

of a column increases or decreases. The boundary can move until the number of boxes has reached 

the minimum number of boxes of a subsystem on any axis (3 boxes). 

The boundary adjustment procedure during a fatigue test is shown in Fig. 3.22. Dislocations in 

the top-right part of the cubic simulation volume are ’virtual’, so less computation time is needed 

to treat them. The boundaries of each subsystem thus move toward bottom-left of the simulation 

volume until they share a comparable number of real dislocation segments. 

The load balancing by parallel-piped subsystems has the following limitations: (i) The different 

subsystems on the same column should have the same width, thus the computing load is balanced 

among the columns, not among the processors. (ii) There should be at least three boxes along 

each axis of each subsystem, thus load concentration smaller than three boxes can not be balanced 

further. 

During simulations, the number of segments can change dramatically. It is not unusual that an 

initial Frank-Read source produces millions of segments. When the number of segments is small, 

X


the efficiency of the new parallel DDD program is quickly decreasing with the number of processors 

(Fig. 3.20), and the speedup could be even reversed by using more processors (Fig. 3.19). One way 

to guarantee a maximum efficiency and to prevent inverse speedup of multiple processors would be 

to change the number of processors dynamically based on the current number of segments. This 

can be done, for example, by creating a new communicator of n [1 : N] processors in the initial 

communicator of N processors. 

3.4.5 Comparison of simulation results between the serial and parallel DDD 

code 

The simulation results of a parallel program should not be significantly different from that of a 

serial program when addressing the same problem. There could be a slight difference, however, due 

to the parallelization because the order of computations might be changed. 

In DDD simulations, the segments are moved sequentially, and two different orders of segments 

can results in different dislocation configurations even though the applied stresses are the same. 

Nevertheless the overall stress-strain relation and the dislocation density in the simulation should 

be consistent when using the parallel or the serial DDD program. 

Fig. 3.23(a) shows the stress-strain curves of a simple tensile test along [001] direction of a single 

crystal. It can be verified that the curve of the parallel program is consistent with that obtained 

using the serial program. The number of dislocations is slightly different by using the two programs 

but the error is negligible as compared to the overall evolution of dislocations (Fig. 3.23(b)). 

3.5 Application to Stage I-II transition simulation 

In this section, resorting to the performance of the new parallel DDD code, it is attempted to 

simulate the transition from Stage I to Stage II in the stress-strain curves of FCC single metals 

subjected to a uniaxial tension. 

3.5.1 Stress-strain curves of FCC single crystals 

In the general case, when a FCC single crystal is subjected to tensile tests, the stress-strain curves 

represent three distinctive stages, I, II and III. Fig. 3.24 shows stress-strain curves from experiments 

in copper crystals covering a wide range of orientations. Stage I or ’easy glide’ is a region of low

3.5 Application to Stage I-II transition simulation 89 

Stress[MPa] 

180 

160 

140 

120 

100 

80 

60 

40 

20 

Parallel 

0 

Serial 

0.0e0 1.0e-4 2.0e-4 3.0e-4 4.0e-4 

Strain 

(a) Stress-strain curves 

Number of segments 

40000 

35000 

30000 

25000 

20000 

15000 

10000 

5000 

Parallel 

0 

Serial 

0 500 1000 1500 2000 2500 3000 

Step number 

(b) Number of segments 

Figure 3.23: Comparison of (a) stress-strain curves and (b) number of segments of the serial and 

the parallel DDD program in a tensile test simulation 

linear hardening (θ = ∂τ 

∂γ 

G ∼ 300 ) and is observed at the beginning of deformation. Stage II or ’linear 

hardening’ is a second linear region with a much greater rate of work hardening (θ ∼ G 

30 ), then is 

followed by Stage III or ’parabolic hardening’, which represents a period of decreasing rate for the 

hardening. Fig. 3.24 also shows that the shape of the curves is strongly dependent on the crystal 

orientation, e.g. the orientations close to the [001] − [¯111] side show a short or no Stage I, whereas 

the orientations far from the boundaries of the standard triangle show a long Stage I. 

The slip system on which the resolved shear stress is the highest is called the primary system, and 

the deformation commences on this system involving a low work hardening rate (Stage I). The 

accumulated slip rotates the orientation of the crystal, and subsequently the resolved shear stress 

on different slip systems is modified as the slip direction rotates towards the tension axis. When 

the tensile axis arrives the [001] − [¯111] side, another slip system (the conjugate system), which has 

been inactive initially, begins to activate, and the interactions of two slip systems initiates Stage II. 

3.5.2 Simulation conditions 

The initial dislocation configuration is made of 5.65 µm-long Frank-Read sources homogeneously 

spread over the 12 slip systems (see Fig. 3.25(a) 14 ). The orthorhombic simulation box has been 

used with the ratio of the axis’ length close to 40 : 30 : 31, and the periodic boundary conditions 

are applied along all the axes. The simulation volume is around 577 µm 3 and the initial dislocation 

14 Different colors represent dislocations on different slip systems


Figure 3.24: Resolved shear sress/shear strain curves of copper crystals as a function of orientation. 

([Diehl 56]) 

density is 8.82 × 10 11 m −2 (90 sources). 

The materials parameters of copper are used : G = 42000 MPa (Shear modulus), ν = 0.324 

(Poisson’s ratio), b = 2.56 Å(Burgers vector magnitude), B = 10 −5 Pa s (Viscous drag coefficient), 

V/b 3 = 350 (Activation volume) and τIII = 32 MPa (Threshold stress). 

The initial tensile axis (T) has been chosen to [ ¯14 15 25], so that the initial configuration is in 

single glide close to the double glide axis. The resulting Schmid factor is shown in Fig. 3.25(b). 

The primary system is the system B4: (111)[¯101] and the conjugate system is C1: (¯1¯11)[011] (the 

notations of Schmid and Boas are recalled in Table 2.1). The simulation runs in the constant strain- 

rate condition of ˙ɛ = 100sec −1 . The rotation tensor of the crystal is computed every time step using 

Eq. 3.10, and the new tensile stress axis (T ′ ) is updated as Eq. 3.11 

dW = 

12 

s=1 

1 

2 (ms ⊗ n s − n s ⊗ m s ) (3.10) 

T ′ = (I + dW) T (3.11) 

This way of updating the orientation of the tensile axis reproduces the experimental tensile tests 

where the crossheads of the tensile machine are unconstrained, i.e. the rotations of the crosshead 

are allowed.


(010) Y 

(100) X 

(001) Z 

(a) Initial dislocation configuration 

3.5.3 Simulation results 

Schmid factor 

0.3 

0.2 

0.1 

0 

B4 D4 D1 C1 B5 C5 D6 A6 A2 B2 C3 A3 

(b) The initial distributions of the Schmid fac- 

tors 

Figure 3.25: Initial conditions of the simulation 

In Fig. 3.26(a), the tensile stress-strain curve is plotted. The curve shows no significant hardening 

up to the strain level of 1.3%. The dislocation configuration at a cumulated tensile strain of about 

1.% are shown in Fig. 3.27, and the dislocation structure shows that the dislocations in the primary 

system are mainly activated, thus the simulation is still in Stage I at this moment. 

The stress-strain curve shows an insignificant hardening or even a negative hardening after, say, 

0.5% cumulated strain. This phenomenon seems to be related to the enhanced cross slip of the 

primary dislocations due to the spurious dipoles generated by the periodic boundary conditions. 

Indeed, the evolution of the dislocation densities plotted in Fig. 3.26(b) shows that the density of 

the cross slip dislocations of the primary system is significant even though the Schmid factor and 

the observed shear strain on the deviate (or cross-slip) system are negligible. 

However, the rotation of the tensile axis is well accounted for in the simulation. Fig. 3.28(a) shows 

the rotation of the stress axis plotted within the standard stereographic triangle. The stress axis 

rotates toward the [001] − [¯111] boundary. Subsequently, the Schmid factors are modified and the 

ratio between the primary and the conjugate system increases toward 1 as plotted in Fig. 3.28(b). 

Despite the spurious softening, the shear stress-strain curves of the primary and the conjugate 

system (see Fig. 3.29) demonstrate that the hardening is decreased in the primary system due to 

the rotation of the axis, whereas the hardening is more pronounced in the conjugate primary system. 

This typical simulation of the behavior of a bulk crystal of copper is a first tentative of a massive


σ 11 

30 

25 

20 

15 

10 

5 

0 

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 

ε11 (a) The tensile stress-strain curve 

ρ[10 11 m-2] 

140 

120 

100 

80 

60 

40 

20 

B4 

D4 

D1 

C1 

B5 

C5 

D6 

A6 

A2 

B2 

C3 

A3 

0 

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 

ε11 (b) The evolution of the dislocation densities 

Figure 3.26: The tensile stress-strain curve and the evolution of the dislocation densities 

Figure 3.27: The dislocation configuration at a cumulated tensile strain of about 1.% 

(010) Y 

(100) X 

(001) Z


τ(B4) [MPa] 

12 

10 

8 

6 

4 

2 

[-14 15 25] 

[-111] 

[001] [011] 

(a) The rotation of the tensile axis 

τ(C1)/τ(B4) 

0.985 

0.98 

0.975 

0.97 

0.965 

0.96 

0.955 

0.95 

0.945 

0e+00 1e+05 2e+05 3e+05 4e+05 5e+05 6e+05 7e+05 

Number of steps 

(b) The ratio between τ(C1) and τ(B4) 

Figure 3.28: The rotation of the tensile axis and the modified Schmid factors 

0 

0e+00 5e-03 1e-02 1e-02 2e-02 3e-02 3e-02 4e-02 

γ(B4) 

(a) The shear stress-strain curve of the primary system 

τ(D4) [MPa] 

12 

10 

8 

6 

4 

2 

0 

0e+00 1e-04 2e-04 3e-04 4e-04 5e-04 6e-04 

γ(D4) 

(b) The shear stress-strain curve of the conjugate system 

Figure 3.29: The shear stress-strain curves of the primary and the conjugate system 

simulation using the new parallel DDD code. The amount of cumulated shear strain on the primary 

system reaches 3% and the number of segments at the end of the simulation was close to 43, 000. 

More investigations are needed to understand the reason of the softening observed during Stage I. 

The transition to the stage II should nevertheless be observed in a few more steps, when the Schmid 

factor on the conjugate system (C1) will be high enough to multiply its dislocation density, and 

hinder the dislocation motion on the primary system (B4).


Key points 

• Parallel models and languages are strongly dependent on the different types of parallel 

computers. For shared memory machines, fork-join model is usually applied using 

OpenMP or Pthread. The message passing model is adequate for distributed memory 

architectures using the MPI as the programming language. 

• Distributed memory system and MPI have been chosen here for the development of 

a parallel DDD code. To handle the dislocation interactions during the segments 

motion is the most complex part for parallelization. Attention is focused on the fact 

that the 3D array of boxes physically dividing the cubic simulation volume is similar 

to a matrix in the computer memory space. 

• The cubic simulation volume is decomposed into parallel-piped subsystems which are 

mapped to processors. The internal stress computation involves small modifications 

in the serial DDD code. The segment motion algorithms have been developed for the 

different box types categorized into inner, boundary and corner boxes. Attention is 

paid to avoid any overlap of the neighboring boxes between the processors. 

• The performance of the new parallel DDD program is measured and compared to a 

simple speedup model. The boundaries of each parallel-piped system are dynamically 

moved to balance the computing load among the processors. Simulation results of 

the parallel DDD version corresponds well to those obtained with the serial version. 

A speedup of around 17 is found when using 25 cpus to handle more than 30, 000 

segments 

• The new parallel code has been applied to simulate the Stage I-II transition of single 

FCC metals subjected to a uniaxial tension. The simulation runs up to the 3.2% 

cumulated shear strain at this moment and still remains in Stage I. However, the 

rotation of the axis is well accounted for, as measured on the evolution of the Schmid 

factors. The softening observed on the mechanical response is attributed to avalanche 

of cross-slip events which may be induced by the periodic boundary conditions.

Chapter 4 

Dislocation-precipitate interactions 

4.1 Image stresses due to a 3D particle 

4.1.1 Motivations and review of the literature 

The image forces need to be considered when one wants to study the behavior of dislocations near 

a free surface. Another level of complexity arises when there are internal interfaces in metals, such 

as voids, second phase particles and microcracks. The magnitude of the interaction forces is at least 

required to understand the dislocation behaviors around the internal interfaces. There have been 

many studies to obtain the interaction both in analytical and numerical ways. 

Free surfaces 

The effect of a free surface can be easily treated in 2D by introducing the mirror images so that the 

traction on the free surface would be forced to zero. The problem is more complex in 3D because it is 

almost impossible to find analytically the image dislocation for a finite dislocation segment that is not 

parallel to the free surface. This problem can be solved using either the solution of the Boussinesq 

problem ([Fivel et al. 96]) or the superposition principles using FEM ([Fivel et al. 98]). The 

main idea of this method is to apply point forces on the free surface so that these forces nullify the 

surface stress field generated by a dislocation in an infinite medium on the free surface. Using this 

method, the dislocation depletion near a free surface can be computed and the direct comparison 

can be made between the dislocation structure calculated in dislocation dynamics simulation and 

the experimental observations, e.g. TEM (Transmission Electron Microscopy). 

The dislocation-free zone near a crack tip ([Kobashi & Ohr 80]) and the plastic zone yielding from

96 Dislocation-precipitate interactions 

a crack ([Vitek 75]) are other fields where the image force acting on a dislocation is an important 

factor to consider. There are several analytical solutions on the interaction of a dislocation line 

and a hole or rigid inclusion. To make the problem to be a simple 2D case, it is generally assumed 

that a infinitely long dislocation line is interacting with an infinitely long cylindrical inclusion 

(hole or rigid) ([Santare & Keer 86], [Zhou & Lung 88], [Chen et al. 99]). The authors used 

a complex potential approach in plane strain restriction for an isotropic medium. They solved the 

elastic solution satisfying the stresses and displacements continuity at the interface. The application 

of these 2D solutions is rather limited to the case of fiber-strengthened composite or microcracks 

with large aspect ratio. 

Particles 

Even for a simple geometrical shape like a spherical particle, the calculation of the interaction force 

between a dislocation line and a particle satisfying the stress and displacement continuity across 

the interface is not an easy task. The elastic problem to satisfy the rigorous boundary conditions is 

too complex to solve in an analytical manner. Instead of the exact solution, approximate solutions 

have been obtained for 3D shapes of particle. One method consists of using the interaction energy 

between a dislocation line and a particle. It is assumed that the interaction energy is equal to 

the change of the energy density of a dislocation line by the presence of a particle volume. The 

interaction force or the image force is obtained by differentiating the interaction energy. This ap- 

proach tells that the ratio of the image force acting on an edge dislocation and a screw dislocation 

is equal to the ratio of the energy density of a dislocation of the respective type. Using this method, 

an analytical equation is obtained for the force acting on a screw dislocation line near a cubical 

particle ([Melander & Persson 78]) and the force on an edge and a screw dislocation near a 

spherical particle are calculated numerically ([Nembach 83]). The long-range interaction between 

a screw dislocation and a spherical inclusion has been treated assuming that a straight disloca- 

tion line is located far from a spherical particle so that the particle disturbs the uniform stress 

field([Weeks et al. 69],[Comninou & Dundurs 72]). 

The interaction force due to a second phase particle with an elastic modulus mismatch is considered 

as negligible compared with the lattice mismatch and the stacking fault energy mismatch effect in 

the case of a penetrable particle ([Nembach 97]). The interaction forces by an elastic modulus 

mismatch, however, increase with the number of dislocations around the particles 1 . The image 

1 This type of interaction is referred to the paraelastic interaction ([Nembach 97]).

4.1 Image stresses due to a 3D particle 97 

(-1-12) Z 

(111) Y 

(1-10) X 

(a) Cylindrical particle 

y=0.86Rp 

y=0.5Rp 

y=0 

(111) Y 

(-1-12) Z (1-10) X 

(b) Spherical particle 

Figure 4.1: Computation geometries of (a) a cylindrical particle and (b) a spherical particle 

stresses by particles thus could be an appreciable factor in the computation of the energy state of 

dislocation structures around a particle and in the phenomena involving several dislocations like 

work hardening rate. Moreover in dispersion-strengthened alloys at high temperature, the interac- 

tion force on a single dislocation line in climb direction is essential to investigate high temperature 

properties, for example creep threshold stresses ([Marquis & Dunand 02]). 

Scope of this section 

Image forces on a long, straight dislocation line near a particle are computed using the decompo- 

sition method as detailed in Sec. 2.4.2. Three cases are considered: a cylindrical (Fig. 4.1(a)), a 

spherical (Fig. 4.1(b)) and a cubical particle. The cylindrical particle case can be compared with 

analytical solutions of 2D circular particles. Image forces along both a glide and a climb direction 

are considered. 

4.1.2 Interaction of an edge dislocation with a circular cylindrical particle 

Image forces on an edge dislocation around a rigid particle have been solved analytically in 2D by 

Santare et al. ([Santare & Keer 86]) and around a void by Vitek ([Vitek 75]) and Chen et al. 

([Chen et al. 99]). The analytical solutions are obtained using complex potentials. 

In the case of a rigid particle, the image force of an edge dislocation projected along the glide


y/Rp 

1.5 

1 

0.5 

1 

0 

-1 -0.5 0 0.5 1 1.5 

-0.5 

-1 

5 

9 

2 

3 

7 

4 

1 

x/Rp 

(a) Analytical solution [Santare & Keer 86] 

y/Rp 

1.5 

1 

0.5 

0 

1 

0 

-1 -0.5 0 0.5 1 1.5 2 

x/Rp 

-0.5 

-1 

(b) Numerical solution 

Figure 4.2: Case of a rigid particle (The image force is normalized by µmb 2 /(4π(1 − νRp)) (a) 

Analytical solution [Santare & Keer 86] (b) Numerical solution (FEM/DDD) 

direction can be simplified as follows ([Santare & Keer 86]) 

F 

µmb 2 

4π(1−ν)Rp 

= x 4x 4 + k 2 x 2 + 2k 2 x 2 y 2 − 3x 2 − 2kx 2 y 2 

(x 2 + y 2 ) 3 (x 2 + y 2 − 1) k 

+ x 2k 2 y 4 + 2ky 2 − 4y 4 − k 2 y 2 + 5y 2 − 2ky 4 

(x 2 + y 2 ) 3 (x 2 + y 2 − 1) k 

with µm, ν being the shear modulus and the Poisson’s ratio of the matrix, Rp being the radius of 

the particle and k = (3 − 4ν) for a plane strain condition. 

There are two solutions of image forces around a circular void. The solutions are written in Eq. 4.2 

([Chen et al. 99]) and Eq. 4.3 ([Vitek 75]). 

F 

µmb 2 

4π(1−ν)Rp 

= −2x x 6 + x 4 y 2 + 4x 2 y 2 − x 2 y 4 + 4y 4 − 2y 2 − y 6 

(x 2 + y 2 ) 3 (x 2 + y 2 − 1) 

F 

µmb 2 

4π(1−ν)Rp 

= −2x 2x 4 − x 2 + 2x 2 y 2 + y 2 

(x 2 + y 2 ) 3 (x 2 + y 2 − 1) 

Image forces are computed numerically around a cylindrical particle with a height axis [¯1¯12], which 

is parallel to an edge dislocation. The shear modulus of the cylinder is set to be 10 3 µm for the 

rigid case and 10 −3 µm for the void. The contours of the image forces acting on an edge dislocation 

and projected along the glide direction are shown in Fig. 4.2 for the case of a rigid inclusion. The 

forces are normalized by µmb 2 /(4π(1 − νRp)). The image force profiles are computed on three lines 

45 

9 

3 

2 

1 

(4.1) 

(4.2) 

(4.3)


F/(µ m b 2 /(4π(1-ν)R p )) 

15 

10 

5 

0 

-5 

-10 

y=0.5R p 

y=0.86R p 

y=0.86R p 

y=0.5R p 

y=0.0 

y=0.0 

[SANTARE & KEER 86] 

[CHEN et al. 99] 

[VITEK 75] 

DDD/FEM 

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 

x/R p 

Figure 4.3: Normalized image forces on an edge dislocation situated at x/Rp from the center of a 

circular hole or a rigid cylinder along three lines with different stand-off distances (Rp: radius of 

cylindrical inclusion) Solid line: Analytical solution (Rigid case: [Santare & Keer 86], Hole case: 

[Vitek 75], [Chen et al. 99]) Points: Calculated by FEM/DDD 

with a direction of [1¯10] and a stand-off distance of 0., 0.5Rp and 0.87Rp from the plane passing 

through the center of the cylinder, and are plotted in Fig. 4.3 and compared with the analytical 

solutions. Numerical solutions fit well to the analytical solution(Eq. 4.1) in the rigid particle case. 

The computed image forces fall between two analytical solutions (Eq. 4.2 and Eq. 4.3) in the case 

of the void. 

From the 2D cylindrical circular case, it is thus validated that the image forces can be computed 

correctly using the FEM-DDD coupled method. 

4.1.3 Interaction of an edge dislocation with a spherical particle 

There is no existing analytical solution of image forces in the case of a spherical particle. However, it 

is common to encounter a precipitate or an inclusion of a spherical shape. Thus, numerical solutions 

of the spherical particle case would be useful in practice. 

Before computing image forces, both the issues of convergence and accuracy of the numerical solu- 

tions have been addressed. Using 20-node 3D elements, it was verified that the numerical solutions


F/(µ m b 2 /(4π(1-ν)R p )) 

12 

10 

8 

6 

4 

2 

0 

y=0.86R p 

y=0.5R p 

[SANTARE & KEER 86] 

y=0.0 

DDD/FEM, y=0.0 

DDD/FEM, y=0.5R p 


0.6 0.8 1 1.2 

x/Rp 1.4 1.6 1.8 

Figure 4.4: Normalized image forces on an edge dislocation situated at x/Rp from the center of 

a spherical rigid particle. (Rp: radius of a spherical particle) Solid line: Analytical solution for a 

cylindrical rigid inclusion [Santare & Keer 86] Points: Calculated by FEM/DDD 

are converging by increasing the number of elements and the accuracy of the solutions were checked 

using the isobands method ([Bathe 96]). A mesh using 6656 20-node elements is found to be able 

to represent the high stress gradients correctly, although the 3D spherical particle problem involves 

rough and discontinuous point load distributions on the elements of the particle volume. 

Image force profiles are obtained on three lines with a direction of [1¯10] and a stand-off distance 

of 0., 0.5Rp and 0.87Rp from the plane passing through the center of the sphere. In Fig. 4.4, the 

image force profile is compared with the corresponding 2D analytical solution (Eq. 4.1) for the case 

of a rigid particle and in Fig. 4.5 for the void particle with Eq. 4.2. 

The magnitude of the image force in the spherical particle case is lower than in the corresponding 

cylindrical case and the difference increases with the stand-off distance of the glide plane of the 

dislocation. It should be noted that the difference is much significant in the case of a spherical void. 

It can be deduced that the interaction volume is smaller in the case of a spherical than that of a 

cylindrical particle. 

The computed image force profiles are fitted in the form of α/(x/Rp) β . The profiles are divided in 

two regions, with a high gradient (up to x = 1.4Rp) and with a moderate gradient. The parameter


F/(µ m b 2 /(4π(1-ν)R p )) 

0 

-1 

-2 

-3 

-4 

-5 

-6 

-7 

-8 

y=0.86R p 

y=0.5R p 

y=0.0 

[CHEN et al] 

DDD/FEM, y=0.0 



0.6 0.8 1 1.2 

x/Rp 1.4 1.6 1.8 

Figure 4.5: Normalized image forces on an edge dislocation situated at x/Rp from the center of a 

spherical void. (Rp: radius of a spherical particle) Solid line: Analytical solution for a cylindrical 

hole [Chen et al. 99] Points: Calculated by FEM/DDD


β is in the range of 6.42 to 8.89 in the high gradient region and 4.34 to 4.84 in the moderate gradient 

region. This result is consistent to the argument of Comninou et al. ([Comninou & Dundurs 72]), 

which shows that the interaction force is proportional to (x/Rp) −4 assuming that a straight dislo- 

cation line is located far from a spherical particle. Although the authors derived the equation in 

the case of a screw dislocation, the scheme can also be applied to an edge dislocation. 

4.1.4 Interaction of an edge and a screw dislocation with a cubical particle 

A cubical particle on a FCC matrix habit plane of {111} is now considered. It facilitates the problem 

because a dislocation line lies parallel to an edge and a face of the cube. The side length 2a of the 

cube is set to be 1.612Rp so that the volume of the cube is equal to that of the spherical particle 

considered previously. Image forces are computed on three glide planes with the stand-off distance 

y=0., y=0.5a, y=0.87a. The shear modulus of the particle (µp) is set to be twice that of the matrix 

and the Poisson’s ratio ν is 0.312 for both the particle and the matrix. 

This configuration was also proposed by Melander ([Melander & Persson 78]) using the energy 

density of a screw dislocation line. Image forces are obtained by differentiating the interaction 

energy. The image force of a screw dislocation is given by 

F = (µp − µm)b 2 

8π 2 a 

⎡ 

⎣ tan−1 

 

(Y −1) 

|X−1| − tan−1 |X − 1| 

 

(Y +1) 

|X−1| 

tan 

− 

−1 

, where X, Y are coordinates normalized by the half side length a. 

 

(Y −1) 

|X+1| − tan−1 |X + 1| 

(Y +1) 

|X+1| 

⎤ 

⎦ (4.4) 

Computed image force profiles on an edge dislocation line are shown in Fig. 4.6. The decrease of 

the image force with the stand-off distance is not as fast as in the spherical particle case, since the 

dislocation line is parallel to one face of the cubical particle and the glide plane is normal to the 

face. It is found that the image force is 20% higher in the case of a cubical particle at the stand-off 

distance of 0 than in the case of a spherical particle. Image forces on a screw dislocation shows that 

the ratio between an edge and a screw dislocation is around 0.68, which is close to (1-ν). However, 

Eq. 4.4 fits the edge dislocation case well, even though Eq. 4.4 is solved in the case of a screw 

dislocation. It is not clear whether this discrepancy is due to the mesh size or the approximations 

used to derive Eq. 4.4. 

The image force magnitude along the climb direction of an edge dislocation is given by 

τcl = 

((b · σ) × t) 

|b| 

· (b × t) 

|b × t| 

(4.5)


F/((µ p -µ m )b 2 /(4π(1-ν)a)) 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

0 

y=0.87R p 

Edge,y=.00Rp 

Edge,y=.50Rp 

Edge,y=.87Rp 

Screw,y=.00Rp 

[MELANDER & PERSSON 78] 

y=0.0 

y=0.5R p 

1.1 1.2 1.3 1.4 

x/a 

1.5 1.6 1.7 1.8 

Figure 4.6: Image forces on a dislocation interacting with a cubical particle. solid line: Analytical 

solution for a cubical particle [Melander & Persson 78] 

with b being the Burgers vector and t being the dislocation line vector. The climb forces draw 

attention because it affects the local climb of a dislocation around a particle. Climb forces on an 

edge dislocation at the position of x = 1.1a on three different stand-off distances are plotted in Fig. 

4.7. The climb forces are negligible up to y = 0.5a. Even at y = 0.877a, the magnitude of the climb 

force is only around half that of the image force along the glide direction. The situation is quite 

different in the spherical particle case. The magnitude of the climb force is 2 − 3 times higher than 

the force along the glide direction y = 0.866Rp. It can be said that the configuration chosen for the 

cubical particle is more resistant to dislocation climb. 

4.1.5 Discussion 

The interaction force of a dislocation line with a circular cylindrical, spherical and cubical particles 

with differing elastic modulus was computed using the superposition principle. The complementary 

problem was solved using the FEM-DDD coupling code. There have been significant research 

interests in case of a long cylindrical inhomogeneity. In the case of a long edge dislocation line 

close to a long cylindrical particle, the image force calculated numerically was compared with the 

analytical solutions. It showed that good accuracy could be obtained by the superposition method.


F cl /((µ m b 2 /(4π(1-ν)R p )) 

14 

12 

10 

8 

6 

4 

2 

0 

-2 

Sphere(Rigid) 

Sphere(Void) 

Cube 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

y/R p 

Figure 4.7: Climb forces on an edge dislocation at the position of x=1.1Rp(a) along y-axis 

The same scheme was applied to the case of a spherical particle interacting with a dislocation line, 

which is a more common shape for inhomogeneities observed in metals. The image force acting on 

a dislocation at x = 1.1Rp along the glide direction is found to be smaller than the cylindrical case 

by a factor of 0.89 for the rigid and 0.58 for the hole case. The interacting volume involved in the 

spherical case is thus much smaller than that in the cylindrical case. As for climb forces, it is found 

that considerable forces are acting along the climb direction. The climb force is acting on an edge 

dislocation in the direction of reducing the extra plane around rigid particle, and in the opposite 

direction around a void. As the stand-off distance of the glide plane is increased, the climb force 

increases. A dislocation line parallel to one face and one edge of a cubical particle showed that the 

image force is higher than the spherical particle by 20% and the climb force is negligible. It can be 

said that a cubical particle is more resistant to dislocation climb.

4.2 A simple case of dislocation-particle interaction 105 

4.2 A simple case of dislocation-particle interaction 

4.2.1 Motivation and review of literatures 

The hardening of materials by distributing small particles of another phase is a well-known phe- 

nomenon and has been used to develop high strength structural materials. The impediment of 

dislocation glide by second phase particles is the basic mechanism of increase in the flow stress. 

In the case of impenetrable particles, the back stress exerted by trapped closed loops causes the 

subsequent increment of stress, i.e. the work hardening. In addition to the glissile closed loops left 

around the particles, prismatic loops have also been observed experimentally to form near a particle 

by the cross slip of dislocations ([Humphreys & Martin 67]). The particles in the two phase 

materials can be classified by their size, shape, volume fraction, spatial distribution (regular or ran- 

dom) and the characteristic of the particle/matrix interface (coherent or incoherent). In addition 

to these morphological particle parameters, the stress fields around the particles, which arise by a 

difference in the lattice constant, the stress incompatibility due to a difference in the shear modulus, 

or the image stress caused by the change in the strain energy of the dislocation near a second phase 

particle are important factors needed to describe the mechanical properties of two-phase alloys. 

A number of experimental and analytical researches have been focused on finding the relevant param- 

eters and their effects in determining the flow stress and work hardening properties of these alloys. 

For example, well-designed experiments, in which the volume fraction, size and spacing of particles 

can be varied, have been made to measure the effect of such parameters on mechanical behavior of 

single crystals containing hard particles ([Ebeling & Ashby 66], [Humphreys & Martin 67]). 

Theoretical approaches to account for the effect of relevant parameters have been developed from 

the Orowan mechanism, which considers the elementary dislocation-particle interaction and relates 

the flow stress with the obstacle spacing, to the methods for dealing with the anisotropy of the 

material and the complicated statistics such as random distribution of particles. An overview of 

the analytical approaches can be found in the review article of Reppich ([Reppich 93]). To fully 

account for the realistic interaction of dislocations and many particles, computer simulations have 

also been developed. Foreman and Makin ([Foreman & Makin 66]) had investigated the effect 

of a random arrangement of strong and weak point obstacles on the flow stress. In recent simula- 

tions more complex and accurate states are treated, e.g. the distribution of finite particles including 

shape effect ([Zhu & Starke 99]), the particles with mismatch stress field by a difference in lattice 

constant ([Mohles & Nembach 01]). More insights of the effect of each parameter on the flow


stress can be gained through these simulations. Note that in a simulations mentioned above, it was 

generally assumed that the dislocation moves on the single glide plane so that no 3D events such as 

cross slip were allowed. 


In this work, the effect of the image stress on the flow stress and work hardening has been studied 

using the 3D dislocation dynamics simulation coupled to a finite element code. The simulation 

method is detailed in section 4.2.2. The glide of a dislocation line through a channel between two 

incoherent, impenetrable spherical particles is considered. The dynamic interaction of a dislocation 

with particles and the resulting image stresses are solved in the 3D space. Several dislocations are 

forced to move between two spherical particles one by one and the resolved shear stress needed to 

bypass the particles having trapped loops is monitored while the cross slip of dislocation is accounted 

for. By changing the shear modulus of the particle with other parameters such as particle radius 

and the inter-particle spacing fixed, the effect of image stress induced by second phase particles is 

evaluated and the relation between the flow stress and the difference in shear modulus is established. 

4.2.2 Calculation procedures 

The situation considered here consists of two impenetrable rigid spherical particles having radius 

Rp, inter-particle distance L and shear modulus Gp. A typical 3D simulation box is shown in 

figure 4.8(a). In the dislocation code, the spherical particles are modeled as a set of facets made of 

polyhedral surface elements. Each facet of the spherical particle has a certain strength to act as an 

obstacle to dislocation motion, i.e. a dislocation is authorized to cross a facet if the local effective 

resolved shear stress is above the particle strength. In this study, the strength of the particle has 

been chosen high enough so that it represents impenetrable hard obstacles. That way, it is possible 

to use a simplified version of the code for which no image stresses are computed. The interaction 

between dislocations and particles is then only related to an obstacle effect. This situation is later 

referred as the ∆G = 0 case. In order to include the effect of the image forces, one has to solve 

the complementary problem described in figure 2.16. To do so, a 3D box containing two spherical 

particles has been meshed in the code CAST∃M( 2 ). To represent correctly the high gradient of 

stress near the particles, the mesh has been refined near the periphery of the particles. A (111) 

section of the 3D mesh taken at the centers of particles is shown in figure 4.8(b). The displacement 

2 Finite element code developed by Commissariat à l’Energie Atomique, CEA-DRN/DMT/SEMT


(a) (b) 

Figure 4.8: (a)Simulation Box used in 3D discrete dislocation simulation. Two particles of radius Rp 

and inter-particle distance L are shown with dislocation line on (111)[1-10] slip system. (b)Typical 

mesh of the simulation box constructed by 4672 20-nodes 3D elements in the finite element code 

CAST∃M. Mesh is sectioned on the (111) plane containing the particle centres. 

of the bottom surface is set to be zero in the direction normal to the surface and two nodes located 

on this surface are fixed in adapted directions in order to remove the trivial rigid body solution. 

A dislocation line, which is initially a pure screw segment, is pinned at the two end points. The 

position of the pinned points is set to be 0.9 of L/2 from the border of particle so that the portion 

of dislocation line which lies between the particles bypasses the particles by bowing out. In section 

4.2.3, it will be shown that this fixed point can be used to obtain reliable results for the flow stress. 

Single slip loading conditions on the (111)[1-10] system are assumed. The resolved shear stress τ on 

the slip plane is increased step by step. After each load increment, the new positions of all segments 

are computed as a function of time until the shear strain γ caused by the dislocation motion has 

fallen below a pre-selected value. If the dislocation line has reached an equilibrium position, ∆γ is 

nearly equal to zero, τ is then increased and ∆γ is monitored again. After the dislocation line has 

completely bypassed the particles leaving two Orowan loops, a new dislocation line is introduced 

and the subsequent increment of τ, i.e. the work hardening is computed. 

4.2.3 Flow stress of impenetrable particles with a different shear modulus 

In order to validate the computation method used in this work, the flow stresses induced by im- 

penetrable particles with no image stresses have been calculated and compared with the results of


Bacon et al ([Bacon et al. 73]). As explained in section 4.2.2, the finite element procedure is no 

more used here. The particle are modeled by the facets obstacles. The radius of particles are set to 

be 0.131(2 9 b), 0.262(2 10 b), 0.524(2 11 b) µm with a fixed inter-particle distance of L = 2.59 µm. The 

situation considered here is similar to the models used by Bacon et al ([Bacon et al. 73]) except 

for the periodic boundary condition. Figure 4.9 shows the dislocation configurations at the flow 

stress for three radius of particles respectively. The dislocation line near the particles is in paral- 

lel position due to the self-interaction which pulls the branches on opposite sides of the particle. 

The line between the particles is quite symmetric. The results obtained for the flow stresses have 

been compared according to the ’effective line tension’ argument proposed by the authors. They 

argued that the effective line tension, which properly accounts for the interactions, can be taken as 

A(ln(1/(2Rp) + 1/L) −1 + B) where A is 1/(2π) and 1/(2π(1 − ν)) for edges and screws respectively. 

Figure 4.10 shows our results obtained for the flow stresses normalized by Gmb/L plotted against 

ln(1/(2Rp) + 1/L) −1 . The linear relation is perfectly reproduced and the slope of the fitting line 

is about 0.254, which is closed to the expected value of 1/(2π(1 − ν)). Considering these observa- 

tions, it can be said that the fixed dislocation source at the point 0.9 of L/2 from the periphery 

of the spherical particle correctly reproduces the periodic boundary condition used by Bacon et al 

([Bacon et al. 73]). 

To investigate the effect of a difference in shear modulus on the flow stress, we have made simulations 

of an alloy made of a copper matrix containing two spherical particles of radius Rp = 0.262 µm. A 

shear modulus ratio (∆G/Gm) was set to be 1, 3 and 5, where ∆G = Gp − Gm. The inter-particle 

distance is fixed to L = 2.59 µm respectively. Figure 4.11 shows the increment in the flow stress 

as a function of ∆G/Gm. As the shear modulus of the particles increases, the flow stress increases 

due to the fact that the repulsive image stresses on the dislocation line needs higher resolved shear 

stress to bypass the dislocation line through particles. The fitting curve shows that the flow stress 

changes as (∆G/Gm) 0.6 . The change in the flow stress is small even for the particles with a shear 

modulus of 6Gm, for which the shear stress only increases by about 6 percent comparing to the no 

image stress case. Actually, the minute effect of ∆G is expected from the short range of the image 

stresses. Indeed, calculations show that the image stress exerted on a dislocation line decreases as 

|x − x0| α , where x, x0 represents the position of a dislocation line and the centre of a spherical 

particle respectively. α is found to be around 6 ∼ 7. So even in the case where ∆G/Gm = 5, the 

image stress decreases below the flow stress of hard obstacle (∆G = 0) at a distance of 1.4 × Rp 

from the centre of a particle. This effects means that the repulsive interaction between a dislocation


Figure 4.9: Dislocation configuration at the flow stress. The radius of particles are 0.131, 0.262, 

0.524 µm from top to bottom.


Normalized flow stress 

2.45 

2.4 

2.35 

2.3 

2.25 

2.2 

2.15 

2.1 

6.8 7 7.2 7.4 7.6 7.8 8 

ln[1/(2Rp )+1/L] -1 

Figure 4.10: τys/(Gmb/L) vs.ln(1/(2Rp) + 1/L) −1 . The line represents the fitting line and the slope 

is 0.254. 

and a particle will reduce the inter-particle spacing by about 8 percent. The small reduction in the 

effective inter-particle spacing will result in the small increase in the flow stress. The effect of Rp 

on the flow stress has been calculated with the constant ∆G/Gm value of three. The results are 

shown in figure 4.12. The flow stress depends almost linearly on Rp. 

The effects of a difference in shear modulus on the flow stress can be summarized as τys ∝ 

Rp × (∆G/Gm) α , where α is lower than 1. Note that this result is different from the case of 

shearable particles. Indeed, the effects of shear modulus of coherent, penetrable particles can be 

found in the literature ([Nembach 83]). He has calculated the image force exerted by one spherical 

particle of modulus Gp on a straight, infinite dislocation line using the change in the strain energy 

density of the dislocation. The shear stress is found to be proportional to ∆G 1.5 and R 0.22 

p . Thus, 

we observed that ∆G has a weaker effect in the case of hard particles than in the case of shearable 

particles.


Increase in flow stress 

0.06 

0.05 

0.04 

0.03 

0.02 

0.01 

0 

0 1 2 3 4 5 

(Gp-Gm )/Gm Figure 4.11: Increase in flow stress by a difference in shear modulus. ∆τ/τ0 is plotted against 

∆G/Gm, where τ0 represents the flow stress of impenetrable obstacle with no image stress. Rp = 

0.262 µm. The fitting curve shows ∆τ ∝ (∆G/Gm) 0.6 . 

Normalized flow stress 

2.6 

2.55 

2.5 

2.45 

2.4 

2.35 

2.3 

2.25 

2.2 

400 600 800 1000 1200 1400 1600 1800 2000 2200 

Rp /b 

Figure 4.12: Normalized flow stress (τys/(Gmb/L)) vs. Normalized particle radius (Rp/b). 

∆G/Gm = 3.


4.2.4 Increment in hardening stress 

Although a difference in shear modulus has a little effect on the flow stress, its effect is much stronger 

on the hardening stress. In this section, the effects of ∆G on the hardening stress are discussed. 

The stress required to force a dislocation to glide between particles which have remaining Orowan 

loops, are plotted in figure 4.13 both for the case of Gp = 4Gm (filled symbols) and for no image 

stress (∆G = 0 : open symbols). The change in the shear modulus of the particles results in an 

increased work hardening rate and the effect of ∆G increases with the particle radius. The image 

stress fields are the sum of the interactions of the particle and each dislocation present around the 

particles. Thus the dislocation lines have to overcome the additional image stress field coming from 

the interaction of the residual loops and the particles. This additional stress is directly related to the 

number of Orowan loops stored around the particles. As a result, compared to the case of no image 

stress, a higher shear stress is required to bypass the particles and this effect is more pronounced as 

more dislocation lines are passing, which leads to a higher material hardening. Considering that the 

range and the magnitude of the image stress increases as the radius Rp, it can be understood that the 

hardening rate is increased as Rp. Fisher et al ([Fisher et al. 53]) have investigated the hardening 

of metal crystals induced by precipitate particles. They computed the back stress resulting from 

Orowan loops and calculated the effective critical stresses of the Frank-Read sources. The argument 

is that the hardening stress (τh) is related to the number of loops (N) by τh = γN, where γ is a 

function of the particle radius Rp and the inter-particle distance L. They obtained 

γ = 0.65cbGm 

 

1 − 

 

ν 

2(1 − ν) 

R 2 p 

(L + Rp) 3 

, where c is the parameter describing the closest distance between a source and a particle. We 

obtained the slopes of each graph in figure 4.13 by linear fitting and the dependence of these slopes 

on b(Rp) 2 /(L + Rp) 3 as shown in figure 4.14. It is found that the argument of back stress proposed 

by Fisher et al ([Fisher et al. 53]) still holds in the case of moving dislocation line through two 

particles. The parameter c is around 3.45 for the case of no image stress and 4.43 when Gp = 4Gm, 

which means, based on their argument, that the effective distance of a stress source is shorter or 

the back stress is higher if the image stress is included. 

It is observed experimentally that all the dislocations left around the particles by the gliding dis- 

locations are not rigorously confined to a single glide plane, but are rather of the prismatic form 

([Humphreys & Martin 67]). If cross-slip is easy, a dislocation may overcome an obstacle in its 

glide plane by slipping on another slip plane, with the formation of long jogs. The simulations pre- 

(4.6)


Shear stress increment (MPa) 

12 

10 

8 

6 

4 

2 

0 

0 1 2 3 4 5 

Number of Orowan loops 

Figure 4.13: Work hardening of alloy containing two particles of radius 0.131 µm(1), 0.262 µm(2), 

0.524 µm(3). τh − τys is plotted against number of Orowan loops around each particle, where τh, 

τ0 represents the hardening stress and the flow stress respectively. 

sented above to investigate the change of hardening stress have been done under the condition that 

the cross slip of dislocation is prohibited by artificially changing the cross slip parameters. When 

the normal conditions for cross slip are used, cross slip events have been observed. As an example 

for the particles of radius 0.262 µm, cross slip occurs if the number of the Orowan loops reaches 

four in case of no image stress and two in case of Gp = 4Gm. Considering that the back stress on 

the primary slip plane becomes higher as the accumulation of the Orowan loops proceeds, it is easy 

to cross slip to the secondary slip plane and cross slip again (double cross slip) into the primary 

plane to bypass the particle. Figure 4.15 shows the bypassing of a dislocation line by double cross 

slip. If the shear modulus of the particle is higher than that of the matrix, a high local stress is 

generated near the particle and the local event of cross slip is more probable due to the image force. 

This demonstrate the importance of including the image stress to investigate local events such as 

cross slip.


Slope 

2 

1.8 

1.6 

1.4 

1.2 

1 

0.8 

0.6 

0 0.02 0.04 0.06 0.08 0.1 0.12 

b(Rp ) 2 /(L+2Rp ) 3 

Figure 4.14: The slope of fitting line in figure 4.13 vs. bR 2 p/(L + Rp) 3 . ∗ : ∆G = 0, × : ∆G = 3Gm. 

Figure 4.15: Bypassing of particles by double cross slip of dislocation line. Dislocation initially 

glides on the slip system of (111)[1-10] and changes the system on (11-1)[1-10] and then comes back 

to initial slip system.


4.2.5 Discussion 

In this work, we studied the effect of a difference in shear modulus on the flow stress and the 

subsequent hardening stress using the 3D discrete dislocation dynamics code. The effect of ∆G on 

the flow stress can be summarized by τys ∝ Rp(∆G/Gm) α , where α is lower than 1 and Rp is the 

radius of particle. Because the range of the image stress is short, the maximum increment in the flow 

stress is only 6 percent in the case of ∆G/Gm = 5 comparing to no image stress case. Nevertheless 

the image stress increases as Orowan loops accumulate, resulting in a change of the work hardening 

rate. This effect is due to the fact that the image stress fields are the sum of interactions of a 

particle and each dislocation present around the particles. As slip accumulates, the dislocation 

line feels an additional image stress field coming from the interaction of the residual loops and the 

particles. The first order approximation on the work hardening of Fisher et al ([Fisher et al. 53]) 

is found to be valid even in the simple configuration of two particles and one dislocation line. The 

effect of the image stress is that the effective distance of a stress source becomes shorter or the back 

stress becomes larger. If dislocation cross slip is allowed in the code, it has been observed that a 

dislocation can avoid an obstacle in its glide plane by cross slip into another slip plane. The back 

stresses on the glide plane due to the Orowan loops trigger the cross slip event. If the image stress 

is included, the cross slip probability increases. The image stress around the particle is large enough 

to affect local events such as cross slip. 

However, the computation time and effort are too demanding to include the effect of image stress 

on simulations of alloy containing a large number of particles. The number of elements used here 

in the simple situation of two particles is already about 5000 20-nodes elements. An approximate 

way to include the effect of image stress is to introduce an effective radius which can represent 

the difference in shear modulus. That way, the facets obstacles alone can be used to reproduce 

the precipitate hardening, so that the finite element coupling is no more needed. For example, in 

the case of ∆G = 3Gm, the average radius of the first trapped loop is 0.272 µm comparing to 

0.264 µm for the case of ∆G = 0. Hence, the difference in shear modulus changes the effective 

particle radius by only a few percent. Nevertheless the stress field generated by a second phase 

particle is increasing as slip accumulates and the image stress field turns out to be crucial to predict 

work hardening magnitude. This problem can then be treated using an empirical solution of the 

image stress generated by the interaction between several dislocations and a particle.


4.3 Fatigue simulations of materials hardened by particles 

4.3.1 Motivation and review of literatures 

Fatigue in single-phased metals 

Strain is usually localized in single-phased metals submitted to cyclic deformation. The imposed 

plastic strain amplitude is accommodated by high local strains in strain localization zones called 

persistent slip bands (PSBs). 

The strain localization results in persistent slip markings (PSMs) at the specimen surface ([Man et al. 02]). 

The irreversible character of slip inside PSBs are known to generate permanent surface steps. After 

numerous experimental observations, it is generally agreed that fatigue crack initiates at the PSMs 

([Mughrabi 85], [Suresh 98]) along the individual PSBs. Fatigue life is thus largely dominated 

by the irreversibility of the slip in the PSBs and the associated surface step. 

Existing crack initiation models can be categorized in (i) crack initiation due to a surface step larger 

than a critical size, (ii) crack initiation due to local decohesion of crystal planes. Understanding of 

the intrinsic PSB microstructure is therefore crucial to establish such models. 

Transmission electron microscopy (TEM) are used to examine the dislocation microstructure in- 

volved in PSBs, and to understand the specific role of dislocations in cyclic deformation. Surface step 

displacements can be measured by atomic force microscopy (see for example, [Risbet et al. 03]). 

Fatigue in precipitation-hardened materials 

Multi-phase materials which contain precipitates often show good static strength compared to single- 

phase materials. Under cyclic loading conditions, however, precipitation-hardened materials do not 

always insure better fatigue properties. 

Typical cyclic properties of materials containing shearable and non-shearable particles are shown in 

Fig. 4.16 ([Gerold & Steiner 82]). The cyclic hardening behavior is shown as a function of the 

cumulative plastic shear strain for various particle sizes. Fig. 4.16(a) demonstrates that specimens 

containing shearable particles often suffer severe softening and even early fatigue failure. Large 

cyclic softening is observed after an initial strong hardening up to a maximum shear stress: the 

softening rate increases with the particle sizes, and peak-aged alloys (74 Å) show the largest cyclic 

softening. In the case of non-shearable particles (Fig. 4.16(b)), the rate of hardening and the stress 

drop decreases as the particle radius increases and the shear stresses are saturated. 

A few of the characteristic fatigue properties observed in experiments are outlined below for each

4.3 Fatigue simulations of materials hardened by particles 117 

(a) Underaged specimens (Shearable particles) (b) Overaged specimens (Non-shearable particles) 

Figure 4.16: Cyclic hardening and softening of aged Cu-2at%Co single crystals 

case ([Mughrabi 83]). 

• Shearable particles 

After an initial hardening stage, the cyclic strain becomes localized into persistent slip bands. 

A drastic cyclic softening related to the destruction of the precipitation hardening in the PSBs 

leads to the early initiation of shear-type fatigue cracks at the PSBs surface intersection. 

• Non-shearable particles 

The cyclic softening is strongly reduced and the cyclic deformation behavior is much more 

stable. Non-shearable particles produce more homogeneous straining. 

Numerical simulations of fatigue tests 

The dynamical features of the dislocations inside the PSBs during the cyclic deformation are not 

easily accessible by experimental observations, e.g. TEM. It is due to the fact that the stresses 

are removed during TEM observations, which makes the observed dislocation microstructure to be 

different from the microstructure under stresses. Besides, the free surface effects are not negligible. 

Another difficulty arises when relating the details of the formation of the surface steps directly 

with the dislocation microstructure inside the grains, since the two experiments are often performed 

independently, and sample preparation methods are usually destructive. Thus, a complete and 

comprehensive scheme for fatigue crack initiation is still missing. 

The development of the DDD method and the increase of the computer capabilities enable simula-


tions to provide crucial information concerning the formation of slip bands. Numerical simulations 

make it now possible to observe the details of the PSB formation, and to investigate the relation 

between surface steps and the corresponding dislocation microstructure. This knowledge would help 

to understand better crack initiation mechanisms and to build a more elaborate fatigue life model. 


The performance of the new parallel DDD program (Chapter 3) makes it feasible to simulate dislo- 

cations interacting with thousands of precipitates. Fatigue tests of precipitation-hardened material 

are simulated in 3D. The fatigue simulations are similar to the work of [Déprés 04] developed in the 

case of 316L stainless steel. The effects of shearable and non-shearable particles on the formation of 

PSBs are studied. Dislocation mechanisms for PSBs formation are detailed and some of numerical 

results are compared with experimental observations. 

4.3.2 Description of the simulation method 

Simulation volume geometries 

Cylindrical grain geometry has been adopted for the shape of the simulated volume. The volume 

is assumed to be a surface grain of a fatigue tested specimen, i.e. the volume consists of one free 

surface and grain boundaries. The cylindrical volume is represented by 20 facets as shown in Fig. 

4.18(a). The free surface is represented by assigning zero strength to the top facets (see Sec. 2.4.2), 

thus dislocations can escape through that surface. All the other facets act as strong obstacles to 

the dislocation motion as if for highly disordered grain boundaries. 

A virtual volume is prepared on top of the free surface to keep track of the dislocations exiting 

the simulated crystal volume (see Fig. 3.21(b)). This virtual volume and virtual dislocations are 

introduced to compute deformations of the free surface as will be detailed in Sec. 4.3.5. The virtual 

dislocations are set to have no effects on dislocations inside the simulation volume and no return 

into the crystal simulation volume is authorized. 

The normal vector of the top facet is taken as [110]. The diameter of the cylinder is 10 µm and the 

height is 5 µm. The image forces due to the free surface are not taken into account in this work. 

Materials parameters 

The material’s parameters used in this simulations are those of nickel as listed in Table 4.1.


Poisson’s ratio Shear modulus Burgers vector 

magnitude 

Activation vol- 

ume 

Viscous drag co- 

efficient 

Threshold stress 

ν G(GPa) b(Å) (V/b 3 ) B(10 −5 P as) τIII(MPa) 

0.276 94.7 2.5 2117 1.06 51.2 

3D particle arrangement 

Table 4.1: Mechanical and microscopic parameters of nickel 

A cylindrical volume containing random distributed particles is constructed as follows. For sim- 

plicity, the particles are assumed to have the same radius rp and the associated volume fraction is 

vf . 

Step 1 Preparing closed packed spheres (Fig. 4.17(a)) 

Close packed spheres of an arbitrary radius r are constructed in a larger sphere (radius R). 

The center of each sphere of radius r is assumed to be the nucleation site of each particle, 

and each particle gets their material from the volume of sphere r during the Ostwald ripening 

process. 

Step 2 Adjusting the volume fraction 

The radii of all spheres are reduced by a common factor while their locations remain un- 

changed. The factor is given so that the volume fraction of the shrunken spheres is equal to 

vf . 

Step 3 Adjusting the radius of particles (Fig. 4.17(b)) 

The radius of the shrunken spheres in Step 2 is scaled to rp, and the coordinates of the centers 

are scaled as well. 

Step 4 Cutting the cylindrical volume (Fig. 4.17(c)) 

A cylindrical simulation volume is placed at the center of the outer sphere, and spherical 

particles situated inside the cylinder are selected. 

In this work, ’Step 1’ is achieved by successive trials of putting spheres of radius ’r’ in the sphere 

of radius ’R’. Subsequent trials are accepted only if the new sphere is not intersecting with spheres 

already in the volume. Although this method does not generate closely packed arrangement of 

particles, the resulting particle arrangement shows a purely random arrangement. Bi-modal size


(a) Randomly placed spheres 

Volume of interest, ri 

Particle, rp 

Simulation volume 

(b) Adjusting the radius of particles (c) Selecting spheres inside the 

cylindrical volume 

Figure 4.17: Construction of a randomly distributed configuration of particles in the cylindrical 

simulation volume (Examples of bimodal size distribution case) 

distributions (see Sec. 4.3.6) are constructed using the same procedure except that two radii of 

spheres are placed in ’Step 1’ as shown in Fig. 4.17(a). 

Radius and volume fraction of particles 

Two cases of particle radius, rp = 160 nm and 400 nm, are considered. The volume fraction vf is 

fixed to 14% for all the cases. The number of particles generated in the cylindrical volume by the 

procedure above is 2510 for rp = 160 nm and 161 for rp = 400 nm case . 

Each individual particle is constructed using two pyramids attached at the bottom 3 in an effort 

to reduce the number of nodes and facets constituting the particles (reducing the computation 

load). The cylindrical simulation volumes are shown in Fig. 4.18, which contain (a) 161 particles 

of rp = 400 nm and (b) 2510 particles of rp = 160 nm respectively. 

The strength of a particle 

Particles are assumed to act as geometrical barriers with a pre-defined strength to the dislocation 

motion. The image forces due to the elastic modulus difference are not computed and no stress 

fields around particles are considered for the simplification of computing. 

The strength of a particle decreases as the particle is sheared by dislocations. This is due to both 

the decrease of the effective particle size on the glide plane and the loss of coherency for ordered 

precipitates ([Stoltz & Pineau 78]). The evolution of particle’s strength is illustrated in Fig. 

4.19(a) for the case where a particle is sheared by successive passages of dislocations in the same 

3 One particle involves thus six facets and five nodes.


(a) Particles of rp = 400 nm and vf = 14% (b) Particles of rp = 160 nm and vf = 

Figure 4.18: Cylindrical simulation volume containing randomly distributed particles of (a) rp = 

400 nm and vf = 14% (b) rp = 160 nm and vf = 14% 

Strength of particle 

(a) Geometrical effect 

Number of 

dislocation passage 

14% 


(b) Chemical effect 

Figure 4.19: Evolution of particle’s strength due to shear-off by dislocation passages 

Loss of strength 

Number of 


glide plane. Fig. 4.19(b) illustrates that a particle may loose its strength completely before being 

totally sheared off due to the surface energy increase and the loss of coherency induced by the 

dislocations random chop-up. 

In this work, the particle’s strength (or facet’s strength) is decreased linearly with each event of 

dislocation passage through a given particle’s facets. As a first order approximation, the strength 

of the facet τfacet (see Sec.2.4.2) is decreased linearly from the initial strength by τfacet/(2rp/b) 

whenever a dislocation penetrates the facet as shown in Fig. 4.20 4 . The facet strength is set to 

zero after a certain number of passages of dislocations to represent the chemical effect shown in Fig. 

4.19(b). 

4 The magnitude of the strength drop is from the assumption that a particle looses its strength after 2rp/b of 

dislocations passages, which corresponds to complete shear-off of the particle as sketched in Fig. 4.19(a)



Initial strength, 

τ 

∗ facet 

τ facet 

τfacet Slope= 

2ri/b 

Number of 


Figure 4.20: Evolution of facet’s strength with dislocation passages 

Particles of radius 160 nm are assumed to be easily shearable, and the initial strength of facets is 

defined as 292 MP a and the final strength τ ∗ facet 

being 162 MP a. Particle of radius 400 nm are 

assumed to be non-shearable or difficult to be sheared by using the initial strength being 7310 MP a. 

The initial configuration 

The initial dislocation microstructure of all the simulations is composed of four Frank-Read sources, 

in the form of pinned dislocation segments. All the Frank-Read sources are of the edge-type with 

the Burgers vector a 

2 [¯1¯10] on the slip plane (¯11¯1) (system 7 in Table 2.1). It should be noted that 

there is no dislocation nucleation around the particles and also that no dislocations are punched in 

from the free surface. 

The loading conditions 

Fatigue simulations are performed under a plastic strain control with a fully symmetrical push- 

pull loading ratio (ɛmax p /ɛmin p = −1), and an applied plastic strain amplitude △ɛp = 1 × 10−3 . In 

DDD simulations, only stresses can be applied. Imposed plastic strain conditions are achieved by 

monitoring the total slip accumulated in all the active slip systems. The applied stresses then are 

increased or decreased by comparing the resulting plastic strain to the pre-selected strain level. 

In the fatigue simulations, the plastic strain rate is monitored at each time step. The applied stress 

is stepwise increased by 1 MP a if the plastic strain rate is lower than the pre-selected minimum 

plastic strain rate, 10 −7 ((1) in Fig. 4.21). The load is kept constant in the case that the plastic


dλ 

λ k 

εp VM 

εp VM 

εp max 

εp min 

1 2 3 4 

Figure 4.21: Quasi-static loading condition: stepwise increment and decrement of the applied 

stresses 

strain rate is between the minimum and the maximum strain rate ((2) in Fig. 4.21), until the 

dislocation microstructure is in equilibrium with the external loading. This condition is achieved 

by keeping the load constant while performing discrete time steps until the resulting plastic strain 

rate becomes lower than the pre-selected minimum strain rate ((3) in Fig. 4.21). The applied stress 

is decreased by 1 MP a if the plastic strain rate is higher than the pre-selected maximum plastic 

strain rate, 10 −4 ((4) in Fig. 4.21). 

4.3.3 Evolution of the dislocation microstructure during the fatigue tests 

Pas k 

Pas k 

Pas k 

General features of the formation of the dislocation microstructure 

The initial Frank-Read sources begin to expand and generate dislocation loops as the applied stresses 

are increased during the first quarter cycle. Parts of the loops are leaving the simulation volume 

through the free surface, which prints steps on the free surface as will be detailed in Sec. 4.3.5. 

The other parts are piled up along the strong grain boundaries. The screw part of dislocation lines 

tends to cross slip owing to the back stresses from the stored dislocations. The cross slip mechanism 

spreads slip lines over the whole simulation volume. Particles affect both the dislocation mobility 

and the cross slip probabilities which result in quite different microstructures as compared with the 

single-phased material case.


2 

3 4 

1 

Figure 4.22: Evolution of the dislocation microstructure by the cyclic loading (Case of rp = 400 nm). 

As the sign of the applied stresses is reversed, the motion of the dislocations is reversed likewise. The 

initial dislocation microstructure, however, can not be completely restored due to the irreversible 

character of slip. Slip irreversibility is caused by the cross slip, the line reconnection (colinear 

junction) and the elimination of dislocation lines by the free surface. It increases with the number 

of fatigue cycles. 

These general features are illustrated in Fig. 4.22 using the figures taken from the rp = 400 nm 

case. The evolution of the dislocation structures is shown with the stress-strain curve of the first 

fatigue cycle 5 . The initial expansion of the Frank-Read sources (1) is followed by cross-slip which 

spreads slip through the entire simulation volume (2), and by subsequent changing of the sign of 

the applied plastic strain, specific dislocation microstructure forms (3)-(4), and even after one cycle 

the microstructure (5) is quite different from the initial one (1). 

5 Compressive plastic strain is applied first. 

5


Dislocation density evolution 

The evolution of the total dislocation density, ρtot is monitored for the case of the volume contain- 

ing particles with rp = 160 nm , particles with rp = 400 nm and no particles during the cyclic 

deformation. ρtot is plotted as a function of the accumulated cyclic Von Mises strain in Fig. 4.23. 

The dislocation densities quickly increase and fluctuate according to the cyclic deformation in all 

the cases owing to the periodically vanishing cyclic load. Here the saturation of the dislocation 

densities is not observed because of the relatively small number of fatigue cycles have been per- 

formed (close to 3 cycles 6 for rp = 160 nm and 5 cycles for rp = 400 nm). It is expected, however, 

that the dislocation densities would gradually saturate as the fatigue cycles proceed as observed by 

[Déprés et al. 04]. 

The simulation results show that 

rp=160 nm 

1. ρtot rp=400 nm 

> ρtot No particle 

> ρtot 2. The rates of the dislocation accumulation after each fatigue cycle are of the same order as the 

total densities. 

After three fatigue cycles (around ɛ V M 0.006), ρ rp=160 nm 

tot 

and six times than ρ 

No particle 

tot 

. 

is three times larger than ρ rp=400 nm 

tot 

The simulation volume containing particles with rp = 160 nm has a high resistance related to the 

limited glide area per slip plane. This effect is due to the large number of particles that are supposed 

to be shearable. To accommodate the applied plastic strain with limited dislocation glide, it is nec- 

essary to have a higher density of dislocations. After reversing the stresses, the dislocations still 

have difficulties to find an easy glide path and to annihilate each other, thus most of the dislocations 

are left inside the volume. For these reasons, it can be deduced that shearable particles give rise to 

a high slip irreversibility. 

In the case of particles with rp = 400 nm, dislocations have a higher chance to find an easy glide 

path because there are fewer particles in the volume. Moreover the particles are more effective 

in spreading dislocations through the simulation volume by cross slip because they are not easily 

shearable and involved with Orowan loops. The applied plastic strain thus can be accommodated 

with a lower dislocation density, and the rate of the dislocation accumulation reduces compared 

to the shearable particle case, since it is easier to move reversely by annihilating Orowan loops 

6 The simulation has to be stopped just before ɛp reaches zero near 3 fatigue cycles because of the large number of 

segments involved


ρ[m -2 ] 

1.8e+13 

1.6e+13 

1.4e+13 

1.2e+13 

1e+13 

8e+12 

6e+12 

4e+12 

2e+12 

0 

0.0e0 2.0e-3 4.0e-3 6.0e-3 8.0e-3 1.0e-2 1.2e-2 

ε VM * 

r p =160nm 

r p =400nm 

No particle 

Figure 4.23: Evolution of the total dislocation density of the volume containing rp = 160 nm, 

rp = 400 nm and no particles 

left around the particles from the forwarding glide. Thus the irreversibility of slip is significantly 

reduced in the case of non-shearable particles. 

Strain localization kinematics 

In the preceding section, it is shown that the shearable particles favor high ρtot. The next question 

to address is whether the simulation can reproduce the localization of the plastic deformation or 

strain. Fig. 4.24 shows the dislocation microstructure formed after 3 cycles in the rp = 160 nm 

case and after 5 cycles in the rp = 400 nm case, along [110] direction. The figures illustrate clearly 

that the dislocation structures are highly heterogenous and intense slip bands are formed on the 

primary slip plane due to the cyclic loading. This result is consistent with experimental observations 

([Calabrese & Laird 74]). Plastic strain localization is believed to cause fatigue damage, since 

the local plastic strain has to be high enough to accommodate all the applied plastic strain. This 

process can eventually lead to fatigue crack nucleation. 

To demonstrate the statistics of the PSBs formation quantitatively, the spatial distribution of the 

dislocation densities is computed as follows at each time step k. The cylindrical simulation volume


(a) Particles with rp = 160 nm (b) Particles with rp = 400 nm 

Figure 4.24: Localization of slip by forming intense slip bands 

is sliced into finite layers along the slip planes normal [¯11¯1]. Dislocation densities are then computed 

in each layer . The heterogeneity of the dislocation density can be shown by plotting the calculated 

dislocation densities of each layer along the reference axis [¯11¯1]. 

Fig. 4.25 shows the evolution of such spatial dislocation density distributions. Three axis of the 

coordinate system correspond to the dislocation density, the position of each layer and the cycle 

number. 

From the figure, the general features of the formation of dislocation microstructure can be confirmed, 

i.e. the first increase of the applied plastic strain spreads dislocations over the simulation volume 

in all the cases. In the next cycles, the heterogeneous dislocation structure forms and certain zones 

accumulate a high dislocation density. 

The detailed observation of Fig. 4.25 reveals that the particles affect the slip localization in several 

ways : 

1. The width wdisl of the dislocation distributions over the simulation volume is the largest in 

the case rp = 400 nm (non-shearable particles) and the smallest in the case rp = 160 nm 

rp=400 nm 

(shearable particles), i.e. wd No particle 

> wd > w 

rp=160 nm 

d 

rp=160 nm 

2. The maximum local dislocation densities (ρmax) are in the following order, ρmax ρ rp=400 nm 

max 

> ρ 

No particle 

max 

. 

rp=400 nm 

3. The intense slip band width is smaller in the case of shearable particles, i.e. db d rp=160 nm 

b . 

. 

> 

>


(a) Particles with rp = 160 nm (b) Particles with rp = 400 nm 

(c) No particle 

Figure 4.25: Evolution of slip localization 

4. There is no clear dislocation localization up to five cycles in the case containing no particles. In 

the other cases, dislocation localization has occurred, and at leat one high dislocation density 

peak is present through the whole simulated fatigue cycles. 

Item 1 demonstrates that non-shearable particles promote the cross-slip due to the back stresses of 

Orowan loops around the particles, and the dislocations easily sweep a large area of the simulation 

volume as a result. Item 2 and 3 are consistent with the experimental observations, according 

to which persistent slip bands (PSBs) are much thinner if particles are shearable, and the local 

plastic strain becomes higher as the PSBs gets narrower ([Lee & Laird 83]). Fig. 4.26 shows 

some of the experimental data of the PSB thicknesses and the related local plastic shear strain 

([Mughrabi 83]). Item 4 can be related to the early initiation of fatigue crack in the case of shear-


Figure 4.26: Relation between the local plastic shear strain amplitude and the thickness of PSBs 

able particles (see Fig. 4.16(a)). It is also interesting to note that the number of intense slip bands 

is only one in the case of shearable particles but second intense slip band begins to form in the 

case of non-shearable particles. This is related to the experimental observation that the number of 

cycles till crack initiation is inverse to the average slip band distance ([Graf & Hornbogen 78]), 

although higher number of fatigue cycles are necessary to confirm it. 

The speed of the slip localization can be quantified by the standard deviation of the spatial dislo- 

cation distribution curves at each time step, because the standard deviation becomes larger as the 

dislocation structure gets more heterogeneous. The standard deviation at time t is computed as 

follows. 

σρ(t) = 

 

1 

Dg 

Dg 

0 

ρ(t, x (s) ) − ¯ρ(t) 2 dx (s) (4.7) 

¯ρ(t) is the average dislocation density, and Dg is the size of the simulation volume. 

Fig. 4.27 shows the evolution of σρ(t) for each case, and it shows σρ(t) rp=160 nm > σρ(t) rp=400 nm > 

σρ(t) No particle . It can be seen that both the intensity and the speed of the slip localization is the 

highest in the case with shearable particles. 

Details of the intense slip band 

The intense slip bands of the rp = 160 nm case (shearable particles) are shown in Fig. 4.28. The 

dislocation microstructure is taken at a cycle number 3 with ɛp ∼ 0. Fig. 4.28(a) shows a 3D image 

of the dislocation structure viewed from the orientations normal to [1¯11] (slip plane normal), [110] 

(Burgers vector) and [¯112].


Standard deviation 

3e+13 

2.5e+13 

2e+13 

1.5e+13 

1e+13 

5e+12 

0 

0.0e0 2.0e-3 4.0e-3 6.0e-3 8.0e-3 1.0e-2 1.2e-2 

ε VM * 

r p =160nm 

r p =400nm 

No particle 

Figure 4.27: Evolution of the standard deviation σρ(t) (Eq. 4.7) in different simulations 

In the plane perpendicular to the primary slip plane (normal to [110]), intense slip bands can be 

seen clearly in the form of thin and compact dislocation walls. In the plane parallel to the primary 

slip plane (normal to [1¯11]), the dislocation structure is quite heterogeneous, and shows ladder-like 

structures along the Burgers vector direction. Dislocation debris and small loops are visible between 

the particles. Fig. 4.28(b) shows the isolated intense slip band, and considerable amounts of residual 

dislocations are clearly visible. It should be noted that the residual dislocation tangles are different 

in size compared to those in the single-phased material case, in which large tangles are observed 

together with ladder-like slip bands ([Obrtlik et al. 94], [Déprés et al. 04]). 

Small tangles between particles in the case of shearable particles are also observed in experimental 

data. Fig. 4.29 shows a dislocation microstructure of fatigue tested Inconel 718 observed in TEM. 

Although the material characteristics are quite different 7 , this micrograph clearly shows small high 

density tangles of primary residual dislocations. 

Fig. 4.30 shows the intense slip bands of non-shearable particle case (rp = 400 nm) at cycle number 

5 with ɛp = 0. The wall thickness of the PSB is much larger than the previous case, and the 

dislocation densities are variable in the band seen along [110] direction. In the plane normal to 

[1¯11], several Orowan loops and dense dislocation tangles are formed around the particles. In the 

7 Particle radius=20 − 40 nm, Grain size=20 − 40 µm, Volume fraction> 15%


(a) 3D image of intense slip bands (b) Isolation of intense slip band 

Figure 4.28: Details of intense slip band of the shearable particle case(rp = 160 nm). Three layers 

of thickness 300 nm are assembled for 3D image in (a) 

Figure 4.29: TEM micrograph of fatigue tested Inconel 718 up to 10,000 cycles



Figure 4.30: Details of intense slip band of the non-shearable particle case(rp = 400 nm). Three 

layers of thickness 300 nm are assembled for 3D image in (a) 

space between the particles, however, long dislocation lines are clearly visible and the dislocation 

distribution is rather homogeneous as illustrated in Fig. 4.30(b). It is also observed that some of 

the complex dislocation structures are formed separated by the same distance as that of between 

two close particles. 

Intense slip band formation mechanism 

In the case of shearable particles (rp = 160 nm), the dislocation density increases rapidly since 

dislocations possess a high degree of irreversibility (see Fig. 4.23). The characteristic of this con- 

figuration is that there is a limited number of easy glide paths for dislocations. Thus, slip bands 

would form along one of the easy glide paths whose thickness is usually limited (order of rp). Upon 

load reversal, double cross-slipped dislocations can glide in the opposite direction along path close 

to the initial glide path because (i) cross-slipped dislocations also have a limited glide distance and 

(ii) the particles in the initial path loose part of their initial strength. This would form closely 

spaced edge dipoles, so called vein structures. As the cycling proceeds, the subsequent cross-slipped


log(frequency) 

10 0 

10 −1 

10 −2 

10 

160 180 200 220 240 260 280 300 

−3 

τ [MPa] 

facet 

Figure 4.31: Repartition by the particle strength 

screw dislocations due to the cyclic loading react with the edge dipoles and produce prismatic 

loops aligned in the Burgers direction or helicoidal structures as observed in single-phased materi- 

als ([Li & Laird 94], [Déprés et al. 04]) but with a much smaller size. The prismatic loops can 

move along their glide cylinder, and form ladder-like structures. The repeated motion of interfacial 

dislocations with the cycles will eventually make particles at the PSBs edges to loose their strength, 

and persistent bands will be formed at this place. This process is observed numerically in the sim- 

ulations. 

Fig. 4.31 shows the statistical distribution of the facet’s residual strength after 3 cycles. The facet 

strength are distributed as follows: most of the facets are not sheared and keep their initial strength 

(right peak) and a small portion of the facets are completely sheared (left peak). The spatial distri- 

bution of the facet’s strength is shown in Fig. 4.32(a) by superimposing colors corresponding to the 

magnitude of strength for each facet. A clear channel of sheared particles is visible and its position 

corresponds exactly to that of the intense slip band. The dislocation structure is overlapped in 

Fig. 4.32(b). It should be noted that the particles near the intense slip band also loose strength, 

which possibly demonstrates that there exist interfacial dislocations at the periphery of the slip 

band, which move rather freely according to the cyclic load changes. Clear channel is also observed 

experimentally in which no dislocations and no particles are visible as shown in Fig. 4.33. 

In the case of non-shearable particles, the accommodation of the applied plastic strain is much easier 

because dislocations can move over a relatively long distance on the glide plane. During the first 

few cycles, the particles are bypassed by the Orowan mechanism. Glissile loops are accumulated 

around the particles. When the critical stress is reached, the screw portions of the loops change


(110) 

(a) 

τfacet[MPa] 

300 

280 

260 

240 

220 

200 

180 

160 

Figure 4.32: Spatial distribution of particle strength 

Figure 4.33: Clear channel containing no particles and no dislocations (Inconel 718) 

(b) 

τfacet[MPa] 

300 

280 

260 

240 

220 

200 

180 

160


σ VM [MPa] 

300 

200 

100 

0 

-100 

-200 

-300 

-6.0e-4 -4.0e-4 -2.0e-4 0.0e0 2.0e-4 4.0e-4 6.0e-4 

ε VM 

Figure 4.34: Typical stress-strain curve coming from the simulations (No particles case) 

their glide plane by cross slip, which contribute both to propagate slips in the simulation volume 

by generating dislocations in the secondary plane, and also to the formation of 3D loops around the 

particles. Interactions between these loops and the dislocations in the secondary plane eventually 

form dense tangles around the particles. As the cyclic deformation proceeds, the tangles around 

the particles act as pinning points of dislocations moving between the particles. It favors thus the 

formation of dislocation dipoles which are linked by two near particles. The cutting of these dipoles 

by freely gliding dislocations, then generate stable dislocation structures. The subsequent formation 

of dipoles between the particles and the dislocation interactions make dense dislocation structures 

to form between the particles. This mechanism explains the dense dislocation tangles observed 

around the particles and complex dislocation structures between pairs of closely spaced particles. 

4.3.4 Mechanical behavior 

Cyclic stress-strain relation 

A typical cyclic stress-strain curve is shown in Fig. 4.34 for the ’No particles’ case. The first 

quarter-cycle corresponds to the activation of the initial Frank-Read sources, which is the hardest 

part of the cycles. 

The cyclic response curves are shown in Fig. 4.35 for the case of rp = 160 nm (shearable), rp =


Stress[MPa] 

500 

450 

400 

350 

300 

250 

r p =160nm, v f =14% 

r p =480nm, v f =14% 

r p =480nm, v f =8% 

No particles 

200 

0.0e0 2.0e-3 4.0e-3 6.0e-3 8.0e-3 1.0e-2 1.2e-2 1.4e-2 

Cumulative plastic strain 

Figure 4.35: Cyclic response for rp = 160 nm (vf = 14%) and rp = 400 nm (vf = 8% and 14%) 

compared with the single-phased material case 

400 nm (non-shearable) and the single-phased material case. The curves obtained with two different 

volume fractions (vf = 8, vf = 14%) are plotted for non-shearable particles. 

The initial shear stress amplitude is the highest in the case of shearable particles as expected and is 

the lowest in the single-phased material case. The simulation volumes containing the non-shearable 

particles show intermediate initial stress values, and the stress amplitude increases with the particle 

volume fraction. 

The short hardening stage is followed by a cyclic softening response in all the cases. The degree of 

softening is maximum for the shearable particle case, and increases as the volume fraction in the 

non-shearable particle case. 

4.3.5 Surface slip markings 

Surface displacement computation method 

Dislocations that leave the simulation volume print steps on the free surface. The computation of 

the surface steps can give valuable information concerning the fatigue life because it is believed that 

the fatigue cracks are initiated from these surface steps. In this section, the method to compute 

surface steps is presented.


b 

P r i m a i r e 

Op 

D2 

Od 

D1 

D e v i e 

(a) Associated problem 

b 

d 

O 

b 

P r i m a i r e 

+... 

Op 

+ 

+ 

Od 

D e v i e 

∆ 

(b) Systematic method 

Figure 4.36: Computation method of displacements associated with the general case of non-planar 

dislocation loops 

Displacements of closed loops can be computed by decomposing the loop into as many triangular 

dislocation loops as needed using the equations in Sec. 2.2.2. In the case that a part of a dislocation 

loop has changed its slip plane by cross slip, an additional operation is necessary [Déprés et al. 03]. 

Fig. 4.36(a) shows a non-planar dislocation loop. Points Op and Od represent the common point to 

construct triangular dislocations in the primary and the cross-slip plane respectively. If triangular 

loops are constructed for each dislocation segment in both planes, the non-planar dislocation loop 

would miss two triangular loops (OpD2D1 and OdD1D2) as shown in Fig. 4.36(a). The displacement 

solution computed from the arbitrary dislocation segments (e.g. OpD1) would be wrong, since they 

are never canceled. To remove this artifact, once a segment is found to have a neighbor in a different 

plane, a supplementary triangular loop is constructed by three points: a common point (e.g. Op), 

the extreme point of the segment (e.g. D1) and the projection point of the common point along the 

glide axis (∆). Fig. 4.36(b) shows the procedure, which cancels the effect of the arbitrary segments 

generated in constructing triangular dislocation loops. 

In fatigue simulations, dislocations can leave the simulation volume. Thus some of the dislocation 

loops are cut and left open by the free surface, and the Barnett’s equations (Sec. 2.2.2) can no 

longer be used without a special treatment. The displacements of open loops can be solved by 

adding virtual dislocations outside the simulation volume ([Weygand et al. 02]). The simulation 

volumes are constructed by two distinct parts, one part is the crystal and the other is a virtual 

medium, containing the virtual segments as briefly introduced in Sec. 4.3.2. Dislocations are 

allowed to leave the crystal volume to the virtual medium in order to keep the dislocation loops 

closed. The dislocations however are not allowed to return back into the crystal volume from the 

b 

+ 

d 

+ 

O 

+...


Figure 4.37: Examples of surface steps generated by the activation of a single Frank-Read source 

in the simulation volume 

virtual medium so that the motion of the dislocations in the crystal is not arbitrarily modified by 

the virtual dislocations. The activation of the Frank-Read source and the subsequent deformation 

of the free surface are shown in Fig. 4.37, which adopts the virtual medium method. 

Surface steps and associated dislocation structures 

Surface steps are computed and shown in Fig. 4.38(a) for the case of shearable particles after 3 

fatigue cycles and in Fig. 4.38(b) for the case of non-shearable particles after 5 fatigue cycles. The 

surface steps represent exactly the same characteristics of the associated dislocation structure. The 

surface markings are intensively confined in a narrow region in the case of shearable particles as the 

slip bands involved are narrow and contain a high density of dislocations (see Fig. 4.24(a)). The 

surface markings are dispersed over the free surface in case of non-shearable particles as the wall 

thickness of slip bands are relatively wider and there are more than one band inside the crystal as 

shown in Fig. 4.24(b). 

The differences in the surface markings for each cases can be seen clearly from one dimensional 

profiles along a probing line. This displaying method is similar as for experimental results obtained 

using atomic force microscopy (AFM). Fig. 4.39 shows such surface profiles along the direction 

normal to the primary plane, i.e. [1¯11]. As indicated in Sec. 4.3.3, the simulation of the shearable 

particle case has finished just before ɛp reaches zero near 3 fatigue cycles, thus the surface profile 

associated with a small plastic strain. The surface marking, however, is significantly wider in the 

case of non-shearable particles than in the case of shearable particles as indicated in Fig. 4.39. 

Detailed surface morphologies are computed on the surface at the exact location of the intense slip


(a) Case of shearable particles (b) Case of non-shearable particles 

Figure 4.38: Surface steps of (a) the simulation volume containing shearable particles after 3 cycles 

and (b) the simulation volume containing non-shearable particles after 5 cycles 

Suface step (b) 

10 

5 

0 

−5 

Shearable particle 

Non−shearable 

particle 

−10 

0 1 2 3 4 5 6 7 8 9 10 11 12 

Probe distant (µm) 

Figure 4.39: One dimensional profiles of the surface steps along [1¯11] direction for the case of 

shearable particles (dashed curve) and non-shearable particles (solid curve)


(a) Tongue-like surface slip markings in the case of 

shearable particles 

(b) Ribbon-like surface slip markings in the case 

of non-shearable particles 

Figure 4.40: Evolution of detailed surface morphologies computed on the surface of 500 nm width 

of (a) the shearable particle case after 1 

2 

and 4 1 

2 cycles 

and 2 1 

2 

cycles (b) the non-shearable particle case after 1 

2 

bands formed in the volume at ɛp = 0. Fig. 4.40 shows the evolution of the detail of the surface 

morphologies from (a) 1 

2 

4 1 

2 

cycle to 2 1 

2 

cycles in the case of sherable particles and (b) 1 

2 

cycle to 

cycles in the case of non-shearable particles. A close examination of these images shows that 

the tongue-like slip markings are associated with the intense slip bands in the simulation volumes 

containing shearable particles (see Fig. 4.28(b)) and the ribbon-like slip markings are associated 

with the intense slip bands of the non-shearable particle case (see Fig.b 4.30(b)). In the shearable 

particle case, prismatic loops aligned in the Burgers direction are responsible for the tongue-like slip 

markings. The ribbon-like slip markings are related to dislocation structures gliding between the 

particles. The length of the ribbon-like marking is closely related to inter-particle distance. 

4.3.6 Fatigue properties of materials containing particles with a bimodal size 

distribution 

Alloys which contain a bimodal size particle distribution are particularly interesting because of 

the optimized combination of fatigue properties, i.e. good strength (merit of underaged alloy as 

shown in Fig. 4.16(a) ) and cyclic stability (merit of overaged alloy as shown in Fig. 4.16(a)). 

Waspaloy ([Clavel & Pineau 82]) is one of the examples which have particles with a bimodal size 

distribution. 

Three bimodal cases are considered with the same volume fraction of particles vf = 14% but with 

different ratio between the number of large (rp = 400 nm) and small (rp = 160 nm) particles. The 

number of particles of each size is listed below for the three considered cases, and the simulation 

volume is shown in Fig. 4.41 for the ’Bimodal2’ case.


(110) 

Unshearable 

Shearable 

partiicles (r=0.4µ m) partiicles (r=0.16µ m) 

Figure 4.41: The simulation volume which contains both rp = 160 nm and rp = 400 nm particles 

• Bimodal1 : rp = 160 nm, 2080 particles + rp = 400 nm, 31 particles 



The same volume geometries and material properties are adopted for the three arrangements. The 

simulation box is taken as a cylindrical volume, and the particles of each size have the same initial 

and final strengths as before (see Sec. 4.3.2). The same loading condition as in the mono-modal 

cases is applied, i.e. △ɛp = 1 × 10 −3 and R = −1. 

The evolution of the total dislocation densities are compared with the case of the mono-modal size 

particles (rp = 160 nm) in Fig. 4.42(a). The total densities and the rates of the dislocation ac- 

cumulation decrease as the percentage of the large particles is increased. Fig. 4.42(b) shows that 

the slip localization retards with the percentage of large particles. It should be noted that the total 

dislocation densities and the slip localization kinetics of all the bimodal cases considered here lie 

between the two previously investigated mono-modal cases (see Fig. 4.23 and Fig. 4.27). 

A 3D reoresentation of the dislocation structure is shown in Fig. 4.43(a) for the ’Bimodal2’ case 

after four fatigue cycles, and details of the associated intense slip bands are shown in Fig. 4.43(b). 

As compared with the mono-modal distribution case (rp = 160 nm, see Fig. 4.28(a)), the slip bands 

are more diffuse (the band thickness is larger and the local dislocation density is accordingly lower). 

The dislocation structure in the (1¯11) plane shown in Fig. 4.43(a) and the intense slip bands plotted


ρ[m -2 ] 

2.5e+13 

2e+13 

1.5e+13 

1e+13 

5e+12 

r p =160nm 

Bimodal 1 

Bimodal 2 

Bimodal 3 

0 

0e0 2e-3 4e-3 6e-3 8e-3 1e-2 1e-2 1e-2 2e-2 

* 

εVM (a) Evolution of the total dislocation density 

Standard deviation 

3e+13 

2.5e+13 

2e+13 

1.5e+13 

1e+13 

5e+12 

r p =160nm 

Bimodal 1 

Bimodal 2 

Bimodal 3 

0 

0e0 2e-3 4e-3 6e-3 8e-3 1e-2 1e-2 1e-2 2e-2 

* 

εVM (b) Evolution of the standard deviation σρ(t) (Eq. 4.7) 

Figure 4.42: Effects of the percentage of large particles on the statistics of fatigue tests 

in Fig. 4.43(b) demonstrates that the structural characteristics from the two previous mono-modal 

cases coexist in the bimodal case: dense dislocation tangles as well as Orowan loops are observed 

around the large particles and ladder-like dislocation structures are formed along the Burgers vec- 

tor direction. Both tangles of residual dislocations and long dislocation lines with a relatively high 

mobility are visible. 

These results are consistent with the experimental observations which show a much more homoge- 

neous and stable slip mode than in the shearable particle cases ([Martin 80], [Edwards & Martin 82]). 

The effective dispersal of slip by the non-shearable particles can explain the formation of more ho- 

mogeneous slip mode. 

TEM micrographs of intense slip bands formed in fatigue tested Waspaloy are shown in Fig. 4.44. 

The positions of a few of the large particles are indicated to facilitate the visualization in Fig. 

4.44(b). Although there exists large discrepancy between the simulation and the experiments con- 

cerning the particle sizes and the magnitude of the applied plastic strain 8 , the micrographs simulated 

dislocation microstructure, i.e. dislocation tangles are formed around the large particles, residual 

dislocations are present between the particles (Fig. 4.44(b)) and slip bands are more diffuse com- 

pared to the shearable particle case (Fig. 4.44(a)). 

Clear channels of totally sheared particles are no more formed due to the effective dispersal of dis- 

location by the large particles. Fig. 4.45(a) shows the distribution of the residual strength of the 

small particles after seven fatigue cycles (’Bimodal3’ case). The final particle strength distribution 

8 Particle radius=15 nm and 80 nm, Grain size∼ 50 µm, Volume fraction> 40%, △ɛp = 10 −2



Figure 4.43: Details of intense slip band of the ’Bimodal2’ case after four fatigue cycles 

(a) Slice normal to the primary plane (b) Slice parallel to the primary plane 

Figure 4.44: TEM micrographs of intense slip bands of fatigue tested Waspaloy


log(frequency) 

10 0 

10 −1 

10 −2 

10 −3 

10 

160 180 200 220 240 260 280 300 

−4 

τ [MPa] 

facet 

(a) Repartition by the particle strength 

(110) 

(b) Spatial distribution of sheared particles 

Figure 4.45: Statistical and spatial distribution of the strength of small particles 

τfacet[MPa] 

is significantly broader as compared to the mono-modal case (see Fig. 4.31) between the initial and 

the final strength. The spatial distribution of the sheared particles in Fig. 4.45(b) also shows that 

the shearing-off of the small particles does not occur in a confined channel but rather in a more 

distributed area as compared to in Fig. 4.32(a). 

The addition of large particles decreases the degree of cyclic softening seen in the mono-modal case 

(rp = 160 nm), and the stress amplitudes of the bimodal cases lie between the two mono-modal 

cases presented before (rp = 160 nm and 400 nm) although the stress differences between the dif- 

ferent bimodal cases is rather small as shown in Fig. 4.46. 

The deformed surface corresponding to the ’Bimodal2’ case is shown in Fig. 4.47(a) after four 

fatigue cycles. As the related slip bands are dispersed in the volume, the steps spread over the 

free surface to a larger extent than those observed in the shearable particle case (Fig. 4.38(a)). 

In addition, the surface step morphologies in Fig. 4.47(b) are shifted from the tongue-like to the 

ribbon-like type, although with a lower extent than in the large particle mono-modal case (Fig. 

4.40). 

4.3.7 Summary 

The simulated fatigue properties of materials hardened by shearable and non-shearable particles 

are qualitatively in good agreement with experimental observations. Simple geometries are used for 

the simulation volume and the particles. The evolution of the particle’s strength by shearing-off 

is assumed also in a simplified manner. The differences of microstructural and mechanical fatigue 

features can be summarized as follows for the shearable and non-shearable particle cases. 

300 

280 

260 

240 

220 

200 

180 

160


Stress[MPa] 

500 

450 

400 

350 

300 

250 

Shearable 

Bimodal 1 

Bimodal 2 

Bimodal 3 

Non-shearable 

200 

0.0e0 4.0e-3 8.0e-3 1.2e-2 1.6e-2 

Cumulative plastic strain 

Figure 4.46: Cyclic response curves of the bimodal cases compared with the mono-modal cases 

(a) Deformed surface of ’Bimodal2’ 

case after four fatigue cycles 

(b) Evolution of surface steps above intense slip bands 

Figure 4.47: Surface morphology of the ’Bimodal2’ case


1. Material hardened by shearable particles 

• High magnitude and accumulation rate of ρtot 

• Formation of thin slip bands with high local dislocation density 

• Ladder-like structures along the primary Burgers vector direction, and small-sized tangles 

between the particles 

• Clear channels of particles of low residual strength 

• Tongue-like surface markings 

• High initial stress amplitude followed by severe softening 

2. Material hardened by non-shearable particles 

• Low accumulation rate of ρtot 

• Larger slip band thickness and reduced inter-band spacing (more than one slip band 

formed in the volume) 

• Dense tangles around particles and complex dislocation structure between pairs of closely- 

spaced particles 

• Ribbon-like surface markings 

• Intermediate initial stress and rather stable cyclic response 

In the shearable particle case, a detailed investigation of the slip band formation shows that they 

are made of closely spaced edge dipolar loops. This is due to the limited glide distance of the 

double cross-slipped screws having a limited easy glide path. In the non-shearable particle case, 

dense dislocation tangles around the particles are attributed to the formation of Orowan loops and 

the subsequent interactions with the gliding dislocations, which act as pinning points for relatively 

mobile dislocations and contribute to form complex dislocation structures in the vicinity of the large 

particles. 

The addition of non-shearable particles (the bimodal cases) promotes dispersion of the slip bands, 

which results in retarded slip localization and in more diffused slip bands. The stress amplitude 

and the characteristics of slip markings are comprised between two mono-modal cases. 

It is observed that the large particles (rp = 400 nm) in the bimodal cases are also sheared off 

significantly. In Fig. 4.48, the residual facet strength are shown after 7 fatigue cycles for the 

bimodal case (Fig. 4.48(a)) and after 6 cycles for the mono-modal case (Fig. 4.48(b)). It is


(110) 

(a) Bimodal case 

τfacet[MPa] 

7400 

6850 

6300 

5750 

5200 

4650 

4100 

3550 

3000 

(110) 

(b) Mono-modal case 

τfacet[MPa] 

Figure 4.48: Comparison of the strength of particles with the radius rp = 400 nm in the (a) bimodal 

case after 7 fatigue cycles and (b)mono-modal case after 6 fatigue cycles 

apparent that the large particles are more sheared off in the bimodal case, e.g. the minimum value 

of the strength is 5128 MP a in Fig. 4.48(a), and 7227 MP a in Fig. 4.48(b). This implies that 

significant softening and damaging effects could eventually take place in the large particles present 

in a bimodal-sized particle distribution. The small difference in the number of cycles does not seem 

to influence much this observation. 

Because of the relatively small number of the simulated fatigue cycles, quantitative analyses on the 

slip irreversibility are not presented in this work. The computational limitations of the simulations 

lie in the maximum possible number of segments (related to memory capacity) and the poor load 

balance characteristic (see Fig. 3.21(b)). As already pointed out in Sec. 3.3.6 and 3.4.4, the 

computational performance can be further increased by 

1. Decomposing the data space 

Each processor use only the necessary and sufficient data for computation, and this will allow 

to use larger memory for the simulations. 

2. Revising the load balance scheme 

Fatigue simulations are poor in load balance due to the highly heterogeneous dislocation 

microstructure involved, moreover the geometry of the simulation volume (cylinder) as shown 

in Fig. 3.21(b) is not easy to decompose in a set of cubic boxes as needed by the parallelization 

scheme. Thus a more efficient load balancing scheme is highly desirable. 

7400 

6850 

6300 

5750 

5200 

4650 

4100 

3550 

3000


Good qualitative agreements of the microstructural and mechanical features of fatigue between the 

simulations and the experiments are, however, very promising for the development of fatigue-life 

models. It is generally agreed that the irreversible fraction of the cumulative cyclic strain describes 

well the fatigue life-controlling mechanisms. Like experimental efforts to well describe the state of 

damage ([Coupeau & Grilhe 99], [Cretegny & Saxena 01]), various parameters are measured 

during the simulations. Parameters such as the surface topology, slip band width and separation 

distance are direct outputs of the simulations. The evolution of the elastic energy inside of the 

slip bands is also accessible by post-processing of the simulated dislocation structures. Although 

each simulation needs considerable computing time 9 , the flexibility of simulations makes possible an 

extensive study on the effects of various parameters like geometries (grain size etc.), particle char- 

acteristics (particle size, volume fraction and strength etc.) and boundary conditions (the applied 

plastic strain etc.). The compilation of these information will serve to build fatigue crack nucleation 

criteria based on the intrinsic microstructural features involved. 

Key points 

• Image stresses by a 3D particle are computed using the FEM/DDD coupling code 

explained in Sec. 2.4.2. Cylindrical, spherical and cubical particles are considered, 

and interaction forces along both glide and climb directions are shown. 

• The effect of the elastic modulus difference is investigated focusing on the flow stress 

and the subsequent hardening behavior. A simple configuration involving two particles 

is used for the computation. 

• Fatigue simulations are performed using the new parallel DDD code. The effects 

of particles (shearable or non-shearable) on the fatigue properties, like the intense 

slip band microstructure, the cyclic mechanical response, and the surface markings 

are investigated. The simulated results are compared to the available experimental 

observations in a qualitative way. Bimodal particle distributions are also simulated. 

The simulations can be used effectively to build fatigue-life models based on the 

intrinsic microstructural features involved. 

9 It takes 4 − 7 days for the fatigue simulations presented in this work using 9 processors in IBM P690 architecture 

supported from KISTI (Korea Institute of Science and Technology Information)

Chapter 5 

Conclusions and perspectives 

At the beginning of this thesis, we have presented the details of the 3D discrete dislocation dynam- 

ics method. Efforts were given to elucidate the theoretical backgrounds and the assumptions lying 

under the method in order to ameliorate and expand the applicability of the method and also to 

be a good guidance to new comers in this field, especially to whom are not Francophone, since it 

is the first thesis on the French group written in English. The method to discretize the simulation 

space and the dislocation lines can be readily applied to other crystal structures, and the anisotropic 

stress fields and the various forms of dislocation mobility can be adopted according to the need of 

research objects. 

Besides the compilation of the existing components, new important elements are added to the 

3D DDD method, i.e. the computation of the displacement fields of dislocations, the implemen- 

tation of the internal interfaces and the periodic boundary conditions. These new features open a 

wide range of research areas in which the DDD code can be used. 

The computation of the displacement fields has been applied successfully to both the study of 

the surface markings during the cyclic loadings and the enforcement of displacement boundary con- 

ditions in the code coupled with CAST∃M, although the latter is not presented in this work. 

The internal interfaces represented by facets are effectively adopted for the particles in precipitation- 

hardened metals. This method can initiate a number of studies which involve the internal interfaces 

: the plasticity in a polycrystal and multilayer films ([Verdier 04]) which comprise grain bound- 

aries and interfaces between films respectively. 

The periodic boundary conditions are applied to the simulation of the Stage I-II transition. It is

150 Conclusions and perspectives 

now being applied to study the effect of the polarization of forest dislocations on the critical stress 

of a gliding dislocation line, in which the periodicity is forced along the line and the glide direction 

of the moving dislocation. 

Although it is not extensively studied in this work, the junction formation and its representation 

in the dislocation dynamics methods are largely investigated nowadays. Out of many important 

issues, the colinear junctions ([Madec et al. 03]) and the glissile junctions 1 are especially of interest. 

The usage of the linked-lists of segments and the decomposition of the orthorhombic simulation 

volume into homothetic boxes produce a significant increase in the computation efficiency and give 

great advantages on computing time with minor errors in the stress computation. This allows mas- 

sive simulations of bulk materials under homogeneous loading condition. 

Although the computational efficiency of the 3D DDD method has improved significantly by ap- 

plying the box method, the code was still infeasible to incorporate many particles in the simulation 

volume. A parallel version of the method has thus been developed. 

The distributed memory system and the standard MPI have been chosen to develop a parallel 

DDD code since the distributed memory architectures are the major stream of the parallel comput- 

ers and it will be for the time being. 

The scheme of the new parallel code is designed based on the box method: the boxes dividing the 

simulation volume are decomposed into parallel-piped subsystems. The advantages of the parallel 

scheme developed in this work are several: most of the serial codes can be used without any mod- 

ification and a relatively short period of development time was needed (less than 4 months). The 

gained speedup is quite satisfactory anyhow. Especially the efficiency of the internal stress compu- 

tation is 100%, thus the anisotropic stress solutions can be incorporated with the same computing 

expenses as those of the isotropic solutions by using several processors. The requirement that at 

least three boxes should exist along each axis of an individual subsystem, however, puts a certain 

limit on the number of processors that can be used simultaneously. Better strategies for the load 

balancing and the decomposition of data space would be highly desirable to improve the efficiency 

and the applicability of the new parallel code. 

Parallel to our efforts, there have been several groups which have converted their own dislocation dy- 

1 D. Weygand in the conference ’Dislocations 2004’ held at "La Colle-sur-Loup, France", September 13-17, 2004

namics codes into a parallel version, especially in Lawrence Livermore National Laboratory (LLNL) 

and University of California in Los Angeles (UCLA). Both of them use the nodal model. 

The image stresses due to a 3D particle were computed using the FEM/DDD coupling code. The 

interaction of a dislocation line with a circular cylindrical, spherical and cubical particle with differ- 

ing elastic modulus was investigated. The computation method was validated by comparing with 

the corresponding analytical solutions. It was shown that the image stresses need to be taken into 

account especially in the study of the local events around the particles, e.g. the computation of the 

energy state around a particle and the calculation of the creep threshold stresses at high tempera- 

tures. 

In these modeling, it is necessary to mesh the whole simulation volume because the geometrical 

symmetries are broken by the heterogeneous force boundary conditions due to a dislocation. Conse- 

quently, the cost of the FEM computation is relatively high both in term of cpu time and required 

memory. The force profiles fitted from the computed data can used as approximation solutions of 

interactions due to the elastic modulus mismatch. For the dynamics, however, the use of a parallel 

finite element method would be of benefit and will be served as a good tool in studying the plasticity 

of multilayer films, for example. 

The effect of the elastic modulus mismatch is investigated focusing on the flow stress and the 

subsequent hardening behavior using the simple geometry involving two particles. The characteris- 

tics of the image stresses (short-ranged and paraelastic) generate minor effects on the yield stresses 

but significant effects on the work hardening rate. The image stresses are also found to affect sig- 

nificantly the local events such as cross slip and climb. 

The fatigue simulations are performed using the internal interfaces represented by facets and the 

new parallel DDD program. The characters of shearable and non-shearable particles and the par- 

ticle’s strength evolution by shearing-off were represented in a simplified manner by adjusting the 

strength of the facets. 

Major features of the fatigue properties of materials hardened by shearable and non-shearable par- 

ticles are well reproduced by the simulations, e.g. microstructure of the intense slip bands, the 

cyclic mechanical response and the surface markings. The simulated results were compared with 

the available experimental observations, and showed good agreements in a qualitative way. The 

151

152 Conclusions and perspectives 

mechanism of the intense slip band formation is proposed from the observation of the simulated 

dislocation microstructure. 

The flexibility of the simulations can permit an extensive study on the effects of various parameters 

like the geometries (grain size etc.), the characteristic of particles (particle size, volume fraction and 

strength etc.) and the applied plastic strain. The compilation of these information will serve to 

build fatigue crack nucleation criteria based on the intrinsic microstructural features involved. 

To build a reliable fatigue life model, it is however imperative to increase the number of fatigue 

cycles of the simulations. For this purpose, the efficiency and performance of the new parallel code 

need to be improved by adopting better strategies for the load balancing and decomposing the data 

space so as to increase the maximum number of dislocation segments and particles. 

In conclusion, the methods that we have developed and verified have been applied to the dislocation- 

precipitate interactions and opens many paths to new interesting research areas.

Bibliography 

[Abraham 97] Abraham F. F., Portrait of a crack: Rapid fracture mechanics using parallel molec- 

ular dynamics, IEEE Computational Science & Engineering, Vol. 4, No.‌ 2, 1997, pp. 66–77. 

[Aoyama & Nakano 99] Aoyama Y. & Nakano J., Practical MPI Programming(RS/6000 SP), 

IBM Redbooks, Vervante, 1999. 

[Bacon et al. 73] Bacon D. J., Kocks U. F. & Scattergood R. O., The effect of dislocation self- 

interaction on the orowan stress, Phil. Mag., Vol. 28, 1973, p. 1241. 

[Barnett 85] Barnett D. M., The displacment field of a triangular dislocation loop, Phil. Mag. 

A, Vol. 51, No.‌ 3, 1985, pp. 383–387. 

[Bathe 96] Bathe K. J., Finite Element Procedures, Prentice-Hall International, INC., 1996. 

[Brown 64] Brown L. M., The self-stress of dislocations and the shape of extended nodes, Phil. 

Mag., 1964, pp. 441–466. 

[Bulatov et al. 01] Bulatov V. V., Rhee M. & Cai W., Periodic boundary conditions for disloca- 

tion dynamics simulations in three dimensions, Mat. Res. Soc. Symp. Proc., ed. by Kubin L. P., 

Selinger R. L., Bassani J. L. & Cho K., 2001. 

[Calabrese & Laird 74] Calabrese C. & Laird C., Cyclic stress-strain response of two-phase 

alloys, parts i and ii, Materials Science and Engineering, Vol. 13, 1974, pp. 141–174. 

[Canova & Kubin 91] Canova G. R. & Kubin L. P., Dislocation microstructures and plastic flow: 

a three dimensional simulaiton, Continuum models and discrete systems, ed. by Maugin G. A., 

1991. 

[Chen et al. 99] Chen B. T., Zhang T. Y. & Lee J. K., Interaction of an edge dislocation with an 

elliptical hole in a rectilinearly anisotropic body, Mech. of Mat., Vol. 31, 1999, p. 71.

154 BIBLIOGRAPHY 

[Clavel & Pineau 82] Clavel M. & Pineau A., Fatigue behaviour of two nickel-base alloys i: 

Experimental results on low cycle fatigue, fatigue crack propogation and substructures, Materials 

Science and Engineering, Vol. 55, 1982, pp. 157–171. 

[Cleveringa et al. 97] Cleveringa H. H. M., Giessen E. Vander. & Needleman A., Comparison of 

discrete dislocation and continuum plasticity predictions for a composite material, Acta Materi- 

alia, Vol. 45, No.‌ 8, 1997, pp. 3163–3179. 

[Comninou & Dundurs 72] Comninou M. & Dundurs J., Long-range interaction between a screw 

dislocation and a spherical inclusion, J. Appl. Phys., Vol. 43, 1972, p. 2461. 

[Coupeau & Grilhe 99] Coupeau C. & Grilhe J., Quantitative analysis of surface effects of plastic 

deformation, Materials Science and Engineering A, Vol. 271, 1999, pp. 242–250. 

[Cretegny & Saxena 01] Cretegny L. & Saxena A., Afm characterization of the evolution of 

surface deformation during fatigue in polycrystalline copper, Acta Materialia, Vol. 49, No. ‌ 18, 

2001, pp. 3647–3887. 

[Demmel et al. 93] Demmel J., Heath M. & van der Vorst H., Parallel numerical linear algebra, 

Acta Numerica 1993, 1993. 

[Déprés 04] Déprés C., Modèlisation physique des stades précurseurs de l’endommagement en 

fatigue, Thèse de PhD, Institut National Polytechnique De Grenoble, 2004. 

[Déprés et al. 03] Déprés C., Fivel M., Robertson C. F., Fissolo A. & Verdier M., Etude des 

stades précurseurs de l’endommagement en fatigue: expériences et simulations à l’échelle des 

dislocations, Journal de Physique IV, Vol. 106, 2003, pp. 81–90. 

[Déprés et al. 04] Déprés C., Robertson C. F. & Fivel M., Low-strain fatigue in 316l steel surface 

grains: a three dimensional discrete dislocation dynamics modelling of the early cycles. part-1: 

Dislocation microstructures and mechanical behaviour, Phil. Mag., Vol. 84, No. ‌ 22, 2004, pp. 

2257–2275. 

[Devincre 95] Devincre B., Three dimensional stress field expressions for straight dislocation 

segments, Solid State Communications, Vol. 93, No.‌ 11, 1995, pp. 875–878. 

[Devincre et al. 01] Devincre B., Kubin L. P., Lemarchand C. & Madec R., Mesoscopic sim- 

ulations of plastic deformation, Materials Science and Engineering, Vol. A309-310, 2001, pp. 

211–219.

BIBLIOGRAPHY 155 

[Devincre & Roberts 96] Devincre B. & Roberts S., Three-dimensional simulation of 

dislocation-crack interactions in b.c.c. metals at the mesoscopic scale, Acta Materialia, Vol. 44, 

No.‌ 7, 1996, pp. 2981–2900. 

[dewit 67] deWit R., Some relations for straight dislocations, Phys. Stat. Sol., Vol. 20, 1967, pp. 

567–573. 

[Diehl 56] Diehl J., Z. Metallk., Vol. 47, 1956, p. 331. 

[Dongarra et al. 98] Dongarra J. J., Duff I. S., Sorenson D. C. & Vorst H. A., Numerical Linear 

Algebra for High Performance Computers, SIAM, Philadelphia, 1998. 

[Ebeling & Ashby 66] Ebeling R. & Ashby M. F., Dispersion hardening of copper single crystals, 

Phil. Mag., Vol. 13, 1966, p. 805. 

[Edwards & Martin 82] Edwards L. & Martin J. W., Proc. of 6th Int. Conf. on the strength of 

metals and alloys, ed. by Gifkins R. C., 1982. 

[Essmann & Mughrabi 79] Essmann U. & Mughrabi H., Annihilation of dislocations during 

tensile and cyclic deformation and limits of dislocation densities, Phil. Mag. A, Vol. 40, No. ‌ 6, 

1979, pp. 731–756. 

[Fahrat & Roux 94] Fahrat C. & Roux F. X., Implicit parallel processing in structural mechanics, 

Computational Mechanics Advances, Vol. 2, No.‌ 1, 1994. 

[Fisher et al. 53] Fisher J. C., Hart E. W. & Rry R. H., The hardening of metal crystals by 

precipitate particles, Acta Materialia, Vol. 1, 1953, p. 336. 

[Fivel 97] Fivel M., Études numériques à différentes échelles de la déformation plastique des 

monocristaux de structure CFC, Thèse de PhD, Institut National Polytechnique De Grenoble, 

1997. 

[Fivel & Canova 99] Fivel M. & Canova G. R., Developing rigorous boundary conditions to 

simulations of discrete dislocation dynamics, Modelling Simul. Mater. Sci. Eng., Vol. 7, 1999, pp. 

753–768. 

[Fivel et al. 96] Fivel M., Gosling T. J. & Canova G. R., Implementing image stresses in a 3d 

dislocation simulation, Modelling Simul. Mater. Sci. Eng., Vol. 4, No.‌ 6, 1996, pp. 581–596.


[Fivel et al. 98] Fivel M., Robertson C. F., Canova G. R. & Boulanger L., 3d modeling of indent- 

induced plastic zone at a mesoscale, Acta Materialia, Vol. 7, 1998, pp. 6183–6194. 

[Foreman 67] Foreman A. J. E., The bowing of a dislocation segment, Phil. Mag., Vol. 15, 1967, 

pp. 1011–1021. 

[Foreman & Makin 66] Foreman A. J. E. & Makin M. J., Dislocation movement through random 

arrays of obstacles, Phil. Mag., Vol. 14, 1966, p. 911. 

[Fusenig & Nembach 75] Fusenig K. D. & Nembach E., Dynamic dislocation effects in precipi- 

tation hardened materials, Acta metall. mater., Vol. 41, 1975, pp. 3181–3189. 

[Gerold & Steiner 82] Gerold V. & Steiner D., Fatigue softening in precipitation-hardened 

copper-cobalt, Scripta Metallurgica, Vol. 16, 1982, pp. 405–408. 

[GG et al. 00] GómezGarcía D., Devincre B. & Kubin L. P., Forest hardening and boundary con- 

ditions in 2d simulations of dislocations dynamics, Mat. Res. Soc. Symp. Proc., ed. by Robertson 

I. M., Lassila D. H., Devincre B. & Phillips R., 2000. 

[Ghoniem et al. 00] Ghoniem N. M., Singh B. N., Sun L. Z. & de la Rubia T. D., Interaction 

and accumulation of glissile defect clusters near dislocations, J. Nucl. Mater., Vol. 276, 2000, pp. 

166–177. 

[Giessen & Needleman 95] Giessen E. Vander. & Needleman A., Discrete dislocation plasticity: 

a simple planar model, Modelling Simul. Mater. Sci. Eng., Vol. 3, 1995, pp. 689–735. 

[Graf & Hornbogen 78] Graf M. & Hornbogen E., The effect of inhomogeneity of cyclic strain 

on initiation of cracks, Scripta Metallurgica, Vol. 12, 1978, pp. 147–150. 

[Gullouglu et al. 89] Gullouglu A. N., Srolovitz D. J., Lesar R. & Lomdahl P. S., Dislocation 

distributions in two dimensions, Scripta Metallurgica, Vol. 23, 1989, p. 1347. 

[Hirth & Lothe 92] Hirth J. P. & Lothe J., Theory of Dislocations, Krieger Publishing Company, 

Malabar, Florida, 1992. 

[Hull & Bacon 83] Hull D. & Bacon D. J., Introduction to Dislocations, Pergamon Press, p96, 

1983. 

[Humphreys & Martin 67] Humphreys F. J. & Martin J. W., Phil. Mag., Vol. 16, 1967, p. 927.


[Khraishi et al. 00a] Khraishi T. A., Zbib H. M., Hirth J. P. & de la Rubia T. D., The stress field 

of a general circular volterra dislocation loop: Analytical and numerical approches, Phil. Mag., 

Vol. 80, 2000, pp. 95–105. 

[Khraishi et al. 00b] Khraishi T. A., Zbib H. M., Hirth J. P. & Khaleel M., The displacement, 

and strain-stress fields of a general circular volterra dislocation loop, Int. J. Eng. Sci., Vol. 80, 

2000, pp. 251–266. 

[Kobashi & Ohr 80] Kobashi S. & Ohr S. M., Phil. Mag. A, Vol. 42, 1980, p. 763. 

[Kocks et al. 75] Kocks U. F., Argon A. S. & Ashby M. F., Thermodynamics and kinetics of slip, 

Progress in Materials Science, ed. by Kubin L. P., Selinger R. L., Bassani J. L. & Cho K., 1975. 

[Lee & Laird 83] Lee J. K. & Laird C., Strain localization during fatigue of precipitation-hardened 

aluminium alloys, Phil. Mag., Vol. 47A, 1983, pp. 579–597. 

[Lépinoux & Kubin 87] Lépinoux J. & Kubin L. P., The dynamic organization of dislocation 

structures: a simulation, Scripta Metallurgica, Vol. 21, 1987, pp. 833–837. 

[Li 64] Li J. C. M., Stress field of a dislocation segment, Phil. Mag., Vol. 10, 1964, pp. 1097–1098. 

[Li & Laird 94] Li Y. & Laird C., Cyclic response and dislocation structures of aisi 316l stainless 

steel. part 1: Single crystals fatigued at intermediate strain amplitude., Materials Science and 

Engineering A, Vol. 186, No.‌ 1–2, 1994, pp. 65–86. 

[Madec 01] Madec R., Des intersections entre dislocations a la plasticité du monocristal CFC; 

Étude par dynamique des dislocations, Thèse de PhD, Universite Paris XI Orsay, 2001. 

[Madec et al. 03] Madec R., Devincre B., Kubin L. P., Hoc T. & Rodney D., The role of collinear 

interaction in dislocation-induced hardening, Science, Vol. 301, No.‌ 26, 2003, pp. 1879–1882. 

[Madec et al. 04] Madec R., Devincre B. & Kubin L. P., On the use of periodic boundary condi- 

tions in dislocation dynamcis simulation, Mesoscopic Dynamics in Fracture Process and Stresngth 

of Materials, ed. by Shibutani Y. & Kitagawa H., 2004. 

[Man et al. 02] Man J., Obrtlik K., Blochwitz C. & Polák J., Atomic force microscopy of surface 

relief in individual grains of fatigued 316l austenitic stainless steel, Acta Materialia, Vol. 50, 2002, 

pp. 3767–3780.


[Marquis & Dunand 02] Marquis E. A. & Dunand D. C., Model for creep threshold stress in 

precipitation-strengthened alloys with coherent particles, Scripta Materialia, Vol. 47, 2002, p. 

503. 

[Martin 80] Martin J. W., Micromechanisms in particle-hardened alloys, Cambrideg Solide State 

Science Series, ed. by Cahn R. W., Thompson M. W. & Ward I. M., 1980. 

[Mason 68] Mason W. P., Dislocation dynamics, MacGraw-Hill, 1968. 

[Melander & Persson 78] Melander A. & Persson P. A., The strength of a precipitation hard- 

ened alznmg alloy, Acta Materialia, Vol. 26, 1978, p. 267. 

[Mohles & Nembach 01] Mohles V. & Nembach E., The peak- and overaged states of particle 

strengthened materials: computer simulations, Acta Materialia, Vol. 49, 2001, p. 2405. 

[Moore 65] Moore G. E., Cramming more components onto integrated circuits, Electronics, 

Vol. 38, No.‌ 8, 1965. 

[Mughrabi 83] Mughrabi H., Deformation of multi-phase and particle containing materials, Pro- 

ceedings of the 4th Risø International Symposium on Metallurgy and Materials Science, ed. by 

Bilde-Sørensen J. B., Hansen N., Horsewell A., Leffers T. & Lilholt H., 1983. 

[Mughrabi 85] Mughrabi H., Dislocation Properties in Real Materials, Book No. 323, The Institute 

of Metals, London, 1985. 

[Nembach 83] Nembach E., Phys. Stat. Sol., Vol. 78, 1983, p. 571. 

[Nembach 97] Nembach E., Particle strengthening of metals and alloys, John Wiley and Sons, 

1997. 

[Obrtlik et al. 94] Obrtlik K., Kruml T. & Polák J., Dislocation structures in 316l stainless steel 

cycled with plastic strain amplitudes over a wide interval., Materials Science and Engineering A, 

Vol. 187, No.‌ 1, 1994, pp. 1–10. 

[Reppich 93] Reppich B., Particle strengthinig, Mater. Sci. Technol., Vol. 6, 1993, pp. 311–357. 

[Rhee et al. 01] Rhee M., Stolken J. S., Bulatov V. V., de la Rubia T. D., Zbib H. M. & Hirth 

J. P., Dislocation stress fields for dynamic codes using anisotropic elasticity: methodology and 

analysis, Materials Science and Engineering, Vol. A309-310, 2001, pp. 288–293.


[Risbet et al. 03] Risbet M., Feaugas X., Guillemer-Neel C. & Clavel M., Use of atomic force 

microscopy to quantify slip irreversibility in a nickel-base superalloy, Scripta Materialia, Vol. 49, 

2003, pp. 533–538. 

[Rodney & Phillips 99] Rodney D. & Phillips R., Structure and strength of dislocation junctions: 

an atomic-level analysis, Phy. Rev. Lett., Vol. 82, 1999, pp. 1704–1707. 

[Santare & Keer 86] Santare M. H. & Keer L. M., Interaction between an edge dislocation and 

a rigid elliptical inclusion, J. Appl. Mech., Vol. 53, 1986, p. 382. 

[Schmid & Boas 35] Schmid E. & Boas W., Kristallplastizitat, Springer Verlag(Berlin), 1935. 

[Schwarz 99] Schwarz K. W., Simulation of dislocations on the mesoscpoic scale. i. methods and 

examples, J. Appl. Phys., Vol. 85, No.‌ 1, 1999, pp. 108–119. 

[Shenoy et al. 00] Shenoy V. B., Kukta R. V. & Phillips R., Mesoscopic analysis of structure 

and strength of dislocatoin junctions in fcc metals, Phy. Rev. Lett., Vol. 84, No. ‌ 7, 2000, pp. 

1491–1494. 

[Shin et al. 01] Shin C. S., Fivel M., Rodney D., Phillips R., Shenoy V. B. & Dupuy L., Forma- 

tion and strength of junctions in fcc metals : a study by dislocation simulation and atomistic 

simulations, Journal de Physique IV, Vol. 11, No.‌ Pr5, 2001, pp. 19–26. 

[Stoltz & Pineau 78] Stoltz R. E. & Pineau A., Dislocation-precipitate interaction and cyclic 

stress-strain behavior of a γ’-strengthened superalloy, Materials Science and Engineering, Vol. 34, 

1978, pp. 275–284. 

[Suresh 98] Suresh S., Fatigue of Materials, 2nd edi., Cambridge University Press, 1998. 

[Tang et al. 98] Tang M., Kubin L. P. & Canova G. R., Dislocation moility and the mechanical 

response of bcc single crystals: a mesoscopic approach, Acta Materialia, Vol. 46, 1998, p. 9. 

[Urabe & Weertman 75] Urabe N. & Weertman J., Dislocation mobility in potassium and iron 

single crystals, Materials Science and Engineering, Vol. 18, 1975, p. 41. 

[Verdier 04] Verdier M., Plasticity in fine scale semi-coherent metallic films and multilayers, 

Scripta Materialia, Vol. 50, No.‌ 6, 2004, pp. 769–773.


[Verdier et al. 98] Verdier M., Fivel M. & Groma I., Mesoscopic scale simulation of dislocation 

dynamic in fcc metals: Principle and applications, Modelling Simul. Mater. Sci. Eng., Vol. 6, No. 

‌ 6, 1998, pp. 755–770. 

[Vitek 75] Vitek V., Yielding from a crack with finite root-radius loaded in uniform tension, J. 

Mech. Phys. Solids, Vol. 24, 1975, p. 67. 

[Weeks et al. 69] Weeks R. W., Pati S. R., Ashby M. F. & Barrand P., The elastic interaction 

between a straight dislocation and a bubble or a particle, Acta Metallurgica, Vol. 17, 1969, p. 

1403. 

[Weygand et al. 01] Weygand D., Friedman L. H., Giessen E. Vander. & Needleman A., Discrete 

dislocation modeling in tree-dimensional confined volumes, Materials Science and Engineering, 

Vol. A309-310, 2001, p. 420. 

[Weygand et al. 02] Weygand D., Friedman L. H. & Giessen E. Vander., Aspect of boundary- 

value problem solutions with three-dimensional dislocation dynamics, Modelling Simul. Mater. 

Sci. Eng., Vol. 10, 2002, pp. 437–468. 

[Zbib et al. 98] Zbib H. M., Rhee M. & Hirth J. P., On plastic deformation and the dynamics of 

3d dislocations, Int. J. Mch. Sci., Vol. Nos, No.‌ 2-3, 1998, pp. 113–127. 

[Zhou & Lung 88] Zhou S. J. & Lung C. W., An image force expression for the dislocation near 

a crack, J. Phys. F: Met. Phys., Vol. 18, 1988, p. 851. 

[Zhu & Starke 99] Zhu A. W. & Starke E. A., Computer experiment on superposition of strength- 

ening effects of different particles, Acta Materialia, Vol. 47, 1999, p. 3263.

3D DISCRETE DISLOCATION DYNAMICS APPLIED TO ... - NUMODIS

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?