3D DISCRETE DISLOCATION DYNAMICS APPLIED TO ... - NUMODIS
3D DISCRETE DISLOCATION DYNAMICS APPLIED TO ... - NUMODIS
3D DISCRETE DISLOCATION DYNAMICS APPLIED TO ... - NUMODIS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
INSTITUT NATIONAL POLYTECHNIQUE DE GRENOBLE<br />
THESE<br />
pour obtenir le grade de<br />
DOCTEUR DE L’INPG<br />
N o attribué par la bibliothèque<br />
Spécialité : « SCIENCE ET GENIE DES MATERIAUX »<br />
préparée au laboratoire Génie Physique et Mécanique des Matériaux (GPM2)<br />
dans le cadre de l’Ecole Doctorale «MATERIAUX ET GENIE DES PROCEDES »<br />
présentée et soutenue publiquement<br />
par<br />
Chansun SHIN<br />
le 25 novembre 2004<br />
Titre :<br />
<strong>3D</strong> <strong>DISCRETE</strong> <strong>DISLOCATION</strong> <strong>DYNAMICS</strong> <strong>APPLIED</strong> <strong>TO</strong><br />
<strong>DISLOCATION</strong>-PRECIPITATE INTERACTIONS<br />
Directeur de thèse :<br />
Marc FIVEL<br />
JURY<br />
M. A. PINEAU ,Président, Rapporteur<br />
M. F. LOUCHET ,Examinateur<br />
M. M. FIVEL ,Directeur de thèse<br />
M. K. H. OH ,Co-encadrant<br />
M. H. N. HAN ,Rapporteur<br />
M. C. ROBERTSON ,Invité<br />
M. M. VERDIER ,Invité
<strong>3D</strong> <strong>DISCRETE</strong> <strong>DISLOCATION</strong> <strong>DYNAMICS</strong> <strong>APPLIED</strong> <strong>TO</strong><br />
<strong>DISLOCATION</strong>-PRECIPITATE INTERACTIONS<br />
The <strong>3D</strong> Discrete Dislocation Dynamics (DDD) method has been applied to investigate the effects of<br />
precipitates on the plasticity of FCC single crystals.<br />
A method to represent the internal interfaces by a series of facets with a pre-defined strength has been<br />
proposed. For a full account of the mutual elastic interactions between dislocations and second-phase<br />
particles, the coupling method with a finite element method is extended. In order to accelerate the<br />
computing time, the serial <strong>3D</strong> DDD algorithm has been improved by revisiting the ’box method’ and a new<br />
parallel code has been developed using the standard Message passing Interface (MPI).<br />
The image stresses due to a three-dimensional particle were computed using the FEM/DDD coupling<br />
code. The numerical results have been compared to the corresponding analytical solutions. The ef-<br />
fect of the elastic modulus mismatch on the flow stress and the subsequent hardening behavior has<br />
then been analyzed. The image stresses were found to affect significantly the work hardening and<br />
the local events such as cross slip and climb. Finally, the fatigue of precipitate-hardened materials<br />
was simulated using the new parallel DDD code. The effects of shearable and non-shearable particles<br />
on the fatigue properties were well reproduced by the simulations, and the numerical results showed<br />
good agreements with the available experimental observations in a qualitative way. The mechanism of<br />
the intense slip band formation is proposed from the observation of the simulated dislocation microstructure.<br />
KEY WORDS: <strong>DISLOCATION</strong>, PRECIPITATE, PLASTICITY, FATIGUE, IMAGE FORCES,<br />
DAMAGE, <strong>DYNAMICS</strong>, PARALLELIZATION<br />
DYNAMIQUE DES <strong>DISLOCATION</strong>S <strong>DISCRETE</strong>S APPLIQUEE AUX<br />
INTERACTIONS ENTRE <strong>DISLOCATION</strong>S ET PRECIPITES<br />
La dynamique des dislocations discrètes (DDD) a été appliquée pour examiner les effets des précipités sur<br />
la plasticité des monocristaux de structure CFC.<br />
Les précipités sont modélisés par un assemblage de facettes franchissable pour une contrainte donnée.<br />
Afin de tenir compte des interactions élastiques entre les dislocations et les particules, un couplage avec la<br />
méthode des éléments finis (MEF) a été utilisé. Afin d’accélérer les temps de calculs, la ’méthode des boites’<br />
a été revisitée et une version parallele du code a été développée en utilisant le standard du programmation<br />
’Message Passing Interface (MPI)’.<br />
Dans un premier temps, les contraintes images créées par une particule <strong>3D</strong> ont été calculées grâce un cou-<br />
plage entre la MEF et le code de DDD. Les résultats numériques ont été comparés aux solutions analytiques<br />
correspondantes. L’effet de la différence des modules d’Young sur la limite élastique et le comportement<br />
durcissant qui en découle ont ensuite été étudiés numériquement. Nous avons montré que les contraintes<br />
image ont un effet significatif sur le durcissement et les événements locaux tels que le glissement dévié et la<br />
montée. Finalement, la fatigue des matériaux durcis par des précipités cisaillables et non-cisaillables a été<br />
simulée avec le nouveau code parallèle de DDD. Les résultats obtenus grâce à nos simulations sont en accord<br />
avec nos observations experimentales et les données de la littérature. Un mécanisme de formation des ban-<br />
des de glissement intense a été proposé à partir de l’observation des microstructures obtenues par simulation.<br />
MOTS CLES: <strong>DISLOCATION</strong>, PRECIPITE, PLASTICITE, FATIGUE, FORCES IMAGES, EN-<br />
DOMMAGEMENT, DYNAMIQUE, PARALLELISATION<br />
Laboratoire Génie Physique et Mécanique des Matériaux (GPM2), ESA5010,<br />
ENSPG, 101 Rue de la Physique, BP46, 38402 Saint Martin d’Hères Cedex
Acknowledgements<br />
First of all, I express my big thanks to my advisor Marc Fivel. Five years ago, he kindly replied<br />
to my audacious e-mail, which could be easily neglected considering the content, and gave me an<br />
opportunity to visit him. This short visit led to the three-year Ph.D program between INP Greno-<br />
ble and Seoul National University, and from the moment we shook hands for the first time, to the<br />
moment we shook hands after the thesis defence, he has been my mentor on both work and life.<br />
I am also grateful to Professor Kyu Hwan Oh, whom I have been working with since I began my<br />
Master study eight years ago. He gave me many opportunities to experience in research, and kept<br />
giving me much good advice.<br />
I owe my special thanks to Marc Verdier (LTPCM) and Christian Robertson (CEA Saclay).<br />
They guided me and advised me as an unofficial co-advisor, from the start of the thesis work to<br />
the rehearsal of the thesis presentation with great patience and encouragement. And I cannot help<br />
attributing some of my work to the fantastic tools of Christophe Déprés, who started and finishes<br />
the Ph.D study with me.<br />
I want to thank Professor André Pineau (ENS Mines Paris) for serving both as ’Président’ and<br />
’Rapporteur’ for my thesis defence. From the moment I met him for the first time at the meeting<br />
of the project ’FAMICRO’ 1 , that supported my work on fatigue simulations, I was fascinated with<br />
his enthusiasm for research and with his boundless memory.. he is a walking library! I also thank<br />
Professor François Louchet (LGGE) and Heung Nam Han (SNU) for serving on my thesis<br />
committee and for their useful suggestions and critical assessment on my work.<br />
My work has been supported by EGIDE 2 , and I want to thank the CNOUS at Grenoble and the<br />
French Embassy in Korea for their efficient professional services.<br />
I am much grateful to all the members of GPM2 Laboratory for pleasant daily life, in the blue room:<br />
Julien Chaussidon, Thomas Nogaret, computing room: David Rodney, Valérie Quatela<br />
and on the playground with a soccer ball: Dider Bouvard, Rémy Dendievel, Luc Salvo,<br />
Charles Josserond, Franck Pelloux and Shigesato Genechi.<br />
And finally, I want to express my thanks and love to my wife Suejung, who is both my great<br />
supporter and best friend, for her love and devotion to our family, and to my little daughter Yvine,<br />
who likes to play with my laptop, for laughter and happiness we all share in our growing family.<br />
1 Modélisation de la durée de vie en Fatigue de matériaux métalliques structuraux, à partir de mécanismes physiques<br />
microscopiques<br />
2 Bourse Pasteur du ministère des affaires étrangères
Abstract<br />
The <strong>3D</strong> Discrete Dislocation Dynamics (DDD) method has been applied to investigate the effects<br />
of precipitates on the plasticity of FCC single crystals.<br />
A method to represent the internal interfaces by a series of facets with a pre-defined strength<br />
has been proposed. For a full account of the mutual elastic interactions between dislocations and<br />
second-phase particles, the coupling method with a finite element method is extended. In order<br />
to accelerate the computing time, the serial <strong>3D</strong> DDD algorithm has been improved by revisiting<br />
the ’box method’ and a new parallel code has been developed using the standard Message passing<br />
Interface (MPI).<br />
The image stresses due to a three-dimensional particle were computed using the FEM/DDD coupling<br />
code. The numerical results have been compared to the corresponding analytical solutions. The<br />
effect of the elastic modulus mismatch on the flow stress and the subsequent hardening behavior has<br />
then been analyzed. The image stresses were found to affect significantly the work hardening and<br />
the local events such as cross slip and climb. Finally, the fatigue of precipitate-hardened materials<br />
was simulated using the new parallel DDD code. The effects of shearable and non-shearable par-<br />
ticles on the fatigue properties were well reproduced by the simulations, and the numerical results<br />
showed good agreements with the available experimental observations in a qualitative way. The<br />
mechanism of the intense slip band formation is proposed from the observation of the simulated<br />
dislocation microstructure.
Résumé<br />
La dynamique des dislocations discrètes (DDD) a été appliquée pour examiner les effets des précip-<br />
ités sur la plasticité des monocristaux de structure CFC.<br />
Les précipités sont modélisés par un assemblage de facettes franchissable pour une contrainte don-<br />
née. Afin de tenir compte des interactions élastiques entre les dislocations et les particules, un<br />
couplage avec la méthode des éléments finis (MEF) a été utilisé. Afin d’accélérer les temps de<br />
calculs, la ’méthode des boites’ a été revisitée et une version parallele du code a été développée en<br />
utilisant le standard du programmation ’Message Passing Interface (MPI)’.<br />
Dans un premier temps, les contraintes images créées par une particule <strong>3D</strong> ont été calculées grâce un<br />
couplage entre la MEF et le code de DDD. Les résultats numériques ont été comparés aux solutions<br />
analytiques correspondantes. L’effet de la différence des modules d’Young sur la limite élastique<br />
et le comportement durcissant qui en découle ont ensuite été étudiés numériquement. Nous avons<br />
montré que les contraintes image ont un effet significatif sur le durcissement et les événements lo-<br />
caux tels que le glissement dévié et la montée. Finalement, la fatigue des matériaux durcis par des<br />
précipités cisaillables et non-cisaillables a été simulée avec le nouveau code parallèle de DDD. Les<br />
résultats obtenus grâce à nos simulations sont en accord avec nos observations experimentales et<br />
les données de la littérature. Un mécanisme de formation des bandes de glissement intense a été<br />
proposé à partir de l’observation des microstructures obtenues par simulation.
Contents<br />
Acknowledgements iii<br />
Abstract v<br />
1 Introduction 1<br />
1.1 Computational methods in plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br />
1.2 Dislocation dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />
1.2.1 2D simulations of dislocation dynamics . . . . . . . . . . . . . . . . . . . . . . 3<br />
1.2.2 <strong>3D</strong> simulations of dislocation dynamics . . . . . . . . . . . . . . . . . . . . . . 4<br />
1.3 Scope of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />
2 Description of the simulation method 9<br />
2.1 Representation of the dislocation lines in FCC metals . . . . . . . . . . . . . . . . . . 10<br />
2.1.1 Preparation of the simulation space . . . . . . . . . . . . . . . . . . . . . . . . 10<br />
2.1.2 Discretization of the dislocation lines . . . . . . . . . . . . . . . . . . . . . . . 10<br />
2.1.3 Existence of a subnetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />
2.1.4 Comments on other crystal structures and dislocation dynamics models . . . 13<br />
2.2 Computation of stresses and displacements of dislocations . . . . . . . . . . . . . . . 14<br />
2.2.1 Evaluation of the driving force . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br />
2.2.2 Computation of displacements . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br />
2.3 Motion of dislocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />
2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />
2.3.2 Dislocation mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />
2.3.3 Dislocation-dislocation interactions . . . . . . . . . . . . . . . . . . . . . . . . 25<br />
2.3.4 Cross-slip of screw dislocation segments . . . . . . . . . . . . . . . . . . . . . 28
viii CONTENTS<br />
2.3.5 Plastic strain due to dislocation movement . . . . . . . . . . . . . . . . . . . . 29<br />
2.4 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />
2.4.1 Periodic Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />
2.4.2 Internal interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
2.5 Acceleration of the DDD code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />
2.5.1 Problem description and review of literatures . . . . . . . . . . . . . . . . . . 35<br />
2.5.2 The Box method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />
2.5.3 Speedup and Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />
2.5.4 Boxes and Periodic boundary conditions . . . . . . . . . . . . . . . . . . . . . 46<br />
2.6 Computation procedure of the DDD program . . . . . . . . . . . . . . . . . . . . . . 48<br />
3 Parallelization of the Discrete Dislocation Dynamics method 51<br />
3.1 An introduction to Supercomputing . . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />
3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />
3.1.2 Classification of hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />
3.1.3 Parallel programming models . . . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />
3.1.4 Classification of parallel languages . . . . . . . . . . . . . . . . . . . . . . . . 58<br />
3.1.5 Supercomputers in France and Korea . . . . . . . . . . . . . . . . . . . . . . . 60<br />
3.2 Towards a parallel DDD code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />
3.2.1 Basic Steps of Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />
3.2.2 Writing a parallel program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br />
3.3 Parallelization of the serial DDD program . . . . . . . . . . . . . . . . . . . . . . . . 67<br />
3.3.1 Initialization of parallel environments . . . . . . . . . . . . . . . . . . . . . . 67<br />
3.3.2 Long-distance stresses computations . . . . . . . . . . . . . . . . . . . . . . . 70<br />
3.3.3 Short-distance stresses computation . . . . . . . . . . . . . . . . . . . . . . . 70<br />
3.3.4 Data structures for distributing and the gathering segments . . . . . . . . . . 71<br />
3.3.5 Motion of segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />
3.3.6 Summary and comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />
3.4 Performance improvment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />
3.4.1 Measure of performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />
3.4.2 Conditions for good performance . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />
3.4.3 Performance tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
CONTENTS ix<br />
3.4.4 Load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />
3.4.5 Comparison of simulation results between the serial and parallel DDD code . 88<br />
3.5 Application to Stage I-II transition simulation . . . . . . . . . . . . . . . . . . . . . . 88<br />
3.5.1 Stress-strain curves of FCC single crystals . . . . . . . . . . . . . . . . . . . . 88<br />
3.5.2 Simulation conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />
3.5.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />
4 Dislocation-precipitate interactions 95<br />
4.1 Image stresses due to a <strong>3D</strong> particle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />
4.1.1 Motivations and review of the literature . . . . . . . . . . . . . . . . . . . . . 95<br />
4.1.2 Interaction of an edge dislocation with a circular cylindrical particle . . . . . 97<br />
4.1.3 Interaction of an edge dislocation with a spherical particle . . . . . . . . . . . 99<br />
4.1.4 Interaction of an edge and a screw dislocation with a cubical particle . . . . . 102<br />
4.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />
4.2 A simple case of dislocation-particle interaction . . . . . . . . . . . . . . . . . . . . . 105<br />
4.2.1 Motivation and review of literatures . . . . . . . . . . . . . . . . . . . . . . . 105<br />
4.2.2 Calculation procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106<br />
4.2.3 Flow stress of impenetrable particles with a different shear modulus . . . . . 107<br />
4.2.4 Increment in hardening stress . . . . . . . . . . . . . . . . . . . . . . . . . . . 112<br />
4.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115<br />
4.3 Fatigue simulations of materials hardened by particles . . . . . . . . . . . . . . . . . 116<br />
4.3.1 Motivation and review of literatures . . . . . . . . . . . . . . . . . . . . . . . 116<br />
4.3.2 Description of the simulation method . . . . . . . . . . . . . . . . . . . . . . . 118<br />
4.3.3 Evolution of the dislocation microstructure during the fatigue tests . . . . . . 123<br />
4.3.4 Mechanical behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135<br />
4.3.5 Surface slip markings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136<br />
4.3.6 Fatigue properties of materials containing particles with a bimodal size dis-<br />
tribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140<br />
4.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144<br />
5 Conclusions and perspectives 149
Chapter 1<br />
Introduction<br />
1.1 Computational methods in plasticity<br />
A dislocation is a line defect within a crystal, which represents permanent deviations of atoms<br />
from their original crystallographic periodicity. The dislocation glide gives rise to macroscopic<br />
deformation of metals. A dislocation thus is a microscopic carrier of the metallic plasticity.<br />
Modeling the plasticity of metals involves understanding the nature of dislocations, which is defined<br />
at the atomistic scale and also evaluating the deformation behaviors at the macroscopic scale. Many<br />
models have been developed to understand the plasticity of metals. Since the features of plasticity<br />
vary much in size and time, the models also vary largely in length and time scales. Out of a range of<br />
models, most attention is given in this section on Molecular Dynamics (MD), Dislocation Dynamics<br />
(DD) and continuum mechanics.<br />
Atoms are the basic constituent elements of MD simulations. Atoms interact with each other through<br />
an interatomic potential. The temporal trajectory of an ensemble of atoms under an external loading<br />
is simulated by minimizing the total potential energy of the system. The deviations of the position of<br />
the atoms from the lattice sites implicitly represent the dislocations. The atomistic scale topology of<br />
a dislocation line thus can be investigated by MD. MD simulations are employed mostly in studying<br />
physical properties of a single or a few dislocation lines due to the constraints of the simulation size<br />
(< (200nm) 3 ).<br />
In DD methods, dislocation lines are represented explicitly. The collective evolution of a large<br />
number of interacting dislocations is simulated under an external loading. Properties of dislocations<br />
such as mobility, junction strength etc., are input parameters of DD simulations, and dislocation<br />
glide results in plastic strain in the simulation volume. The stress-strain behavior is thus an output
2 Introduction<br />
Time (sec)<br />
10 3<br />
1<br />
10 -3<br />
10 -6<br />
10 -9<br />
10 -12<br />
Molecular<br />
Dynamics<br />
10 -11 10 -10 10 -9<br />
10 -8<br />
10 -7<br />
10 -6<br />
Single crystal<br />
models<br />
Dislocation<br />
Dynamics<br />
10 -5<br />
Space (m)<br />
Homogenization technique<br />
Polycrystal<br />
models<br />
Continuum mechanics<br />
Figure 1.1: Figure illustrating length and time scales of each model. Solid lines represent the limit<br />
ranges imposed by the intrinsic physics of the model. Dashed lines represent the limit imposed by<br />
the available computing power.<br />
of the DD simulations.<br />
Continuum mechanics treat the behavior of a continuum medium by a set of equations and boundary<br />
conditions. There are a wide range of numerical techniques which can solve the equations. Finite<br />
difference and finite element methods are two broad subsets of such techniques. In these methods,<br />
a continuum domain of interest is subdivided into discrete cells or elements, in which the values of<br />
certain physical quantities are determined by solving a system of equations. The output of a typical<br />
application to the metallic plasticity is the deformation behavior of the simulated volume, for which<br />
a governing constitutive equation is assumed.<br />
As introduced briefly above, MD, DD and continuum methods have their own characteristic length<br />
and time scale. Fig. 1.1 shows such ranges of length and time scales of each method. As the<br />
performance of each numerical method is improved, the volume and the physical time which can<br />
be simulated are increasing (top and right domain limit of each method in Fig. 1.1). Recently the<br />
length and time scales of the various methods begin to be overlapped. This gives a great impetus<br />
to exchange information in order to build up a unified model of the metallic plasticity, which would<br />
be able to predict the behavior of a material from the fundamental properties of the material.<br />
10 -4<br />
10 -3<br />
10 -2<br />
10 -1<br />
1
1.2 Dislocation dynamics 3<br />
(a) Weak obstacles (b) Hard obstacles<br />
Figure 1.2: 2D simulations of dislocations moving through a random array of point obstacles: Effects<br />
of obstacles’ strength ([Foreman & Makin 66])<br />
1.2 Dislocation dynamics<br />
1.2.1 2D simulations of dislocation dynamics<br />
Based on the well understood elementary properties of a single dislocation, numerical DD methods<br />
have been developed first in 2D.<br />
Dislocation dynamics in 2D can be further divided in terms of the crystallographic orientation of<br />
the plane used for the simulations: (i) parallel and (ii) perpendicular to dislocation lines. In the<br />
case (i), the plane of the simulations is parallel to the glide plane of dislocation lines, thus nei-<br />
ther cross-slip nor climb of dislocations are allowed. This configurations have been applied initially<br />
to study line tension and the shape of a dislocation under stress ([Brown 64]). The dynamical<br />
movements of dislocations have also been simulated in the case of a glide plane containing random<br />
distribution of point obstacles ([Foreman & Makin 66]). The effects of obstacles’ strength on the<br />
initial flow-stress have been studied, and some of the simulation results are shown in Fig. 1.2. This<br />
type of 2D simulations is still in use to study the effect of particles’ parameters on the flow stresses,<br />
see for example [Mohles & Nembach 01].<br />
In the case (ii), dislocations are perpendicular to the simulation plane, that is, dislocations are<br />
infinite, parallel to each and have the same character. This configuration can simulate the multi-<br />
plication, annihilation, cross-slip and climb of dislocations. It is, however, difficult to include the
4 Introduction<br />
line tension effect explicitly. This kind of configuration has been used to simulate the spontaneous<br />
microstructure formation ([Lépinoux & Kubin 87]). Because of its simplicity, this 2D method<br />
can simulate dislocation motion up to relatively large strains. This method is still largely under<br />
development and applied to several studies, see for example [Cleveringa et al. 97].<br />
1.2.2 <strong>3D</strong> simulations of dislocation dynamics<br />
The motivation of a <strong>3D</strong> DD can be summarized as the needs<br />
• to include the <strong>3D</strong> nature of the dislocation behavior, cross-slip, junction formation, ...<br />
• to explain the formation of dislocation structures during the plastic deformation<br />
The first simulation in <strong>3D</strong> is proposed in [Canova & Kubin 91]. Since then, the proposed method<br />
has been developed and applied to investigate the collective motion of dislocations under various<br />
conditions by two leading groups 1 in France. This method is based on the representation of dislo-<br />
cation lines by segments in an integer space. Other versions of DD in <strong>3D</strong> have emerged since the<br />
end of 1990s, as will be detailed in Sec. 2.1.4. Due to the development of simulation methods and<br />
the increased computing power, these simulation methods have strengthened their positions in the<br />
field of crystal plasticity. The <strong>3D</strong> discrete dislocation dynamics (DDD) method has proven to be a<br />
powerful tool to investigate the plasticity of metals and been expected to serve as a link between<br />
atomistic and continuum scale simulations (see Fig. 1.1).<br />
1.3 Scope of Thesis<br />
This thesis aims at applying the <strong>3D</strong> DDD method to both rigorous computations of dislocation-<br />
precipitate interactions and studying the effects of precipitates on the fatigue properties of metals.<br />
For the rigorous computations, we extended the code coupled with a finite element method ([Fivel 97])<br />
in order to incorporate <strong>3D</strong> precipitates with a differing elastic modulus. The interaction forces due<br />
to a second phase particle are computed and the effects of these forces on the flow stress and the<br />
subsequent hardening are investigated.<br />
1 Génie Physique et Mécanique des Matériaux (GPM2) and Laboratoire d’Etude Métallurgique (LEM)
1.3 Scope of Thesis 5<br />
Recently the <strong>3D</strong> DDD method are applied successfully to the study of early fatigue crack initi-<br />
ation of 316L stainless steels ([Déprés 04]). The critical role of cross-slip was pointed out, which<br />
demonstrates the advantages of the <strong>3D</strong> DD simulations over the 2D simulations. Inspired by this<br />
study, we applied the <strong>3D</strong> DDD method to simulate the fatigue behavior of materials hardened by<br />
precipitates. It was found, however, that the feasible volume fraction of precipitates is quite small<br />
considering the performance of the currently available computing machines with a single processor<br />
and the computing efficiency of the serial <strong>3D</strong> DDD code. This is due to the additional computa-<br />
tional loads induced when many precipitates are introduced in the <strong>3D</strong> DDD simulations, which are<br />
already computationally demanding because of the long-ranged stress field of a dislocation segment<br />
and the need to handle the dislocation interactions during the segment motions. Because of the<br />
inherent computational load of the <strong>3D</strong> DDD simulations, a maximum strain which can be simulated<br />
in a reasonable time still remains in the order of 10 −3 in multislip condition.<br />
The easiest way to suffice the computational demands of the fatigue simulations of precipitation-<br />
hardened materials would be to waite until a faster single processor is available. Considering the<br />
relatively short period of a doctorate, however, it cannot be a good way to choose notwithstanding<br />
the speed of a single processor has improved tremendously 2 .<br />
The other way is to increase the computational capacity by collecting single processors and making<br />
them work together, that is, parallel computing. A parallel computer simply comprises a number of<br />
processors that solve a problem together to reduce the elapsed computation time. In fact, parallel<br />
computing has been widely adopted in many research fields to resolve the increase of the compu-<br />
tational demands, which arises due to many reasons, e.g. encompassing sophisticated boundary<br />
conditions, involving nonlinear material behaviors and many unknowns. Evident successes of the<br />
parallel computing in the field of the computational plasticity can be found in both MD meth-<br />
ods ([Abraham 97]) and continuum mechanics ([Demmel et al. 93], [Fahrat & Roux 94]). The<br />
parallel codes have enabled each model to perform large scale simulations in reasonable time. In<br />
MD simulations, for example, a volume of 0.01nm 3 can be treated over a period of time in the order<br />
of 10 −12 seconds using massively parallel machines ([Abraham 97]).<br />
As can be seen in many references including the few examples cited above, the subject of parallel<br />
2 Semiconductor technology has been known to increase a processor clock rate by double in 18 months up to now.<br />
This is known as the Moore’s law first published in 1965 ([Moore 65]) and which still holds true today. Intel expects<br />
that it will continue at least through the end of this decade. In the end, the performance of a single processor<br />
computing device will reach an upper limit due to the physical limits of semiconductor technology
6 Introduction<br />
computing has been investigated extensively and is now a well established field. From the success in<br />
atomistic and continuum parallel simulations, we came to the conclusion that parallel computation<br />
will be the best and the only choice in order to include a relatively high volume fraction of precip-<br />
itates in the simulation volume, because a dramatic increase in computational power can only be<br />
met through it.<br />
A parallel DDD code thus has been developed and applied to the fatigue simulations containing a<br />
large number of particles. The effects of particles on the fatigue properties are studied focusing on<br />
the irreversibility of slips and the formation of the intense slip bands during the cyclic deformation.<br />
The parallel DDD code developed would be of benefit to not only small scale simulations which<br />
involve a large number of internal defects but also large scale simulations which would make a com-<br />
parison with the macroscopic simulations possible. The parallel DDD code would hence reinforce<br />
the role of the <strong>3D</strong> DDD method in the series of the plasticity simulation methods.<br />
This thesis is organized as follows.<br />
Chapter 1 Introduction (this chapter)<br />
Chapter 2 summarizes the theoretical backgrounds and methodologies of the <strong>3D</strong> discrete disloca-<br />
tion dynamics method. The computation of the displacement fields of a dislocation loop is<br />
introduced. Several boundary conditions are explained, such as the implementation of internal<br />
interfaces and the periodic boundary conditions. The numerical efficiency of the serial DDD<br />
algorithm is increased by revisiting the so-called box method ([Verdier et al. 98]).<br />
Chapter 3 presents the parallel algorithm developed to parallelize the <strong>3D</strong> DDD program used<br />
in this work. The parallel version of the DDD program aims at simulating fatigue tests of<br />
materials containing large number of particles in reasonable time using parallel computers.<br />
The new parallel DDD program is tested and improved in performance by balancing the load<br />
dynamically, and then applied to stage I-II simulations. This chapter also contains general<br />
introduction to parallel computing.<br />
Chapter 4 contains three applications of the method developed and detailed in the preceding<br />
chapters. Image stresses by a cylindrical, spherical and cubical particle are computed. The<br />
effects of image stresses on flow stresses and hardening are investigated. The FEM/DDD
1.3 Scope of Thesis 7<br />
coupling method presented in Sec. 2.4.2 is used for these applications. Finally, the new<br />
parallel program is used for fatigue simulations of precipitate-hardened metals.<br />
Chapter 5 gives concluding remarks and perspectives.
Chapter 2<br />
Description of the simulation method<br />
THE discrete dislocation dynamics (DDD) method initially proposed in [Canova & Kubin 91] has<br />
been improved much in its numerical precision and applicability to problems involving complex<br />
boundary conditions over the past 15 years. The purpose of this chapter is to review the theoretical<br />
backgrounds and methodologies of the DDD method, and also to describe the author’s contributions<br />
: computation of displacement fields, implementation of internal interfaces and the periodic boundary<br />
conditions and acceleration of the code using the revised box method.<br />
The DDD method used in this thesis only deals with perfect dislocations in face-centered cubic (FCC)<br />
metals. Sec. 2.1 introduces the simulation lattice and the discretization of a dislocation line of the DDD<br />
model, and the model is compared with other dislocation dynamics models. Although the focus is given<br />
on the FCC lattice, the methodology is quite general. The extension of the method to the other cubic<br />
crystals is also discussed briefly.<br />
Computation of the effective stress of each dislocation segment is presented in Sec. 2.2. The method<br />
used for computing the displacement field of a dislocation loop is detailed also, and the extension of the<br />
method to more general dislocation structures can be found in Sec. 4.3.5. The stress and displacement<br />
solutions are all based on the theory of linear elasticity in isotropic frame.<br />
Sec. 2.3 introduces the motion of dislocation segments. This section includes a description of the several<br />
local rules needed to handle interactions between dislocations.<br />
New boundary conditions are explained in Sec. 2.4. Representation of internal interfaces is discussed<br />
in both a simple method using facets and a more rigorous way with full elastic interactions. The<br />
implementation of periodic boundary conditions is also detailed.<br />
The performance of the DDD code is improved by revising the box method which was first described in<br />
[Verdier et al. 98]. The computational efficiency of the method is significantly increased by using the
10 Description of the simulation method<br />
linked-list of segments. The methodology and the performance of the box method is described in Sec.<br />
2.5. The overall flowchart of the code is presented at the end of this chapter.<br />
2.1 Representation of the dislocation lines in FCC metals<br />
2.1.1 Preparation of the simulation space<br />
The lattice of the simulation volume is homothetic to that of FCC metals. The lattice spacing<br />
of the simulation lattice is adopted from an experimental measurement of the athermal critical<br />
self annihilation distance between edge dislocations 1 . The experiments of Essmann and Mughrabi<br />
([Essmann & Mughrabi 79]), for example, show that no edge dislocations coexist within the<br />
distance of the order of 1.5 nm in their copper specimens at room temperature. Thus the shortest<br />
distance of two edge dislocation in the simulation is set to this critical distance.<br />
An inter-planar distance of two adjacent {111} planes equals to a/ √ 3, with ’a’ the lattice spacing<br />
(see Fig. 2.2(a)). If ye denotes the critical self annihilation distance, a can be expressed as Eq. 2.1<br />
by equating a/ √ 3 to 2ye.<br />
a = 2 √ 3ye<br />
A typical value of the simulation lattice spacing xl(= a/2) is around 2.598 nm with the value of<br />
ye = 1.5 nm.<br />
The reader should note that xl is the value of the order of 10b, where b is the magnitude of the<br />
Burgers vector. This is certainly larger than the dislocation core radius (∼ 2b). The use of the<br />
lattice spacing larger than the core radius has two effects on the simulation method.<br />
1. Linear elastic solutions of stress and displacement of a dislocation is valid all over the simu-<br />
lation network (Sec. 2.2).<br />
2. It requires to express the core properties of a dislocation in a phenomenological manner (Sec.<br />
2.3.3 & 2.3.4).<br />
2.1.2 Discretization of the dislocation lines<br />
Only perfect dislocations in FCC metals are considered in this work and no dissociation into partials<br />
is allowed. It is probable that the width of spacing of two partial dislocations is smaller than the<br />
1 Screw dislocations annihilate more easily than edge ones by the cross-slip mechanism, thus the critical distance<br />
of edge dislocations defines the lattice spacing of the simulation lattice.<br />
(2.1)
2.1 Representation of the dislocation lines in FCC metals 11<br />
[-12-1]<br />
[111]<br />
[-101]<br />
Edge segment<br />
Screw segment<br />
2 xl<br />
Figure 2.1: Representation of a curved dislocation line with a link of pure edge and screw segments:<br />
The dots represent lattice points on (111) slip plane. Unit lengths of edge ( √ 6xl) and screw segment<br />
( √ 2xl) are shown.<br />
lattice spacing used (∼ 10b), because the stacking fault energy γ is about 140mJm −2 for aluminium,<br />
40mJm −2 for copper and 20mJm −2 for silver, which gives the corresponding width of staking-fault<br />
ribbons of √ 2b, 5 √ 2b and 7 √ 2b for aluminum, copper and silver respectively for the case of Poisson’s<br />
ratio being zero ([Hull & Bacon 83]).<br />
A curved dislocation line is represented as a connected set of discrete dislocation segments of a pure<br />
edge and a pure screw type. This is why the method is called as the edge-screw model. Fig. 2.1<br />
schematically shows the discretization of a dislocation line by a succession of orthogonal edge and<br />
screw segments of the same Burgers vector on the same slip plane 2 .<br />
Maximum length of a segment is set to the discretization length ld and any segment with a length<br />
lseg longer than ld is subdivided further into lseg/ld segments.<br />
The edge (< 112 > type) and screw (< 110 > type) vectors for each of the 12 slip systems used<br />
in the DDD simulations are shown in Table 2.1 3 . Each screw direction is associated to two edge<br />
directions, Edge1 and Edge2, defining the two glide systems, (Screw, Edge1) and (Screw, Edge2),<br />
which share the same Burgers vector. The line directions of the 6 screw vectors (or Burgers vectors)<br />
6 xl<br />
2 Edge segments move along the screw vector direction and vice versa. Edge segments of the line vector [¯12¯1] in<br />
Fig. 2.1, for example, move along ±[¯101], and screw segments of [¯101] move along either ±[¯12¯1] or ±[¯1¯2¯1] direction<br />
(the cross-slip mechanism (see Sec.2.3.4)).<br />
3 The notation of Schmid and Boas [Schmid & Boas 35] is written with the system number.
12 Description of the simulation method<br />
are adopted from the Thompson tetrahedron given in [Hirth & Lothe 92], p319. The signs of the<br />
vectors are defined from the following 2 rules:<br />
1. Edge × Screw = n, where n is the outgoing normal of the Thompson tetrahedron<br />
2. Edge1 × Edge2 = b so that any prismatic loop is unambiguously defined 4<br />
System 1 (B4) 2 (D4) 3 (D1) 4 (C1) 5 (B5) 6 (C5)<br />
Screw [¯101] [011] [1¯10]<br />
Edge [¯12¯1] [¯1¯2¯1] [¯2¯11] [2¯11] [¯1¯12] [¯1¯1¯2]<br />
Plane normal (111) (¯11¯1) (¯11¯1) (¯1¯11) (111) (¯1¯11)<br />
System 7 (D6) 8 (A6) 9 (A2) 10 (B2) 11 (C3) 12 (A3)<br />
Screw [¯1¯10] [0¯11] [101]<br />
Edge [1¯1¯2] [1¯12] [211] [¯211] [1¯2¯1] [12¯1]<br />
Plane normal (¯11¯1) (1¯1¯1) (1¯1¯1) (111) (¯1¯11) (1¯1¯1)<br />
Table 2.1: Vectors of line and glide directions of dislocation segments used in the DDD code.<br />
Each segment is represented numerically by a set of integers that are the three coordinates of the<br />
starting point, the length and the two indexes of the line and the moving vector. The coordinates<br />
are expressed in units of the simulation lattice parameter xl. The length is in unit of the norm of<br />
the line vector. The connection of a line is built through a pointer of segments index.<br />
2.1.3 Existence of a subnetwork<br />
There exist certain sets of slip planes in which mutual dislocation interactions cannot be treated<br />
properly. We shall call each set as subnetwork. This is due to the fact that in the edge-screw model,<br />
a unit line vector of an edge dislocation is , whose length is √ 6xl (Table 2.1).<br />
An edge dislocation [11¯2] on a (111) plane, for example, is shown in Fig. 2.2(b). There are two<br />
(¯1¯11) slip planes which intersect with the [11¯2] edge dislocation in a unit cell of a simulation volume<br />
as illustrated in Fig. 2.2(a). The lattice points along the intersecting lines are shown with filled<br />
and hollow points for each plane in Fig. 2.2(b). One of the planes cuts the unit edge segments in<br />
the middle, which is not permitted 5 .<br />
4 the Right Hand Final to Start(RHFS) rule is adopted, which can be seen in Fig.2.10<br />
5 It is noted that this improper intersection happens between two planes with the same Burgers vector.
2.1 Representation of the dislocation lines in FCC metals 13<br />
[100]<br />
xl<br />
[001]<br />
a<br />
(a) The unit cell<br />
[010]<br />
xl[-110]<br />
xl[11-2]<br />
(b) Subnetwork<br />
Unit edge segment<br />
Figure 2.2: The unit cell of the simulation space and the existence of subnetworks: The lattice is<br />
homothetic to that of FCC crystal, where xl is usually taken as ∼ 10b. There exist subnetworks<br />
due to the definition of dislocation line vectors in Tab. 2.1<br />
This indicates that there exist subnetworks which cannot be used simultaneously. Attention should<br />
thus be given to the initial dislocation configurations of the simulations so that segments in two<br />
slip planes of the same Burgers vector share a common point on one of the planes. In practice each<br />
starting point of the dislocation segments is described in the elementary basis (Screw, Edge1, Edge2)<br />
so that the origin point (0,0,0) is the same for the two involved slip systems. The subnetwork also<br />
imposes certain restrictions while applying periodic boundary conditions (see Sec.2.4.1).<br />
2.1.4 Comments on other crystal structures and dislocation dynamics models<br />
Although it is not treated in this thesis, dislocations in other cubic crystal structures can be rep-<br />
resented in a similar manner. For example, in the body-centered cubic (BCC) crystal structure,<br />
slip occurs in close packed directions. The crystallographic slip planes are {110},{112} and<br />
{123}. By the same analogy as the construction of Table 2.1, the slip systems of {110} or<br />
{112} can be defined by 4 screw and 12 edge line vectors respectively. The {123} 6<br />
slip system involves 4 screw and 24 edge line vectors. Dislocation dynamics models using the BCC<br />
crystal structure can be found in [Devincre & Roberts 96] and [Tang et al. 98].<br />
There exist several dislocation dynamics models. The difference comes mainly from how to dis-<br />
6 The {123} slip system is less closed packed, thus at low temperatures, it would be sufficient to take only<br />
{110} and {112} slip systems into account.
14 Description of the simulation method<br />
(a) Edge-screw model (b) Pure-mixed model (c) Nodal model<br />
Figure 2.3: Discretization of a curved dislocation line in edge-screw, pure-mixed and nodal model<br />
cretize dislocation lines. Zbib et al. ([Zbib et al. 98]), for example, has approximated the dis-<br />
location curves by series of mixed straight segments of an arbitrary length and orientation. The<br />
scheme, which parameterizes a dislocation line by a set of nodes, is often called as ’nodal dislocation<br />
dynamics’. Some of the nodal dislocation dynamics can even treat dislocation splitting into partials<br />
([Shenoy et al. 00], [Weygand et al. 01]). The nodal model has advantages in the numerical pre-<br />
cision. The nodal model is, however, much complex in dealing with topological aspects of segments,<br />
because it involves more degrees of freedom in segment types as compared to the edge-screw model.<br />
Thus the nodal model is used preferably to investigate phenomena involving a small number of<br />
dislocations and a high precision in the dislocation topology ([Schwarz 99], [Ghoniem et al. 00]).<br />
Recently, there has been an attempt to increase numerical accuracy by introducing one more seg-<br />
ment type in the edge-screw model. It is called as the ’pure-mixed’ model. This model incorporates<br />
additional line directions, i.e. ±60 o characters. The model aims at an accurate description of a<br />
curved dislocation line with a minimum number of segments ([Devincre et al. 01], [Madec 01]).<br />
In Fig. 2.3, the discretization description methods for a curved dislocation line used in the edge-<br />
screw, pure-mixed and nodal model are compared side by side.<br />
2.2 Computation of stresses and displacements of dislocations<br />
2.2.1 Evaluation of the driving force<br />
The velocity of each segment is governed by the effective stress τe acting on the segment. The<br />
effective stress is given by τe = fg/b, where b is the magnitude of the Burgers vector and fg is the<br />
magnitude of the glide force per unit length. fg is computed at the center of each segment and
2.2 Computation of stresses and displacements of dislocations 15<br />
includes four contributions:<br />
(i) the force due to the internal stress field produced by all the other dislocation segments in the<br />
simulation volume except by two neighboring segments and the considered segment itself<br />
(ii) the force due to applied stress fields<br />
(iii) the force due to the line tension<br />
(iv) the force due to the Peierls stress<br />
The forces due to atomistic-level interactions, such as dragging forces by solute atoms or jogs, are not<br />
treated explicitly. They can be included implicitly, however, by modifying the motion rule which de-<br />
fines the relation between the glide velocity and the effective shear stress of a segment (see Sec. 2.3).<br />
Internal stresses<br />
To compute the internal stresses at the center of a segment, the expression of the stress field of a<br />
single finite straight segment is required. This problem has been addressed by Li ([Li 64]). Li has<br />
found an interesting fact from the stress solution of an angular dislocation made of two semi-infinite<br />
dislocations joined together at one point. According to Li, the stress field of an angular dislocation<br />
is the sum of the stress fields of each dislocation arm, i.e., a semi-infinite dislocation. Although the<br />
stress field of a semi-infinite dislocation does not obey the equations of equilibrium, the sum of the<br />
stress fields of two semi-infinite dislocations satisfies the equilibrium.<br />
If a semi-infinite dislocation lies in the positive z axis running into the origin, O, the stress field<br />
produced at a point r(x,y,z) has the following components ([Li 64]).<br />
σxx(r) = −bxy−byx<br />
r(r−z) − x2 (bxy−byx)(2r−z))<br />
r 3 (r−z) 2<br />
σyy(r) = bxy+byx<br />
r(r−z) − y2 (bxy−byx)(2r−z))<br />
r 3 (r−z) 2<br />
σzz(r) = z(bxy−byx)<br />
r 3<br />
σyz(r) = y(bxy−byx)<br />
r 3<br />
σzx(r) = x(bxy−byx)<br />
r 3<br />
σxy(r) = bxx−byy<br />
r(r−z)<br />
− 2ν(bxy−byx)<br />
r(r−z)<br />
− νbx<br />
r<br />
+ νby<br />
r<br />
+ (1−ν)bzx<br />
r(r−z)<br />
− (1−ν)bzy<br />
r(r−z)<br />
− xy(bxy−byx)(2r−z)<br />
r 3 (r−z) 2<br />
In Eq. 2.2, the stresses are given in unit of µ/4π(1 − ν) with µ and ν being the shear modulus and<br />
the Poisson ratio respectively. r is the distance to the point r(x,y,z) as shown in Fig. 2.4. The<br />
stress field of a dislocation segment lying on the z axis running from z2 into z1 is obtained from<br />
(2.2)
16 Description of the simulation method<br />
X<br />
Z<br />
z2<br />
z1<br />
O<br />
Dislocation<br />
segment (z2-z1)<br />
r<br />
r(x,y,z)<br />
Y<br />
=<br />
X<br />
Z<br />
z1<br />
O<br />
Semi-infinite<br />
dislocation<br />
Y<br />
-<br />
X<br />
Z<br />
z2<br />
O<br />
Semi-infinite<br />
dislocation<br />
Figure 2.4: A configuration of a semi-infinite dislocation and a calculation of a stress field of a<br />
dislocation segment<br />
that of two semi-infinite dislocations as shown in Fig. 2.4. The stress field is constructed by using<br />
Eq. 2.2 twice, and substituting z in the equation for z − z1 and z − z2 respectively.<br />
σij(r) = σij(r)z−z1 − σij(r)z−z2<br />
(2.3)<br />
The expressions of Li (Eq.2.2) are derived such that a semi-infinite dislocation line lies on the z<br />
axis. A rotation of the stress tensor would be necessary for an arbitrary segment in order to bring<br />
the segment into the reference coordinate.<br />
The compact formulae of de Wit [dewit 67], on the other hand, are given with respect to an<br />
arbitrary Cartesian coordinate system. Thus the expressions of de Wit can be used without any<br />
rotation of the coordinate system. The final form is shown in Eq. 2.4, which has been derived by<br />
Devincre in [Devincre 95].<br />
σij(r) = µ<br />
πY 2<br />
<br />
[bYt] s ij − 1<br />
1 − ν [btY]s <br />
(b, Y, t)<br />
ij − δij + titj +<br />
2(1 − ν)<br />
2<br />
Y 2<br />
<br />
ρiYj + ρjYi + L<br />
R YiYj<br />
<br />
(2.4)<br />
The vectors in Eq. 2.4 are shown on a dislocation line of a line vector t and Burgers vector b<br />
in Fig. 2.5. The vectors and the scalars are defined as R = r − r ′ , L = R · t, ρ=R − Lt and<br />
Y = R + Rt. δij is the Kronecker delta and (b, Y, t) is the mixed product. [abc] s ij<br />
Y<br />
is defined as<br />
1<br />
2 ((a × b)icj + (a × b)jci). The stress field of a dislocation segment between two points A and B<br />
is determined by inserting Eq. 2.4 in Eq. 2.3 and substituting r ′ for r ′ A and r′ B<br />
in Eq. 2.4.
2.2 Computation of stresses and displacements of dislocations 17<br />
X<br />
Z<br />
O<br />
r<br />
r’<br />
R<br />
t<br />
Y<br />
ρ<br />
L<br />
Infinite<br />
dislocation<br />
Figure 2.5: Definitions of the geometry of Eq. 2.4<br />
The formulae of both Li and de Wit are derived within the frame of the isotropic elasticity theory.<br />
A numerical method for stress fields in anisotropic elasticity has been developed recently by Rhee<br />
et al. in [Rhee et al. 01]. The difference between the isotropic and anisotropic solution was found<br />
to have an important effect within only about 15b from the distorted hexagon they used for the<br />
calculations. The difference, however, becomes smaller as the distance from the hexagon increases,<br />
therefore it is sufficient to use the solution of the isotropic elasticity for long-range interactions.<br />
The stress field of a prismatic loop represented by successive straight segments is shown to exhibit a<br />
satisfactory accuracy comparing with the corresponding exact analytical solution ([Khraishi et al. 00a]).<br />
The computation of the segment stress fields shows no anomaly even near the joint of two rectan-<br />
gular segments. The contour of the resolved shear stress on the (¯11¯1) plane is shown in Fig. 2.6(b)<br />
and the corresponding dislocation segments in Fig. 2.6(a).<br />
Computation of internal stresses is the most computationally demanding spot in the DDD algo-<br />
rithm. A method to increase the efficiency of computation will be discussed in Sec. 2.5.<br />
Applied stresses<br />
External stresses are applied in two ways, depending on the boundary conditions involved.<br />
In the first case, the simulation volume represents a small element in a single crystal or a grain of<br />
a polycrystal. In this case, the external stress field is assumed to be homogeneous throughout the<br />
simulation volume. The same stress tensor is applied to each segment in the volume. The magnitude<br />
of this tensor is updated according to a certain rule, constant stress or strain rate ([Fivel 97]).<br />
In the second case, the simulation volume represents a finite volume with free surfaces, thus exter-
18 Description of the simulation method<br />
1 µ m<br />
Dislocation<br />
segments<br />
1 µ m<br />
3 µ m<br />
n=[111]<br />
b=[110]<br />
(a) Dislocation segments configuration (b) Contour of the resolved shear<br />
Figure 2.6: A planar set of dislocation segments and the contour of the resolved shear stress on the<br />
glide plane: The stress is computed at the corner where two orthogonal segments meet (shaded area<br />
stress<br />
of 1 µm × 1 µm). The resolved shear stress shows no anomaly.<br />
nal stresses produce inhomogeneous stress fields in the volume. This inhomogeneity of the applied<br />
stresses can be incorporated using a code coupled with a finite element method ([Fivel et al. 98]).<br />
The more general cases which include internal interfaces, e.g. second phase particles or multilayer<br />
films are treated in Sec. 2.4.2.<br />
Line tension<br />
The mutual effect between two adjacent segments, which is not considered in the internal stress<br />
computation, is accounted for by a local line tension computation. The line tension T (θ) creates<br />
a force τlt = T (θ)/(bR) along the center of a dislocation arc with a radius of curvature R. T (θ) is<br />
given by the energy of a dislocation line E(θ) with θ being the angle that the Burgers vector makes<br />
with the dislocation line direction.<br />
T (θ) = E(θ) + d2 E(θ)<br />
dθ 2<br />
The simplest form of the line tension would be obtained by assuming that edge, screw and mixed<br />
segments have the same energy per unit length, i.e., E = αµb 2 . The line tension of an arc of<br />
dislocation then becomes τlt = αµb<br />
R<br />
from Eq. 2.5.<br />
The energy of a dislocation is dependent on the character however: a screw dislocation has lower<br />
energy than an edge one. This explains why a dislocation line shape is approximately elliptical with<br />
a major axis parallel to the Burgers vector. To include the variation of the energy with a segment<br />
character, the analytical equation of line tension suggested by Foreman [Foreman 67] (Eq. 2.6) is<br />
(2.5)
2.2 Computation of stresses and displacements of dislocations 19<br />
used.<br />
b θ L<br />
τlt τ’ lt<br />
Dislocation line vector<br />
Figure 2.7: Definition of the geometry of the line tension calculation.<br />
τlt =<br />
R<br />
µb<br />
4π(1 − ν)R (1 − 2ν + 3ν cos2 <br />
<br />
L<br />
θ) ln − ν cos(2θ)<br />
2b<br />
µ and ν stand for the shear modulus and the Poisson ratio respectively. R is the radius of a circle<br />
defined by the three center points of segments. L is the length of a segment and θ is the angle<br />
between the Burgers vector b and the dislocation line vector. The dislocation line vector is taken<br />
as parallel to the vector of two center points of the neighbor segments as illustrated in Fig. 2.7.<br />
τlt is, in fact, the magnitude of the line tension along the direction to the center of the circle. τlt<br />
projected to the glide direction of a segment is finally taken as the line tension acting on a segment.<br />
The Peierls force<br />
The Peierls stress refers to the applied resolved shear stress required to make a dislocation glide in<br />
an otherwise perfect crystal. This effect arises as a direct consequence of the periodic structure of<br />
the crystal lattice and acts as a friction to the dislocation motion. In the DDD, which cannot treat<br />
atomistic effects explicitly because of the lattice parameter xl of the order of 10b (Sec.2.1.1), the<br />
Peierls stress is simply implemented as a frictional force τp and contributes to the effective stress as<br />
a back stress to motion of a segment. In practice, the frictional force τp includes all the chemical<br />
effect, the impurities, and solutes etc. identified on experiments ([Déprés et al. 04]). In the case<br />
of FCC metals, τp is the order of 10 −5 µ, thus is expected to have a minute effect on the simulation<br />
results.<br />
(2.6)
20 Description of the simulation method<br />
Effective stresses<br />
After the internal (σint) and the applied stresses (σapp) are computed, the force on a slip system<br />
is defined by the Peach-Koehler equation and a projection along the glide direction g as shown in<br />
Eq.2.7.<br />
, where l is the unit vector tangent to the dislocation line.<br />
τg b = {[(σint + σapp) · b] × l} .g (2.7)<br />
It should be noted that σint and σapp are computed at the center of a given segment on the<br />
assumption that the stress field variations are small over the segment length. The effective stress<br />
τe is then computed by summing all the contributions as τe = τg + τlt − τp. Then, the velocity of<br />
the dislocation segment is given by Eq. 2.13.<br />
2.2.2 Computation of displacements<br />
The computation of the displacement field of dislocations is very useful not only in analyzing surface<br />
deformation induced by dislocations, but also in imposing displacement boundary conditions in a<br />
coupling method with a finite element method (Sec. 2.4.2).<br />
The displacement solution of any closed curved dislocations can be found from the Burgers formula<br />
in the frame of elastic isotropy. The Burgers equation is given in terms of line and area integrals as<br />
shown in Eq. 2.8 in a vector form.<br />
u(r) = − b<br />
<br />
1 b × dl<br />
Ω −<br />
4π 4π C<br />
′<br />
R +<br />
1<br />
8π(1 − ν) ∇<br />
<br />
(b × R) dl<br />
C<br />
′<br />
R<br />
b is the Burgers vector and ν is the Poisson ratio. Ω is the solid angle through which the positive<br />
side of a loop is seen and is defined as follows.<br />
<br />
RdA<br />
Ω = −<br />
A R3 The parameters for the computation Eq. 2.8 and Eq. 2.9 are shown for the configuration of a closed<br />
loop in Fig. 2.8.<br />
An analytical solution of the displacement fields can be obtained using Eq. 2.8 for the case of<br />
simple dislocation loops 7 . The solutions of complex dislocation loops are generally difficult to be<br />
resolved analytically. The general way of computing a displacement field of an arbitrary dislocation<br />
loop is to decompose the loop into triangular loops as illustrated in Fig. 2.8. The methodology<br />
to construct a displacement field from triangular loops was first presented by Hirth and Lothe (see<br />
7 Khraishi et al. ([Khraishi et al. 00b]) have found a closed-form analytical solution of a circular dislocation loop.<br />
(2.8)<br />
(2.9)
2.2 Computation of stresses and displacements of dislocations 21<br />
b<br />
C<br />
A<br />
Field point<br />
Ω<br />
R<br />
dl’<br />
n Slip plane normal<br />
Dislocation<br />
loop<br />
b<br />
Triangular loop<br />
n<br />
Dislocation<br />
segments<br />
Figure 2.8: The parameters in the Burgers equation (Eq. 2.8) and decomposition of a dislocation<br />
loop by triangular dislocation loops<br />
[Hirth & Lothe 92]). Special care, however, should be taken at evaluating the inverse trigono-<br />
metric functions, as the author experienced. Barnett ([Barnett 85]) has developed a formula more<br />
suitable for numerical computation, which will be detailed below.<br />
The displacement at a field point P(r) generated by a triangular dislocation loop with points<br />
A(rA), B(rB) and C(rC) are expressed as Eq. 2.10. The triangular dislocation loop ABC and a<br />
field point are shown in Fig. 2.9.<br />
u(r) = − b<br />
4π Ω + FAB + FBC + FCA<br />
(2.10)<br />
Ω is the solid angle associated with the triangle ABC, which generates a discontinuity of ∆u = b<br />
in traversing the cut surface ABC. F ij(i,j=A,B or C) is a continuous displacement field term ex-<br />
cept on the dislocation line. The solid angle Ω and the continuous terms Fij are given as follows<br />
([Barnett 85]).<br />
<br />
<br />
s<br />
<br />
s − a s − b s − c<br />
Ω = −sign (Ri.n) 4 arctan tan tan tan tan<br />
2 2<br />
2<br />
2<br />
<br />
Fij = −<br />
1 − 2ν<br />
8π(1 − ν) (b × tij) ln Rj + Rj.tij<br />
Ri + Ri.tij<br />
+<br />
1<br />
8π(1 − ν) (b.nij)<br />
<br />
Rj<br />
Rj<br />
The vectors and the constants in Eq. 2.11 and Eq. 2.12 are listed below.<br />
− Ri<br />
<br />
× nij<br />
Ri<br />
(2.11)<br />
(2.12)
22 Description of the simulation method<br />
n<br />
B(r<br />
)<br />
B<br />
A<br />
(r )<br />
A<br />
RA<br />
RB<br />
P (r)<br />
RC<br />
C (r C )<br />
Triangular loop<br />
Figure 2.9: A geometric configuration of a triangular loop and the parameters for the computation<br />
of displacements using Eq. 2.10<br />
⎧<br />
⎪⎨<br />
⎪⎩<br />
s = a+b+c<br />
2<br />
a = arccos (rB−r).(rC−r)<br />
rB−rrC−r<br />
b = arccos (rA−r).(rC−r)<br />
rA−rrC−r<br />
c = arccos (rA−r).(rB−r)<br />
rA−rrB−r<br />
Ri = ri − r<br />
tij = rj−ri<br />
rj−ri<br />
nij = Ri×Rj<br />
RiRj<br />
The displacements at any field point by a dislocation loop are obtained by the summation of the<br />
displacements of triangular loops which comprise the dislocation loop. As an example, the displace-<br />
ment field of a interstitial prismatic loop (Fig. 2.10(a)) computed by Eq. 2.11 and Eq. 2.12 is<br />
shown in Fig. 2.10(b). It can be seen that the interstitial prismatic loop induces the maximum<br />
displacement of 0.5b on the plane just above the loop.<br />
Displacement computation of more general cases of dislocation loops will be presented in Sec. 4.3.5,<br />
where the presented computation method is applied to the analysis of surface deformation during<br />
fatigue tests (see Sec. 4.3.5).
2.2 Computation of stresses and displacements of dislocations 23<br />
Computation plane<br />
(a) Schematic of the deformation around an interstitial prismatic<br />
loop<br />
b<br />
e 1<br />
e2<br />
Probing line<br />
Surface, b<br />
Probing line, µ m<br />
(b) Computed displacement field around an interstitial prismatic loop<br />
Figure 2.10: Computations of displacements induced by an interstitial prismatic loop using Eq. 2.10
24 Description of the simulation method<br />
2.3 Motion of dislocations<br />
2.3.1 Preliminaries<br />
The stress field of a moving dislocation is, in fact, not equivalent to that of a static dislocation.<br />
Under most dynamic conditions of practical interest, however, dislocations move in such a way<br />
that the dynamic stresses and displacements can be approximated quite accurately by the static<br />
solutions, e.g., the stress equations presented in Sec. 2.2.<br />
Only dislocation glide on a slip plane is considered in the current DDD code. No climb mechanisms 8<br />
are implemented here. Theoretically, diffusion theories could be incorporated in the DDD code to<br />
treat climb event properly, because climb involves interactions between dislocations and point defects<br />
(vacancies or interstitial atoms). Numerically, it would be necessary to include a new line vector<br />
and a glide direction into Tab. 2.1, because climb involves the nucleation and motion of jogs.<br />
Dislocation mobility is dependent on the applied shear stress and temperature. It varies also with the<br />
crystal purity and the dislocation type 9 . There are a number of forms for the relations between glide<br />
velocity and the effective shear stress, including power law forms and expressions with an activation<br />
term in an exponential function to represent the temperature-dependency ([Hirth & Lothe 92],<br />
[Kocks et al. 75]). A simple power law form is adopted in this work for convenience sake, but any<br />
forms of equation can be readily adopted.<br />
2.3.2 Dislocation mobility<br />
The simple power law relation (v ∝ (τ) m ) is used to compute the dislocation velocity. A linear form<br />
of the equation, m=1, is known to predict well the case of glide over the Peierls barrier in FCC<br />
metals.<br />
The velocity of a dislocation segment is given by<br />
vi = τe|b|<br />
B<br />
(2.13)<br />
with the effective stress of segment (τe), the Burgers vector (b) and the phonon drag coefficient<br />
(B) 10 . At room temperature, the coefficient B is found to be of the order of 10 −4 Pa·s for aluminium<br />
8 a process by which an edge dislocation can move out of its slip plane by diffusion<br />
9 In BCC single crystals, for example, a pure screw dislocation is more difficult to move than a mixed one at low<br />
temperature, since a screw dislocation has a complex core structure ([Urabe & Weertman 75]).<br />
10 Damping forces, which oppose dislocation motion, arise from the scattering of lattice vibrations (phonons) or<br />
electrons.
2.3 Motion of dislocations 25<br />
([Mason 68]) and 1.5 · 10 −4 Pa·s for copper ([Fusenig & Nembach 75]). The coefficient B, in<br />
fact, changes with the velocity of a dislocation as B = B0<br />
1−v 2 /c 2 . For simplicity, a constant value of B<br />
is used by putting a limit on the velocity of dislocations as vmax, so that v 2 /c 2 becomes relatively<br />
small.<br />
Using the velocity of a segment given by Eq. 2.13, the next position of the segment is solved by<br />
explicit integration such that x t+∆t<br />
i<br />
= xt i + vi∆t, where xt i is the position of the segment at time t<br />
and ∆t is the time step. As is a feature of the forward explicit algorithm, the use of a larger value<br />
of ∆t causes a numerical instability. In the DDD method, a dislocation segment may oscillate,<br />
because a large time increment causes a segment to move over a too large distance. This brings a<br />
significant change in the local curvature, and in turn, produces an increase of the back stress (the<br />
line tension). The segment oscillates consequently. The use of a constant value of ∆t in the range<br />
from 0.5 × 10 −9 to 1. × 10 −9 has been verified successful in practice, but ∆t has to be adapted for<br />
each simulation. The maximum velocity vmax is imposed so as to prevent the segments to glide over<br />
a too large distance.<br />
2.3.3 Dislocation-dislocation interactions<br />
A segment can interact with other segments during the glide. The task is then to search any possible<br />
intersection with segments within a virtual glide area of the gliding segment, which is defined by<br />
the length of the segment, Li and the free flight distance, vi∆t. The nearest intersection point of<br />
the possible interaction events is found from simple geometry of two finite lines (segments). The<br />
type of interaction is, then, determined by the relation of the Burgers vectors and the slip systems<br />
of the two intersecting segments.<br />
The types of possible dislocation-dislocation interactions considered in the DDD model are catego-<br />
rized as follows:<br />
a. coplanar cases in which two dislocation segments glide on the same plane<br />
b. non-coplanar cases in which two dislocation segments glide on different planes<br />
(a) Coplanar cases<br />
The portion of intersection of two segments with the same Burgers but opposite in direction (oppo-<br />
site sign) is deleted and the links of the rest segments are rebuilt as shown in Fig. 2.11(a). In case<br />
of the same sign, no interaction is realized, since it is elastically repulsive. Only discretization of a<br />
segment is done for the next step as illustrated in Fig. 2.11(b).
26 Description of the simulation method<br />
Opposite sign<br />
Same sign<br />
(a) Annihilation<br />
(b) Repulsion<br />
Annihilation<br />
Discretization<br />
Figure 2.11: Interaction between two segments in the same glide plane: Segments are annihilated<br />
if the sign is opposite, and discretized if the sign is same.<br />
No explicit handling is done in the case of two different Burgers vectors in the same plane, which<br />
corresponds to the a copla<br />
1<br />
(b) Non-coplanar cases<br />
case explained below.<br />
Before introducing interaction handling schemes for non-planar cases, dislocation junctions are<br />
presented, because such interactions result in the formation of junctions. In the frame of the<br />
hardening theory, five different forms of dislocation junctions are usually considered:<br />
(i) a coli<br />
1<br />
(ii) a ortho<br />
1<br />
(iii) a copla<br />
1<br />
for which b1 = b2 on different slip planes<br />
for which b1 ⊥ b2 on different slip planes<br />
for which b1 = b2 on the same slip plane<br />
(iv) a2 for which b1 + b2 is glissile on either of the planes<br />
(v) a3 for which b1 + b2 is sessile on either of the planes<br />
The junctions formed between slip systems are tabulated in Table 2.2 for the 12 slip systems defined<br />
in Table 2.1.<br />
(i) a coli<br />
1<br />
is represented in the DDD by changing neighboring arms between two interacting seg-<br />
ments 11 . Fig. 2.12 shows the intersection of two dislocation segments, which glide on a slip plane<br />
11 Its role in dislocation-hardening can be found in [Madec et al. 03].
2.3 Motion of dislocations 27<br />
A2 A3 A6 B2 B4 B5 C1 C3 C5 D1 D4 D6<br />
A2 a0 a copla<br />
1<br />
a copla<br />
1<br />
a coli<br />
1 a2 a2 a ortho<br />
1 a2 a3 a ortho<br />
1 a3 a2<br />
A3 a0 a copla<br />
1 a2 aortho 1 a3 a2 acoli 1 a2 a3 aortho 1<br />
A6 a0 a2 a3 aortho 1 a3 a2 aortho 1 a2 a2 acoli 1<br />
B2 a0 a copla<br />
1<br />
a copla<br />
1<br />
a ortho<br />
1 a3 a2 a ortho<br />
1 a2 a3<br />
B4 a0 a copla<br />
1 a3 aortho 1 a2 a2 acoli 1<br />
B5 a0 a2 a2 acoli 1 a3 a2 aortho 1<br />
C1 a0 a copla<br />
1<br />
a copla<br />
1<br />
a2<br />
a2<br />
a coli<br />
1 a2 a2<br />
C3 a0 a copla<br />
1 a2 aortho 1<br />
C5 symmetric a0 a2 a3 a ortho<br />
1<br />
D1 a0 a copla<br />
1<br />
a3<br />
a copla<br />
1<br />
D4 a0 a copla<br />
1<br />
D6 a0<br />
Table 2.2: Hardening coefficients<br />
and its deviate plane respectively. Segments change its neighbors upon intersection and make an<br />
angular dislocation with θ = 70.53 ◦ .<br />
(ii) No explicit treatment is done on a ortho<br />
1<br />
(iii) No explicit treatment is done on a copla<br />
1<br />
(known as Hirth lock).<br />
as explained in the coplanar cases above.<br />
(iv) & (v) a2 (known as Glissile junction) and a3 (known as Lomer-Cottrell lock) are implicitly<br />
adopted with the simple energy analogy in Eq. 2.14 Two segments of the Burgers vector b1 and b2<br />
n prim<br />
b<br />
devi<br />
n<br />
Figure 2.12: Changing of neighbor arms between two segments in primary and deviate planes.<br />
n prim<br />
b<br />
devi<br />
n
28 Description of the simulation method<br />
b<br />
(a)<br />
b<br />
(b)<br />
Figure 2.13: Cross-slip of a screw segment<br />
are considered to form a junction if a simple energy criterion<br />
b 2 1 + b 2 2 > (b1 + b2) 2<br />
b<br />
(c)<br />
(2.14)<br />
is satisfied. The energy of a dislocation is assumed to be proportional to |b| 2 and has no dependence<br />
on the line character in this criterion. Once a junction is formed, it is given a certain breaking<br />
strength τjunc. The junction can be broken afterward only if the effective stress of a component<br />
segment of the junction is larger than τjunc, thus the junction acts as a pinning point for the<br />
dislocation motion.<br />
Detailed studies of the dislocation junctions ([Shin et al. 01], [Rodney & Phillips 99]) show that<br />
they are formed due to the elastic fields of two dislocation lines and the strength of junction is<br />
governed by the "unzipping" mechanism. Thus the properties of junctions can be treated by the<br />
elastic stress fields of the involved dislocations. The local breaking strength of a junction will serve<br />
as pinning points in so-called mass simulations.<br />
2.3.4 Cross-slip of screw dislocation segments<br />
The cross slip of a screw segment (Fig. 2.13) is implemented in a stochastic manner accounting for<br />
its thermally activated character. A cross slip probability P over each time step is computed first<br />
using the equation<br />
P = β l<br />
δt<br />
L0 t0<br />
<br />
τd − τIII<br />
exp V<br />
κT<br />
(2.15)<br />
,where β is a normalization coefficient, l is the length of the particular screw segment, L0 is 1 µm,<br />
t0 is 1 sec, V is the activation volume, τd is the resolved shear stress in the cross slip system and<br />
τIII is a threshold stress. A random number r is generated and the dislocation cross slip occurs<br />
only if r is lower than P . As an example, the values used for V and τIII in copper are V = 350eV<br />
and τIII = 32 MP a.
2.4 Boundary conditions 29<br />
2.3.5 Plastic strain due to dislocation movement<br />
Dislocation motion results in plastic strain. The plastic strain of a simulation volume is determined<br />
by summing up the slipped area taking place in each slip system. The slip γ (s) of a slip system ’s’<br />
is computed as<br />
γ (s) = |b|A(s)<br />
V<br />
(2.16)<br />
with b being the Burgers vector, V the volume of the simulation box and A (s) the area swept by<br />
all the mobile dislocations of the slip system s over a time step. A (s) is defined as<br />
A (s) = <br />
Livi∆t (2.17)<br />
i<br />
where the summation is done over all the segments of the system s and Livi∆t is the area of glide<br />
of a segment i with the length Li. The components of the plastic strain tensor are given by<br />
ɛij =<br />
12<br />
s=1<br />
1<br />
<br />
n<br />
2<br />
(s)<br />
i b(s)<br />
j + n(s)<br />
j b(s)<br />
<br />
i γ (s)<br />
(2.18)<br />
with n (s)<br />
i and b (s)<br />
i being the component of the slip plane normal and the Burgers vector of the slip<br />
system s respectively.<br />
2.4 Boundary conditions<br />
2.4.1 Periodic Boundary Conditions<br />
Motivation and review of literatures<br />
Typically simulation volume are 10 3 − 15 3 µm 3 large simulation volume for a so-called mass simu-<br />
lation addressing work hardening or dislocation cell formation. In order to compare the simulations<br />
to experiments, it is desirable to build a simulation volume representative of a small element taken<br />
out from the single crystal or from the grain of a ploycrystal.<br />
Periodic boundary conditions (PBC) forcing segments to cross a boundary between two cells to<br />
emerge in all cells at the equivalent position on the opposite boundary, are extensively used to<br />
avoid undesirable size effects due to the finite dimensions of a simulation volume. PBC can be<br />
easily applied in 2D case, for example, by subtracting a simulation volume size (Lx, Ly) from coor-<br />
dinates of dislocation segments leaving the initial volume. The simulations of Gómez-García et al.<br />
[GG et al. 00] have shown many advantages of PBC compared to free boundary surfaces which are<br />
bound to have artificial dislocation losses and undesirable size effects.
30 Description of the simulation method<br />
0 1 ... ... L-2 L-1<br />
L-1<br />
Figure 2.14: Property of p.b.c<br />
PBC in <strong>3D</strong> was considered to be difficult because of the complexities related to the connectivity<br />
of the dislocation lines segments after exiting from one boundary. Since [Bulatov et al. 01] has<br />
demonstrated that PBC can be applied to dislocation dynamics, attentions have been given on the<br />
stress calculation, initial configuration of dislocations and balancing the incoming and outgoing dis-<br />
location fluxes. Madec et al. ([Madec et al. 04]) have reported that portions of dislocation loops<br />
may self-annihilate with replicas having emerged after a certain number of boundary crossings.<br />
This self-annihilation reduces the mean free-path of dislocations and consequently leads to spurious<br />
self-interactions, because a short effective mean free-path affects the density of mobile dislocations<br />
and their storage rate and, hence, both the microstructure arrangements and the strain hardening<br />
properties. The artifact of self-annihilation can be avoided by using an orthorhombic simulation<br />
volume.<br />
A numerical method that Madec et al. ([Madec et al. 04]) have used to apply PBC in <strong>3D</strong> is to<br />
translate all the segments about a selected segment by shifting it to the center of a volume, and<br />
apply "MODULO" operations so that the segment coordinates x larger than the simulation size Lx<br />
are replaced by the remainder of x/Lx.<br />
Numerical implementation<br />
A perpendicular line joining two opposite boundaries is actually equivalent to a line on a bracelet<br />
(Fig. 2.14). A quantity ic(i) with an integer argument i in the range of [0 : L − 1] has the<br />
property as given by Eq.(2.19) under PBC. Eq. (2.19) is merely another expression of Fig. 2.14 in<br />
L-2<br />
...<br />
0<br />
...<br />
1
2.4 Boundary conditions 31<br />
a mathematical form.<br />
ic(L + i) = ic(i)<br />
ic(−i) = ic(L − i) for i = 0, . . . (2.19)<br />
The array ipc can be used to redirect a segment coordinates, which has left an initial volume lattice,<br />
to an equivalent position in the initial lattice. Then it is possible to apply PBC with a simple<br />
array reference. The orthorhombic simulation volume is readily realizable by changing the range of<br />
periodicity according to the maximum length of the simulation volume along each axis.<br />
Because of the subnetwork in the simulation volume (see Sec. 2.1.3), the periodicity should be a<br />
multiple of 4xl along each axis.<br />
2.4.2 Internal interfaces<br />
Motivations and review of the literature<br />
The collective behavior of dislocations in a single crystal can be simulated with the stress com-<br />
putation and motion treatments as explained in the previous sections. More rigorous boundary<br />
conditions need to be implemented on the method in order to treat more general cases, such as<br />
a crystal with free surfaces, a crystal containing particles of a second-phase or a polycrystal with<br />
grain boundaries.<br />
A dislocation experiences forces near an interface because the dislocation energy is different in the<br />
two mediums involved. The dislocation is attracted towards a free surface, for example, and repelled<br />
by a rigid surface layer. These image stresses can be treated by using the superposition principle.<br />
The effects of free surfaces were treated by Fivel et al. ([Fivel & Canova 99]). The forces ex-<br />
erted on a free surface by dislocations are computed assuming that the dislocations are embedded<br />
in an infinite medium. These forces are then reversed and changed into the appropriate point<br />
forces to enforce the traction free surface condition. Applications of this method can be found in<br />
[Fivel et al. 98].<br />
The image stresses by a free surface is, in fact, a special case of the more general situation in which<br />
an interface separates two materials of differing elastic constants, e.g., oxide layers and particles.<br />
The image stresses on a dislocation in the presence of a second phase particle can be computed also<br />
by the superposition principle. The formulation follows that of Van der Giessen and Needleman<br />
([Giessen & Needleman 95]). Previous applications of this method to 2D cases can be found in<br />
Cleveringa et al. ([Cleveringa et al. 97]).
32 Description of the simulation method<br />
Facet<br />
,τ facet<br />
Glide plane<br />
Intersection<br />
Segment, τ e<br />
Figure 2.15: Geometries of a facet and construction of a sphere by square facets.<br />
In this section, complete method to treat internal interfaces is described. Firstly, an interface is<br />
represented by a set of facets, which have a certain strength thus, can act as barriers to dislocation<br />
motion. The application of this method can be found in Sec. 4.3. Secondly, a full account of elastic<br />
interaction with dislocations or image stresses is presented using a coupling method with a finite<br />
element method. The method is applied to compute the image stresses in Sec. 4.1.<br />
Internal interfaces represented by facets<br />
A curved <strong>3D</strong> boundary is approximated by a series of facets, in the same way as a surface is repre-<br />
sented by a finite element meshing. Each facet is defined by indexes of its nodes or vertices, whose<br />
coordinates are stored separately.<br />
Intersection events between a segment and a facet is detected by determining the nearest intersec-<br />
tion point between a facet and a virtual glide plane of a segment (Fig. 2.15).<br />
Each facet is given a strength τfacet. Only segments whose effective stress are greater than τfacet<br />
are allowed to cross the facet, i.e. a facet acts as a barrier to dislocation motion. The strength<br />
is further specified by τ +<br />
facet<br />
and τ −<br />
facet<br />
depending on the relation between moving direction of a<br />
segment (g) and the normal direction of a facet (n), which makes the application of facets more<br />
flexible. A facet of τ +<br />
facet<br />
being 100 MP a and τ −<br />
facet<br />
being 0, for example, means that the facet<br />
blocks all the segments of g · n > 0 if the effective stress is lower than 100 MP a, but is transparent<br />
to segments moving along −n direction, that is, g · n < 0.<br />
The facet method enables simple geometrical barriers to dislocation motion in dislocation dynamics.<br />
When applied to treat the case of dislocation-precipitate interactions, this method can be consid-
2.4 Boundary conditions 33<br />
ered as a first-order approximation in the sense that elastic interactions between dislocations and<br />
interfaces are not considered. The application of the facet model to a particle-strengthened crystal<br />
is presented in Sec 4.3, where the hypothesis used to derive the strength τfacet are explained in<br />
detail for the unshearable and shearable particle cases.<br />
Full account of elastic interactions<br />
The problem of a finite volume containing dislocations is decomposed into two problems:<br />
1. the problem of dislocations in an infinite elastic isotropic medium<br />
2. the complementary problem of a finite dislocation-free volume, which compensates for the<br />
proper boundary conditions.<br />
The problem decomposition is shown schematically in Fig. 2.16. The stress and strain field of the<br />
current state of the body, σ and ɛ are determined by the governing equations<br />
∇ · σ = 0, ɛ = ∇u<br />
σ = LM : ɛ in VM and σ = LP : ɛ in VP<br />
with moduli LM, LP of matrix and particle respectively. Boundary conditions are<br />
u = Uap on Su, n · σ = Fap on St.<br />
After the decomposition, the stress and strain fields are written as the superposition of two fields:<br />
ɛ = ɛ1 + ɛ2, σ = σ1 + σ2 (2.20)<br />
In the first problem (denoted by 1), it is assumed that the dislocations are in an infinite elastic<br />
isotropic medium. The stress(σ1) /strain(ɛ1) relationship is expressed as σ1=LM : ɛ1 in the whole<br />
volume. The forces FD and displacements UD on the virtual boundaries can be computed by the<br />
expressions presented in Sec. 2.2.<br />
In the second problem, the simulation volume contains no dislocations. The fields in the second<br />
problem (denoted by 2) are the fields needed to correct the actual boundary conditions as well as for<br />
the presence of the inclusions. The governing equations for the complementary problem becomes:<br />
σ2 = LM : ɛ2 in VM<br />
σ2 = LP : ɛ2 + σcorrec in VP (2.21)
34 Description of the simulation method<br />
The complementary problem has a correction term such as σcorrec = (LP −LM) : ɛ1 in VP . With this<br />
correction term, the current stress field in the particle volume can be constructed by the superposi-<br />
tion of two fields (σ = σ1 +σ2 in VP ). By replacing ɛ1 with L −1<br />
M : σ1, this correction term equals to<br />
(LP : L −1<br />
M − I) : σ1, where I represents the fourth-order unit tensor. (LP : L −1<br />
M − I) is expressed by<br />
the Young’s modulii and the Poisson’s ratios of the matrix (E, ν) and the particle (E ∗ , ν ∗ ) as follows.<br />
(LP : L −1<br />
M<br />
⎛<br />
⎞<br />
a1 − 1<br />
⎜<br />
− I) = ⎜<br />
⎝<br />
a2<br />
a1 − 1<br />
a2<br />
a2<br />
a1 − 1<br />
0<br />
0<br />
0<br />
a3 − 1<br />
0<br />
0<br />
0<br />
0<br />
a3 − 1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
a3 − 1<br />
⎟<br />
⎠<br />
with a1 being E∗ (1−ν∗−2νν ∗ )<br />
E(1+ν∗ )(1−2ν∗ ) , a2 being<br />
E∗ (ν∗−ν) E(1+ν∗ )(1−2ν∗ ) and a3 being E∗ (1+ν)<br />
E(1+ν∗ ) .<br />
The complementary problem is solved using CAST∃M( 12 ), a finite element code developed in France<br />
by the Commissariat à l’Energie Atomique. The FEM formulation in the particle volume of the<br />
second problem can be written as follows with a strain-displacement matrix (B) :<br />
<br />
VP<br />
B T <br />
· L · BdV · u +<br />
VP<br />
B T · LP : L −1<br />
M − I · σ1dV = 0 (2.22)<br />
In order to compute the body force-like term (right-hand side of Eq. 2.22) within the precipitate<br />
volume, first the stresses σ1 due to the dislocations at the points in VP , e.g., at the stress integration<br />
Gauss points, are computed. Then the correction term is computed within each finite element inside<br />
the particle. The computed stresses σcorrec are then changed into a nodal body force field, f b . This<br />
can be easily done with the ’BSIG’ operator in CAST∃M. These forces are applied to VP and the<br />
FEM gives the solution of a two phases boundary problem where forces Fap −FD and displacements<br />
Uap − UD are imposed at the boundary and body forces f b are applied inside the particle. The<br />
continuity of displacement and normal stresses at the particle/matrix interface are enforced by the<br />
FEM.<br />
12 Finite element code developed by Commissariat à l’Energie Atomique, CEA-DRN/DMT/SEMT
2.5 Acceleration of the DDD code 35<br />
Figure 2.16: Decomposition of the problem into the problem of dislocations in infinite media and<br />
the complementary problem of inhomogeneous finite volume without dislocations. Forces(Fap) and<br />
displacements(Uap) are applied on the boundary. In the complementary problem, boundary condi-<br />
tions are modified with forces(FD), displacements(UD) and a nodal body force field(f b ) generated<br />
by dislocations.<br />
2.5 Acceleration of the DDD code<br />
2.5.1 Problem description and review of literatures<br />
Internal stress computation is the most computationally intensive part in the DDD method. This<br />
is due to the fact that the stress field at a distance r from a dislocation line is proportional to 1/r.<br />
The stress field of a dislocation line is thus long-ranged. Another time consuming part in the DDD<br />
method is handling the dislocation segments interactions. Segment motion involves examination of<br />
possible interactions, between dislocations or between dislocation and internal interfaces.<br />
In a programming perspective, the two parts can be represented as follows in pseudo-code.<br />
Internal stress computation<br />
DO I=1,Nsegm<br />
DO J=1, Nsegm<br />
if(J=I and I’s neighbor)<br />
ENDDO<br />
ENDDO<br />
Compute σ int<br />
I←J<br />
Nsegm: Number of segments<br />
Nfacets: Number of facets<br />
Segment motion<br />
DO I=1,Nsegm<br />
DO J=1, Nsegm<br />
Examine interaction with segment J<br />
ENDDO<br />
DO K=1, Nfacets<br />
Examine interaction with facet K<br />
ENDDO<br />
Move segment I<br />
ENDDO
36 Description of the simulation method<br />
Both parts need the order of N 2 segm computation with Nsegm being the number of dislocation seg-<br />
ments. As for the segment motion, Nsegm × Nfacets additional computations are required. It should<br />
be noted that in the ’Segment motion’, each segment is treated and moved sequentially. In addition,<br />
each individual segment displacement generates a new dislocation configuration. In complex situ-<br />
ation, changing the computing order of the segments may slightly change the resulting dislocation<br />
structure.<br />
In Molecular Dynamics simulations, the stress computation favors the use of a cut-off distance, be-<br />
yond which the stress is called a long-distance stress and neglected, because the interatomic stress<br />
field is short-ranged. This cut-off scheme reduces the cost of the stress computation with a minor er-<br />
ror. In Dislocation Dynamics simulations, however, the cut-off distance scheme may cause a spurious<br />
formation of cells ([Gullouglu et al. 89]). The study of Devincre et al. ([Devincre et al. 01])<br />
has shown, however, that neglecting the long-distance stresses does not affect much the yield stress<br />
and hardening properties of FCC single crystal. It should be noted that the study was dealing with<br />
dislocation patterning in multislip conditions, where cross-slip of dislocations, which is governed by<br />
a short-distance stress, is supposed to play an important role. However, it would be difficult to<br />
generalize Devincre et al.’s observation to other situations. Thus it is generally required to take into<br />
account all the dislocations in the simulation volume to compute the internal stresses.<br />
Reasonable approximations in the computation of the internal stresses have been made to overcome<br />
this severe computational limitation. In [Verdier et al. 98], the simulation volume is decomposed<br />
into boxes and short- and long-distance stresses are classified by the topology of boxes. The com-<br />
putational cost can be reduced by updating the long-distance stresses less frequently. The concept<br />
of superdislocation has been adopted in stress computation by [Zbib et al. 98]. The idea is to re-<br />
place a large number of dislocation segments beyond a certain distance into a limited number of<br />
superdislocations, which have a modified Burgers vector magnitude. This method is based on the<br />
multipolar expansion of the elastic field of a 2D dislocation array and extended in <strong>3D</strong> by a simple<br />
’projection-extension’ method.<br />
2.5.2 The Box method<br />
The box method proposed by [Verdier et al. 98] is based on the fact that a dislocation micro-<br />
structure does not change rapidly by comparison with the short time step used in the simulation<br />
(O(10 −9 sec.)). Thus stress fields of the long-distance segments could be updated with a certain<br />
frequency and between updates the previous values could be used with an acceptable error.
2.5 Acceleration of the DDD code 37<br />
(a) Dislocations in a cubical simulation volume (b) Division of the simulation volume into boxes<br />
Figure 2.17: Decomposition of a simulation volume into boxes: (a) A typical dislocation structure<br />
in a cubical simulation volume (b) Simulation volume divided into 10 × 10 × 10 boxes<br />
The simulation volume is first decomposed into boxes. For the sake of simplicity of the computation<br />
scheme, each side of the simulation volume is divided into M boxes. Hence the simulation volume<br />
comprises M 3 homologous boxes. Fig. 2.17(a) shows a typical simulation volume with dislocation<br />
segments. Fig.2.17(b) is an example of the same simulation volume decomposed into 10 3 boxes.<br />
To facilitate the identification of the segments in the box ib, linked-lists of segments are constructed.<br />
As shown in Fig. 2.18, the mid-point of a segment (imid(i)) is used to determine the box index (ib)<br />
to which it belongs.<br />
ib = 1 + imid(1)<br />
N1<br />
M<br />
+ imid(2)<br />
N2<br />
M<br />
M + imid(3)<br />
M N3<br />
M<br />
2<br />
(2.23)<br />
N1, N2 and N3 are the sizes of the orthorhombic simulation volume along x, y and z axis. The array<br />
indexb(ib) saves the number of segments belonging to the box ib. An array isbox(ib, 2) saves the<br />
index of the first segment in the box ib. The linked-list of segments is implemented with the array<br />
isbox(is, 1 : 2): isbox(is, 1) indicates the index of segment prior to is and isbox(is, 2) reserves the<br />
index of segment posterior to is. The identification of the segments in box ib is shown schematically<br />
in Fig. 2.18. A segment can be easily added or subtracted by switching the array isbox.<br />
Note that the number of boxes is limited since the box size should be big enough. The minimum size<br />
of the boxes is chosen so that the first neighboring boxes include the maximum free-flight distance of<br />
a segment. The criterion defining the minimum edge length of a to adopt for a box can be expressed
38 Description of the simulation method<br />
as Eq.(2.24).<br />
ib1 ib2<br />
1<br />
ib3 3 ib4<br />
5<br />
15<br />
14<br />
7<br />
11<br />
16<br />
6<br />
212<br />
9<br />
10<br />
8<br />
13<br />
4<br />
indexb<br />
ib1 7 1<br />
ib2 10 4<br />
ib3 3 5<br />
ib4 12 2<br />
Figure 2.18: Linked list of segments<br />
<br />
min a ≥ 1<br />
√ ld, a ≥<br />
2 2 <br />
√ vmaxδt<br />
6<br />
ib1 ib2 ib3 ib4<br />
2 4 5 5<br />
6 16<br />
14 11 15<br />
13 9 8<br />
(2.24)<br />
The first term in Eq.(2.24) states that neighbors of a segment are located inside the first neighbor-<br />
hood, hence it is given by the discretization length ld. The second term states that a segment is not<br />
allowed to move across the first neighboring boxes in one step, thus it is a function of the maximum<br />
velocity vmax (see Sec 2.3.2) and time step ∆t. Fig. 2.19 shows the criterion of the minimum edge<br />
length of a box.<br />
The use of the box sizes larger than the minimum size of the box reduces computing cost in ’Segment<br />
motion’ part. This is because only interactions within a maximum free-flight distance of a segment<br />
need to be considered instead of taking all the segments and facets into consideration. This will<br />
reduce the number of segments and facets to be inspected without any approximation.<br />
The internal stress acting on a segment is divided into a long-distance stress (σ LR ), which varies<br />
rather slowly over time steps and a short-distance stress (σ SR ), which shows large fluctuation over<br />
single time steps. σ SR of a segment in the box ib is computed by taking all the segments into<br />
account in the L th neighboring boxes at every simulation step. σ LR is adopted by the stress at the<br />
center point of ib from all the segments outside of the L th neighboring boxes. All segments in the<br />
same box, therefore, have the same σ LR . This approximation is valid if σ LR has a wave length<br />
larger than the box size. The computation of σ LR for all the boxes is updated every f step.<br />
The parameters involved in the box method are listed in Table 2.3. The maximum number of boxes<br />
M is given by the minimum box size (Eq.(2.24)) and the simulation volume size. The other param-<br />
eters should be chosen based on the numerical accuracy and the speedup, and will be the issue of<br />
the following section.
2.5 Acceleration of the DDD code 39<br />
[100]<br />
[001]<br />
Slip direction,<br />
[112]<br />
(1)<br />
(2)<br />
Slip plane,<br />
(11-1)<br />
Screw<br />
segment<br />
Figure 2.19: Minimum box size<br />
Parameter Description<br />
[010]<br />
M number of boxes along each side of a simulation volume<br />
f frequency of σ LR update<br />
L number of layers for σ SR<br />
Table 2.3: Parameters of the box method<br />
The pseudo-code of internal stress computation in Sec. 2.5.1 is then substituted by the following<br />
pseudo-code.<br />
Internal stress computation<br />
DO I=1,Nsegm<br />
Identify the box ’ib’ of the segment ’I’<br />
Compute the stresses σ SR by segments<br />
within short-distance boxes<br />
Add σ LR (ib)<br />
ENDDO<br />
Long-distance stress computation (every f<br />
step)<br />
DO iz=1,M<br />
DO iy=1,M<br />
DO ix=1,M<br />
compute the box index ’ib’<br />
compute the long-distance stresses σ LR (ib)<br />
ENDDO<br />
ENDDO<br />
ENDDO<br />
at the center of ’ib’
40 Description of the simulation method<br />
And the pseudo-code of the segment motion in Sec. 2.5.1 is replaced by<br />
Segment motion<br />
DO I=1,Nsegm<br />
Identify the box ’ib’ of the segment ’I’<br />
Examine interaction with segments and facets<br />
within short-distance boxes<br />
Move the segment ’I’<br />
ENDDO<br />
2.5.3 Speedup and Error<br />
Optimum values of M, L and f in Table 2.3 should be chosen so as to minimize errors and maximize<br />
speedup. There exist two sources of errors in the internal stress computation, i.e. a spatial and a<br />
temporal error. The spatial error occurs because σ LR is computed at the center point of a box and<br />
assigned to all the segments in that box. The temporal error is induced by updating σ LR with a<br />
frequency f so that σ LR of the previous computation is used during f steps.<br />
Speedup<br />
The speedup is defined as the ratio between the execution time of the box method and that of the<br />
original method. It is used to measure the relative algorithm performance. To facilitate an analyt-<br />
ical relation, the execution time is assumed to be proportional to the number of computations.<br />
Nsegm segments are assumed to be homogeneously distributed over the simulation volume de-<br />
composed by M 3 boxes. When not using the box method, the number of computations of in-<br />
ternal stress (n orig<br />
s ) is N 2 segm 13 . Using the box method, the number of computation for σ SR is<br />
(2L + 1) 3 Nsegm<br />
M 3 Nsegm and M 3 − (2L + 1) 3 Nsegm<br />
M 3<br />
M 3<br />
f for σLR . The speedup of the box method is<br />
then given as Eq. 2.25 14<br />
Speedup = norig s<br />
nbox s<br />
=<br />
(2L + 1) 3 N 2 segm<br />
M 3<br />
N 2 segm<br />
+ (M 3 −(2L+1) 3 )Nsegm<br />
f<br />
(2.25)<br />
Solid lines in Fig. 2.20 show Eq. 2.25 as a function of M for Nsegm = 10, 000, 20, 000 and 90, 000.<br />
The number of layers L is set to 1 and σ LR update frequency f is 20. There exist maxima in<br />
13 The number of computations is Nsegm(Nsegm − 3) precisely because two neighbor segments and itself are not<br />
considered in the internal stress computation. For simplicity, it is approximated by N 2 segm<br />
14 It should be noted that the equation is derived with the assumption that periodic boundary conditions are applied<br />
as detailed in Sec. 2.5.4
2.5 Acceleration of the DDD code 41<br />
Speedup<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
N segm =90k<br />
N segm =20k<br />
N segm =10k<br />
N segm =20k(measured)<br />
N segm =10k(measured)<br />
5 10 15 20 25<br />
M<br />
Figure 2.20: Evolution of the speedup of domain decomposition method as a function of the number<br />
of boxes (M) and number of segments (N) in the condition that L = 1, f = 20<br />
speedup depending on the number of segments. Solid dots represent the actual data measured with<br />
a 3.0-GHz Intel Pentinum 4 processor and 1 GB of memory. Only the elapsed time for computing<br />
the internal stress is measured. The measured data reflects well the characteristic of Eq. 2.25, even<br />
though the segments are not distributed perfectly homogeneously.<br />
The effect of f on the speedup is shown in Fig. 2.21(a), and that of increasing L is shown in Fig.<br />
2.21(b). The optimum value of M is dependent on the value of f and L.<br />
There is always gain in speedup regarding the segment motion by increasing the number of boxes.<br />
Assuming Nsegm segments and Nfacets facets, the speedup in examining the interactions can be<br />
represented as Eq. 2.26.<br />
Speedup = norig o<br />
nbox o<br />
= N 2 segm + NsegmNfacets<br />
N 2 segm<br />
M 3<br />
+ NsegmNfacets<br />
M 3<br />
= M 3<br />
(2.26)
42 Description of the simulation method<br />
Speedup<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
f=30<br />
f=20<br />
f=10<br />
5 10 15 20 25<br />
M<br />
(a) Effect of f (L = 1, Nsegm = 20, 000)<br />
Speedup<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
L=1<br />
L=2<br />
L=3<br />
5 10 15 20 25<br />
M<br />
(b) Effect of L (f = 20, Nsegm = 20, 000)<br />
Figure 2.21: Effect of the number of layers (L) and frequency (f) on the speedup of stress compu-<br />
tations<br />
Spatial error<br />
A large box size or small M is bound to have a large spatial error at the end of the diagonal due<br />
to a large deviation from σ LR computation position (center of the box)(see Fig. 2.22(a)). A small<br />
box size or large M has also a large spatial error (Fig. 2.22(b)), but in this case the reason is due to<br />
that the front boxes for σ LR computation are too close. Segments in the front boxes would generate<br />
highly inhomogeneous stress fields.<br />
In Fig. 2.23, relative spatial error along a diagonal of the central box is shown for each M. Here,<br />
the simulation volume is cubic shape with the edge length of 16.4 µm containing 22, 210 segments<br />
(ρ 2.5 × 10 12 m −2 ), which is taken from a tensile simulation along [001]. ɛr is defined as Eq. 2.27<br />
with σ exact computed at each point on the diagonal and σ approx computed at the center point of<br />
the box.<br />
6<br />
|σ<br />
i=1<br />
ɛr =<br />
exact<br />
i<br />
6<br />
σexact i<br />
i=1<br />
− σ approx<br />
i |<br />
(2.27)<br />
From the figure, it can be seen that both small and large values of M increase the relative spatial<br />
error. To compare the curves, ɛr is averaged and shown in Fig.2.24 as a function of M . There is a<br />
certain value M that has a minimum spatial error.<br />
The most effective way to minimize the spatial error would use the smallest box size with a certain<br />
number of layers L for σ LR as shown in Fig. 2.22(c) with L = 3. Indeed, the mean spatial stress
2.5 Acceleration of the DDD code 43<br />
is<br />
Long-distance<br />
stress<br />
(a) M=5, L=1<br />
Short-distance<br />
stress<br />
is<br />
Long-distance<br />
stress<br />
(b) M=15, L=1<br />
Short-distance<br />
stress<br />
is<br />
Long-distance<br />
stress<br />
(c) M=15, L=3<br />
Short-distance<br />
stress<br />
Figure 2.22: Effect of the number of boxes (M) and layers (L) on the accuracy of stress computations<br />
Relative spatial Error<br />
0.25<br />
0.2<br />
0.15<br />
0.1<br />
0.05<br />
M=21<br />
M=15<br />
M=7<br />
0<br />
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5<br />
Position on the diagonal(microns)<br />
Figure 2.23: Relative spatial error along a diagonal of the central box. (M=number of boxes along<br />
each axis, L=1)
44 Description of the simulation method<br />
Mean Relative Spatial Error<br />
0.1<br />
0.08<br />
0.06<br />
0.04<br />
0.02<br />
0<br />
6 8 10 12 14 16 18 20 22<br />
Figure 2.24: Mean relative spatial error of stress computation as a function of the number of boxes<br />
(M) and the number of layer (L)<br />
M<br />
L=2<br />
L=3<br />
could be decreased down to 2% using L = 3 as shown in Fig.2.24.<br />
Temporal error<br />
It is difficult to evaluate a temporal error because it is strongly related to how fast a dislocation<br />
structure changes, which is governed by both the type of mechanical test simulated and the time<br />
step. So it is difficult to set f a priori. In order to evaluate the effect of f, a simple tensile test<br />
has been done in a constant strain-rate condition (˙ɛ = 10 3 sec −1 ) with 22, 210 initial segments. The<br />
time step is set to 2.10 −10 second and M = 21, L = 3. During the test, internal stress is recorded<br />
at the center point of a simulation volume. A relative temporal error is defined as in Eq. 2.27.<br />
σ exact denotes the internal stress at the central point computed over each time step and σ approx is<br />
the stress with σ LR updated at the frequency f. In Fig. 2.25, the relative temporal error is shown<br />
for the cases f = 20, 40, and 60. In the case f = 60, the maximum error level observed reaches 5%.<br />
The update frequency of 60, however, induces a negligible effect on the overall stress-strain curve<br />
as shown in Fig. 2.26.<br />
In conclusion, the use of a maximum number of boxes is favorable, although speedup analysis in<br />
stress computation indicates that there exists an optimum number of boxes. One reason is that
2.5 Acceleration of the DDD code 45<br />
Temporal error<br />
0.06<br />
0.05<br />
0.04<br />
0.03<br />
0.02<br />
0.01<br />
0<br />
1200 1250 1300 1350 1400<br />
Step number<br />
f=60<br />
f=40<br />
f=20<br />
Figure 2.25: Effect of the frequency of long-distance stress computation on the relative temporal<br />
error. (f=σ LR update frequency)<br />
Stress[MPa]<br />
180<br />
170<br />
160<br />
150<br />
140<br />
130<br />
120<br />
110<br />
Reference<br />
f=20<br />
f=40<br />
100<br />
f=60<br />
0.0e0 4.0e-5 8.0e-5 1.2e-4 1.6e-4 2.0e-4<br />
Strain<br />
Figure 2.26: Stress-strain curves of simulations with the σ LR update frequency f
46 Description of the simulation method<br />
Shifted<br />
boundaries<br />
Original<br />
boundaries<br />
Figure 2.27: Computation of stresses under periodic boundary conditions and the box method<br />
a large number of M always delivers an advantage on segment motion by the factor of 27/M 3 .<br />
Another one is related to the parallelization scheme, which will be detailed in Sec. 3.3.<br />
2.5.4 Boxes and Periodic boundary conditions<br />
An efficient method to apply PBC is presented in Sec. 2.4.1. When the simulation volume is divided<br />
into boxes and the internal stresses are decomposed into long- and short-distance stresses, attention<br />
should be paid to the segments in the boundary boxes. As shown in Fig. 2.27, some of the boxes<br />
(especially along the boundaries) may need to account for segments inside so-called image boxes<br />
for internal stress computation and segment motion. The segments’ coordinates in the image boxes<br />
are determined by translating the segments coordinates from the appropriate boxes. This operation<br />
can be performed by a simple array reference and addition/subtraction.<br />
Fig. 2.28 shows the example of the activation of a Frank-Read source in the cubic and the or-<br />
thorhombic simulation volume, and the number of segments is recorded in Fig. 2.29. In the case of<br />
the cubic simulation volume, the self annihilation of segments occurs and the number of segments<br />
oscillates as shown in Fig. 2.29, whereas the dislocation density increases in the orthorhombic sim-<br />
ulation volume. It is desirable to use the orthorhombic simulation volume to remove the artificial<br />
self annihilation of dislocations due to the periodic boundary conditions.
2.5 Acceleration of the DDD code 47<br />
[100]<br />
[001]<br />
[010]<br />
(a) The cubic and the orthorhombic simu-<br />
lation volume<br />
(b) Dislocation structure seen at (110) di- (c) Dislocation structure seen at (110) di-<br />
rection in the cubic simulation volume<br />
rection in the orthorhombic simulation vol-<br />
Figure 2.28: Activation of a Frank-Read source in the cubic and the orthorhombic simulation volume<br />
under periodic boundary conditions<br />
ume
48 Description of the simulation method<br />
Number of segments<br />
1800<br />
1600<br />
1400<br />
1200<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
Orthorhombic<br />
Cubic<br />
0<br />
2500 3000 3500 4000 4500<br />
Step number<br />
Figure 2.29: Change of the number of segments with respect to the simulation steps. In the case of<br />
the cubic simulation volume, self annihilation of dislocations occurs.<br />
2.6 Computation procedure of the DDD program<br />
The serial DDD program using the box method can be subdivided into the following tasks.<br />
a. Initialization<br />
b. Discretization of the segments<br />
c. Construction of the linked-lists<br />
d. Updating of the long-distance stresses every f steps<br />
e. Computation of the short-distance stresses<br />
f. Motion of segments<br />
g. Updating the external stresses<br />
h. Save of outputs<br />
A simulation initialization (a) requires to set parameters such as the number of time steps, the<br />
number of boxes, the material property constants and the loading conditions etc. It also reads the<br />
initial segment configurations, geometries of the simulation box and internal interfaces from external
2.6 Computation procedure of the DDD program 49<br />
files.<br />
Operations from ’b’ to ’h’ are executed sequentially over each time step. The segments which are<br />
larger than a maximum length (defined explicitly at initialization) are further discretized in the task<br />
’b’. Linked-lists of segments in each box are constructed (’c’). Long-distance stresses are computed<br />
every f steps at the center of each box (’d’) as described in Sec. 2.5.2. Short-distance stresses are<br />
computed using linked-list of segments (’e’). Once stresses on each segment are known, the effective<br />
stresses are computed using Eq. 2.7, and all the segments are moved to the next positions after<br />
examination of all possible interactions (’f’). The external stresses are updated according to the<br />
loading conditions (’g’), and output data like current stresses, strains and dislocation configurations<br />
are saved in external files (’h’). This completes one time step and the same procedure is performed<br />
at the next time step.<br />
Key points<br />
• The DDD method used in this work discretizes perfect dislocations into discrete dislocation<br />
segments of a pure edge and screw type in a volume homothetic to the FCC structure with<br />
the lattice spacing of ∼ 10b.<br />
• The effective stress acting on a segment is computed, accounting for the internal, applied<br />
stresses, line tension and the Peierls stress in the frame of linear isotropic elasticity.<br />
The displacements of dislocation loops are computed using Barnett’s expressions.<br />
• A linear relation between the effective stress and the velocity of segments is used. Dislocation<br />
interactions are taken into account by local rules. The cross-slip of a screw segment is<br />
implemented in a stochastic manner.<br />
• Periodic boundary conditions are applied in an orthorhombic simulation volume.<br />
• Internal interfaces are represented either by simple facets with certain strengths or by a<br />
coupled method with a finite element method.<br />
• The box method is revisited in order to increase the computing efficiency of the DDD code.<br />
A speedup of 50 with errors lower than 3% is obtained in the typical situation of 20, 000<br />
segments submitted to tensile loading (L=1, f=20, M=15)
Chapter 3<br />
Parallelization of the Discrete<br />
Dislocation Dynamics method<br />
Although the numerical efficiency of the serial DDD method has been improved by using the box<br />
method (see Sec. 2.5), the code is still insufficient to deal with a large density of dislocations or<br />
dislocations interacting with thousands of precipitates. It is often said that there exists the gulf between<br />
the desired problem size and the available computing power, since computational demands usually exceed<br />
the performance of currently available computing hardware.<br />
A parallel version of the DDD program has been developed in an attempt to simulate the interactions<br />
between dislocations and a large number of precipitates within a reasonable time using parallel comput-<br />
ers 1 . The object of this chapter is to present the development of the new parallel DDD program and its<br />
performance.<br />
In Sec. 3.1, parallel computing hardware is listed and various models and programming languages suit-<br />
able for each hardware are reviewed. This section is intended to explain the reason of the parallel model<br />
chosen in this work.<br />
In Sec. 3.2, the hot spots of the serial DDD program are analyzed focusing on the data flow dependen-<br />
cies. Based on the flow dependencies, the existing parallel algorithms are reviewed to help in establishing<br />
a parallelization strategy.<br />
Sec. 3.3 describes the parallelization of the serial DDD code from a programming perspective. The par-<br />
allelization algorithm of each of the hot spots of the serial DDD code is presented using pseudo-codes.<br />
An attempt to increase the performance of the new parallel DDD program is presented in Sec. 3.4. The<br />
1 A parallel computer refers to several computers that are interconnected to increase computing power.
52 Parallelization of the Discrete Dislocation Dynamics method<br />
performance of the program is quantified and issues such as the load balance are investigated.<br />
Although the DDD code used here is the edge-screw model presented in Chapter 2, the parallelization<br />
scheme is quite general and may be applied to any DDD code or finite difference methods which have<br />
similar data dependencies.<br />
3.1 An introduction to Supercomputing<br />
3.1.1 Overview<br />
The different types of parallel computer need to be reviewed before attempting to create a parallel<br />
program, since a programming model and a programming language should be chosen depending on<br />
the selection of an architecture.<br />
A parallel computer is in fact a subset of a supercomputer which is defined as a computer that<br />
performs at or near the currently highest operational rate for computers. Computation using a<br />
supercomputer is often called supercomputing or high performance computing.<br />
In the following sections, the technological trend of supercomputer is reviewed using data from the<br />
top 500 list 2 . The top 500 list compiles information regarding the top 500 fastest supercomputers<br />
in the world 3 .<br />
3.1.2 Classification of hardware<br />
All the current supercomputers use multiple processors and memories. There exist many classifi-<br />
cation methods according to the usage of multiple processors and memories and their interactions.<br />
Supercomputers are usually classified as follows:<br />
(1) by processor type: scalar and vector processor<br />
(2) by memory type: shared and distributed memory<br />
(1) by processor types<br />
Processor architectures can be divided into two principal types: scalar and vector processors. The<br />
main difference between the two types relates to the number of operations performed by a single<br />
2 visit at www.top500.org<br />
3 It is published twice a year, in June at the International Supercomputer Conference and in November at the<br />
ACM/IEEE Supercomputing Conference. The list has been compiled since 1993, when the first top 500 list have<br />
published at the International Supercomputer Conference, Mannheim.
3.1 An introduction to Supercomputing 53<br />
instruction.<br />
Scalar processors perform a single operation for each single instruction. An addition instruction<br />
(a + b), for example, results in the addition of two numbers. This type refers to a general-purpose<br />
processor and is widely used 4 .<br />
In vector processors, a single instruction results in identical operations being performed on differ-<br />
ent data. It means that the addition of two arrays of data (A(i) + B(i)) can be performed in a<br />
single instruction. Vector processors are developed for a high performance numerical computation<br />
of vector data or arrays and are relatively expensive as compared to scalar processors 5 .<br />
Examples of hardware for each processor type are listed in Table 3.1. Vector processors are known to<br />
Processor classification Example<br />
Scalar processor Intel x86, DEC Alpha, PowerPC, IBM Power<br />
Vector processor Cray vector, NEC vector, Fujitsu VPP<br />
Table 3.1: Processor classification<br />
have an excellent effective performance and facilitate the development of parallelization algorithms.<br />
However, they are less frequently used due to their high cost and limited scalability. Fig. 3.1, which<br />
plots the share of each processor types over the past ten years, shows this trends clearly. In June<br />
1993, vector processor architectures accounted for 66.8% of the top 500. That proportion decreased<br />
to 5% in June 2004, whereas the share of scalar processors increases to 95%. The advantage of scalar<br />
processors would be their relatively low price and excellent scalability, though they have a poor ef-<br />
fective performance. Because the majority of supercomputers are using multiple scalar processors,<br />
parallel computing (parallel computer) are often used instead of supercomputing (supercomputer).<br />
4 Scalar processor is divided further into two groups: CISC(Complex Instruction Set Computer) and RISC(Reduced<br />
Instruction Set Computer). The CISC group comprises Motorola 680x0, Intel x86 processors whereas DEC Alpha,<br />
PowerPC and IBM POWER processors are within the RISC group.<br />
5 Vector processors do perform parallel operations in a way that is sometimes described as ’data parallel’, though<br />
they are not a parallel computer in the sense of many machines working together.
54 Parallelization of the Discrete Dislocation Dynamics method<br />
Share %<br />
(2) by memory types<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
Scalar processor<br />
Vector processor<br />
Jun 93<br />
Nov 93<br />
Jun 94<br />
Nov 94<br />
Jun 95<br />
Nov 95<br />
Jun 96<br />
Nov 96<br />
Jun 97<br />
Nov 97<br />
Jun 98<br />
Nov 98<br />
Jun 99<br />
Nov 99<br />
Jun 00<br />
Nov 00<br />
Jun 01<br />
Nov 01<br />
Jun 02<br />
Nov 02<br />
Jun 03<br />
Nov 03<br />
Jun 04<br />
Figure 3.1: A change of processor types used in major supercomputers.<br />
Memory architectures can be classified into two principal types: shared and distributed memory.<br />
In shared memory systems, memories and processors are typically all interconnected by a common<br />
bus or switching network. Each processor can access all the memories of the system and a processor<br />
can directly load or store any shared address. In other words, the data movements are transparent to<br />
the user. This provides an easy and powerful model for creating and managing a parallel program.<br />
The shared memory systems can be further grouped into UMA (Uniform Memory Access) 6 and<br />
NUMA (Non Uniform Memory Access) 7 depending on whether the main memory is a single physical<br />
or a logical one. A schematic diagram of processors and memory in UMA system is shown in Fig.<br />
3.2(a) and NUMA system in Fig. 3.2(b). The NUMA model has been developed to overcome a<br />
technical difficulty of a UMA system, which limits the possible number of processors. Because a<br />
NUMA system uses physically distributed memories in several systems as a single logically shared<br />
memory, the access time to a certain memory could be different depending on whether a memory<br />
is a local or a remote one to a specific processor.<br />
In distributed memory systems or MPP (Massively Parallel Processor) systems, several computers,<br />
6 Intel Dual CPU system, Compaq ES40, Sun E10000 and HP N-class belong to the UMA category.<br />
7 Machines such as Compaq GS320, HP Superdome and SGI Origin 3000 belong to this category.
3.1 An introduction to Supercomputing 55<br />
MEMORY<br />
. . .<br />
P P P P P<br />
(a) A UMA system<br />
MEMORY<br />
. . .<br />
P P<br />
Logical Memory interconnect<br />
MEMORY<br />
. . .<br />
P P<br />
. . .<br />
(b) An NUMA system<br />
MEMORY<br />
. . .<br />
P P<br />
Figure 3.2: (a) Schematics of UMA systems (b) Schematics of NUMA systems<br />
M<br />
P<br />
Communication Network<br />
M<br />
P<br />
. . .<br />
M<br />
P<br />
(a) An MPP system<br />
MEMORY<br />
. . .<br />
P P<br />
Communicaton Network<br />
UMA UMA UMA<br />
MEMORY<br />
. . .<br />
P P<br />
. . .<br />
(b) A SMP cluster system<br />
MEMORY<br />
. . .<br />
P P<br />
Figure 3.3: (a) Schematics of MPP systems (b) Schematics of SMP systems<br />
where a single processor has its own memory resource, are interconnected by a bus or network, and<br />
processors access to distributed memories through a network. Fig. 3.3(a) shows a configuration of<br />
processors and memories of such an MPP system. In this model, parallel processing is facilitated by<br />
explicit message passing, since each processor has its own memory resource which cannot be directly<br />
accessed by other processors in the MPP machine. Individual processors could all be of the same<br />
type such as a network or cluster of workstations or PCs, which could work independently or in<br />
unison. A heterogeneous networks of various platforms (vector processors, parallel supercomputers<br />
etc.) could also be assembled in principle. IBM P690 architecture, which is used for some of the<br />
results presented in this work, consists of several UMA machines as shown Fig. 3.3(b) 8 .<br />
Various architectures of each memory classification are summarized in Table 3.2. In the early 1990s,<br />
Memory classification Type<br />
Shared memory UMA, NUMA<br />
Distributed memory MPP, clusters of PCs, clusters of UMAs<br />
Table 3.2: Memory classification<br />
8 The UMA cluster system looks similar to a NUMA system, but as memories are not shared between nodes, the<br />
user should explicitly assign data movement like a distributed memory system.
56 Parallelization of the Discrete Dislocation Dynamics method<br />
Share %<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
Shared memory Distributed memory<br />
SMP<br />
SIMD<br />
Single processor<br />
Constellation<br />
Jun 93<br />
Nov 93<br />
Jun 94<br />
Nov 94<br />
Jun 95<br />
Nov 95<br />
Jun 96<br />
Nov 96<br />
Jun 97<br />
Nov 97<br />
Jun 98<br />
Nov 98<br />
Jun 99<br />
Nov 99<br />
Jun 00<br />
MPP<br />
Cluster<br />
Nov 00<br />
Jun 01<br />
Nov 01<br />
Jun 02<br />
Nov 02<br />
Jun 03<br />
Nov 03<br />
Jun 04<br />
Figure 3.4: The transition of the supercomputer structures for the past ten years. ’Cluster’ and<br />
’MPP’ belong to the distributed memory system and ’SMP’ , ’single processor’, ’SIMD’ and ’Con-<br />
stellation’ belongs to the shared memory system<br />
most of the top 500 was shared memory architecture. However, the mainstream has changed to<br />
distributed memory systems since the late 1990s (see Fig. 3.4).<br />
Merits and demerits of each supercomputer type<br />
Supercomputers are classified by a processor and a memory type. Each type has merits and demerits<br />
which come from the different architectures used. They are summarized in Table. 3.3. In the case<br />
Vector<br />
Scalar<br />
Shared Distributed<br />
Ease of use, Good effective performance<br />
High cost, Limited scalability<br />
Ease of use Excellent cost/peak performance<br />
Limited scalability Poor effective performance<br />
Table 3.3: Merits and demerits of each processor and memory type<br />
of a shared memory using vector processors (left-top of Table. 3.3), one can expect an excellent
3.1 An introduction to Supercomputing 57<br />
effective performance and that it is relatively easy to vectorize a code using a compiler. But on the<br />
other hand, the system is relatively expensive and shows limited scalability.<br />
A distributed memory system using scalar processors has a relatively low price and yields a good<br />
scalability. It is often said however, that it needs high skills to parallelize a code and the system<br />
shows generally poor effective performance.<br />
3.1.3 Parallel programming models<br />
The main goal of parallel programming is to minimize the elapsed time of a program by utilizing<br />
several processors. Since there is no single programming model that can be used in any architecture,<br />
it is necessary to adopt different programming models for the different architectures summarized in<br />
Sec. 3.1.2. This section is intended to provide programming models used in shared and distributed<br />
memory systems.<br />
Shared memory based<br />
A program is made threads, each of which contains a work (computation) and a memory data<br />
(object of work). A single-threaded program processes a data sequentially. The main idea of shared<br />
memory based models is to create multiple threads and let each thread compute a portion of data<br />
simultaneously. All the threads share the same address space and it is easy to reference data<br />
that other threads have updated. So multi-thread programs are best fit with the shared memory<br />
architecture in which all the memory spaces are shared.<br />
This model is often called as the ’fork-join’ model as shown in Fig. 3.5(a). The single-thread<br />
program processes S1 through S2, where S1 and S2 are inherently sequential parts. In the multi-<br />
thread program, the first thread forks two more threads and the three threads process P1 through P3<br />
in parallel. They are joined to the first thread once finishing the work. The compiler automatically<br />
parallelizes certain types of ’DO’ loops, or else one can add some directives to tell the compiler how<br />
to divide a work. OpenMP is one of such a compiler, and will be briefly reviewed in Sec. 3.1.4.<br />
Distributed memory based<br />
If the address space is not shared among the different nodes, parallel processors have to transmit<br />
data over an interconnecting network in order to access data that other processors have updated.<br />
Fig. 3.5(b) illustrates how a message-passing program runs. Each processor computes its own part<br />
and the processors communicate with each other during the execution of the parallelizable part,
58 Parallelization of the Discrete Dislocation Dynamics method<br />
t s<br />
Single thread<br />
S1<br />
P1<br />
P2<br />
P3<br />
S2<br />
S1<br />
P1<br />
S2<br />
Multi thread<br />
Fork<br />
P2 P3<br />
Join<br />
(a) Single-thread process and Multi-<br />
thread process<br />
t<br />
s<br />
Serial Parallel<br />
S1<br />
P1<br />
P2<br />
P3<br />
S2<br />
S1<br />
P1<br />
S2<br />
S1<br />
P2<br />
S2<br />
Communications<br />
S1<br />
P3<br />
S2<br />
(b) Message passing between processors<br />
Figure 3.5: Parallel programming models for the shared and distributed memory architectures<br />
P1-P3 (S1 and S2 are inherently sequential parts.). The figure shows data passing between only<br />
two processors adjacent to each other. But in general, each processor communicates with all the<br />
other processors. Due to the communication overhead 9 , time spent for processing each of P1-P3<br />
is generally longer in the message-passing program than in the serial program. So only a modest<br />
fraction is achieved of the capacity of several interconnected processors in practice 10 .<br />
3.1.4 Classification of parallel languages<br />
Different types of parallel computing hardware and the corresponding parallel models have been<br />
reviewed in the preceding sections: shared memory-fork/join model, distributed memory-message<br />
passing model. The choice of a parallel language is largely determined by the hardware type and<br />
the parallel model to be used. Table 3.4 shows possible programming languages of each hardware-<br />
model pair. In this section, two main parallel programming languages are outlined, OpenMP (Open<br />
Message Passing) and MPI (Message Passing Interface).<br />
Hardware-Model Parallel programming languages<br />
Shared memory-fork/join model OpenMP, Pthread<br />
Distributed memory-message passing model MPI, PVM<br />
Table 3.4: Parallel hardware-model pairs and corresponding languages<br />
9 and work load unbalance, and synchronization as shown in Fig. 3.16<br />
10 Theoretically, the computational power should increase linearly with the number of interconnected processors.<br />
t<br />
p
3.1 An introduction to Supercomputing 59<br />
OpenMP<br />
OpenMP is a set of compiler directives and callable runtime libraries that extend the Fortran and<br />
C languages to allow the development of scalable parallel programs on shared memory machines.<br />
OpenMP provides access to the strengths of the shared memory parallel computation without an<br />
excessive programming effort. For example, a single loop can be parallelized by simply inserting<br />
standard directives, ’!$OMP PARALLEL DO’, as follows.<br />
DO I=1, 100 DO I=1, 100<br />
!$OMP PARALLEL DO<br />
C(I)=A(I)+B(I) ⇒ C(I)=A(I)+B(I)<br />
ENDDO ENDDO<br />
!$OMP END PARALLEL DO<br />
The directive, ’!$OMP PARALLEL DO’ creates multi threads (Fork) as schematically shown in<br />
Fig.3.5(a). If four threads are created for example, the second thread would perform the addition<br />
from I = 26 to 50 in the above code. ’!$OMP END PARALLEL DO’ collects results to master<br />
thread (Join). Programming with OpenMP is relatively simple and it shows good efficiency if most<br />
of the program execution time is dominated by a single, simple ’DO’ loop. But the efficiency of<br />
this type of parallelization becomes poor as the data dependency inside the loops becomes com-<br />
plex. OpenMP is also bound to the limit of the shared memory architecture, such as the number<br />
of processors, size of memory, and it lacks portability between different platforms.<br />
MPI<br />
MPI enables message passing programming model in distributed memory architectures. As de-<br />
scribed in the previous sections, distributed memory machine holds all the variables in local memory<br />
space. The work shared across the different processors requires communication and message-passing<br />
is the context in which this communication takes place. MPI is a parallel language which facilitates<br />
message-passing between separated processors. Some of the implementations of MPI are listed in<br />
Table 3.5. MPI will be reviewed in Sec. 3.2.1.
60 Parallelization of the Discrete Dislocation Dynamics method<br />
Acronym Developers<br />
MPI/Pro MPI Software Technology<br />
IBM MPI IBM product implementation for the SP and RS/6000 workstation clusters<br />
MPICH Argonne National Lab and Mississippi State University<br />
UNIFY Mississippi State University<br />
CHIMP Edinburgh Parallel Computing Center<br />
LAM Ohio Supercomputer Center<br />
Table 3.5: Various version of MPI<br />
3.1.5 Supercomputers in France and Korea<br />
Before finishing Sec. 3.1, the states of supercomputers in Korea and France in June 2004 are listed.<br />
Table 3.6 and 3.7 shows the rank in top 500, machine specs, Rmax 11 and Rpeak 12 of the top 5<br />
supercomputers in Korea and France, respectively. At the date of this thesis (summer 2004), Korea<br />
possesses 9 supercomputers in the top 500 and France does 16 machines.<br />
Rank Site/Year Computer(manufacturer)/processors Rmax/Rpeak<br />
48 KIST/2003 xSeries Cluster Xeon(IBM)/1024 3067/4915.2<br />
113 KISTI/2004 xSeries Cluster Xeon(IBM)/512 1762/2867<br />
115 KISTI/2003 pSeries 690 (IBM)/544 1760/3699.2<br />
233 SNU/2002 Pegasus P4 Xeon cluster(Self-made)/400 1011/1843<br />
310 KT/2004 Integrity Superdome,HPlex(HP)/176 844/1056<br />
Table 3.6: Top 5 supercomputers in June 2004, Korea<br />
3.2 Towards a parallel DDD code<br />
3.2.1 Basic Steps of Parallelization<br />
In case of parallelizing an existing serial program, the basic steps could be summarized as follows<br />
([Aoyama & Nakano 99]).<br />
11 Maximal LINPACK performance achieved<br />
12 Theoretical peak performance
3.2 Towards a parallel DDD code 61<br />
Rank Site/Year Computer(manufacturer)/processors Rmax/Rpeak<br />
28 CEA/2001 AlphaServer SC45(HP)/2560 3980/5120<br />
120 TotalFinaElf/2003 xSeries Cluster Xeon(IBM)/1024 1755/4915.2<br />
124 SG SGBI/2003 xSeries Cluster Xeon(IBM)/968 1685.49/4646.4<br />
132 CNRS/IDRIS/2004 eServer pSeries 690(IBM)/384 1630/2611.2<br />
149 CNRS/IDRIS/2004 eServer pSeries 655(IBM)/384 1477/2611.2<br />
1. Tune the serial program<br />
Table 3.7: Top 5 supercomputers in June 2004, France<br />
The performance of a parallel program is bound to that of a serial program from which the<br />
parallel program is written. The first step thus is to tune the hot spots of the serial program<br />
and make the serial program as efficient as possible.<br />
2. Consider the outline of the parallelization<br />
It needs to get the profile of the tuned serial program and know which part or parts consume<br />
most of the CPU time. It might be sufficient to parallelize most time consuming parts only.<br />
At the same time, it is necessary to select the hardware on which the program is parallelized.<br />
3. Determine the strategy for the parallelization<br />
Depending on the hardware chosen and the data dependencies of the program, a parallel<br />
algorithm should be made. For this, the existing strategies can be adopted if a pattern of<br />
parallelization is similar, or else it needs to create a new algorithm. Then it should be decided<br />
which scalar variables and arrays must be transmitted.<br />
4. Parallelize the program<br />
The strategy chosen is then realized using an appropriate parallel language.<br />
The procedure of parallelization which has been selected in this work is summarized below according<br />
to the basic steps mentioned above.<br />
Step 1: Tune the serial program<br />
The numerical efficiency of the serial DDD code has been increased using the box method and<br />
the linked-list of segments (see Sec. 2.5). The internal stresses are approximated by the long- and<br />
short-distance stresses, and there is no approximation in handling the dislocations interactions. The
62 Parallelization of the Discrete Dislocation Dynamics method<br />
speedup is dependent on the parameters (in Table 2.3) chosen. The speedup of 50 is attained in the<br />
case of L = 1, M = 15 and f = 20 with Nsegm = 20, 000 (Fig. 2.20).<br />
Step 2: Consider the outline of the parallelization<br />
In this work, a parallel DDD code has been written on distributed memory machines using the<br />
MPI. The choice of the distributed memory systems and the MPI has several advantages such as<br />
popularity, portability and extendability even though they are not the most efficient and the easiest<br />
combination. As already shown in Fig. 3.4, most of the top 500 are a distributed memory system<br />
nowadays, and distributed memory systems becomes more popular and widely used as individual<br />
laboratories purchase parallel computers made of several PCs and workstations connected through<br />
a network.<br />
Computation of the internal stresses and handling of the dislocations interactions are still the most<br />
time consuming parts.<br />
Step 3: Determine the strategy for the parallelization<br />
Before developing a parallel algorithm suitable for the DDD method, a few characteristics of the<br />
method are summarized. First, the number of dislocation segments is not constant. Dislocation<br />
segments can be created or annihilated with time. Next, the DDD method has highly complex flow<br />
dependence in that a movement of a segment modifies not only its own position and connection, but<br />
also the surrounding dislocation configurations. This is because dislocation lines are represented as<br />
connected sets of segments and segments’ connections are often changed by cutting the dislocation<br />
lines.<br />
The stress computation has no flow dependence. If the computation load is distributed over P<br />
processors, ideally the elapsed time for stress calculation will decrease by a factor of 1/P . To fully<br />
make use of the box method as described in the previous chapter, it is pertinent to distribute the<br />
stress computation in the boxes to several processors.<br />
On the other hand, segment positions updating has a highly complex flow dependency as mentioned<br />
before. This dependency can be shown as follows. a(i, j, k) represents the quantity of the segments<br />
(e.g. the number of segments) in a box (i, j, k) indexed along x, y and z direction. In order to<br />
update a(i, j, k), all the information from the first neighboring boxes are needed because the segment<br />
interactions needs to take into account all the segments in the first neighboring boxes. In addition,<br />
any quantity inside the first neighbors are susceptible to modification by the motion of segments
3.2 Towards a parallel DDD code 63<br />
a(i+1,j-1) a(i+1,j)<br />
a(i,j-1)<br />
a(i-1,j-1)<br />
a(i,j)<br />
a(i-1,j)<br />
a(i+1,j+1)<br />
a(i,j+1)<br />
a(i-1,j+1)<br />
Figure 3.6: Dependence on neighbors: The center element a(i,j) is being computed. All of the<br />
surrounding elements are used in the computation and also are modified after computing the center<br />
element.<br />
in the (i, j, k) box. This dependence is represented in Fig.3.6 in a simple 2D configuration. Thus a<br />
special attention should be paid in handling segment interactions so that no boxes are overlapped<br />
between processors when updating the segment positions. A specific sequence is required to avoid<br />
updating adjacent boxes in two different processors concurrently.<br />
Among the existing parallel strategies, that of molecular dynamics and of a finite difference method<br />
are of particular interest because inter-dislocation stress computation is similar to inter-atomic stress<br />
computation and the box method divides a simulation volume with <strong>3D</strong> arrays of boxes, which is<br />
similar to a matrix in a finite difference method.<br />
In molecular dynamics programs, computation of forces on atoms usually accounts for most of the<br />
CPU time. For each atom i, the total force exerted by the other atoms is computed using a double-<br />
nested loop in which both loops are running from i = 1 to i = Natom, with Natom being the total<br />
number of atoms. These loops are often parallelized, for example, by distributing the atoms among<br />
the different processors. Each processor then computes forces of the resident atoms only. This<br />
method is referred to data decomposition.<br />
Matrix often represent physical data at grid points in a finite difference method. A parallel program<br />
breaks up these matrix and distributes the parts across the processors. This method is called<br />
domain decomposition. Domain decomposition simply refers to the subdivision or partitioning of
64 Parallelization of the Discrete Dislocation Dynamics method<br />
a problem over a number of processors in a parallel program. Various method such as red-black<br />
ordering and multi-color schemes have been proposed to deal with the data dependency between<br />
adjacent grids. Further information can be found in [Dongarra et al. 98]. The main point in the<br />
domain decomposition is how to specify the order of communication among processors to provide<br />
the necessary data. The number of inter-processor communication and the order are determined by<br />
the data dependencies of a problem.<br />
Step 4: Parallelize the program<br />
MPI is reviewed in more detail because it is the parallel library used in this work and it is used to<br />
explain the newly developed parallel code as presented in Sec. 3.3.<br />
The Message Passing Interface Forum (MPIF) has been organized to develop a standard library<br />
for writing message-passing programs in 1992. The MPIF comprised more than 40 organizations<br />
and endeavored to make the standard practical, efficient and flexible. Practically, it means that the<br />
standard should allow convenient C and Fortran bindings and define an interface not too different<br />
from the practice at that time, e.g., the Parallel Virtual Machine (PVM). The standard aimed at<br />
efficient communication on a reliable communication interface so that the users need not struggle<br />
with communication failures. Flexibility is guaranteed by defining an interface that can be imple-<br />
mented on many vendor’s platforms with no significant changes and allowing usage in heterogeneous<br />
environments. The first draft of the standard was published in 1994 and revised in 1997 (MPI-2).<br />
The standard MPI provides descriptions of the parallel tasks as subroutines in Fortran and functions<br />
in C. Only the Fortran version of MPI is presented here.<br />
There exist around 192 subroutines in the MPI. All of them facilitate the parallel tasks of a MPI<br />
program, which could be summarized as i) specifying a group of processors, ii) extracting a rank or<br />
processor ID and iii) defining message passings between or among processors. It needs not to know<br />
all of the subroutines since only about a dozen of the subroutines are frequently used to parallelize<br />
a program.<br />
The MPI subroutines could be categorized into three main groups as follows:<br />
• Environment Management Subroutines<br />
This group controls the overall environment of a MPI program. It includes initialization and<br />
finalization of a parallel environment. It also includes creation of a communicator or a group<br />
of processors.<br />
A general MPI program would look like as follows.
3.2 Towards a parallel DDD code 65<br />
PROGRAM parallel<br />
INCLUDE ’mpif.h’<br />
CALL MPI_INIT(ierr)<br />
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)<br />
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)<br />
Computations here . . .<br />
CALL MPI_FINALIZE(ierr)<br />
END<br />
Line 2 includes ’mpif.h’, which defines MPI-related parameters such as MPI_INTEGER and<br />
MPI_COMM_WORLD. All Fortran procedures that use MPI subroutines have to include<br />
this file. Line 3 calls ’MPI_INIT’ for initializing an MPI environment. ’MPI_INIT’ must<br />
be called only once before calling any other MPI subroutines. In Fortran, ’ierr’ is the return<br />
code of every MPI subroutine and ’0’ if successful or a non zero if failed. The subroutine<br />
’MPI_COMM_SIZE’ in line 4 returns the number of processors(nprocs) belonging to the<br />
communicator(MPI_COMM_WORLD). ’nprocs’ is given as the environmental variable of<br />
the parallel work. ’MPI_COMM_WORLD’ is an identifier associated with a group of pro-<br />
cessors and represents the group consisting of all the processors participating in the parallel<br />
job. Each processor in a communicator has its unique rank, which is in the range [0,nprocs-<br />
1]. The subroutine, ’MPI_COMM_RANK’ in line 5 returns the rank of the process within<br />
the communicator. In line 6, each processor does some work on its data, and line 7 calls<br />
’MPI_FINALIZE’. ’MPI_FINALIZE’ terminates MPI processing and no other MPI call can<br />
be made afterwards.<br />
• Point-to-point Communication Subroutines<br />
This group specifies data exchange between two processors in the communicator. There exist<br />
blocking and non-blocking communication subroutines. Details are not discussed here and the<br />
interested reader is advised to refer [Aoyama & Nakano 99].<br />
As an example of using non-blocking send/receive subroutines, which are used in the paral-<br />
lelization of the DDD code by the author, let us consider two processors that need to exchange<br />
data with each other.<br />
IF (myrank==0) THEN
66 Parallelization of the Discrete Dislocation Dynamics method<br />
CALL MPI_ISEND(a, 1, MPI_REAL8, 1, itag, MPI_COMM_WORLD, ireq1, ierr)<br />
CALL MPI_IRECV(b, 1, MPI_REAL8, 0, itag, MPI_COMM_WORLD, ireq2, ierr)<br />
ELSEIF (myrank==1) THEN<br />
CALL MPI_ISEND(a, 1, MPI_REAL8, 0, itag, MPI_COMM_WORLD, ireq1, ierr)<br />
CALL MPI_IRECV(b, 1, MPI_REAL8, 1, itag, MPI_COMM_WORLD, ireq2, ierr)<br />
ENDIF<br />
CALL MPI_WAIT(ireq1, istatus, ierr)<br />
CALL MPI_WAIT(ireq2, istatus, ierr)<br />
In this example, the processor of rank 0 sends a variable ’a’, which is one element and a real<br />
number to the processor of rank 1 in the communicator ’MPI_COMM_WORLD’. ’itag’ is<br />
the message tag. ’ireq1’ and ’ierr’ are subroutine return values. The syntax of the subroutine<br />
’MPI_IRECV’ can be understood in the similar way, and the processor of rank 1 saves the<br />
received data from rank 0 to variable ’b’.<br />
• Collective Communication Subroutines<br />
This group allows the user to exchange data among a group of processors. It happens fre-<br />
quently that data in one processor need to be shared with all the processors in the commu-<br />
nicator, or, inversely, data in each processor need to be collected to one processor. It would<br />
not be efficient to use point-to-point communication in this case considering communication<br />
latency of a network. The syntax of subroutines in this group comprise a sending and/or<br />
receiving data array, its size and type and a rank of a processor which send or receive data<br />
from a communicator. The subroutine, MPI_BCAST, for example, has the following syntax,<br />
CALL MPI_BCAST(buffer, count, datatype, root, MPI_COMM_WORLD, ierr)<br />
, where ’buffer’ is broadcasted from a processor of rank root to all processors in the commu-<br />
nicator, MPI_COMM_WORLD.<br />
In addition to the groups categorized above, there exist subroutines relating to managing processor<br />
groups, defining data types and controlling input and output files. The MPI standard also provide<br />
supports for profiling interface and file management, etc.
3.3 Parallelization of the serial DDD program 67<br />
3.2.2 Writing a parallel program<br />
To save efforts to parallelize the serial DDD program, the serial subroutines are kept intact or made<br />
minor modifications if possible. In the following, parts that need to be modified for parallelization<br />
are indicated in bold characters on the general computation procedure of the serial code from Sec.<br />
2.6).<br />
a. Initialization of parallel environments<br />
b. Discretization of the segments<br />
c. Construction of the linked-lists<br />
d. Updating of the long-distance stresses every f steps<br />
e. Computation of the short-distance stresses<br />
f. Motion of segments<br />
g. Updating the external stresses<br />
h. Save of outputs<br />
The modified ’a’ is needed to build a parallel environment involved in partitioning the computation<br />
into a selected number of concurrent processors. The internal stress computation steps (’d’-’e’)<br />
need minor modifications. The computation step ’f’ needs complex interactions between processors<br />
because of the flow dependencies. In the following section, the parallelization scheme is detailed in<br />
programming perspectives.<br />
3.3 Parallelization of the serial DDD program<br />
3.3.1 Initialization of parallel environments<br />
The boxes which decompose a simulation volume (Fig. 3.7(a)) are partitioned into parallel-piped<br />
subsystems (Fig. 3.7(b)). The processors in a parallel computer are then logically arranged accord-<br />
ing to the topology of the physical subsystems, and assigned to each subsystem.<br />
A processor of rank ’p’ is assigned to a parallel-piped subsystem assuming that the subsystems are<br />
arranged in a <strong>3D</strong> array of dimensions P1, P2 and P3. The total number P of processors required<br />
is then given by P1 × P2 × P3. Vector IDs of each subsystem on the cartesian system is stored in<br />
the array nid(:), that is, nid(1) is, for example, in the range of [0 : P 1 − 1]. A processor p is then<br />
assigned to each subsystem as defined as Eq. 3.1. Remember that each processor is given a unique
68 Parallelization of the Discrete Dislocation Dynamics method<br />
(a) Cubic simulation volume using the box method<br />
X<br />
Z<br />
Y<br />
Proc 1<br />
Proc 0<br />
Proc 3<br />
(b) Parallel-piped subsystems<br />
Figure 3.7: Domain decomposition of the simulation volume (a) into parallel-piped subsystems (b).<br />
The use of four processors is assumed and each parallel-piped is allocated to each processor.<br />
processor identification number (rank) p in the range of [0 : P − 1].<br />
p = nid(1) + nid(2)P1 + nid(3)P2P3<br />
For each processor, the six face-shared neighbor processors are identified by a sequential array,<br />
nni(:), and given automatically by the following equation.<br />
X<br />
Z<br />
Y<br />
Proc 2<br />
(3.1)<br />
nni(k) = ick(1) + ick(2)P1 + ick(3)P2P3, k = 1, 6 (3.2)<br />
nni(:) will be used to identify neighbor processors for message-passing.<br />
In Eq. 3.2, ick(i) is a vector ID of neighbor processor k, and is written in Eq. 3.3 using an<br />
array iv(:, :) defined in Table. 3.8 and the ’MODULO’ operation. A torus connection between the<br />
processors is considered in Eq. 3.3 to enforce periodic boundary conditions.<br />
ick(i) = MOD(nid(i) + iv(i, k) + Pi, Pi) i = 1, 3 & k = 1, 6 (3.3)<br />
If there exist M boxes along each axis, the boxes are distributed as follows. Suppose when M is<br />
divided into Pi (number of processors), the quotient is q and the remainder is r, that is, M = qPi+r.<br />
Processors whose vector ID, nid(i) is smaller than r are assigned q+1 boxes and the other processors<br />
are assigned q boxes. Total number of boxes along i axis are kept as (q + 1)r + q(Pi − r) equals to
3.3 Parallelization of the serial DDD program 69<br />
y<br />
12<br />
x<br />
Neighbor ID,k 1 2 3 4 5 6<br />
iv(3,k) -1 0 0 1 0 0 0 -1 0 0 1 0 0 0 -1 0 0 1<br />
Table 3.8: The relative location of each neighbor processor<br />
13<br />
14<br />
15<br />
8 9 10 11<br />
4 5 6 7<br />
0 1 2 3<br />
(a)<br />
y<br />
x<br />
3 4<br />
Figure 3.8: Top view of 20 × 20 × 20 subboxes being assigned to 4 × 4 × 1 processors. Numbers<br />
represent processor identification.<br />
M. This distribution method is useful when the number of boxes M is not divisible by the number<br />
of processors, Pi.<br />
Fig.3.8(a) shows the decomposition of 20 × 20 boxes into 4 × 4 subsystems. For simplicity, the<br />
configuration is considered in 2D, which is equivalent to <strong>3D</strong> with P3 being 1, for example. All the<br />
subsystems have the equal number of the boxes. In Fig. 3.8(a), the number of boxes is not identical<br />
in each subsystem because M = 20 is not divisible by P1, P2 = 3.<br />
IDs of the boxes which bound the subsystem of processor p are stored in the array ibs(:): ibs(1), ibs(3)<br />
and ibs(5) save the first box number along x, y and z direction respectively, and ibs(2), ibs(4) and<br />
ibs(6) represent the last box number along each direction. Processor 6 in Fig. 3.8(a), for exam-<br />
ple, is bounded by ibs(1) = 11, ibs(2) = 15, ibs(3) = 6, ibs(4) = 10 and neighbor processors are<br />
nni(1) = 5, nni(2) = 7, nni(3) = 2, nni(4) = 10.<br />
6<br />
0<br />
7<br />
1<br />
(b)<br />
8<br />
5<br />
2
70 Parallelization of the Discrete Dislocation Dynamics method<br />
3.3.2 Long-distance stresses computations<br />
The serial version computes long-distance stresses as follows. The boxes that are situated at long-<br />
distance relative to one give box are recognized by a topological relation. The stresses due to the<br />
segments in the long-distance labeled boxes are computed at the center point of the given box. Thus<br />
one processor scans all the boxes and computes the long-distance stresses in the serial program.<br />
In a parallel program, the work is divided into several processors, since each processor is responsible<br />
for a fraction of the boxes. Boxes in each processor are distinguished by the array ibs(6), and each<br />
processor computes the long-distance stresses of boxes only in its subsystem.<br />
The serial and the parallel version are compared in the following.<br />
Serial version<br />
DO iz=1, M<br />
DO iy=1, M<br />
DO ix=1, M<br />
the box ’ib’<br />
compute the box index ’ib’<br />
compute the long-distance stresses of<br />
ENDDO<br />
ENDDO<br />
ENDDO<br />
⇒<br />
Parallel version<br />
DO iz=ibs(5), ibs(6)<br />
DO iy=ibs(3), ibs(4)<br />
DO ix=ibs(1), ibs(2)<br />
compute the box index ’ib’<br />
compute the long-distance stresses of the<br />
box ’ib’<br />
ENDDO<br />
ENDDO<br />
ENDDO<br />
The parallel version uses most of the serial codings and only the range of the loops are slightly<br />
modified. And it should be noted that each subsystem shares all the segments information at the<br />
time of computing the long-distance stresses.<br />
3.3.3 Short-distance stresses computation<br />
The following pseudo-code explains how the short-distance stresses are computed both in the serial<br />
and in the parallel DDD code.
3.3 Parallelization of the serial DDD program 71<br />
Serial version<br />
DO is=1, Nsegm<br />
segment ’is’<br />
Identify the box ’ib’ containing the<br />
Compute the short-distance stresses<br />
due to the segments<br />
within the short-distance boxes<br />
ENDDO<br />
⇒<br />
Parallel version<br />
DO is=1, iscnt(p)<br />
Identify the box ’ib’ containing the seg-<br />
ment ’is’<br />
Compute the short-distance stresses due<br />
to the segments<br />
within the short-distance boxes<br />
ENDDO<br />
As for the computation of the long-distance stresses, only small modifications are made to the<br />
serial coding: in the case of the serial program, all the segments (Nsegm) are processed by a single<br />
processor. In the parallel program, on the contrary, segments are distributed among several proces-<br />
sors, and a processor p computes stresses of iscnt(p) segments only. The construction of iscnt(p)<br />
will be discussed in Sec. 3.3.4.<br />
Since the stress on a segment can be computed without regard to the stress on the other segments,<br />
all processors can work independently. The elapsed time for stress computation decreases by a<br />
factor of 1/P (the number of processors), if the number of segments of each processor is the same.<br />
Otherwise, the overall elapsed time for the stress computation is determined by the busiest proces-<br />
sor, because the other processors have to wait until the latest processor finishes the computation to<br />
move the segments. For higher efficiency, the segments have to be distributed uniformly over the<br />
different processors. This can be realized by shifting the subsystem boundaries, which changes the<br />
ibs array and consequently iscnt. This load balancing issue will be addressed in Sec. 3.4.4.<br />
3.3.4 Data structures for distributing and the gathering segments<br />
The processors do not work entirely independently in a parallel program. At some point of a pro-<br />
gram, it needs to collect all the information to one processor or to distribute the data to all the<br />
processors. An obvious example of gathering information is when data are written in external files.<br />
One processor normally takes charge of writing files, and the data to be written are sent to that<br />
processor from the other processors.<br />
In a parallel DDD program, segments’ information including coordinates, neighbors, linked-list and<br />
the effective stress etc. need to be communicated. The segments are identified by a vector of integer<br />
numbers. To send segments’ data to the other processor, the list of segments to be sent should<br />
be shared between the sender and the receiver processors. The arrays iswork(:,:) and iscnt(;) are
72 Parallelization of the Discrete Dislocation Dynamics method<br />
Proc 2 Proc 3<br />
6<br />
3<br />
14<br />
8<br />
11<br />
7<br />
13<br />
5<br />
10<br />
4<br />
12<br />
1 2<br />
Proc 0 Proc 1<br />
9<br />
Figure 3.9: List of segments<br />
p 0 1 2 3<br />
iscnt(p) 2 2 5 5<br />
iswork(i,p) 1 9 8 10<br />
i 7 2 14 4<br />
3 13<br />
11 12<br />
6 5<br />
constructed to facilitate this process and contain the list of segment identification number and the<br />
number of segments in each processor respectively. In Fig. 3.9, for example, four processors treat<br />
fourteen segments in a 2D configuration. The values of the arrays iswork and iscnt are written in<br />
the figure as an example.<br />
The segments in processor p can be recognized by scanning the processor box content and using the<br />
linked-lists, indexb(ib) and isbox(ib, 2) as described in Sec. 2.5.2. The arrays iswork and iscnt can<br />
be constructed as follows.<br />
DO iz = ibs(5), ibs(6)<br />
DO iy = ibs(3), ibs(4)<br />
DO ix = ibs(1), ibs(2)<br />
compute box number ’ib’ from ix,iy and iz<br />
call Bliste(ib, isliste)<br />
DO is=1, indexb(ib)<br />
iscnt(p)=iscnt(p)+1<br />
iswork(iscnt(p),p)=isliste(is)<br />
ENDDO<br />
ENDDO<br />
ENDDO<br />
ENDDO<br />
The subroutine Bliste generates the list isliste of segments in the box ib, and indexb(ib) contains
3.3 Parallelization of the serial DDD program 73<br />
the number of segments inside this box. For a given processor p, the number of segments and the<br />
list are saved in iscnt(p) and iswork(:,p) respectively. The arrays iscnt(p) and iswork(:,p) can then<br />
be shared among all the processors by using the MPI_BCAST subroutine,<br />
DO irank=0, nprocs-1<br />
call MPI_BCAST(iscnt(irank), 1, MPI_INTEGER,<br />
ENDDO<br />
DO irank=0, nprocs-1<br />
irank, MPI_COMM_WORLD, ierr)<br />
call MPI_BCAST(iswork(1,irank), iscnt(irank), MPI_INTEGER,<br />
ENDDO<br />
with nprocs being the total number of processors.<br />
irank, MPI_COMM_WORLD, ierr)<br />
Now all the processors in the MPI_COMM_WORLD communicator share the list of segments in<br />
each processor, segments’ information gathering or distributing can be realized using the lists.<br />
3.3.5 Motion of segments<br />
The segment motion induces interactions with the other dislocation segments. The dislocation in-<br />
teractions involve complex dependencies as shown in Fig. 3.6. The key idea of handling dislocation<br />
interactions is to avoid any overlap of neighbor boxes of concurrently updated boxes. The handling<br />
of the dislocation interactions is managed by first dividing the boxes inside a processor p into three<br />
groups according to the topology of the neighboring boxes: inner boxes (IB), boundary boxes (BB)<br />
and corner boxes (CB) (Fig. 3.10). It should be noted that at least three boxes are required along<br />
each axis in each subsystem to categorize the boxes into these three groups.<br />
The inner boxes have all their neighboring boxes in the same processor, thus the motion of the<br />
segments in the inner boxes modifies the segments located in the same processor only. Because<br />
all the information needed to handle the dislocation interactions are stored in the local memory,<br />
and there is no overlap of the neighboring boxes between the adjacent processors, the positions<br />
of the segments in the inner boxes of the different processors can be updated simultaneously and<br />
independently and involve no message passing.<br />
The boundary boxes and the corner boxes, on the other hand, have a lack of neighboring boxes<br />
in the same processor. Thus it needs message passing between processors to obtain the segments’
74 Parallelization of the Discrete Dislocation Dynamics method<br />
y<br />
12<br />
x<br />
13<br />
14<br />
15<br />
8 9 10 11<br />
4 5 6 7<br />
0 1 2 3<br />
(a) Parallel-piped subsystems<br />
CB<br />
BB<br />
BB<br />
CB<br />
IB BB<br />
CB BB CB<br />
(b) Categorization of boxes<br />
Figure 3.10: Three category of boxes in a processor p: Inner boxes (IB) have all the neighbor boxes<br />
in the same processor, thus need no communications. Boundary boxes (BB) have lack of neighbor<br />
boxes and need communications with a neighbor processor. Corner boxes (CB) have lack of neighbor<br />
boxes in three different processors.<br />
information from their neighboring boxes and to send back the information modified by the dislo-<br />
cation interactions.<br />
In the case of the boundary boxes, all the missing neighboring boxes are situated in the neighboring<br />
processors, therefore a message passing only with the adjacent processor is sufficient to provide the<br />
missing information. The corner boxes, however, have neighboring boxes scattered in more than<br />
four different processors including itself (in a 2D configuration), and thus are bound to involve<br />
complex message passings.<br />
Updating the positions of the segments is performed in the following three steps.<br />
• In the first step, all the segments in the inner boxes of each processor are updated indepen-<br />
dently and simultaneously.<br />
• In the second step, the segments in the boundary boxes are updated involving message passing<br />
with the their respective neighboring processors. The order of computation is from right to<br />
left in the x,y and z direction order (see Fig. 3.11).<br />
• In the final step, all the information of segments are collected into one processor (Master<br />
processor) and segment positions updating in the corner boxes are made in that processor<br />
only. This procedure avoids at least complex message passing between the different processors.
3.3 Parallelization of the serial DDD program 75<br />
y<br />
y<br />
12 13 14 15<br />
8 9 10 11<br />
4 5 6 7<br />
0 1 2 3<br />
x<br />
x<br />
(a) Inner boxes<br />
(d) Boundary boxes y+<br />
y<br />
y<br />
x<br />
(b) Boundary boxes x+<br />
x<br />
(e) Boundary boxes y-<br />
Figure 3.11: Overall procedure of motion of segments<br />
y<br />
y<br />
x<br />
(c) Boundary boxes x-<br />
x<br />
(f) Corner boxes<br />
The overall procedure is drawn in Fig. 3.11. Fig. 3.11 shows that the simulation volume is<br />
subdivided into nine processors. In the first step, all the inner boxes of each processor are updated<br />
and the updated boxes are represented as shaded one. The boundary boxes are then updated in the<br />
order of x, y and z direction, and at the right (plus) and the left (minus) position of each direction<br />
sequentially. After updating all the boundary boxes, the corner boxes are treated by one processor<br />
exclusively. In what follows, the details and the corresponding message passing of each step are<br />
discussed.<br />
Inner boxes<br />
The segments in the inner boxes can be identified easily by performing the loops over [ibs(1) + 1,<br />
ibs(2) − 1], [ibs(3) + 1, ibs(4) − 1] and [ibs(5) + 1, ibs(6) − 1]. The same segment motion algorithm<br />
as in the serial program can be used to update the list of segments in the inner boxes.
76 Parallelization of the Discrete Dislocation Dynamics method<br />
A special care should be taken on the label numbers assigned to the different segments, in order to<br />
avoid giving duplicate numbers to the newly created segments, in the different processors. Duplicate<br />
labels can generate confusion when all the segment information are gathered into one processor. A<br />
new label list can be generated consistently sending to each processor the number of the new<br />
segments created inside all the other processors. The procedure for renumbering the new segments<br />
is shown in Fig. 3.12 in the case of four processors. The key point is that the label numbers of the<br />
newly created segments are updated in real time and the new segments are renumbered in ascending<br />
orders of processor ranks.<br />
For label renumbering, the array isnewcnt(p) is increased by one whenever a new segment is created<br />
inside a given processor p. After all the processors have finished treating the segment motion and<br />
related interactions, this array is synchronized and used to renumber the newly created segments<br />
as follows.<br />
DO irank=0, nprocs-1<br />
call MPI_BCAST(isnewcnt(irank), 1, MPI_INTEGER,<br />
ENDDO<br />
iadd=0<br />
DO irank=0, p-1<br />
iadd=iadd+isnewcnt(irank)<br />
ENDDO<br />
DO is=nsegm, nsol+1, -1<br />
isnew=is+iadd<br />
irank, MPI_COMM_WORLD, ierr)<br />
Shifting segment information from ’is’ to ’isnew’<br />
ENDDO<br />
nsegm is the local number of segments of each processor, and nsol is the global number of segments<br />
before to treat the segment motion. After renumbering, the global number of segments can be<br />
computed by summing the array isnewcnt:.<br />
Boundary boxes<br />
Fig.3.13 shows the sequence of message passings to update the segment positions inside the bound-<br />
ary boxes, on the +x direction. Before sending the information, arrays concerning the number of
3.3 Parallelization of the serial DDD program 77<br />
Proc 2<br />
6<br />
3<br />
14<br />
8<br />
11<br />
Proc 3<br />
13<br />
5<br />
10<br />
4<br />
12<br />
Proc 2 6<br />
3<br />
14<br />
8<br />
11<br />
15<br />
16<br />
16<br />
Proc 3<br />
13 17<br />
15<br />
5<br />
10<br />
4<br />
12<br />
7<br />
1 2<br />
9<br />
7<br />
17<br />
16<br />
1<br />
15 15<br />
2<br />
9<br />
17<br />
16<br />
Proc 0 Proc 1 Proc 0<br />
Proc 1<br />
isnewcnt(0)=3<br />
isnewcnt(1)=3<br />
isnewcnt(2)=2<br />
isnewcnt(3)=3<br />
Synchronize<br />
Proc 2 6<br />
3<br />
14<br />
8<br />
11<br />
21<br />
22<br />
24 Proc 3<br />
13 25<br />
23<br />
5<br />
10<br />
4<br />
12<br />
7<br />
9<br />
17 20<br />
16<br />
1<br />
15 18<br />
2<br />
19<br />
Proc 0 Proc 1<br />
Figure 3.12: Label assignment to the newly created segments<br />
segments ibcnt and the list of segments ibwork to be sent and be received are synchronized between<br />
a sender and a receiver processor. The arrays ibcnt and ibwork are constructed and synchronized<br />
in a similar way as the arrays iscnt and iswork are processed in Sec. 3.3.4.<br />
Information of segments, e.g. coordinates, neighbor segments and effective stresses, etc., are packed<br />
in one dimensional buffer arrays of the integer, real and logical types.<br />
The buffer arrays are then sent to the next processor inext and received from the previous processor<br />
iprev using the subroutine MPI_ISEND and MPI_IRECV. An example code is shown below.<br />
call MPI_ISEND(bufsi(1), ibcnt(p)*11, MPI_INTEGER,<br />
inext, itag, MPI_COMM_WORLD, ireqs, ierr<br />
call MPI_IRECV(bufri(1), ibcnt(iprev)*11, MPI_INTEGER,<br />
call MPI_WAIT(ireqs, istatus, ierr)<br />
call MPI_WAIT(ireqr, istatus, ierr)<br />
iprev, itag, MPI_COMM_WORLD, ireqr, ierr<br />
Buffer arrays which are received, e.g. bufri in the above example, then are unpacked in the in-
78 Parallelization of the Discrete Dislocation Dynamics method<br />
Send buffer arrays of<br />
this column to nni(1)<br />
Receive buffer arrays of<br />
this column from nni(2)<br />
To be<br />
updated<br />
(a) Before updating the boundary boxes<br />
Receive modified<br />
column from nni(1)<br />
Updated<br />
Send modified<br />
column to nni(2)<br />
(b) After updating the boundary boxes<br />
Figure 3.13: A sequence of message passings to update the positions in the boundary boxes located<br />
at +x position (dark grey). Two message passing steps are involved: (a) Send segment information<br />
in the leftmost column to the neighboring processor in the −x direction and receive information<br />
from the neighboring processor in the +x direction (b) Send segment information, which is modified<br />
due to updating the boundary boxes, back to the processor in the +x direction and receive from<br />
the processor in the −x direction.<br />
verse sense of the packing using synchronized ibcnt, ibwork.<br />
When all the necessary information from the neighboring boxes are collected, segment positions in<br />
the boundary boxes are updated. The segment motion in the boundary boxes also modifies the<br />
segment configuration in the received boxes. In order to properly synchronize this modification,<br />
information of the received boxes are then repacked and resent to the original processor. This<br />
completes the boundary boxes updating in the +x direction, and likewise all the updating in the<br />
−x, ±y, ±z directions are completed.<br />
Corner boxes<br />
Once all the boundary boxes are updated, information of segments of each processor is sent to one<br />
processor (Master) and the segments in the corner boxes are treated by the Master processor only.<br />
After finishing the motion of segments in the corner boxes, only one processor (Master) contains<br />
the final configuration of the segments of the current time step. Before running the next time step,<br />
the information concerning the dislocation segments are sent to all the processors from the Master<br />
processor.
3.3 Parallelization of the serial DDD program 79<br />
Initialization of<br />
parallel environments<br />
Discretization of<br />
the segments<br />
Linked-lists of<br />
the segments<br />
Computation of the<br />
long-distance stresses<br />
Computation of the<br />
short-distance stresses<br />
Motion of<br />
the segments<br />
Update external stresses<br />
and save outputs<br />
Inner boxes<br />
Boundary boxes<br />
Corner boxes<br />
(1)<br />
(2)<br />
Send/Receive<br />
Gather (3)<br />
Broadcast<br />
Figure 3.14: The overall flow chart of the new parallel DDD code<br />
3.3.6 Summary and comments<br />
The overall flow chart of the new parallel DDD code is shown in Fig. 3.14. The ’Motion of segments’<br />
step is composed of three parts which correspond to inner, boundary and corner boxes. The message<br />
passing addresses are also indicated.<br />
It should be noted that all the processors begin each time step with the same segment information<br />
(marked as ’(1)’ in Fig. 3.14), although it is not indicated explicitly in the previous sections.<br />
After the segment discretization, each processor computes and thus alters its local segments’ data<br />
independently up to ’Inner boxes’ step (’(2)’). While updating information in the boundary boxes,<br />
two adjacent processors mutually send and receive data and then send the local data to one processor<br />
(as indicated ’Gather’ in Fig. 3.14). The Master processor then updates all the information in the<br />
corner boxes and broadcast data to the other processors (as indicated ’Broadcast’ in Fig. 3.14).<br />
Hence all the processors share the same segment information.<br />
Thus, there is no gain in a memory aspect of the program by using several processors in the present<br />
parallel version. The parallel code can further be improved by decomposing the data space, i.e. by<br />
making each processor to use only the necessary and sufficient amount of memory. This would save<br />
memory space for parallel computation, and would also decrease the communication overhead and<br />
eventually increase the performance of the code.
80 Parallelization of the Discrete Dislocation Dynamics method<br />
3.4 Performance improvment<br />
3.4.1 Measure of performance<br />
It is needed to measure the performance of the new parallel program in terms of gain of the elapsed<br />
time. The following measure is often used.<br />
Speedup(P ) = t0<br />
, where t0 is the elapsed time of the serial program and tp is that of the parallel program using P<br />
processors. The speedup indicates what the practical advantage is by using the parallel program<br />
instead of the serial program. t0 can be replaced by the elapsed time of the parallel program<br />
run with one processor. Then the speedup parameter shows the advantage of the use of several<br />
processors because both the numerator and the denominator contain overhead for initializing a<br />
parallel environment. This is often called the algorithmic speedup ratio. The speedup results<br />
presented in the following section are measured using the algorithmic speedup ratio, because it<br />
is difficult to compare directly the serial and the parallel DDD program compiled using different<br />
compilers and run in different platforms.<br />
The efficiency of a parallel program is a measure of the effectiveness of the hardware usage. The<br />
efficiency is represented as the ratio of the speedup on P processors to P , that is, Speedup(P )/P .<br />
An efficiency close to 1 indicates an excellent scalability.<br />
3.4.2 Conditions for good performance<br />
In the ideal case, the speedup is a linear function of the number of processors P , i.e. Speedup(P ) =<br />
P . This case is hardly achievable in practice because of the moderate fraction of the parallelizable<br />
part of an algorithm, the communication overhead and the load unbalance.<br />
Suppose that a fraction fp of a serial program can be parallelized and that the remaining 1 − fp<br />
cannot be parallelized. The Speedup(P ) can be written as the following equation, supposing a<br />
perfect load balancing and no communication overhead.<br />
Speedup(P ) =<br />
tp<br />
1<br />
(1 − fp) + fp/P<br />
Eq. 3.5 is plotted in Fig. 3.15 for fp = 1, 0.99, 0.9 and 0.5. The ideal case (Speedup(P ) = P ) is<br />
only possible if fp equals to 1, and a maximum speedup of only 10 is expected where fp = 0.9.<br />
A parallel program involves communication overhead to send and receive data, which does not exist<br />
(3.4)<br />
(3.5)
3.4 Performance improvment 81<br />
Speedup<br />
50<br />
45<br />
40<br />
35<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
f p =1.00<br />
f p =0.99<br />
f p =0.90<br />
f p =0.50<br />
5 10 15 20 25 30 35 40 45 50<br />
Number of CPUs<br />
Figure 3.15: Ideal speedup of a program with the number of processors when only a fraction fp of<br />
the program is parallelized<br />
in a serial program. In general situations, the performance of a parallel program is worse since the<br />
load is not perfectly balanced among the different processors. The performance of general parallel<br />
programs is shown in Fig. 3.16. It is assumed that only 80% of a serial program is parallelized, and<br />
the effects of the communication overheads and the load unbalance are shown as well.<br />
From the figure, it is obvious that good performance can be achieved by good load balancing among<br />
processors, minimizing communication overhead and increasing the parallelisable fraction f of a<br />
serial program. Note that the communication overhead can be decreased both by minimizing the<br />
amount of communication (good algorithm) and by using a fast network (good hardware).<br />
3.4.3 Performance tests<br />
A simple speedup model of our algorithm is made and compared to the actual timing results. It<br />
is assumed that the simulation volume is decomposed in M × M × M boxes and that the total<br />
number of processors used is P , dividing boxes into a 2D array of P 1/2 × P 1/2 × 1 or into a <strong>3D</strong> array<br />
of P 1/3 × P 1/3 × P 1/3 . The elapsed time by using a single processor is approximately the sum of<br />
the time needed for the stress computation (t s stress) and that used to update the positions (t s update ).<br />
Assuming that t s update is a fraction of ts stress (t s update = αts stress), the total elapsed time t s then is
82 Parallelization of the Discrete Dislocation Dynamics method<br />
Serial<br />
Parallel<br />
cpu 1<br />
cpu 2<br />
cpu 3<br />
cpu 4<br />
20 80<br />
20 20<br />
Load unbalance<br />
Unparallelizable part<br />
Parallelizable part<br />
Communications<br />
Figure 3.16: Load unbalance and communication overhead of general parallel programs<br />
written as Eq.(3.6).<br />
t s = t s stress + t s update = (1 + α)ts stress<br />
The number of inner boxes (BI), boundary boxes (BB) of each processor and the total corner boxes<br />
(BC) can be expressed using M and P as Eq.(3.7) in the case of a 2D array and as Eq.(3.8) in the<br />
case of a <strong>3D</strong> array of processors. It is assumed that every processors have the same number of boxes<br />
in its subsystem.<br />
BI =<br />
<br />
M<br />
BI = M − 2<br />
P 1/2<br />
<br />
M<br />
− 2<br />
P 1/3<br />
3<br />
2<br />
(3.6)<br />
<br />
M<br />
; BB = 4M − 2 ; BC = 4MP (3.7)<br />
P 1/2<br />
2 <br />
M<br />
; BB = 6 − 2 ; BC = P 12<br />
P 1/3 M<br />
<br />
− 16<br />
P 1/3<br />
If dislocation segments are homogeneously distributed over all the processors, the elapsed time for<br />
the stress computation of each processor (t p<br />
stress ) is merely a division of ts stress by P . Considering<br />
that the elapsed time for updating segments’ positions of a box is t s update /M 3 , the elapsed time of a<br />
processor (t p ) for both the stress computation and the segment motion can be expressed as Eq.(3.9).<br />
t p = t p<br />
stress + tp<br />
update<br />
(3.8)<br />
= ts stress<br />
P + ts update<br />
M 3 (BI + BB + BC) + tc (3.9)<br />
The elapsed time for updating BC is included on each processor, because every processors wait until<br />
the updates of BC by the Master processor are finished. tc represents the time needed for message
3.4 Performance improvment 83<br />
Speedup<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
Ideal case<br />
<strong>3D</strong>, t c =0<br />
2D, t c =0<br />
<strong>3D</strong>, t c =0.02t s stress<br />
2D, t c =0.02t s stress<br />
10 20 30 40 50 60<br />
Number of CPUs<br />
Figure 3.17: Speedup model of the algorithm (Eq.(3.9)) with M = 21, t s update = 0.02ts stress for 2D<br />
array of processors (2D) and <strong>3D</strong> array of processors (<strong>3D</strong>)<br />
passings.<br />
A speedup (t s /t p ) is plotted in Fig.3.17 using Eq.(3.9) with M = 21, α = 0.02 and tc/t s stress =<br />
0., 0.02. The curve is drawn up to P = 49 in the case of 2D array of processors. Note that a<br />
maximum of 49 processors can be used with M = 21 in the 2D array of processors, since there<br />
should be at least three boxes along any coordinate axis.<br />
The speedup of the algorithm is strongly dependent on the network speed. If the network is fast<br />
enough (tc 0), the algorithm speedup can be as high as 23 using 25 processors with the 2D array<br />
of processors.<br />
It seems that the <strong>3D</strong> array of processors have an advantage over the 2D array if the same number<br />
of processors are used. In reality it is controversial, because a <strong>3D</strong> array of processors involves more<br />
messages passing than a 2D array. A <strong>3D</strong> array needs message passings along all of three coordinates,<br />
whereas a 2D array needs message passings along only two coordinates. The size of each message,<br />
however, is smaller in the case of a <strong>3D</strong> array of processors.<br />
Dislocation structures with 13185, 37182, 57605 and 77198 segments are extracted from a simple<br />
tensile test of a single crystal with M = 20. Then execution time for 100 steps with zero applied<br />
stress is measured and the elapsed time per step is averaged by dividing the execution time by
84 Parallelization of the Discrete Dislocation Dynamics method<br />
Time per step(seconds)<br />
160<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
0 5 10 15 20 25 30 35 40<br />
Number of CPUs<br />
N=77198<br />
N=57605<br />
N=37182<br />
N=13185<br />
Figure 3.18: Elapsed time per step, in seconds, as a function of the number of processors for 13185,<br />
37182, 57605 and 77198 segments.<br />
100. Fig.3.18 shows the average elapsed time required to complete one time step using up to<br />
36 processors in a 2D array of processors on the IBM p690 architecture with 1.7GHz POWER4<br />
processors 13 . Fig.3.19 shows the speedup of each number of processors and compares the actual<br />
data with the speedup model. Measured data agrees well with the model except in the 13185<br />
segments case. The speedup decrease in the 13185 segments case for large values of P is due to the<br />
the proportion of the computation time over the communication time decreases with the number of<br />
processors.<br />
3.4.4 Load balancing<br />
As pointed out in Sec. 3.4.2, good load balancing is crucial to achieve high performance of a parallel<br />
computation. In many cases, DDD simulations involve highly heterogeneous dislocation structures.<br />
An example is the formation of intense slip bands in fatigue simulations as shown in Fig. 3.21 (see<br />
Sec. 4.3). Fig. 3.21 shows the worst case in load balancing, due to the inherent highly heterogeneous<br />
13 The author would like to acknowledge the support from KISTI (Korea Institute of Science and Technology<br />
Information) under "the 5th Strategic Supercomputing Applications Support Program’ with Dr. Sangmin LEE as<br />
the technical supporter. The use of the computing system of the Supercomputing Center is also greatly appreciated.
3.4 Performance improvment 85<br />
Speedup<br />
40<br />
35<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
Ideal case<br />
N=37182<br />
N=57605<br />
N=77198<br />
N=13185<br />
t c =0.015t s stress<br />
0 5 10 15 20 25 30 35 40<br />
Number of CPUs<br />
Figure 3.19: Speedup by using P processors in 2D array for 13185, 37182, 57605 and 77198 segments<br />
(on IBM p690 architecture)<br />
Efficiency<br />
1.1<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
Ideal case<br />
N=37182<br />
N=57605<br />
N=77198<br />
N=13185<br />
0 5 10 15 20 25 30 35 40<br />
Number of CPUs<br />
Figure 3.20: Efficiency by using P processors in 2D array for 13185, 37182, 57605 and 77198<br />
segments (on IBM p690 architecture)
86 Parallelization of the Discrete Dislocation Dynamics method<br />
(a) Intense slip bands of fatigue tested volume con-<br />
taining bimodal-sized particles<br />
[010]<br />
6 7 8<br />
3 4 5<br />
0 1 2<br />
[100]<br />
(b) Decomposition of the simulation volume by 3 × 3 × 1 proces-<br />
sors<br />
Figure 3.21: An example taken from fatigue tests of cylindrical simulation volume containing parti-<br />
cles of bimodal size distribution (see Sec.4.3). Load is highly unbalanced among processors due to<br />
the highly heterogeneous dislocation microstructure<br />
dislocation microstructure in fatigue and the geometry of the simulation volume.<br />
If the simulation volume is decomposed into equal sized subsystems as shown in Fig. 3.21(b), there<br />
is a high discrepancy in the number of segments between the different processors, and consequently<br />
in the computation time. A load-balance method is thus highly desirable to equilibrate the proces-<br />
sor loadings.<br />
One obvious way to better balance the loads is to shift the boundaries of each subsystem, or the<br />
array ibs, so that each processor has approximately the same number of segments, since the com-<br />
putation time is usually proportional to the number of segments. In surface grain simulations,<br />
however, the number of segments may not be a good yardstick because some segments are treated<br />
as virtual ones and need no internal stress computation, which will be detailed in Sec. 4.3. Hence<br />
the actual elapsed time for stress computations is taken as an indication for load balancing.<br />
The elapsed time is measured by using, for example, the MPI timer function MPI_WTIME(). Load<br />
balancing is processed every fbalan steps. To minimize the overhead of load balancing, the processor<br />
which having the minimum elapsed time is determined one step before the load balancing. This<br />
processor then takes charge of shifting the boundaries. This makes the overhead of load balancing<br />
to be hidden in the process of overall stress computation.
3.4 Performance improvment 87<br />
Y<br />
Z<br />
Initial<br />
boundaries<br />
Current<br />
boundaries<br />
Figure 3.22: Shifting of subsystem boundaries to balance load among processors<br />
The elapsed time for the stress computation of each processor is gathered to the processor in charge.<br />
The processor adds the elapsed time of processors in the same column along x, y, and z axis in the<br />
processor array, and the average elapsed time of the columns is calculated. By comparing the<br />
elapsed time of a column to the average time, the boundary is shifted by one box so that the size<br />
of a column increases or decreases. The boundary can move until the number of boxes has reached<br />
the minimum number of boxes of a subsystem on any axis (3 boxes).<br />
The boundary adjustment procedure during a fatigue test is shown in Fig. 3.22. Dislocations in<br />
the top-right part of the cubic simulation volume are ’virtual’, so less computation time is needed<br />
to treat them. The boundaries of each subsystem thus move toward bottom-left of the simulation<br />
volume until they share a comparable number of real dislocation segments.<br />
The load balancing by parallel-piped subsystems has the following limitations: (i) The different<br />
subsystems on the same column should have the same width, thus the computing load is balanced<br />
among the columns, not among the processors. (ii) There should be at least three boxes along<br />
each axis of each subsystem, thus load concentration smaller than three boxes can not be balanced<br />
further.<br />
During simulations, the number of segments can change dramatically. It is not unusual that an<br />
initial Frank-Read source produces millions of segments. When the number of segments is small,<br />
X
88 Parallelization of the Discrete Dislocation Dynamics method<br />
the efficiency of the new parallel DDD program is quickly decreasing with the number of processors<br />
(Fig. 3.20), and the speedup could be even reversed by using more processors (Fig. 3.19). One way<br />
to guarantee a maximum efficiency and to prevent inverse speedup of multiple processors would be<br />
to change the number of processors dynamically based on the current number of segments. This<br />
can be done, for example, by creating a new communicator of n [1 : N] processors in the initial<br />
communicator of N processors.<br />
3.4.5 Comparison of simulation results between the serial and parallel DDD<br />
code<br />
The simulation results of a parallel program should not be significantly different from that of a<br />
serial program when addressing the same problem. There could be a slight difference, however, due<br />
to the parallelization because the order of computations might be changed.<br />
In DDD simulations, the segments are moved sequentially, and two different orders of segments<br />
can results in different dislocation configurations even though the applied stresses are the same.<br />
Nevertheless the overall stress-strain relation and the dislocation density in the simulation should<br />
be consistent when using the parallel or the serial DDD program.<br />
Fig. 3.23(a) shows the stress-strain curves of a simple tensile test along [001] direction of a single<br />
crystal. It can be verified that the curve of the parallel program is consistent with that obtained<br />
using the serial program. The number of dislocations is slightly different by using the two programs<br />
but the error is negligible as compared to the overall evolution of dislocations (Fig. 3.23(b)).<br />
3.5 Application to Stage I-II transition simulation<br />
In this section, resorting to the performance of the new parallel DDD code, it is attempted to<br />
simulate the transition from Stage I to Stage II in the stress-strain curves of FCC single metals<br />
subjected to a uniaxial tension.<br />
3.5.1 Stress-strain curves of FCC single crystals<br />
In the general case, when a FCC single crystal is subjected to tensile tests, the stress-strain curves<br />
represent three distinctive stages, I, II and III. Fig. 3.24 shows stress-strain curves from experiments<br />
in copper crystals covering a wide range of orientations. Stage I or ’easy glide’ is a region of low
3.5 Application to Stage I-II transition simulation 89<br />
Stress[MPa]<br />
180<br />
160<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
Parallel<br />
0<br />
Serial<br />
0.0e0 1.0e-4 2.0e-4 3.0e-4 4.0e-4<br />
Strain<br />
(a) Stress-strain curves<br />
Number of segments<br />
40000<br />
35000<br />
30000<br />
25000<br />
20000<br />
15000<br />
10000<br />
5000<br />
Parallel<br />
0<br />
Serial<br />
0 500 1000 1500 2000 2500 3000<br />
Step number<br />
(b) Number of segments<br />
Figure 3.23: Comparison of (a) stress-strain curves and (b) number of segments of the serial and<br />
the parallel DDD program in a tensile test simulation<br />
linear hardening (θ = ∂τ<br />
∂γ<br />
G ∼ 300 ) and is observed at the beginning of deformation. Stage II or ’linear<br />
hardening’ is a second linear region with a much greater rate of work hardening (θ ∼ G<br />
30 ), then is<br />
followed by Stage III or ’parabolic hardening’, which represents a period of decreasing rate for the<br />
hardening. Fig. 3.24 also shows that the shape of the curves is strongly dependent on the crystal<br />
orientation, e.g. the orientations close to the [001] − [¯111] side show a short or no Stage I, whereas<br />
the orientations far from the boundaries of the standard triangle show a long Stage I.<br />
The slip system on which the resolved shear stress is the highest is called the primary system, and<br />
the deformation commences on this system involving a low work hardening rate (Stage I). The<br />
accumulated slip rotates the orientation of the crystal, and subsequently the resolved shear stress<br />
on different slip systems is modified as the slip direction rotates towards the tension axis. When<br />
the tensile axis arrives the [001] − [¯111] side, another slip system (the conjugate system), which has<br />
been inactive initially, begins to activate, and the interactions of two slip systems initiates Stage II.<br />
3.5.2 Simulation conditions<br />
The initial dislocation configuration is made of 5.65 µm-long Frank-Read sources homogeneously<br />
spread over the 12 slip systems (see Fig. 3.25(a) 14 ). The orthorhombic simulation box has been<br />
used with the ratio of the axis’ length close to 40 : 30 : 31, and the periodic boundary conditions<br />
are applied along all the axes. The simulation volume is around 577 µm 3 and the initial dislocation<br />
14 Different colors represent dislocations on different slip systems
90 Parallelization of the Discrete Dislocation Dynamics method<br />
Figure 3.24: Resolved shear sress/shear strain curves of copper crystals as a function of orientation.<br />
([Diehl 56])<br />
density is 8.82 × 10 11 m −2 (90 sources).<br />
The materials parameters of copper are used : G = 42000 MPa (Shear modulus), ν = 0.324<br />
(Poisson’s ratio), b = 2.56 Å(Burgers vector magnitude), B = 10 −5 Pa s (Viscous drag coefficient),<br />
V/b 3 = 350 (Activation volume) and τIII = 32 MPa (Threshold stress).<br />
The initial tensile axis (T) has been chosen to [ ¯14 15 25], so that the initial configuration is in<br />
single glide close to the double glide axis. The resulting Schmid factor is shown in Fig. 3.25(b).<br />
The primary system is the system B4: (111)[¯101] and the conjugate system is C1: (¯1¯11)[011] (the<br />
notations of Schmid and Boas are recalled in Table 2.1). The simulation runs in the constant strain-<br />
rate condition of ˙ɛ = 100sec −1 . The rotation tensor of the crystal is computed every time step using<br />
Eq. 3.10, and the new tensile stress axis (T ′ ) is updated as Eq. 3.11<br />
dW =<br />
12<br />
s=1<br />
1<br />
2 (ms ⊗ n s − n s ⊗ m s ) (3.10)<br />
T ′ = (I + dW) T (3.11)<br />
This way of updating the orientation of the tensile axis reproduces the experimental tensile tests<br />
where the crossheads of the tensile machine are unconstrained, i.e. the rotations of the crosshead<br />
are allowed.
3.5 Application to Stage I-II transition simulation 91<br />
(010) Y<br />
(100) X<br />
(001) Z<br />
(a) Initial dislocation configuration<br />
3.5.3 Simulation results<br />
Schmid factor<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
B4 D4 D1 C1 B5 C5 D6 A6 A2 B2 C3 A3<br />
(b) The initial distributions of the Schmid fac-<br />
tors<br />
Figure 3.25: Initial conditions of the simulation<br />
In Fig. 3.26(a), the tensile stress-strain curve is plotted. The curve shows no significant hardening<br />
up to the strain level of 1.3%. The dislocation configuration at a cumulated tensile strain of about<br />
1.% are shown in Fig. 3.27, and the dislocation structure shows that the dislocations in the primary<br />
system are mainly activated, thus the simulation is still in Stage I at this moment.<br />
The stress-strain curve shows an insignificant hardening or even a negative hardening after, say,<br />
0.5% cumulated strain. This phenomenon seems to be related to the enhanced cross slip of the<br />
primary dislocations due to the spurious dipoles generated by the periodic boundary conditions.<br />
Indeed, the evolution of the dislocation densities plotted in Fig. 3.26(b) shows that the density of<br />
the cross slip dislocations of the primary system is significant even though the Schmid factor and<br />
the observed shear strain on the deviate (or cross-slip) system are negligible.<br />
However, the rotation of the tensile axis is well accounted for in the simulation. Fig. 3.28(a) shows<br />
the rotation of the stress axis plotted within the standard stereographic triangle. The stress axis<br />
rotates toward the [001] − [¯111] boundary. Subsequently, the Schmid factors are modified and the<br />
ratio between the primary and the conjugate system increases toward 1 as plotted in Fig. 3.28(b).<br />
Despite the spurious softening, the shear stress-strain curves of the primary and the conjugate<br />
system (see Fig. 3.29) demonstrate that the hardening is decreased in the primary system due to<br />
the rotation of the axis, whereas the hardening is more pronounced in the conjugate primary system.<br />
This typical simulation of the behavior of a bulk crystal of copper is a first tentative of a massive
92 Parallelization of the Discrete Dislocation Dynamics method<br />
σ 11<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014<br />
ε11 (a) The tensile stress-strain curve<br />
ρ[10 11 m-2]<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
B4<br />
D4<br />
D1<br />
C1<br />
B5<br />
C5<br />
D6<br />
A6<br />
A2<br />
B2<br />
C3<br />
A3<br />
0<br />
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014<br />
ε11 (b) The evolution of the dislocation densities<br />
Figure 3.26: The tensile stress-strain curve and the evolution of the dislocation densities<br />
Figure 3.27: The dislocation configuration at a cumulated tensile strain of about 1.%<br />
(010) Y<br />
(100) X<br />
(001) Z
3.5 Application to Stage I-II transition simulation 93<br />
τ(B4) [MPa]<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
[-14 15 25]<br />
[-111]<br />
[001] [011]<br />
(a) The rotation of the tensile axis<br />
τ(C1)/τ(B4)<br />
0.985<br />
0.98<br />
0.975<br />
0.97<br />
0.965<br />
0.96<br />
0.955<br />
0.95<br />
0.945<br />
0e+00 1e+05 2e+05 3e+05 4e+05 5e+05 6e+05 7e+05<br />
Number of steps<br />
(b) The ratio between τ(C1) and τ(B4)<br />
Figure 3.28: The rotation of the tensile axis and the modified Schmid factors<br />
0<br />
0e+00 5e-03 1e-02 1e-02 2e-02 3e-02 3e-02 4e-02<br />
γ(B4)<br />
(a) The shear stress-strain curve of the primary system<br />
τ(D4) [MPa]<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
0e+00 1e-04 2e-04 3e-04 4e-04 5e-04 6e-04<br />
γ(D4)<br />
(b) The shear stress-strain curve of the conjugate system<br />
Figure 3.29: The shear stress-strain curves of the primary and the conjugate system<br />
simulation using the new parallel DDD code. The amount of cumulated shear strain on the primary<br />
system reaches 3% and the number of segments at the end of the simulation was close to 43, 000.<br />
More investigations are needed to understand the reason of the softening observed during Stage I.<br />
The transition to the stage II should nevertheless be observed in a few more steps, when the Schmid<br />
factor on the conjugate system (C1) will be high enough to multiply its dislocation density, and<br />
hinder the dislocation motion on the primary system (B4).
94 Parallelization of the Discrete Dislocation Dynamics method<br />
Key points<br />
• Parallel models and languages are strongly dependent on the different types of parallel<br />
computers. For shared memory machines, fork-join model is usually applied using<br />
OpenMP or Pthread. The message passing model is adequate for distributed memory<br />
architectures using the MPI as the programming language.<br />
• Distributed memory system and MPI have been chosen here for the development of<br />
a parallel DDD code. To handle the dislocation interactions during the segments<br />
motion is the most complex part for parallelization. Attention is focused on the fact<br />
that the <strong>3D</strong> array of boxes physically dividing the cubic simulation volume is similar<br />
to a matrix in the computer memory space.<br />
• The cubic simulation volume is decomposed into parallel-piped subsystems which are<br />
mapped to processors. The internal stress computation involves small modifications<br />
in the serial DDD code. The segment motion algorithms have been developed for the<br />
different box types categorized into inner, boundary and corner boxes. Attention is<br />
paid to avoid any overlap of the neighboring boxes between the processors.<br />
• The performance of the new parallel DDD program is measured and compared to a<br />
simple speedup model. The boundaries of each parallel-piped system are dynamically<br />
moved to balance the computing load among the processors. Simulation results of<br />
the parallel DDD version corresponds well to those obtained with the serial version.<br />
A speedup of around 17 is found when using 25 cpus to handle more than 30, 000<br />
segments<br />
• The new parallel code has been applied to simulate the Stage I-II transition of single<br />
FCC metals subjected to a uniaxial tension. The simulation runs up to the 3.2%<br />
cumulated shear strain at this moment and still remains in Stage I. However, the<br />
rotation of the axis is well accounted for, as measured on the evolution of the Schmid<br />
factors. The softening observed on the mechanical response is attributed to avalanche<br />
of cross-slip events which may be induced by the periodic boundary conditions.
Chapter 4<br />
Dislocation-precipitate interactions<br />
4.1 Image stresses due to a <strong>3D</strong> particle<br />
4.1.1 Motivations and review of the literature<br />
The image forces need to be considered when one wants to study the behavior of dislocations near<br />
a free surface. Another level of complexity arises when there are internal interfaces in metals, such<br />
as voids, second phase particles and microcracks. The magnitude of the interaction forces is at least<br />
required to understand the dislocation behaviors around the internal interfaces. There have been<br />
many studies to obtain the interaction both in analytical and numerical ways.<br />
Free surfaces<br />
The effect of a free surface can be easily treated in 2D by introducing the mirror images so that the<br />
traction on the free surface would be forced to zero. The problem is more complex in <strong>3D</strong> because it is<br />
almost impossible to find analytically the image dislocation for a finite dislocation segment that is not<br />
parallel to the free surface. This problem can be solved using either the solution of the Boussinesq<br />
problem ([Fivel et al. 96]) or the superposition principles using FEM ([Fivel et al. 98]). The<br />
main idea of this method is to apply point forces on the free surface so that these forces nullify the<br />
surface stress field generated by a dislocation in an infinite medium on the free surface. Using this<br />
method, the dislocation depletion near a free surface can be computed and the direct comparison<br />
can be made between the dislocation structure calculated in dislocation dynamics simulation and<br />
the experimental observations, e.g. TEM (Transmission Electron Microscopy).<br />
The dislocation-free zone near a crack tip ([Kobashi & Ohr 80]) and the plastic zone yielding from
96 Dislocation-precipitate interactions<br />
a crack ([Vitek 75]) are other fields where the image force acting on a dislocation is an important<br />
factor to consider. There are several analytical solutions on the interaction of a dislocation line<br />
and a hole or rigid inclusion. To make the problem to be a simple 2D case, it is generally assumed<br />
that a infinitely long dislocation line is interacting with an infinitely long cylindrical inclusion<br />
(hole or rigid) ([Santare & Keer 86], [Zhou & Lung 88], [Chen et al. 99]). The authors used<br />
a complex potential approach in plane strain restriction for an isotropic medium. They solved the<br />
elastic solution satisfying the stresses and displacements continuity at the interface. The application<br />
of these 2D solutions is rather limited to the case of fiber-strengthened composite or microcracks<br />
with large aspect ratio.<br />
Particles<br />
Even for a simple geometrical shape like a spherical particle, the calculation of the interaction force<br />
between a dislocation line and a particle satisfying the stress and displacement continuity across<br />
the interface is not an easy task. The elastic problem to satisfy the rigorous boundary conditions is<br />
too complex to solve in an analytical manner. Instead of the exact solution, approximate solutions<br />
have been obtained for <strong>3D</strong> shapes of particle. One method consists of using the interaction energy<br />
between a dislocation line and a particle. It is assumed that the interaction energy is equal to<br />
the change of the energy density of a dislocation line by the presence of a particle volume. The<br />
interaction force or the image force is obtained by differentiating the interaction energy. This ap-<br />
proach tells that the ratio of the image force acting on an edge dislocation and a screw dislocation<br />
is equal to the ratio of the energy density of a dislocation of the respective type. Using this method,<br />
an analytical equation is obtained for the force acting on a screw dislocation line near a cubical<br />
particle ([Melander & Persson 78]) and the force on an edge and a screw dislocation near a<br />
spherical particle are calculated numerically ([Nembach 83]). The long-range interaction between<br />
a screw dislocation and a spherical inclusion has been treated assuming that a straight disloca-<br />
tion line is located far from a spherical particle so that the particle disturbs the uniform stress<br />
field([Weeks et al. 69],[Comninou & Dundurs 72]).<br />
The interaction force due to a second phase particle with an elastic modulus mismatch is considered<br />
as negligible compared with the lattice mismatch and the stacking fault energy mismatch effect in<br />
the case of a penetrable particle ([Nembach 97]). The interaction forces by an elastic modulus<br />
mismatch, however, increase with the number of dislocations around the particles 1 . The image<br />
1 This type of interaction is referred to the paraelastic interaction ([Nembach 97]).
4.1 Image stresses due to a <strong>3D</strong> particle 97<br />
(-1-12) Z<br />
(111) Y<br />
(1-10) X<br />
(a) Cylindrical particle<br />
y=0.86Rp<br />
y=0.5Rp<br />
y=0<br />
(111) Y<br />
(-1-12) Z (1-10) X<br />
(b) Spherical particle<br />
Figure 4.1: Computation geometries of (a) a cylindrical particle and (b) a spherical particle<br />
stresses by particles thus could be an appreciable factor in the computation of the energy state of<br />
dislocation structures around a particle and in the phenomena involving several dislocations like<br />
work hardening rate. Moreover in dispersion-strengthened alloys at high temperature, the interac-<br />
tion force on a single dislocation line in climb direction is essential to investigate high temperature<br />
properties, for example creep threshold stresses ([Marquis & Dunand 02]).<br />
Scope of this section<br />
Image forces on a long, straight dislocation line near a particle are computed using the decompo-<br />
sition method as detailed in Sec. 2.4.2. Three cases are considered: a cylindrical (Fig. 4.1(a)), a<br />
spherical (Fig. 4.1(b)) and a cubical particle. The cylindrical particle case can be compared with<br />
analytical solutions of 2D circular particles. Image forces along both a glide and a climb direction<br />
are considered.<br />
4.1.2 Interaction of an edge dislocation with a circular cylindrical particle<br />
Image forces on an edge dislocation around a rigid particle have been solved analytically in 2D by<br />
Santare et al. ([Santare & Keer 86]) and around a void by Vitek ([Vitek 75]) and Chen et al.<br />
([Chen et al. 99]). The analytical solutions are obtained using complex potentials.<br />
In the case of a rigid particle, the image force of an edge dislocation projected along the glide
98 Dislocation-precipitate interactions<br />
y/Rp<br />
1.5<br />
1<br />
0.5<br />
1<br />
0<br />
-1 -0.5 0 0.5 1 1.5<br />
-0.5<br />
-1<br />
5<br />
9<br />
2<br />
3<br />
7<br />
4<br />
1<br />
x/Rp<br />
(a) Analytical solution [Santare & Keer 86]<br />
y/Rp<br />
1.5<br />
1<br />
0.5<br />
0<br />
1<br />
0<br />
-1 -0.5 0 0.5 1 1.5 2<br />
x/Rp<br />
-0.5<br />
-1<br />
(b) Numerical solution<br />
Figure 4.2: Case of a rigid particle (The image force is normalized by µmb 2 /(4π(1 − νRp)) (a)<br />
Analytical solution [Santare & Keer 86] (b) Numerical solution (FEM/DDD)<br />
direction can be simplified as follows ([Santare & Keer 86])<br />
F<br />
µmb 2<br />
4π(1−ν)Rp<br />
= x 4x 4 + k 2 x 2 + 2k 2 x 2 y 2 − 3x 2 − 2kx 2 y 2<br />
(x 2 + y 2 ) 3 (x 2 + y 2 − 1) k<br />
+ x 2k 2 y 4 + 2ky 2 − 4y 4 − k 2 y 2 + 5y 2 − 2ky 4<br />
(x 2 + y 2 ) 3 (x 2 + y 2 − 1) k<br />
with µm, ν being the shear modulus and the Poisson’s ratio of the matrix, Rp being the radius of<br />
the particle and k = (3 − 4ν) for a plane strain condition.<br />
There are two solutions of image forces around a circular void. The solutions are written in Eq. 4.2<br />
([Chen et al. 99]) and Eq. 4.3 ([Vitek 75]).<br />
F<br />
µmb 2<br />
4π(1−ν)Rp<br />
= −2x x 6 + x 4 y 2 + 4x 2 y 2 − x 2 y 4 + 4y 4 − 2y 2 − y 6<br />
(x 2 + y 2 ) 3 (x 2 + y 2 − 1)<br />
F<br />
µmb 2<br />
4π(1−ν)Rp<br />
= −2x 2x 4 − x 2 + 2x 2 y 2 + y 2<br />
(x 2 + y 2 ) 3 (x 2 + y 2 − 1)<br />
Image forces are computed numerically around a cylindrical particle with a height axis [¯1¯12], which<br />
is parallel to an edge dislocation. The shear modulus of the cylinder is set to be 10 3 µm for the<br />
rigid case and 10 −3 µm for the void. The contours of the image forces acting on an edge dislocation<br />
and projected along the glide direction are shown in Fig. 4.2 for the case of a rigid inclusion. The<br />
forces are normalized by µmb 2 /(4π(1 − νRp)). The image force profiles are computed on three lines<br />
45<br />
9<br />
3<br />
2<br />
1<br />
(4.1)<br />
(4.2)<br />
(4.3)
4.1 Image stresses due to a <strong>3D</strong> particle 99<br />
F/(µ m b 2 /(4π(1-ν)R p ))<br />
15<br />
10<br />
5<br />
0<br />
-5<br />
-10<br />
y=0.5R p<br />
y=0.86R p<br />
y=0.86R p<br />
y=0.5R p<br />
y=0.0<br />
y=0.0<br />
[SANTARE & KEER 86]<br />
[CHEN et al. 99]<br />
[VITEK 75]<br />
DDD/FEM<br />
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9<br />
x/R p<br />
Figure 4.3: Normalized image forces on an edge dislocation situated at x/Rp from the center of a<br />
circular hole or a rigid cylinder along three lines with different stand-off distances (Rp: radius of<br />
cylindrical inclusion) Solid line: Analytical solution (Rigid case: [Santare & Keer 86], Hole case:<br />
[Vitek 75], [Chen et al. 99]) Points: Calculated by FEM/DDD<br />
with a direction of [1¯10] and a stand-off distance of 0., 0.5Rp and 0.87Rp from the plane passing<br />
through the center of the cylinder, and are plotted in Fig. 4.3 and compared with the analytical<br />
solutions. Numerical solutions fit well to the analytical solution(Eq. 4.1) in the rigid particle case.<br />
The computed image forces fall between two analytical solutions (Eq. 4.2 and Eq. 4.3) in the case<br />
of the void.<br />
From the 2D cylindrical circular case, it is thus validated that the image forces can be computed<br />
correctly using the FEM-DDD coupled method.<br />
4.1.3 Interaction of an edge dislocation with a spherical particle<br />
There is no existing analytical solution of image forces in the case of a spherical particle. However, it<br />
is common to encounter a precipitate or an inclusion of a spherical shape. Thus, numerical solutions<br />
of the spherical particle case would be useful in practice.<br />
Before computing image forces, both the issues of convergence and accuracy of the numerical solu-<br />
tions have been addressed. Using 20-node <strong>3D</strong> elements, it was verified that the numerical solutions
100 Dislocation-precipitate interactions<br />
F/(µ m b 2 /(4π(1-ν)R p ))<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
y=0.86R p<br />
y=0.5R p<br />
[SANTARE & KEER 86]<br />
y=0.0<br />
DDD/FEM, y=0.0<br />
DDD/FEM, y=0.5R p<br />
DDD/FEM, y=0.86R p<br />
0.6 0.8 1 1.2<br />
x/Rp 1.4 1.6 1.8<br />
Figure 4.4: Normalized image forces on an edge dislocation situated at x/Rp from the center of<br />
a spherical rigid particle. (Rp: radius of a spherical particle) Solid line: Analytical solution for a<br />
cylindrical rigid inclusion [Santare & Keer 86] Points: Calculated by FEM/DDD<br />
are converging by increasing the number of elements and the accuracy of the solutions were checked<br />
using the isobands method ([Bathe 96]). A mesh using 6656 20-node elements is found to be able<br />
to represent the high stress gradients correctly, although the <strong>3D</strong> spherical particle problem involves<br />
rough and discontinuous point load distributions on the elements of the particle volume.<br />
Image force profiles are obtained on three lines with a direction of [1¯10] and a stand-off distance<br />
of 0., 0.5Rp and 0.87Rp from the plane passing through the center of the sphere. In Fig. 4.4, the<br />
image force profile is compared with the corresponding 2D analytical solution (Eq. 4.1) for the case<br />
of a rigid particle and in Fig. 4.5 for the void particle with Eq. 4.2.<br />
The magnitude of the image force in the spherical particle case is lower than in the corresponding<br />
cylindrical case and the difference increases with the stand-off distance of the glide plane of the<br />
dislocation. It should be noted that the difference is much significant in the case of a spherical void.<br />
It can be deduced that the interaction volume is smaller in the case of a spherical than that of a<br />
cylindrical particle.<br />
The computed image force profiles are fitted in the form of α/(x/Rp) β . The profiles are divided in<br />
two regions, with a high gradient (up to x = 1.4Rp) and with a moderate gradient. The parameter
4.1 Image stresses due to a <strong>3D</strong> particle 101<br />
F/(µ m b 2 /(4π(1-ν)R p ))<br />
0<br />
-1<br />
-2<br />
-3<br />
-4<br />
-5<br />
-6<br />
-7<br />
-8<br />
y=0.86R p<br />
y=0.5R p<br />
y=0.0<br />
[CHEN et al]<br />
DDD/FEM, y=0.0<br />
DDD/FEM, y=0.5R p<br />
DDD/FEM, y=0.86R p<br />
0.6 0.8 1 1.2<br />
x/Rp 1.4 1.6 1.8<br />
Figure 4.5: Normalized image forces on an edge dislocation situated at x/Rp from the center of a<br />
spherical void. (Rp: radius of a spherical particle) Solid line: Analytical solution for a cylindrical<br />
hole [Chen et al. 99] Points: Calculated by FEM/DDD
102 Dislocation-precipitate interactions<br />
β is in the range of 6.42 to 8.89 in the high gradient region and 4.34 to 4.84 in the moderate gradient<br />
region. This result is consistent to the argument of Comninou et al. ([Comninou & Dundurs 72]),<br />
which shows that the interaction force is proportional to (x/Rp) −4 assuming that a straight dislo-<br />
cation line is located far from a spherical particle. Although the authors derived the equation in<br />
the case of a screw dislocation, the scheme can also be applied to an edge dislocation.<br />
4.1.4 Interaction of an edge and a screw dislocation with a cubical particle<br />
A cubical particle on a FCC matrix habit plane of {111} is now considered. It facilitates the problem<br />
because a dislocation line lies parallel to an edge and a face of the cube. The side length 2a of the<br />
cube is set to be 1.612Rp so that the volume of the cube is equal to that of the spherical particle<br />
considered previously. Image forces are computed on three glide planes with the stand-off distance<br />
y=0., y=0.5a, y=0.87a. The shear modulus of the particle (µp) is set to be twice that of the matrix<br />
and the Poisson’s ratio ν is 0.312 for both the particle and the matrix.<br />
This configuration was also proposed by Melander ([Melander & Persson 78]) using the energy<br />
density of a screw dislocation line. Image forces are obtained by differentiating the interaction<br />
energy. The image force of a screw dislocation is given by<br />
F = (µp − µm)b 2<br />
8π 2 a<br />
⎡<br />
⎣ tan−1<br />
<br />
(Y −1)<br />
|X−1| − tan−1 |X − 1|<br />
<br />
(Y +1)<br />
|X−1|<br />
tan<br />
−<br />
−1<br />
, where X, Y are coordinates normalized by the half side length a.<br />
<br />
(Y −1)<br />
|X+1| − tan−1 |X + 1|<br />
(Y +1)<br />
|X+1|<br />
⎤<br />
⎦ (4.4)<br />
Computed image force profiles on an edge dislocation line are shown in Fig. 4.6. The decrease of<br />
the image force with the stand-off distance is not as fast as in the spherical particle case, since the<br />
dislocation line is parallel to one face of the cubical particle and the glide plane is normal to the<br />
face. It is found that the image force is 20% higher in the case of a cubical particle at the stand-off<br />
distance of 0 than in the case of a spherical particle. Image forces on a screw dislocation shows that<br />
the ratio between an edge and a screw dislocation is around 0.68, which is close to (1-ν). However,<br />
Eq. 4.4 fits the edge dislocation case well, even though Eq. 4.4 is solved in the case of a screw<br />
dislocation. It is not clear whether this discrepancy is due to the mesh size or the approximations<br />
used to derive Eq. 4.4.<br />
The image force magnitude along the climb direction of an edge dislocation is given by<br />
τcl =<br />
((b · σ) × t)<br />
|b|<br />
· (b × t)<br />
|b × t|<br />
(4.5)
4.1 Image stresses due to a <strong>3D</strong> particle 103<br />
F/((µ p -µ m )b 2 /(4π(1-ν)a))<br />
3.5<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
y=0.87R p<br />
Edge,y=.00Rp<br />
Edge,y=.50Rp<br />
Edge,y=.87Rp<br />
Screw,y=.00Rp<br />
[MELANDER & PERSSON 78]<br />
y=0.0<br />
y=0.5R p<br />
1.1 1.2 1.3 1.4<br />
x/a<br />
1.5 1.6 1.7 1.8<br />
Figure 4.6: Image forces on a dislocation interacting with a cubical particle. solid line: Analytical<br />
solution for a cubical particle [Melander & Persson 78]<br />
with b being the Burgers vector and t being the dislocation line vector. The climb forces draw<br />
attention because it affects the local climb of a dislocation around a particle. Climb forces on an<br />
edge dislocation at the position of x = 1.1a on three different stand-off distances are plotted in Fig.<br />
4.7. The climb forces are negligible up to y = 0.5a. Even at y = 0.877a, the magnitude of the climb<br />
force is only around half that of the image force along the glide direction. The situation is quite<br />
different in the spherical particle case. The magnitude of the climb force is 2 − 3 times higher than<br />
the force along the glide direction y = 0.866Rp. It can be said that the configuration chosen for the<br />
cubical particle is more resistant to dislocation climb.<br />
4.1.5 Discussion<br />
The interaction force of a dislocation line with a circular cylindrical, spherical and cubical particles<br />
with differing elastic modulus was computed using the superposition principle. The complementary<br />
problem was solved using the FEM-DDD coupling code. There have been significant research<br />
interests in case of a long cylindrical inhomogeneity. In the case of a long edge dislocation line<br />
close to a long cylindrical particle, the image force calculated numerically was compared with the<br />
analytical solutions. It showed that good accuracy could be obtained by the superposition method.
104 Dislocation-precipitate interactions<br />
F cl /((µ m b 2 /(4π(1-ν)R p ))<br />
14<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
-2<br />
Sphere(Rigid)<br />
Sphere(Void)<br />
Cube<br />
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />
y/R p<br />
Figure 4.7: Climb forces on an edge dislocation at the position of x=1.1Rp(a) along y-axis<br />
The same scheme was applied to the case of a spherical particle interacting with a dislocation line,<br />
which is a more common shape for inhomogeneities observed in metals. The image force acting on<br />
a dislocation at x = 1.1Rp along the glide direction is found to be smaller than the cylindrical case<br />
by a factor of 0.89 for the rigid and 0.58 for the hole case. The interacting volume involved in the<br />
spherical case is thus much smaller than that in the cylindrical case. As for climb forces, it is found<br />
that considerable forces are acting along the climb direction. The climb force is acting on an edge<br />
dislocation in the direction of reducing the extra plane around rigid particle, and in the opposite<br />
direction around a void. As the stand-off distance of the glide plane is increased, the climb force<br />
increases. A dislocation line parallel to one face and one edge of a cubical particle showed that the<br />
image force is higher than the spherical particle by 20% and the climb force is negligible. It can be<br />
said that a cubical particle is more resistant to dislocation climb.
4.2 A simple case of dislocation-particle interaction 105<br />
4.2 A simple case of dislocation-particle interaction<br />
4.2.1 Motivation and review of literatures<br />
The hardening of materials by distributing small particles of another phase is a well-known phe-<br />
nomenon and has been used to develop high strength structural materials. The impediment of<br />
dislocation glide by second phase particles is the basic mechanism of increase in the flow stress.<br />
In the case of impenetrable particles, the back stress exerted by trapped closed loops causes the<br />
subsequent increment of stress, i.e. the work hardening. In addition to the glissile closed loops left<br />
around the particles, prismatic loops have also been observed experimentally to form near a particle<br />
by the cross slip of dislocations ([Humphreys & Martin 67]). The particles in the two phase<br />
materials can be classified by their size, shape, volume fraction, spatial distribution (regular or ran-<br />
dom) and the characteristic of the particle/matrix interface (coherent or incoherent). In addition<br />
to these morphological particle parameters, the stress fields around the particles, which arise by a<br />
difference in the lattice constant, the stress incompatibility due to a difference in the shear modulus,<br />
or the image stress caused by the change in the strain energy of the dislocation near a second phase<br />
particle are important factors needed to describe the mechanical properties of two-phase alloys.<br />
A number of experimental and analytical researches have been focused on finding the relevant param-<br />
eters and their effects in determining the flow stress and work hardening properties of these alloys.<br />
For example, well-designed experiments, in which the volume fraction, size and spacing of particles<br />
can be varied, have been made to measure the effect of such parameters on mechanical behavior of<br />
single crystals containing hard particles ([Ebeling & Ashby 66], [Humphreys & Martin 67]).<br />
Theoretical approaches to account for the effect of relevant parameters have been developed from<br />
the Orowan mechanism, which considers the elementary dislocation-particle interaction and relates<br />
the flow stress with the obstacle spacing, to the methods for dealing with the anisotropy of the<br />
material and the complicated statistics such as random distribution of particles. An overview of<br />
the analytical approaches can be found in the review article of Reppich ([Reppich 93]). To fully<br />
account for the realistic interaction of dislocations and many particles, computer simulations have<br />
also been developed. Foreman and Makin ([Foreman & Makin 66]) had investigated the effect<br />
of a random arrangement of strong and weak point obstacles on the flow stress. In recent simula-<br />
tions more complex and accurate states are treated, e.g. the distribution of finite particles including<br />
shape effect ([Zhu & Starke 99]), the particles with mismatch stress field by a difference in lattice<br />
constant ([Mohles & Nembach 01]). More insights of the effect of each parameter on the flow
106 Dislocation-precipitate interactions<br />
stress can be gained through these simulations. Note that in a simulations mentioned above, it was<br />
generally assumed that the dislocation moves on the single glide plane so that no <strong>3D</strong> events such as<br />
cross slip were allowed.<br />
Scope of this section<br />
In this work, the effect of the image stress on the flow stress and work hardening has been studied<br />
using the <strong>3D</strong> dislocation dynamics simulation coupled to a finite element code. The simulation<br />
method is detailed in section 4.2.2. The glide of a dislocation line through a channel between two<br />
incoherent, impenetrable spherical particles is considered. The dynamic interaction of a dislocation<br />
with particles and the resulting image stresses are solved in the <strong>3D</strong> space. Several dislocations are<br />
forced to move between two spherical particles one by one and the resolved shear stress needed to<br />
bypass the particles having trapped loops is monitored while the cross slip of dislocation is accounted<br />
for. By changing the shear modulus of the particle with other parameters such as particle radius<br />
and the inter-particle spacing fixed, the effect of image stress induced by second phase particles is<br />
evaluated and the relation between the flow stress and the difference in shear modulus is established.<br />
4.2.2 Calculation procedures<br />
The situation considered here consists of two impenetrable rigid spherical particles having radius<br />
Rp, inter-particle distance L and shear modulus Gp. A typical <strong>3D</strong> simulation box is shown in<br />
figure 4.8(a). In the dislocation code, the spherical particles are modeled as a set of facets made of<br />
polyhedral surface elements. Each facet of the spherical particle has a certain strength to act as an<br />
obstacle to dislocation motion, i.e. a dislocation is authorized to cross a facet if the local effective<br />
resolved shear stress is above the particle strength. In this study, the strength of the particle has<br />
been chosen high enough so that it represents impenetrable hard obstacles. That way, it is possible<br />
to use a simplified version of the code for which no image stresses are computed. The interaction<br />
between dislocations and particles is then only related to an obstacle effect. This situation is later<br />
referred as the ∆G = 0 case. In order to include the effect of the image forces, one has to solve<br />
the complementary problem described in figure 2.16. To do so, a <strong>3D</strong> box containing two spherical<br />
particles has been meshed in the code CAST∃M( 2 ). To represent correctly the high gradient of<br />
stress near the particles, the mesh has been refined near the periphery of the particles. A (111)<br />
section of the <strong>3D</strong> mesh taken at the centers of particles is shown in figure 4.8(b). The displacement<br />
2 Finite element code developed by Commissariat à l’Energie Atomique, CEA-DRN/DMT/SEMT
4.2 A simple case of dislocation-particle interaction 107<br />
(a) (b)<br />
Figure 4.8: (a)Simulation Box used in <strong>3D</strong> discrete dislocation simulation. Two particles of radius Rp<br />
and inter-particle distance L are shown with dislocation line on (111)[1-10] slip system. (b)Typical<br />
mesh of the simulation box constructed by 4672 20-nodes <strong>3D</strong> elements in the finite element code<br />
CAST∃M. Mesh is sectioned on the (111) plane containing the particle centres.<br />
of the bottom surface is set to be zero in the direction normal to the surface and two nodes located<br />
on this surface are fixed in adapted directions in order to remove the trivial rigid body solution.<br />
A dislocation line, which is initially a pure screw segment, is pinned at the two end points. The<br />
position of the pinned points is set to be 0.9 of L/2 from the border of particle so that the portion<br />
of dislocation line which lies between the particles bypasses the particles by bowing out. In section<br />
4.2.3, it will be shown that this fixed point can be used to obtain reliable results for the flow stress.<br />
Single slip loading conditions on the (111)[1-10] system are assumed. The resolved shear stress τ on<br />
the slip plane is increased step by step. After each load increment, the new positions of all segments<br />
are computed as a function of time until the shear strain γ caused by the dislocation motion has<br />
fallen below a pre-selected value. If the dislocation line has reached an equilibrium position, ∆γ is<br />
nearly equal to zero, τ is then increased and ∆γ is monitored again. After the dislocation line has<br />
completely bypassed the particles leaving two Orowan loops, a new dislocation line is introduced<br />
and the subsequent increment of τ, i.e. the work hardening is computed.<br />
4.2.3 Flow stress of impenetrable particles with a different shear modulus<br />
In order to validate the computation method used in this work, the flow stresses induced by im-<br />
penetrable particles with no image stresses have been calculated and compared with the results of
108 Dislocation-precipitate interactions<br />
Bacon et al ([Bacon et al. 73]). As explained in section 4.2.2, the finite element procedure is no<br />
more used here. The particle are modeled by the facets obstacles. The radius of particles are set to<br />
be 0.131(2 9 b), 0.262(2 10 b), 0.524(2 11 b) µm with a fixed inter-particle distance of L = 2.59 µm. The<br />
situation considered here is similar to the models used by Bacon et al ([Bacon et al. 73]) except<br />
for the periodic boundary condition. Figure 4.9 shows the dislocation configurations at the flow<br />
stress for three radius of particles respectively. The dislocation line near the particles is in paral-<br />
lel position due to the self-interaction which pulls the branches on opposite sides of the particle.<br />
The line between the particles is quite symmetric. The results obtained for the flow stresses have<br />
been compared according to the ’effective line tension’ argument proposed by the authors. They<br />
argued that the effective line tension, which properly accounts for the interactions, can be taken as<br />
A(ln(1/(2Rp) + 1/L) −1 + B) where A is 1/(2π) and 1/(2π(1 − ν)) for edges and screws respectively.<br />
Figure 4.10 shows our results obtained for the flow stresses normalized by Gmb/L plotted against<br />
ln(1/(2Rp) + 1/L) −1 . The linear relation is perfectly reproduced and the slope of the fitting line<br />
is about 0.254, which is closed to the expected value of 1/(2π(1 − ν)). Considering these observa-<br />
tions, it can be said that the fixed dislocation source at the point 0.9 of L/2 from the periphery<br />
of the spherical particle correctly reproduces the periodic boundary condition used by Bacon et al<br />
([Bacon et al. 73]).<br />
To investigate the effect of a difference in shear modulus on the flow stress, we have made simulations<br />
of an alloy made of a copper matrix containing two spherical particles of radius Rp = 0.262 µm. A<br />
shear modulus ratio (∆G/Gm) was set to be 1, 3 and 5, where ∆G = Gp − Gm. The inter-particle<br />
distance is fixed to L = 2.59 µm respectively. Figure 4.11 shows the increment in the flow stress<br />
as a function of ∆G/Gm. As the shear modulus of the particles increases, the flow stress increases<br />
due to the fact that the repulsive image stresses on the dislocation line needs higher resolved shear<br />
stress to bypass the dislocation line through particles. The fitting curve shows that the flow stress<br />
changes as (∆G/Gm) 0.6 . The change in the flow stress is small even for the particles with a shear<br />
modulus of 6Gm, for which the shear stress only increases by about 6 percent comparing to the no<br />
image stress case. Actually, the minute effect of ∆G is expected from the short range of the image<br />
stresses. Indeed, calculations show that the image stress exerted on a dislocation line decreases as<br />
|x − x0| α , where x, x0 represents the position of a dislocation line and the centre of a spherical<br />
particle respectively. α is found to be around 6 ∼ 7. So even in the case where ∆G/Gm = 5, the<br />
image stress decreases below the flow stress of hard obstacle (∆G = 0) at a distance of 1.4 × Rp<br />
from the centre of a particle. This effects means that the repulsive interaction between a dislocation
4.2 A simple case of dislocation-particle interaction 109<br />
Figure 4.9: Dislocation configuration at the flow stress. The radius of particles are 0.131, 0.262,<br />
0.524 µm from top to bottom.
110 Dislocation-precipitate interactions<br />
Normalized flow stress<br />
2.45<br />
2.4<br />
2.35<br />
2.3<br />
2.25<br />
2.2<br />
2.15<br />
2.1<br />
6.8 7 7.2 7.4 7.6 7.8 8<br />
ln[1/(2Rp )+1/L] -1<br />
Figure 4.10: τys/(Gmb/L) vs.ln(1/(2Rp) + 1/L) −1 . The line represents the fitting line and the slope<br />
is 0.254.<br />
and a particle will reduce the inter-particle spacing by about 8 percent. The small reduction in the<br />
effective inter-particle spacing will result in the small increase in the flow stress. The effect of Rp<br />
on the flow stress has been calculated with the constant ∆G/Gm value of three. The results are<br />
shown in figure 4.12. The flow stress depends almost linearly on Rp.<br />
The effects of a difference in shear modulus on the flow stress can be summarized as τys ∝<br />
Rp × (∆G/Gm) α , where α is lower than 1. Note that this result is different from the case of<br />
shearable particles. Indeed, the effects of shear modulus of coherent, penetrable particles can be<br />
found in the literature ([Nembach 83]). He has calculated the image force exerted by one spherical<br />
particle of modulus Gp on a straight, infinite dislocation line using the change in the strain energy<br />
density of the dislocation. The shear stress is found to be proportional to ∆G 1.5 and R 0.22<br />
p . Thus,<br />
we observed that ∆G has a weaker effect in the case of hard particles than in the case of shearable<br />
particles.
4.2 A simple case of dislocation-particle interaction 111<br />
Increase in flow stress<br />
0.06<br />
0.05<br />
0.04<br />
0.03<br />
0.02<br />
0.01<br />
0<br />
0 1 2 3 4 5<br />
(Gp-Gm )/Gm Figure 4.11: Increase in flow stress by a difference in shear modulus. ∆τ/τ0 is plotted against<br />
∆G/Gm, where τ0 represents the flow stress of impenetrable obstacle with no image stress. Rp =<br />
0.262 µm. The fitting curve shows ∆τ ∝ (∆G/Gm) 0.6 .<br />
Normalized flow stress<br />
2.6<br />
2.55<br />
2.5<br />
2.45<br />
2.4<br />
2.35<br />
2.3<br />
2.25<br />
2.2<br />
400 600 800 1000 1200 1400 1600 1800 2000 2200<br />
Rp /b<br />
Figure 4.12: Normalized flow stress (τys/(Gmb/L)) vs. Normalized particle radius (Rp/b).<br />
∆G/Gm = 3.
112 Dislocation-precipitate interactions<br />
4.2.4 Increment in hardening stress<br />
Although a difference in shear modulus has a little effect on the flow stress, its effect is much stronger<br />
on the hardening stress. In this section, the effects of ∆G on the hardening stress are discussed.<br />
The stress required to force a dislocation to glide between particles which have remaining Orowan<br />
loops, are plotted in figure 4.13 both for the case of Gp = 4Gm (filled symbols) and for no image<br />
stress (∆G = 0 : open symbols). The change in the shear modulus of the particles results in an<br />
increased work hardening rate and the effect of ∆G increases with the particle radius. The image<br />
stress fields are the sum of the interactions of the particle and each dislocation present around the<br />
particles. Thus the dislocation lines have to overcome the additional image stress field coming from<br />
the interaction of the residual loops and the particles. This additional stress is directly related to the<br />
number of Orowan loops stored around the particles. As a result, compared to the case of no image<br />
stress, a higher shear stress is required to bypass the particles and this effect is more pronounced as<br />
more dislocation lines are passing, which leads to a higher material hardening. Considering that the<br />
range and the magnitude of the image stress increases as the radius Rp, it can be understood that the<br />
hardening rate is increased as Rp. Fisher et al ([Fisher et al. 53]) have investigated the hardening<br />
of metal crystals induced by precipitate particles. They computed the back stress resulting from<br />
Orowan loops and calculated the effective critical stresses of the Frank-Read sources. The argument<br />
is that the hardening stress (τh) is related to the number of loops (N) by τh = γN, where γ is a<br />
function of the particle radius Rp and the inter-particle distance L. They obtained<br />
γ = 0.65cbGm<br />
<br />
1 −<br />
<br />
ν<br />
2(1 − ν)<br />
R 2 p<br />
(L + Rp) 3<br />
, where c is the parameter describing the closest distance between a source and a particle. We<br />
obtained the slopes of each graph in figure 4.13 by linear fitting and the dependence of these slopes<br />
on b(Rp) 2 /(L + Rp) 3 as shown in figure 4.14. It is found that the argument of back stress proposed<br />
by Fisher et al ([Fisher et al. 53]) still holds in the case of moving dislocation line through two<br />
particles. The parameter c is around 3.45 for the case of no image stress and 4.43 when Gp = 4Gm,<br />
which means, based on their argument, that the effective distance of a stress source is shorter or<br />
the back stress is higher if the image stress is included.<br />
It is observed experimentally that all the dislocations left around the particles by the gliding dis-<br />
locations are not rigorously confined to a single glide plane, but are rather of the prismatic form<br />
([Humphreys & Martin 67]). If cross-slip is easy, a dislocation may overcome an obstacle in its<br />
glide plane by slipping on another slip plane, with the formation of long jogs. The simulations pre-<br />
(4.6)
4.2 A simple case of dislocation-particle interaction 113<br />
Shear stress increment (MPa)<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
0 1 2 3 4 5<br />
Number of Orowan loops<br />
Figure 4.13: Work hardening of alloy containing two particles of radius 0.131 µm(1), 0.262 µm(2),<br />
0.524 µm(3). τh − τys is plotted against number of Orowan loops around each particle, where τh,<br />
τ0 represents the hardening stress and the flow stress respectively.<br />
sented above to investigate the change of hardening stress have been done under the condition that<br />
the cross slip of dislocation is prohibited by artificially changing the cross slip parameters. When<br />
the normal conditions for cross slip are used, cross slip events have been observed. As an example<br />
for the particles of radius 0.262 µm, cross slip occurs if the number of the Orowan loops reaches<br />
four in case of no image stress and two in case of Gp = 4Gm. Considering that the back stress on<br />
the primary slip plane becomes higher as the accumulation of the Orowan loops proceeds, it is easy<br />
to cross slip to the secondary slip plane and cross slip again (double cross slip) into the primary<br />
plane to bypass the particle. Figure 4.15 shows the bypassing of a dislocation line by double cross<br />
slip. If the shear modulus of the particle is higher than that of the matrix, a high local stress is<br />
generated near the particle and the local event of cross slip is more probable due to the image force.<br />
This demonstrate the importance of including the image stress to investigate local events such as<br />
cross slip.
114 Dislocation-precipitate interactions<br />
Slope<br />
2<br />
1.8<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0 0.02 0.04 0.06 0.08 0.1 0.12<br />
b(Rp ) 2 /(L+2Rp ) 3<br />
Figure 4.14: The slope of fitting line in figure 4.13 vs. bR 2 p/(L + Rp) 3 . ∗ : ∆G = 0, × : ∆G = 3Gm.<br />
Figure 4.15: Bypassing of particles by double cross slip of dislocation line. Dislocation initially<br />
glides on the slip system of (111)[1-10] and changes the system on (11-1)[1-10] and then comes back<br />
to initial slip system.
4.2 A simple case of dislocation-particle interaction 115<br />
4.2.5 Discussion<br />
In this work, we studied the effect of a difference in shear modulus on the flow stress and the<br />
subsequent hardening stress using the <strong>3D</strong> discrete dislocation dynamics code. The effect of ∆G on<br />
the flow stress can be summarized by τys ∝ Rp(∆G/Gm) α , where α is lower than 1 and Rp is the<br />
radius of particle. Because the range of the image stress is short, the maximum increment in the flow<br />
stress is only 6 percent in the case of ∆G/Gm = 5 comparing to no image stress case. Nevertheless<br />
the image stress increases as Orowan loops accumulate, resulting in a change of the work hardening<br />
rate. This effect is due to the fact that the image stress fields are the sum of interactions of a<br />
particle and each dislocation present around the particles. As slip accumulates, the dislocation<br />
line feels an additional image stress field coming from the interaction of the residual loops and the<br />
particles. The first order approximation on the work hardening of Fisher et al ([Fisher et al. 53])<br />
is found to be valid even in the simple configuration of two particles and one dislocation line. The<br />
effect of the image stress is that the effective distance of a stress source becomes shorter or the back<br />
stress becomes larger. If dislocation cross slip is allowed in the code, it has been observed that a<br />
dislocation can avoid an obstacle in its glide plane by cross slip into another slip plane. The back<br />
stresses on the glide plane due to the Orowan loops trigger the cross slip event. If the image stress<br />
is included, the cross slip probability increases. The image stress around the particle is large enough<br />
to affect local events such as cross slip.<br />
However, the computation time and effort are too demanding to include the effect of image stress<br />
on simulations of alloy containing a large number of particles. The number of elements used here<br />
in the simple situation of two particles is already about 5000 20-nodes elements. An approximate<br />
way to include the effect of image stress is to introduce an effective radius which can represent<br />
the difference in shear modulus. That way, the facets obstacles alone can be used to reproduce<br />
the precipitate hardening, so that the finite element coupling is no more needed. For example, in<br />
the case of ∆G = 3Gm, the average radius of the first trapped loop is 0.272 µm comparing to<br />
0.264 µm for the case of ∆G = 0. Hence, the difference in shear modulus changes the effective<br />
particle radius by only a few percent. Nevertheless the stress field generated by a second phase<br />
particle is increasing as slip accumulates and the image stress field turns out to be crucial to predict<br />
work hardening magnitude. This problem can then be treated using an empirical solution of the<br />
image stress generated by the interaction between several dislocations and a particle.
116 Dislocation-precipitate interactions<br />
4.3 Fatigue simulations of materials hardened by particles<br />
4.3.1 Motivation and review of literatures<br />
Fatigue in single-phased metals<br />
Strain is usually localized in single-phased metals submitted to cyclic deformation. The imposed<br />
plastic strain amplitude is accommodated by high local strains in strain localization zones called<br />
persistent slip bands (PSBs).<br />
The strain localization results in persistent slip markings (PSMs) at the specimen surface ([Man et al. 02]).<br />
The irreversible character of slip inside PSBs are known to generate permanent surface steps. After<br />
numerous experimental observations, it is generally agreed that fatigue crack initiates at the PSMs<br />
([Mughrabi 85], [Suresh 98]) along the individual PSBs. Fatigue life is thus largely dominated<br />
by the irreversibility of the slip in the PSBs and the associated surface step.<br />
Existing crack initiation models can be categorized in (i) crack initiation due to a surface step larger<br />
than a critical size, (ii) crack initiation due to local decohesion of crystal planes. Understanding of<br />
the intrinsic PSB microstructure is therefore crucial to establish such models.<br />
Transmission electron microscopy (TEM) are used to examine the dislocation microstructure in-<br />
volved in PSBs, and to understand the specific role of dislocations in cyclic deformation. Surface step<br />
displacements can be measured by atomic force microscopy (see for example, [Risbet et al. 03]).<br />
Fatigue in precipitation-hardened materials<br />
Multi-phase materials which contain precipitates often show good static strength compared to single-<br />
phase materials. Under cyclic loading conditions, however, precipitation-hardened materials do not<br />
always insure better fatigue properties.<br />
Typical cyclic properties of materials containing shearable and non-shearable particles are shown in<br />
Fig. 4.16 ([Gerold & Steiner 82]). The cyclic hardening behavior is shown as a function of the<br />
cumulative plastic shear strain for various particle sizes. Fig. 4.16(a) demonstrates that specimens<br />
containing shearable particles often suffer severe softening and even early fatigue failure. Large<br />
cyclic softening is observed after an initial strong hardening up to a maximum shear stress: the<br />
softening rate increases with the particle sizes, and peak-aged alloys (74 Å) show the largest cyclic<br />
softening. In the case of non-shearable particles (Fig. 4.16(b)), the rate of hardening and the stress<br />
drop decreases as the particle radius increases and the shear stresses are saturated.<br />
A few of the characteristic fatigue properties observed in experiments are outlined below for each
4.3 Fatigue simulations of materials hardened by particles 117<br />
(a) Underaged specimens (Shearable particles) (b) Overaged specimens (Non-shearable particles)<br />
Figure 4.16: Cyclic hardening and softening of aged Cu-2at%Co single crystals<br />
case ([Mughrabi 83]).<br />
• Shearable particles<br />
After an initial hardening stage, the cyclic strain becomes localized into persistent slip bands.<br />
A drastic cyclic softening related to the destruction of the precipitation hardening in the PSBs<br />
leads to the early initiation of shear-type fatigue cracks at the PSBs surface intersection.<br />
• Non-shearable particles<br />
The cyclic softening is strongly reduced and the cyclic deformation behavior is much more<br />
stable. Non-shearable particles produce more homogeneous straining.<br />
Numerical simulations of fatigue tests<br />
The dynamical features of the dislocations inside the PSBs during the cyclic deformation are not<br />
easily accessible by experimental observations, e.g. TEM. It is due to the fact that the stresses<br />
are removed during TEM observations, which makes the observed dislocation microstructure to be<br />
different from the microstructure under stresses. Besides, the free surface effects are not negligible.<br />
Another difficulty arises when relating the details of the formation of the surface steps directly<br />
with the dislocation microstructure inside the grains, since the two experiments are often performed<br />
independently, and sample preparation methods are usually destructive. Thus, a complete and<br />
comprehensive scheme for fatigue crack initiation is still missing.<br />
The development of the DDD method and the increase of the computer capabilities enable simula-
118 Dislocation-precipitate interactions<br />
tions to provide crucial information concerning the formation of slip bands. Numerical simulations<br />
make it now possible to observe the details of the PSB formation, and to investigate the relation<br />
between surface steps and the corresponding dislocation microstructure. This knowledge would help<br />
to understand better crack initiation mechanisms and to build a more elaborate fatigue life model.<br />
Scope of this section<br />
The performance of the new parallel DDD program (Chapter 3) makes it feasible to simulate dislo-<br />
cations interacting with thousands of precipitates. Fatigue tests of precipitation-hardened material<br />
are simulated in <strong>3D</strong>. The fatigue simulations are similar to the work of [Déprés 04] developed in the<br />
case of 316L stainless steel. The effects of shearable and non-shearable particles on the formation of<br />
PSBs are studied. Dislocation mechanisms for PSBs formation are detailed and some of numerical<br />
results are compared with experimental observations.<br />
4.3.2 Description of the simulation method<br />
Simulation volume geometries<br />
Cylindrical grain geometry has been adopted for the shape of the simulated volume. The volume<br />
is assumed to be a surface grain of a fatigue tested specimen, i.e. the volume consists of one free<br />
surface and grain boundaries. The cylindrical volume is represented by 20 facets as shown in Fig.<br />
4.18(a). The free surface is represented by assigning zero strength to the top facets (see Sec. 2.4.2),<br />
thus dislocations can escape through that surface. All the other facets act as strong obstacles to<br />
the dislocation motion as if for highly disordered grain boundaries.<br />
A virtual volume is prepared on top of the free surface to keep track of the dislocations exiting<br />
the simulated crystal volume (see Fig. 3.21(b)). This virtual volume and virtual dislocations are<br />
introduced to compute deformations of the free surface as will be detailed in Sec. 4.3.5. The virtual<br />
dislocations are set to have no effects on dislocations inside the simulation volume and no return<br />
into the crystal simulation volume is authorized.<br />
The normal vector of the top facet is taken as [110]. The diameter of the cylinder is 10 µm and the<br />
height is 5 µm. The image forces due to the free surface are not taken into account in this work.<br />
Materials parameters<br />
The material’s parameters used in this simulations are those of nickel as listed in Table 4.1.
4.3 Fatigue simulations of materials hardened by particles 119<br />
Poisson’s ratio Shear modulus Burgers vector<br />
magnitude<br />
Activation vol-<br />
ume<br />
Viscous drag co-<br />
efficient<br />
Threshold stress<br />
ν G(GPa) b(Å) (V/b 3 ) B(10 −5 P as) τIII(MPa)<br />
0.276 94.7 2.5 2117 1.06 51.2<br />
<strong>3D</strong> particle arrangement<br />
Table 4.1: Mechanical and microscopic parameters of nickel<br />
A cylindrical volume containing random distributed particles is constructed as follows. For sim-<br />
plicity, the particles are assumed to have the same radius rp and the associated volume fraction is<br />
vf .<br />
Step 1 Preparing closed packed spheres (Fig. 4.17(a))<br />
Close packed spheres of an arbitrary radius r are constructed in a larger sphere (radius R).<br />
The center of each sphere of radius r is assumed to be the nucleation site of each particle,<br />
and each particle gets their material from the volume of sphere r during the Ostwald ripening<br />
process.<br />
Step 2 Adjusting the volume fraction<br />
The radii of all spheres are reduced by a common factor while their locations remain un-<br />
changed. The factor is given so that the volume fraction of the shrunken spheres is equal to<br />
vf .<br />
Step 3 Adjusting the radius of particles (Fig. 4.17(b))<br />
The radius of the shrunken spheres in Step 2 is scaled to rp, and the coordinates of the centers<br />
are scaled as well.<br />
Step 4 Cutting the cylindrical volume (Fig. 4.17(c))<br />
A cylindrical simulation volume is placed at the center of the outer sphere, and spherical<br />
particles situated inside the cylinder are selected.<br />
In this work, ’Step 1’ is achieved by successive trials of putting spheres of radius ’r’ in the sphere<br />
of radius ’R’. Subsequent trials are accepted only if the new sphere is not intersecting with spheres<br />
already in the volume. Although this method does not generate closely packed arrangement of<br />
particles, the resulting particle arrangement shows a purely random arrangement. Bi-modal size
120 Dislocation-precipitate interactions<br />
(a) Randomly placed spheres<br />
Volume of interest, ri<br />
Particle, rp<br />
Simulation volume<br />
(b) Adjusting the radius of particles (c) Selecting spheres inside the<br />
cylindrical volume<br />
Figure 4.17: Construction of a randomly distributed configuration of particles in the cylindrical<br />
simulation volume (Examples of bimodal size distribution case)<br />
distributions (see Sec. 4.3.6) are constructed using the same procedure except that two radii of<br />
spheres are placed in ’Step 1’ as shown in Fig. 4.17(a).<br />
Radius and volume fraction of particles<br />
Two cases of particle radius, rp = 160 nm and 400 nm, are considered. The volume fraction vf is<br />
fixed to 14% for all the cases. The number of particles generated in the cylindrical volume by the<br />
procedure above is 2510 for rp = 160 nm and 161 for rp = 400 nm case .<br />
Each individual particle is constructed using two pyramids attached at the bottom 3 in an effort<br />
to reduce the number of nodes and facets constituting the particles (reducing the computation<br />
load). The cylindrical simulation volumes are shown in Fig. 4.18, which contain (a) 161 particles<br />
of rp = 400 nm and (b) 2510 particles of rp = 160 nm respectively.<br />
The strength of a particle<br />
Particles are assumed to act as geometrical barriers with a pre-defined strength to the dislocation<br />
motion. The image forces due to the elastic modulus difference are not computed and no stress<br />
fields around particles are considered for the simplification of computing.<br />
The strength of a particle decreases as the particle is sheared by dislocations. This is due to both<br />
the decrease of the effective particle size on the glide plane and the loss of coherency for ordered<br />
precipitates ([Stoltz & Pineau 78]). The evolution of particle’s strength is illustrated in Fig.<br />
4.19(a) for the case where a particle is sheared by successive passages of dislocations in the same<br />
3 One particle involves thus six facets and five nodes.
4.3 Fatigue simulations of materials hardened by particles 121<br />
(a) Particles of rp = 400 nm and vf = 14% (b) Particles of rp = 160 nm and vf =<br />
Figure 4.18: Cylindrical simulation volume containing randomly distributed particles of (a) rp =<br />
400 nm and vf = 14% (b) rp = 160 nm and vf = 14%<br />
Strength of particle<br />
(a) Geometrical effect<br />
Number of<br />
dislocation passage<br />
14%<br />
Strength of particle<br />
(b) Chemical effect<br />
Figure 4.19: Evolution of particle’s strength due to shear-off by dislocation passages<br />
Loss of strength<br />
Number of<br />
dislocation passage<br />
glide plane. Fig. 4.19(b) illustrates that a particle may loose its strength completely before being<br />
totally sheared off due to the surface energy increase and the loss of coherency induced by the<br />
dislocations random chop-up.<br />
In this work, the particle’s strength (or facet’s strength) is decreased linearly with each event of<br />
dislocation passage through a given particle’s facets. As a first order approximation, the strength<br />
of the facet τfacet (see Sec.2.4.2) is decreased linearly from the initial strength by τfacet/(2rp/b)<br />
whenever a dislocation penetrates the facet as shown in Fig. 4.20 4 . The facet strength is set to<br />
zero after a certain number of passages of dislocations to represent the chemical effect shown in Fig.<br />
4.19(b).<br />
4 The magnitude of the strength drop is from the assumption that a particle looses its strength after 2rp/b of<br />
dislocations passages, which corresponds to complete shear-off of the particle as sketched in Fig. 4.19(a)
122 Dislocation-precipitate interactions<br />
Strength of particle<br />
Initial strength,<br />
τ<br />
∗ facet<br />
τ facet<br />
τfacet Slope=<br />
2ri/b<br />
Number of<br />
dislocation passage<br />
Figure 4.20: Evolution of facet’s strength with dislocation passages<br />
Particles of radius 160 nm are assumed to be easily shearable, and the initial strength of facets is<br />
defined as 292 MP a and the final strength τ ∗ facet<br />
being 162 MP a. Particle of radius 400 nm are<br />
assumed to be non-shearable or difficult to be sheared by using the initial strength being 7310 MP a.<br />
The initial configuration<br />
The initial dislocation microstructure of all the simulations is composed of four Frank-Read sources,<br />
in the form of pinned dislocation segments. All the Frank-Read sources are of the edge-type with<br />
the Burgers vector a<br />
2 [¯1¯10] on the slip plane (¯11¯1) (system 7 in Table 2.1). It should be noted that<br />
there is no dislocation nucleation around the particles and also that no dislocations are punched in<br />
from the free surface.<br />
The loading conditions<br />
Fatigue simulations are performed under a plastic strain control with a fully symmetrical push-<br />
pull loading ratio (ɛmax p /ɛmin p = −1), and an applied plastic strain amplitude △ɛp = 1 × 10−3 . In<br />
DDD simulations, only stresses can be applied. Imposed plastic strain conditions are achieved by<br />
monitoring the total slip accumulated in all the active slip systems. The applied stresses then are<br />
increased or decreased by comparing the resulting plastic strain to the pre-selected strain level.<br />
In the fatigue simulations, the plastic strain rate is monitored at each time step. The applied stress<br />
is stepwise increased by 1 MP a if the plastic strain rate is lower than the pre-selected minimum<br />
plastic strain rate, 10 −7 ((1) in Fig. 4.21). The load is kept constant in the case that the plastic
4.3 Fatigue simulations of materials hardened by particles 123<br />
dλ<br />
λ k<br />
εp VM<br />
εp VM<br />
εp max<br />
εp min<br />
1 2 3 4<br />
Figure 4.21: Quasi-static loading condition: stepwise increment and decrement of the applied<br />
stresses<br />
strain rate is between the minimum and the maximum strain rate ((2) in Fig. 4.21), until the<br />
dislocation microstructure is in equilibrium with the external loading. This condition is achieved<br />
by keeping the load constant while performing discrete time steps until the resulting plastic strain<br />
rate becomes lower than the pre-selected minimum strain rate ((3) in Fig. 4.21). The applied stress<br />
is decreased by 1 MP a if the plastic strain rate is higher than the pre-selected maximum plastic<br />
strain rate, 10 −4 ((4) in Fig. 4.21).<br />
4.3.3 Evolution of the dislocation microstructure during the fatigue tests<br />
Pas k<br />
Pas k<br />
Pas k<br />
General features of the formation of the dislocation microstructure<br />
The initial Frank-Read sources begin to expand and generate dislocation loops as the applied stresses<br />
are increased during the first quarter cycle. Parts of the loops are leaving the simulation volume<br />
through the free surface, which prints steps on the free surface as will be detailed in Sec. 4.3.5.<br />
The other parts are piled up along the strong grain boundaries. The screw part of dislocation lines<br />
tends to cross slip owing to the back stresses from the stored dislocations. The cross slip mechanism<br />
spreads slip lines over the whole simulation volume. Particles affect both the dislocation mobility<br />
and the cross slip probabilities which result in quite different microstructures as compared with the<br />
single-phased material case.
124 Dislocation-precipitate interactions<br />
2<br />
3 4<br />
1<br />
Figure 4.22: Evolution of the dislocation microstructure by the cyclic loading (Case of rp = 400 nm).<br />
As the sign of the applied stresses is reversed, the motion of the dislocations is reversed likewise. The<br />
initial dislocation microstructure, however, can not be completely restored due to the irreversible<br />
character of slip. Slip irreversibility is caused by the cross slip, the line reconnection (colinear<br />
junction) and the elimination of dislocation lines by the free surface. It increases with the number<br />
of fatigue cycles.<br />
These general features are illustrated in Fig. 4.22 using the figures taken from the rp = 400 nm<br />
case. The evolution of the dislocation structures is shown with the stress-strain curve of the first<br />
fatigue cycle 5 . The initial expansion of the Frank-Read sources (1) is followed by cross-slip which<br />
spreads slip through the entire simulation volume (2), and by subsequent changing of the sign of<br />
the applied plastic strain, specific dislocation microstructure forms (3)-(4), and even after one cycle<br />
the microstructure (5) is quite different from the initial one (1).<br />
5 Compressive plastic strain is applied first.<br />
5
4.3 Fatigue simulations of materials hardened by particles 125<br />
Dislocation density evolution<br />
The evolution of the total dislocation density, ρtot is monitored for the case of the volume contain-<br />
ing particles with rp = 160 nm , particles with rp = 400 nm and no particles during the cyclic<br />
deformation. ρtot is plotted as a function of the accumulated cyclic Von Mises strain in Fig. 4.23.<br />
The dislocation densities quickly increase and fluctuate according to the cyclic deformation in all<br />
the cases owing to the periodically vanishing cyclic load. Here the saturation of the dislocation<br />
densities is not observed because of the relatively small number of fatigue cycles have been per-<br />
formed (close to 3 cycles 6 for rp = 160 nm and 5 cycles for rp = 400 nm). It is expected, however,<br />
that the dislocation densities would gradually saturate as the fatigue cycles proceed as observed by<br />
[Déprés et al. 04].<br />
The simulation results show that<br />
rp=160 nm<br />
1. ρtot rp=400 nm<br />
> ρtot No particle<br />
> ρtot 2. The rates of the dislocation accumulation after each fatigue cycle are of the same order as the<br />
total densities.<br />
After three fatigue cycles (around ɛ V M 0.006), ρ rp=160 nm<br />
tot<br />
and six times than ρ<br />
No particle<br />
tot<br />
.<br />
is three times larger than ρ rp=400 nm<br />
tot<br />
The simulation volume containing particles with rp = 160 nm has a high resistance related to the<br />
limited glide area per slip plane. This effect is due to the large number of particles that are supposed<br />
to be shearable. To accommodate the applied plastic strain with limited dislocation glide, it is nec-<br />
essary to have a higher density of dislocations. After reversing the stresses, the dislocations still<br />
have difficulties to find an easy glide path and to annihilate each other, thus most of the dislocations<br />
are left inside the volume. For these reasons, it can be deduced that shearable particles give rise to<br />
a high slip irreversibility.<br />
In the case of particles with rp = 400 nm, dislocations have a higher chance to find an easy glide<br />
path because there are fewer particles in the volume. Moreover the particles are more effective<br />
in spreading dislocations through the simulation volume by cross slip because they are not easily<br />
shearable and involved with Orowan loops. The applied plastic strain thus can be accommodated<br />
with a lower dislocation density, and the rate of the dislocation accumulation reduces compared<br />
to the shearable particle case, since it is easier to move reversely by annihilating Orowan loops<br />
6 The simulation has to be stopped just before ɛp reaches zero near 3 fatigue cycles because of the large number of<br />
segments involved
126 Dislocation-precipitate interactions<br />
ρ[m -2 ]<br />
1.8e+13<br />
1.6e+13<br />
1.4e+13<br />
1.2e+13<br />
1e+13<br />
8e+12<br />
6e+12<br />
4e+12<br />
2e+12<br />
0<br />
0.0e0 2.0e-3 4.0e-3 6.0e-3 8.0e-3 1.0e-2 1.2e-2<br />
ε VM *<br />
r p =160nm<br />
r p =400nm<br />
No particle<br />
Figure 4.23: Evolution of the total dislocation density of the volume containing rp = 160 nm,<br />
rp = 400 nm and no particles<br />
left around the particles from the forwarding glide. Thus the irreversibility of slip is significantly<br />
reduced in the case of non-shearable particles.<br />
Strain localization kinematics<br />
In the preceding section, it is shown that the shearable particles favor high ρtot. The next question<br />
to address is whether the simulation can reproduce the localization of the plastic deformation or<br />
strain. Fig. 4.24 shows the dislocation microstructure formed after 3 cycles in the rp = 160 nm<br />
case and after 5 cycles in the rp = 400 nm case, along [110] direction. The figures illustrate clearly<br />
that the dislocation structures are highly heterogenous and intense slip bands are formed on the<br />
primary slip plane due to the cyclic loading. This result is consistent with experimental observations<br />
([Calabrese & Laird 74]). Plastic strain localization is believed to cause fatigue damage, since<br />
the local plastic strain has to be high enough to accommodate all the applied plastic strain. This<br />
process can eventually lead to fatigue crack nucleation.<br />
To demonstrate the statistics of the PSBs formation quantitatively, the spatial distribution of the<br />
dislocation densities is computed as follows at each time step k. The cylindrical simulation volume
4.3 Fatigue simulations of materials hardened by particles 127<br />
(a) Particles with rp = 160 nm (b) Particles with rp = 400 nm<br />
Figure 4.24: Localization of slip by forming intense slip bands<br />
is sliced into finite layers along the slip planes normal [¯11¯1]. Dislocation densities are then computed<br />
in each layer . The heterogeneity of the dislocation density can be shown by plotting the calculated<br />
dislocation densities of each layer along the reference axis [¯11¯1].<br />
Fig. 4.25 shows the evolution of such spatial dislocation density distributions. Three axis of the<br />
coordinate system correspond to the dislocation density, the position of each layer and the cycle<br />
number.<br />
From the figure, the general features of the formation of dislocation microstructure can be confirmed,<br />
i.e. the first increase of the applied plastic strain spreads dislocations over the simulation volume<br />
in all the cases. In the next cycles, the heterogeneous dislocation structure forms and certain zones<br />
accumulate a high dislocation density.<br />
The detailed observation of Fig. 4.25 reveals that the particles affect the slip localization in several<br />
ways :<br />
1. The width wdisl of the dislocation distributions over the simulation volume is the largest in<br />
the case rp = 400 nm (non-shearable particles) and the smallest in the case rp = 160 nm<br />
rp=400 nm<br />
(shearable particles), i.e. wd No particle<br />
> wd > w<br />
rp=160 nm<br />
d<br />
rp=160 nm<br />
2. The maximum local dislocation densities (ρmax) are in the following order, ρmax ρ rp=400 nm<br />
max<br />
> ρ<br />
No particle<br />
max<br />
.<br />
rp=400 nm<br />
3. The intense slip band width is smaller in the case of shearable particles, i.e. db d rp=160 nm<br />
b .<br />
.<br />
><br />
>
128 Dislocation-precipitate interactions<br />
(a) Particles with rp = 160 nm (b) Particles with rp = 400 nm<br />
(c) No particle<br />
Figure 4.25: Evolution of slip localization<br />
4. There is no clear dislocation localization up to five cycles in the case containing no particles. In<br />
the other cases, dislocation localization has occurred, and at leat one high dislocation density<br />
peak is present through the whole simulated fatigue cycles.<br />
Item 1 demonstrates that non-shearable particles promote the cross-slip due to the back stresses of<br />
Orowan loops around the particles, and the dislocations easily sweep a large area of the simulation<br />
volume as a result. Item 2 and 3 are consistent with the experimental observations, according<br />
to which persistent slip bands (PSBs) are much thinner if particles are shearable, and the local<br />
plastic strain becomes higher as the PSBs gets narrower ([Lee & Laird 83]). Fig. 4.26 shows<br />
some of the experimental data of the PSB thicknesses and the related local plastic shear strain<br />
([Mughrabi 83]). Item 4 can be related to the early initiation of fatigue crack in the case of shear-
4.3 Fatigue simulations of materials hardened by particles 129<br />
Figure 4.26: Relation between the local plastic shear strain amplitude and the thickness of PSBs<br />
able particles (see Fig. 4.16(a)). It is also interesting to note that the number of intense slip bands<br />
is only one in the case of shearable particles but second intense slip band begins to form in the<br />
case of non-shearable particles. This is related to the experimental observation that the number of<br />
cycles till crack initiation is inverse to the average slip band distance ([Graf & Hornbogen 78]),<br />
although higher number of fatigue cycles are necessary to confirm it.<br />
The speed of the slip localization can be quantified by the standard deviation of the spatial dislo-<br />
cation distribution curves at each time step, because the standard deviation becomes larger as the<br />
dislocation structure gets more heterogeneous. The standard deviation at time t is computed as<br />
follows.<br />
σρ(t) =<br />
<br />
1<br />
Dg<br />
Dg<br />
0<br />
ρ(t, x (s) ) − ¯ρ(t) 2 dx (s) (4.7)<br />
¯ρ(t) is the average dislocation density, and Dg is the size of the simulation volume.<br />
Fig. 4.27 shows the evolution of σρ(t) for each case, and it shows σρ(t) rp=160 nm > σρ(t) rp=400 nm ><br />
σρ(t) No particle . It can be seen that both the intensity and the speed of the slip localization is the<br />
highest in the case with shearable particles.<br />
Details of the intense slip band<br />
The intense slip bands of the rp = 160 nm case (shearable particles) are shown in Fig. 4.28. The<br />
dislocation microstructure is taken at a cycle number 3 with ɛp ∼ 0. Fig. 4.28(a) shows a <strong>3D</strong> image<br />
of the dislocation structure viewed from the orientations normal to [1¯11] (slip plane normal), [110]<br />
(Burgers vector) and [¯112].
130 Dislocation-precipitate interactions<br />
Standard deviation<br />
3e+13<br />
2.5e+13<br />
2e+13<br />
1.5e+13<br />
1e+13<br />
5e+12<br />
0<br />
0.0e0 2.0e-3 4.0e-3 6.0e-3 8.0e-3 1.0e-2 1.2e-2<br />
ε VM *<br />
r p =160nm<br />
r p =400nm<br />
No particle<br />
Figure 4.27: Evolution of the standard deviation σρ(t) (Eq. 4.7) in different simulations<br />
In the plane perpendicular to the primary slip plane (normal to [110]), intense slip bands can be<br />
seen clearly in the form of thin and compact dislocation walls. In the plane parallel to the primary<br />
slip plane (normal to [1¯11]), the dislocation structure is quite heterogeneous, and shows ladder-like<br />
structures along the Burgers vector direction. Dislocation debris and small loops are visible between<br />
the particles. Fig. 4.28(b) shows the isolated intense slip band, and considerable amounts of residual<br />
dislocations are clearly visible. It should be noted that the residual dislocation tangles are different<br />
in size compared to those in the single-phased material case, in which large tangles are observed<br />
together with ladder-like slip bands ([Obrtlik et al. 94], [Déprés et al. 04]).<br />
Small tangles between particles in the case of shearable particles are also observed in experimental<br />
data. Fig. 4.29 shows a dislocation microstructure of fatigue tested Inconel 718 observed in TEM.<br />
Although the material characteristics are quite different 7 , this micrograph clearly shows small high<br />
density tangles of primary residual dislocations.<br />
Fig. 4.30 shows the intense slip bands of non-shearable particle case (rp = 400 nm) at cycle number<br />
5 with ɛp = 0. The wall thickness of the PSB is much larger than the previous case, and the<br />
dislocation densities are variable in the band seen along [110] direction. In the plane normal to<br />
[1¯11], several Orowan loops and dense dislocation tangles are formed around the particles. In the<br />
7 Particle radius=20 − 40 nm, Grain size=20 − 40 µm, Volume fraction> 15%
4.3 Fatigue simulations of materials hardened by particles 131<br />
(a) <strong>3D</strong> image of intense slip bands (b) Isolation of intense slip band<br />
Figure 4.28: Details of intense slip band of the shearable particle case(rp = 160 nm). Three layers<br />
of thickness 300 nm are assembled for <strong>3D</strong> image in (a)<br />
Figure 4.29: TEM micrograph of fatigue tested Inconel 718 up to 10,000 cycles
132 Dislocation-precipitate interactions<br />
(a) <strong>3D</strong> image of intense slip bands (b) Isolation of intense slip band<br />
Figure 4.30: Details of intense slip band of the non-shearable particle case(rp = 400 nm). Three<br />
layers of thickness 300 nm are assembled for <strong>3D</strong> image in (a)<br />
space between the particles, however, long dislocation lines are clearly visible and the dislocation<br />
distribution is rather homogeneous as illustrated in Fig. 4.30(b). It is also observed that some of<br />
the complex dislocation structures are formed separated by the same distance as that of between<br />
two close particles.<br />
Intense slip band formation mechanism<br />
In the case of shearable particles (rp = 160 nm), the dislocation density increases rapidly since<br />
dislocations possess a high degree of irreversibility (see Fig. 4.23). The characteristic of this con-<br />
figuration is that there is a limited number of easy glide paths for dislocations. Thus, slip bands<br />
would form along one of the easy glide paths whose thickness is usually limited (order of rp). Upon<br />
load reversal, double cross-slipped dislocations can glide in the opposite direction along path close<br />
to the initial glide path because (i) cross-slipped dislocations also have a limited glide distance and<br />
(ii) the particles in the initial path loose part of their initial strength. This would form closely<br />
spaced edge dipoles, so called vein structures. As the cycling proceeds, the subsequent cross-slipped
4.3 Fatigue simulations of materials hardened by particles 133<br />
log(frequency)<br />
10 0<br />
10 −1<br />
10 −2<br />
10<br />
160 180 200 220 240 260 280 300<br />
−3<br />
τ [MPa]<br />
facet<br />
Figure 4.31: Repartition by the particle strength<br />
screw dislocations due to the cyclic loading react with the edge dipoles and produce prismatic<br />
loops aligned in the Burgers direction or helicoidal structures as observed in single-phased materi-<br />
als ([Li & Laird 94], [Déprés et al. 04]) but with a much smaller size. The prismatic loops can<br />
move along their glide cylinder, and form ladder-like structures. The repeated motion of interfacial<br />
dislocations with the cycles will eventually make particles at the PSBs edges to loose their strength,<br />
and persistent bands will be formed at this place. This process is observed numerically in the sim-<br />
ulations.<br />
Fig. 4.31 shows the statistical distribution of the facet’s residual strength after 3 cycles. The facet<br />
strength are distributed as follows: most of the facets are not sheared and keep their initial strength<br />
(right peak) and a small portion of the facets are completely sheared (left peak). The spatial distri-<br />
bution of the facet’s strength is shown in Fig. 4.32(a) by superimposing colors corresponding to the<br />
magnitude of strength for each facet. A clear channel of sheared particles is visible and its position<br />
corresponds exactly to that of the intense slip band. The dislocation structure is overlapped in<br />
Fig. 4.32(b). It should be noted that the particles near the intense slip band also loose strength,<br />
which possibly demonstrates that there exist interfacial dislocations at the periphery of the slip<br />
band, which move rather freely according to the cyclic load changes. Clear channel is also observed<br />
experimentally in which no dislocations and no particles are visible as shown in Fig. 4.33.<br />
In the case of non-shearable particles, the accommodation of the applied plastic strain is much easier<br />
because dislocations can move over a relatively long distance on the glide plane. During the first<br />
few cycles, the particles are bypassed by the Orowan mechanism. Glissile loops are accumulated<br />
around the particles. When the critical stress is reached, the screw portions of the loops change
134 Dislocation-precipitate interactions<br />
(110)<br />
(a)<br />
τfacet[MPa]<br />
300<br />
280<br />
260<br />
240<br />
220<br />
200<br />
180<br />
160<br />
Figure 4.32: Spatial distribution of particle strength<br />
Figure 4.33: Clear channel containing no particles and no dislocations (Inconel 718)<br />
(b)<br />
τfacet[MPa]<br />
300<br />
280<br />
260<br />
240<br />
220<br />
200<br />
180<br />
160
4.3 Fatigue simulations of materials hardened by particles 135<br />
σ VM [MPa]<br />
300<br />
200<br />
100<br />
0<br />
-100<br />
-200<br />
-300<br />
-6.0e-4 -4.0e-4 -2.0e-4 0.0e0 2.0e-4 4.0e-4 6.0e-4<br />
ε VM<br />
Figure 4.34: Typical stress-strain curve coming from the simulations (No particles case)<br />
their glide plane by cross slip, which contribute both to propagate slips in the simulation volume<br />
by generating dislocations in the secondary plane, and also to the formation of <strong>3D</strong> loops around the<br />
particles. Interactions between these loops and the dislocations in the secondary plane eventually<br />
form dense tangles around the particles. As the cyclic deformation proceeds, the tangles around<br />
the particles act as pinning points of dislocations moving between the particles. It favors thus the<br />
formation of dislocation dipoles which are linked by two near particles. The cutting of these dipoles<br />
by freely gliding dislocations, then generate stable dislocation structures. The subsequent formation<br />
of dipoles between the particles and the dislocation interactions make dense dislocation structures<br />
to form between the particles. This mechanism explains the dense dislocation tangles observed<br />
around the particles and complex dislocation structures between pairs of closely spaced particles.<br />
4.3.4 Mechanical behavior<br />
Cyclic stress-strain relation<br />
A typical cyclic stress-strain curve is shown in Fig. 4.34 for the ’No particles’ case. The first<br />
quarter-cycle corresponds to the activation of the initial Frank-Read sources, which is the hardest<br />
part of the cycles.<br />
The cyclic response curves are shown in Fig. 4.35 for the case of rp = 160 nm (shearable), rp =
136 Dislocation-precipitate interactions<br />
Stress[MPa]<br />
500<br />
450<br />
400<br />
350<br />
300<br />
250<br />
r p =160nm, v f =14%<br />
r p =480nm, v f =14%<br />
r p =480nm, v f =8%<br />
No particles<br />
200<br />
0.0e0 2.0e-3 4.0e-3 6.0e-3 8.0e-3 1.0e-2 1.2e-2 1.4e-2<br />
Cumulative plastic strain<br />
Figure 4.35: Cyclic response for rp = 160 nm (vf = 14%) and rp = 400 nm (vf = 8% and 14%)<br />
compared with the single-phased material case<br />
400 nm (non-shearable) and the single-phased material case. The curves obtained with two different<br />
volume fractions (vf = 8, vf = 14%) are plotted for non-shearable particles.<br />
The initial shear stress amplitude is the highest in the case of shearable particles as expected and is<br />
the lowest in the single-phased material case. The simulation volumes containing the non-shearable<br />
particles show intermediate initial stress values, and the stress amplitude increases with the particle<br />
volume fraction.<br />
The short hardening stage is followed by a cyclic softening response in all the cases. The degree of<br />
softening is maximum for the shearable particle case, and increases as the volume fraction in the<br />
non-shearable particle case.<br />
4.3.5 Surface slip markings<br />
Surface displacement computation method<br />
Dislocations that leave the simulation volume print steps on the free surface. The computation of<br />
the surface steps can give valuable information concerning the fatigue life because it is believed that<br />
the fatigue cracks are initiated from these surface steps. In this section, the method to compute<br />
surface steps is presented.
4.3 Fatigue simulations of materials hardened by particles 137<br />
b<br />
P r i m a i r e<br />
Op<br />
D2<br />
Od<br />
D1<br />
D e v i e<br />
(a) Associated problem<br />
b<br />
d<br />
O<br />
b<br />
P r i m a i r e<br />
+...<br />
Op<br />
+<br />
+<br />
Od<br />
D e v i e<br />
∆<br />
(b) Systematic method<br />
Figure 4.36: Computation method of displacements associated with the general case of non-planar<br />
dislocation loops<br />
Displacements of closed loops can be computed by decomposing the loop into as many triangular<br />
dislocation loops as needed using the equations in Sec. 2.2.2. In the case that a part of a dislocation<br />
loop has changed its slip plane by cross slip, an additional operation is necessary [Déprés et al. 03].<br />
Fig. 4.36(a) shows a non-planar dislocation loop. Points Op and Od represent the common point to<br />
construct triangular dislocations in the primary and the cross-slip plane respectively. If triangular<br />
loops are constructed for each dislocation segment in both planes, the non-planar dislocation loop<br />
would miss two triangular loops (OpD2D1 and OdD1D2) as shown in Fig. 4.36(a). The displacement<br />
solution computed from the arbitrary dislocation segments (e.g. OpD1) would be wrong, since they<br />
are never canceled. To remove this artifact, once a segment is found to have a neighbor in a different<br />
plane, a supplementary triangular loop is constructed by three points: a common point (e.g. Op),<br />
the extreme point of the segment (e.g. D1) and the projection point of the common point along the<br />
glide axis (∆). Fig. 4.36(b) shows the procedure, which cancels the effect of the arbitrary segments<br />
generated in constructing triangular dislocation loops.<br />
In fatigue simulations, dislocations can leave the simulation volume. Thus some of the dislocation<br />
loops are cut and left open by the free surface, and the Barnett’s equations (Sec. 2.2.2) can no<br />
longer be used without a special treatment. The displacements of open loops can be solved by<br />
adding virtual dislocations outside the simulation volume ([Weygand et al. 02]). The simulation<br />
volumes are constructed by two distinct parts, one part is the crystal and the other is a virtual<br />
medium, containing the virtual segments as briefly introduced in Sec. 4.3.2. Dislocations are<br />
allowed to leave the crystal volume to the virtual medium in order to keep the dislocation loops<br />
closed. The dislocations however are not allowed to return back into the crystal volume from the<br />
b<br />
+<br />
d<br />
+<br />
O<br />
+...
138 Dislocation-precipitate interactions<br />
Figure 4.37: Examples of surface steps generated by the activation of a single Frank-Read source<br />
in the simulation volume<br />
virtual medium so that the motion of the dislocations in the crystal is not arbitrarily modified by<br />
the virtual dislocations. The activation of the Frank-Read source and the subsequent deformation<br />
of the free surface are shown in Fig. 4.37, which adopts the virtual medium method.<br />
Surface steps and associated dislocation structures<br />
Surface steps are computed and shown in Fig. 4.38(a) for the case of shearable particles after 3<br />
fatigue cycles and in Fig. 4.38(b) for the case of non-shearable particles after 5 fatigue cycles. The<br />
surface steps represent exactly the same characteristics of the associated dislocation structure. The<br />
surface markings are intensively confined in a narrow region in the case of shearable particles as the<br />
slip bands involved are narrow and contain a high density of dislocations (see Fig. 4.24(a)). The<br />
surface markings are dispersed over the free surface in case of non-shearable particles as the wall<br />
thickness of slip bands are relatively wider and there are more than one band inside the crystal as<br />
shown in Fig. 4.24(b).<br />
The differences in the surface markings for each cases can be seen clearly from one dimensional<br />
profiles along a probing line. This displaying method is similar as for experimental results obtained<br />
using atomic force microscopy (AFM). Fig. 4.39 shows such surface profiles along the direction<br />
normal to the primary plane, i.e. [1¯11]. As indicated in Sec. 4.3.3, the simulation of the shearable<br />
particle case has finished just before ɛp reaches zero near 3 fatigue cycles, thus the surface profile<br />
associated with a small plastic strain. The surface marking, however, is significantly wider in the<br />
case of non-shearable particles than in the case of shearable particles as indicated in Fig. 4.39.<br />
Detailed surface morphologies are computed on the surface at the exact location of the intense slip
4.3 Fatigue simulations of materials hardened by particles 139<br />
(a) Case of shearable particles (b) Case of non-shearable particles<br />
Figure 4.38: Surface steps of (a) the simulation volume containing shearable particles after 3 cycles<br />
and (b) the simulation volume containing non-shearable particles after 5 cycles<br />
Suface step (b)<br />
10<br />
5<br />
0<br />
−5<br />
Shearable particle<br />
Non−shearable<br />
particle<br />
−10<br />
0 1 2 3 4 5 6 7 8 9 10 11 12<br />
Probe distant (µm)<br />
Figure 4.39: One dimensional profiles of the surface steps along [1¯11] direction for the case of<br />
shearable particles (dashed curve) and non-shearable particles (solid curve)
140 Dislocation-precipitate interactions<br />
(a) Tongue-like surface slip markings in the case of<br />
shearable particles<br />
(b) Ribbon-like surface slip markings in the case<br />
of non-shearable particles<br />
Figure 4.40: Evolution of detailed surface morphologies computed on the surface of 500 nm width<br />
of (a) the shearable particle case after 1<br />
2<br />
and 4 1<br />
2 cycles<br />
and 2 1<br />
2<br />
cycles (b) the non-shearable particle case after 1<br />
2<br />
bands formed in the volume at ɛp = 0. Fig. 4.40 shows the evolution of the detail of the surface<br />
morphologies from (a) 1<br />
2<br />
4 1<br />
2<br />
cycle to 2 1<br />
2<br />
cycles in the case of sherable particles and (b) 1<br />
2<br />
cycle to<br />
cycles in the case of non-shearable particles. A close examination of these images shows that<br />
the tongue-like slip markings are associated with the intense slip bands in the simulation volumes<br />
containing shearable particles (see Fig. 4.28(b)) and the ribbon-like slip markings are associated<br />
with the intense slip bands of the non-shearable particle case (see Fig.b 4.30(b)). In the shearable<br />
particle case, prismatic loops aligned in the Burgers direction are responsible for the tongue-like slip<br />
markings. The ribbon-like slip markings are related to dislocation structures gliding between the<br />
particles. The length of the ribbon-like marking is closely related to inter-particle distance.<br />
4.3.6 Fatigue properties of materials containing particles with a bimodal size<br />
distribution<br />
Alloys which contain a bimodal size particle distribution are particularly interesting because of<br />
the optimized combination of fatigue properties, i.e. good strength (merit of underaged alloy as<br />
shown in Fig. 4.16(a) ) and cyclic stability (merit of overaged alloy as shown in Fig. 4.16(a)).<br />
Waspaloy ([Clavel & Pineau 82]) is one of the examples which have particles with a bimodal size<br />
distribution.<br />
Three bimodal cases are considered with the same volume fraction of particles vf = 14% but with<br />
different ratio between the number of large (rp = 400 nm) and small (rp = 160 nm) particles. The<br />
number of particles of each size is listed below for the three considered cases, and the simulation<br />
volume is shown in Fig. 4.41 for the ’Bimodal2’ case.
4.3 Fatigue simulations of materials hardened by particles 141<br />
(110)<br />
Unshearable<br />
Shearable<br />
partiicles (r=0.4µ m) partiicles (r=0.16µ m)<br />
Figure 4.41: The simulation volume which contains both rp = 160 nm and rp = 400 nm particles<br />
• Bimodal1 : rp = 160 nm, 2080 particles + rp = 400 nm, 31 particles<br />
• Bimodal2 : rp = 160 nm, 1456 particles + rp = 400 nm, 57 particles<br />
• Bimodal3 : rp = 160 nm, 772 particles + rp = 400 nm, 103 particles<br />
The same volume geometries and material properties are adopted for the three arrangements. The<br />
simulation box is taken as a cylindrical volume, and the particles of each size have the same initial<br />
and final strengths as before (see Sec. 4.3.2). The same loading condition as in the mono-modal<br />
cases is applied, i.e. △ɛp = 1 × 10 −3 and R = −1.<br />
The evolution of the total dislocation densities are compared with the case of the mono-modal size<br />
particles (rp = 160 nm) in Fig. 4.42(a). The total densities and the rates of the dislocation ac-<br />
cumulation decrease as the percentage of the large particles is increased. Fig. 4.42(b) shows that<br />
the slip localization retards with the percentage of large particles. It should be noted that the total<br />
dislocation densities and the slip localization kinetics of all the bimodal cases considered here lie<br />
between the two previously investigated mono-modal cases (see Fig. 4.23 and Fig. 4.27).<br />
A <strong>3D</strong> reoresentation of the dislocation structure is shown in Fig. 4.43(a) for the ’Bimodal2’ case<br />
after four fatigue cycles, and details of the associated intense slip bands are shown in Fig. 4.43(b).<br />
As compared with the mono-modal distribution case (rp = 160 nm, see Fig. 4.28(a)), the slip bands<br />
are more diffuse (the band thickness is larger and the local dislocation density is accordingly lower).<br />
The dislocation structure in the (1¯11) plane shown in Fig. 4.43(a) and the intense slip bands plotted
142 Dislocation-precipitate interactions<br />
ρ[m -2 ]<br />
2.5e+13<br />
2e+13<br />
1.5e+13<br />
1e+13<br />
5e+12<br />
r p =160nm<br />
Bimodal 1<br />
Bimodal 2<br />
Bimodal 3<br />
0<br />
0e0 2e-3 4e-3 6e-3 8e-3 1e-2 1e-2 1e-2 2e-2<br />
*<br />
εVM (a) Evolution of the total dislocation density<br />
Standard deviation<br />
3e+13<br />
2.5e+13<br />
2e+13<br />
1.5e+13<br />
1e+13<br />
5e+12<br />
r p =160nm<br />
Bimodal 1<br />
Bimodal 2<br />
Bimodal 3<br />
0<br />
0e0 2e-3 4e-3 6e-3 8e-3 1e-2 1e-2 1e-2 2e-2<br />
*<br />
εVM (b) Evolution of the standard deviation σρ(t) (Eq. 4.7)<br />
Figure 4.42: Effects of the percentage of large particles on the statistics of fatigue tests<br />
in Fig. 4.43(b) demonstrates that the structural characteristics from the two previous mono-modal<br />
cases coexist in the bimodal case: dense dislocation tangles as well as Orowan loops are observed<br />
around the large particles and ladder-like dislocation structures are formed along the Burgers vec-<br />
tor direction. Both tangles of residual dislocations and long dislocation lines with a relatively high<br />
mobility are visible.<br />
These results are consistent with the experimental observations which show a much more homoge-<br />
neous and stable slip mode than in the shearable particle cases ([Martin 80], [Edwards & Martin 82]).<br />
The effective dispersal of slip by the non-shearable particles can explain the formation of more ho-<br />
mogeneous slip mode.<br />
TEM micrographs of intense slip bands formed in fatigue tested Waspaloy are shown in Fig. 4.44.<br />
The positions of a few of the large particles are indicated to facilitate the visualization in Fig.<br />
4.44(b). Although there exists large discrepancy between the simulation and the experiments con-<br />
cerning the particle sizes and the magnitude of the applied plastic strain 8 , the micrographs simulated<br />
dislocation microstructure, i.e. dislocation tangles are formed around the large particles, residual<br />
dislocations are present between the particles (Fig. 4.44(b)) and slip bands are more diffuse com-<br />
pared to the shearable particle case (Fig. 4.44(a)).<br />
Clear channels of totally sheared particles are no more formed due to the effective dispersal of dis-<br />
location by the large particles. Fig. 4.45(a) shows the distribution of the residual strength of the<br />
small particles after seven fatigue cycles (’Bimodal3’ case). The final particle strength distribution<br />
8 Particle radius=15 nm and 80 nm, Grain size∼ 50 µm, Volume fraction> 40%, △ɛp = 10 −2
4.3 Fatigue simulations of materials hardened by particles 143<br />
(a) <strong>3D</strong> image of intense slip bands (b) Isolation of intense slip band<br />
Figure 4.43: Details of intense slip band of the ’Bimodal2’ case after four fatigue cycles<br />
(a) Slice normal to the primary plane (b) Slice parallel to the primary plane<br />
Figure 4.44: TEM micrographs of intense slip bands of fatigue tested Waspaloy
144 Dislocation-precipitate interactions<br />
log(frequency)<br />
10 0<br />
10 −1<br />
10 −2<br />
10 −3<br />
10<br />
160 180 200 220 240 260 280 300<br />
−4<br />
τ [MPa]<br />
facet<br />
(a) Repartition by the particle strength<br />
(110)<br />
(b) Spatial distribution of sheared particles<br />
Figure 4.45: Statistical and spatial distribution of the strength of small particles<br />
τfacet[MPa]<br />
is significantly broader as compared to the mono-modal case (see Fig. 4.31) between the initial and<br />
the final strength. The spatial distribution of the sheared particles in Fig. 4.45(b) also shows that<br />
the shearing-off of the small particles does not occur in a confined channel but rather in a more<br />
distributed area as compared to in Fig. 4.32(a).<br />
The addition of large particles decreases the degree of cyclic softening seen in the mono-modal case<br />
(rp = 160 nm), and the stress amplitudes of the bimodal cases lie between the two mono-modal<br />
cases presented before (rp = 160 nm and 400 nm) although the stress differences between the dif-<br />
ferent bimodal cases is rather small as shown in Fig. 4.46.<br />
The deformed surface corresponding to the ’Bimodal2’ case is shown in Fig. 4.47(a) after four<br />
fatigue cycles. As the related slip bands are dispersed in the volume, the steps spread over the<br />
free surface to a larger extent than those observed in the shearable particle case (Fig. 4.38(a)).<br />
In addition, the surface step morphologies in Fig. 4.47(b) are shifted from the tongue-like to the<br />
ribbon-like type, although with a lower extent than in the large particle mono-modal case (Fig.<br />
4.40).<br />
4.3.7 Summary<br />
The simulated fatigue properties of materials hardened by shearable and non-shearable particles<br />
are qualitatively in good agreement with experimental observations. Simple geometries are used for<br />
the simulation volume and the particles. The evolution of the particle’s strength by shearing-off<br />
is assumed also in a simplified manner. The differences of microstructural and mechanical fatigue<br />
features can be summarized as follows for the shearable and non-shearable particle cases.<br />
300<br />
280<br />
260<br />
240<br />
220<br />
200<br />
180<br />
160
4.3 Fatigue simulations of materials hardened by particles 145<br />
Stress[MPa]<br />
500<br />
450<br />
400<br />
350<br />
300<br />
250<br />
Shearable<br />
Bimodal 1<br />
Bimodal 2<br />
Bimodal 3<br />
Non-shearable<br />
200<br />
0.0e0 4.0e-3 8.0e-3 1.2e-2 1.6e-2<br />
Cumulative plastic strain<br />
Figure 4.46: Cyclic response curves of the bimodal cases compared with the mono-modal cases<br />
(a) Deformed surface of ’Bimodal2’<br />
case after four fatigue cycles<br />
(b) Evolution of surface steps above intense slip bands<br />
Figure 4.47: Surface morphology of the ’Bimodal2’ case
146 Dislocation-precipitate interactions<br />
1. Material hardened by shearable particles<br />
• High magnitude and accumulation rate of ρtot<br />
• Formation of thin slip bands with high local dislocation density<br />
• Ladder-like structures along the primary Burgers vector direction, and small-sized tangles<br />
between the particles<br />
• Clear channels of particles of low residual strength<br />
• Tongue-like surface markings<br />
• High initial stress amplitude followed by severe softening<br />
2. Material hardened by non-shearable particles<br />
• Low accumulation rate of ρtot<br />
• Larger slip band thickness and reduced inter-band spacing (more than one slip band<br />
formed in the volume)<br />
• Dense tangles around particles and complex dislocation structure between pairs of closely-<br />
spaced particles<br />
• Ribbon-like surface markings<br />
• Intermediate initial stress and rather stable cyclic response<br />
In the shearable particle case, a detailed investigation of the slip band formation shows that they<br />
are made of closely spaced edge dipolar loops. This is due to the limited glide distance of the<br />
double cross-slipped screws having a limited easy glide path. In the non-shearable particle case,<br />
dense dislocation tangles around the particles are attributed to the formation of Orowan loops and<br />
the subsequent interactions with the gliding dislocations, which act as pinning points for relatively<br />
mobile dislocations and contribute to form complex dislocation structures in the vicinity of the large<br />
particles.<br />
The addition of non-shearable particles (the bimodal cases) promotes dispersion of the slip bands,<br />
which results in retarded slip localization and in more diffused slip bands. The stress amplitude<br />
and the characteristics of slip markings are comprised between two mono-modal cases.<br />
It is observed that the large particles (rp = 400 nm) in the bimodal cases are also sheared off<br />
significantly. In Fig. 4.48, the residual facet strength are shown after 7 fatigue cycles for the<br />
bimodal case (Fig. 4.48(a)) and after 6 cycles for the mono-modal case (Fig. 4.48(b)). It is
4.3 Fatigue simulations of materials hardened by particles 147<br />
(110)<br />
(a) Bimodal case<br />
τfacet[MPa]<br />
7400<br />
6850<br />
6300<br />
5750<br />
5200<br />
4650<br />
4100<br />
3550<br />
3000<br />
(110)<br />
(b) Mono-modal case<br />
τfacet[MPa]<br />
Figure 4.48: Comparison of the strength of particles with the radius rp = 400 nm in the (a) bimodal<br />
case after 7 fatigue cycles and (b)mono-modal case after 6 fatigue cycles<br />
apparent that the large particles are more sheared off in the bimodal case, e.g. the minimum value<br />
of the strength is 5128 MP a in Fig. 4.48(a), and 7227 MP a in Fig. 4.48(b). This implies that<br />
significant softening and damaging effects could eventually take place in the large particles present<br />
in a bimodal-sized particle distribution. The small difference in the number of cycles does not seem<br />
to influence much this observation.<br />
Because of the relatively small number of the simulated fatigue cycles, quantitative analyses on the<br />
slip irreversibility are not presented in this work. The computational limitations of the simulations<br />
lie in the maximum possible number of segments (related to memory capacity) and the poor load<br />
balance characteristic (see Fig. 3.21(b)). As already pointed out in Sec. 3.3.6 and 3.4.4, the<br />
computational performance can be further increased by<br />
1. Decomposing the data space<br />
Each processor use only the necessary and sufficient data for computation, and this will allow<br />
to use larger memory for the simulations.<br />
2. Revising the load balance scheme<br />
Fatigue simulations are poor in load balance due to the highly heterogeneous dislocation<br />
microstructure involved, moreover the geometry of the simulation volume (cylinder) as shown<br />
in Fig. 3.21(b) is not easy to decompose in a set of cubic boxes as needed by the parallelization<br />
scheme. Thus a more efficient load balancing scheme is highly desirable.<br />
7400<br />
6850<br />
6300<br />
5750<br />
5200<br />
4650<br />
4100<br />
3550<br />
3000
148 Dislocation-precipitate interactions<br />
Good qualitative agreements of the microstructural and mechanical features of fatigue between the<br />
simulations and the experiments are, however, very promising for the development of fatigue-life<br />
models. It is generally agreed that the irreversible fraction of the cumulative cyclic strain describes<br />
well the fatigue life-controlling mechanisms. Like experimental efforts to well describe the state of<br />
damage ([Coupeau & Grilhe 99], [Cretegny & Saxena 01]), various parameters are measured<br />
during the simulations. Parameters such as the surface topology, slip band width and separation<br />
distance are direct outputs of the simulations. The evolution of the elastic energy inside of the<br />
slip bands is also accessible by post-processing of the simulated dislocation structures. Although<br />
each simulation needs considerable computing time 9 , the flexibility of simulations makes possible an<br />
extensive study on the effects of various parameters like geometries (grain size etc.), particle char-<br />
acteristics (particle size, volume fraction and strength etc.) and boundary conditions (the applied<br />
plastic strain etc.). The compilation of these information will serve to build fatigue crack nucleation<br />
criteria based on the intrinsic microstructural features involved.<br />
Key points<br />
• Image stresses by a <strong>3D</strong> particle are computed using the FEM/DDD coupling code<br />
explained in Sec. 2.4.2. Cylindrical, spherical and cubical particles are considered,<br />
and interaction forces along both glide and climb directions are shown.<br />
• The effect of the elastic modulus difference is investigated focusing on the flow stress<br />
and the subsequent hardening behavior. A simple configuration involving two particles<br />
is used for the computation.<br />
• Fatigue simulations are performed using the new parallel DDD code. The effects<br />
of particles (shearable or non-shearable) on the fatigue properties, like the intense<br />
slip band microstructure, the cyclic mechanical response, and the surface markings<br />
are investigated. The simulated results are compared to the available experimental<br />
observations in a qualitative way. Bimodal particle distributions are also simulated.<br />
The simulations can be used effectively to build fatigue-life models based on the<br />
intrinsic microstructural features involved.<br />
9 It takes 4 − 7 days for the fatigue simulations presented in this work using 9 processors in IBM P690 architecture<br />
supported from KISTI (Korea Institute of Science and Technology Information)
Chapter 5<br />
Conclusions and perspectives<br />
At the beginning of this thesis, we have presented the details of the <strong>3D</strong> discrete dislocation dynam-<br />
ics method. Efforts were given to elucidate the theoretical backgrounds and the assumptions lying<br />
under the method in order to ameliorate and expand the applicability of the method and also to<br />
be a good guidance to new comers in this field, especially to whom are not Francophone, since it<br />
is the first thesis on the French group written in English. The method to discretize the simulation<br />
space and the dislocation lines can be readily applied to other crystal structures, and the anisotropic<br />
stress fields and the various forms of dislocation mobility can be adopted according to the need of<br />
research objects.<br />
Besides the compilation of the existing components, new important elements are added to the<br />
<strong>3D</strong> DDD method, i.e. the computation of the displacement fields of dislocations, the implemen-<br />
tation of the internal interfaces and the periodic boundary conditions. These new features open a<br />
wide range of research areas in which the DDD code can be used.<br />
The computation of the displacement fields has been applied successfully to both the study of<br />
the surface markings during the cyclic loadings and the enforcement of displacement boundary con-<br />
ditions in the code coupled with CAST∃M, although the latter is not presented in this work.<br />
The internal interfaces represented by facets are effectively adopted for the particles in precipitation-<br />
hardened metals. This method can initiate a number of studies which involve the internal interfaces<br />
: the plasticity in a polycrystal and multilayer films ([Verdier 04]) which comprise grain bound-<br />
aries and interfaces between films respectively.<br />
The periodic boundary conditions are applied to the simulation of the Stage I-II transition. It is
150 Conclusions and perspectives<br />
now being applied to study the effect of the polarization of forest dislocations on the critical stress<br />
of a gliding dislocation line, in which the periodicity is forced along the line and the glide direction<br />
of the moving dislocation.<br />
Although it is not extensively studied in this work, the junction formation and its representation<br />
in the dislocation dynamics methods are largely investigated nowadays. Out of many important<br />
issues, the colinear junctions ([Madec et al. 03]) and the glissile junctions 1 are especially of interest.<br />
The usage of the linked-lists of segments and the decomposition of the orthorhombic simulation<br />
volume into homothetic boxes produce a significant increase in the computation efficiency and give<br />
great advantages on computing time with minor errors in the stress computation. This allows mas-<br />
sive simulations of bulk materials under homogeneous loading condition.<br />
Although the computational efficiency of the <strong>3D</strong> DDD method has improved significantly by ap-<br />
plying the box method, the code was still infeasible to incorporate many particles in the simulation<br />
volume. A parallel version of the method has thus been developed.<br />
The distributed memory system and the standard MPI have been chosen to develop a parallel<br />
DDD code since the distributed memory architectures are the major stream of the parallel comput-<br />
ers and it will be for the time being.<br />
The scheme of the new parallel code is designed based on the box method: the boxes dividing the<br />
simulation volume are decomposed into parallel-piped subsystems. The advantages of the parallel<br />
scheme developed in this work are several: most of the serial codes can be used without any mod-<br />
ification and a relatively short period of development time was needed (less than 4 months). The<br />
gained speedup is quite satisfactory anyhow. Especially the efficiency of the internal stress compu-<br />
tation is 100%, thus the anisotropic stress solutions can be incorporated with the same computing<br />
expenses as those of the isotropic solutions by using several processors. The requirement that at<br />
least three boxes should exist along each axis of an individual subsystem, however, puts a certain<br />
limit on the number of processors that can be used simultaneously. Better strategies for the load<br />
balancing and the decomposition of data space would be highly desirable to improve the efficiency<br />
and the applicability of the new parallel code.<br />
Parallel to our efforts, there have been several groups which have converted their own dislocation dy-<br />
1 D. Weygand in the conference ’Dislocations 2004’ held at "La Colle-sur-Loup, France", September 13-17, 2004
namics codes into a parallel version, especially in Lawrence Livermore National Laboratory (LLNL)<br />
and University of California in Los Angeles (UCLA). Both of them use the nodal model.<br />
The image stresses due to a <strong>3D</strong> particle were computed using the FEM/DDD coupling code. The<br />
interaction of a dislocation line with a circular cylindrical, spherical and cubical particle with differ-<br />
ing elastic modulus was investigated. The computation method was validated by comparing with<br />
the corresponding analytical solutions. It was shown that the image stresses need to be taken into<br />
account especially in the study of the local events around the particles, e.g. the computation of the<br />
energy state around a particle and the calculation of the creep threshold stresses at high tempera-<br />
tures.<br />
In these modeling, it is necessary to mesh the whole simulation volume because the geometrical<br />
symmetries are broken by the heterogeneous force boundary conditions due to a dislocation. Conse-<br />
quently, the cost of the FEM computation is relatively high both in term of cpu time and required<br />
memory. The force profiles fitted from the computed data can used as approximation solutions of<br />
interactions due to the elastic modulus mismatch. For the dynamics, however, the use of a parallel<br />
finite element method would be of benefit and will be served as a good tool in studying the plasticity<br />
of multilayer films, for example.<br />
The effect of the elastic modulus mismatch is investigated focusing on the flow stress and the<br />
subsequent hardening behavior using the simple geometry involving two particles. The characteris-<br />
tics of the image stresses (short-ranged and paraelastic) generate minor effects on the yield stresses<br />
but significant effects on the work hardening rate. The image stresses are also found to affect sig-<br />
nificantly the local events such as cross slip and climb.<br />
The fatigue simulations are performed using the internal interfaces represented by facets and the<br />
new parallel DDD program. The characters of shearable and non-shearable particles and the par-<br />
ticle’s strength evolution by shearing-off were represented in a simplified manner by adjusting the<br />
strength of the facets.<br />
Major features of the fatigue properties of materials hardened by shearable and non-shearable par-<br />
ticles are well reproduced by the simulations, e.g. microstructure of the intense slip bands, the<br />
cyclic mechanical response and the surface markings. The simulated results were compared with<br />
the available experimental observations, and showed good agreements in a qualitative way. The<br />
151
152 Conclusions and perspectives<br />
mechanism of the intense slip band formation is proposed from the observation of the simulated<br />
dislocation microstructure.<br />
The flexibility of the simulations can permit an extensive study on the effects of various parameters<br />
like the geometries (grain size etc.), the characteristic of particles (particle size, volume fraction and<br />
strength etc.) and the applied plastic strain. The compilation of these information will serve to<br />
build fatigue crack nucleation criteria based on the intrinsic microstructural features involved.<br />
To build a reliable fatigue life model, it is however imperative to increase the number of fatigue<br />
cycles of the simulations. For this purpose, the efficiency and performance of the new parallel code<br />
need to be improved by adopting better strategies for the load balancing and decomposing the data<br />
space so as to increase the maximum number of dislocation segments and particles.<br />
In conclusion, the methods that we have developed and verified have been applied to the dislocation-<br />
precipitate interactions and opens many paths to new interesting research areas.
Bibliography<br />
[Abraham 97] Abraham F. F., Portrait of a crack: Rapid fracture mechanics using parallel molec-<br />
ular dynamics, IEEE Computational Science & Engineering, Vol. 4, No. 2, 1997, pp. 66–77.<br />
[Aoyama & Nakano 99] Aoyama Y. & Nakano J., Practical MPI Programming(RS/6000 SP),<br />
IBM Redbooks, Vervante, 1999.<br />
[Bacon et al. 73] Bacon D. J., Kocks U. F. & Scattergood R. O., The effect of dislocation self-<br />
interaction on the orowan stress, Phil. Mag., Vol. 28, 1973, p. 1241.<br />
[Barnett 85] Barnett D. M., The displacment field of a triangular dislocation loop, Phil. Mag.<br />
A, Vol. 51, No. 3, 1985, pp. 383–387.<br />
[Bathe 96] Bathe K. J., Finite Element Procedures, Prentice-Hall International, INC., 1996.<br />
[Brown 64] Brown L. M., The self-stress of dislocations and the shape of extended nodes, Phil.<br />
Mag., 1964, pp. 441–466.<br />
[Bulatov et al. 01] Bulatov V. V., Rhee M. & Cai W., Periodic boundary conditions for disloca-<br />
tion dynamics simulations in three dimensions, Mat. Res. Soc. Symp. Proc., ed. by Kubin L. P.,<br />
Selinger R. L., Bassani J. L. & Cho K., 2001.<br />
[Calabrese & Laird 74] Calabrese C. & Laird C., Cyclic stress-strain response of two-phase<br />
alloys, parts i and ii, Materials Science and Engineering, Vol. 13, 1974, pp. 141–174.<br />
[Canova & Kubin 91] Canova G. R. & Kubin L. P., Dislocation microstructures and plastic flow:<br />
a three dimensional simulaiton, Continuum models and discrete systems, ed. by Maugin G. A.,<br />
1991.<br />
[Chen et al. 99] Chen B. T., Zhang T. Y. & Lee J. K., Interaction of an edge dislocation with an<br />
elliptical hole in a rectilinearly anisotropic body, Mech. of Mat., Vol. 31, 1999, p. 71.
154 BIBLIOGRAPHY<br />
[Clavel & Pineau 82] Clavel M. & Pineau A., Fatigue behaviour of two nickel-base alloys i:<br />
Experimental results on low cycle fatigue, fatigue crack propogation and substructures, Materials<br />
Science and Engineering, Vol. 55, 1982, pp. 157–171.<br />
[Cleveringa et al. 97] Cleveringa H. H. M., Giessen E. Vander. & Needleman A., Comparison of<br />
discrete dislocation and continuum plasticity predictions for a composite material, Acta Materi-<br />
alia, Vol. 45, No. 8, 1997, pp. 3163–3179.<br />
[Comninou & Dundurs 72] Comninou M. & Dundurs J., Long-range interaction between a screw<br />
dislocation and a spherical inclusion, J. Appl. Phys., Vol. 43, 1972, p. 2461.<br />
[Coupeau & Grilhe 99] Coupeau C. & Grilhe J., Quantitative analysis of surface effects of plastic<br />
deformation, Materials Science and Engineering A, Vol. 271, 1999, pp. 242–250.<br />
[Cretegny & Saxena 01] Cretegny L. & Saxena A., Afm characterization of the evolution of<br />
surface deformation during fatigue in polycrystalline copper, Acta Materialia, Vol. 49, No. 18,<br />
2001, pp. 3647–3887.<br />
[Demmel et al. 93] Demmel J., Heath M. & van der Vorst H., Parallel numerical linear algebra,<br />
Acta Numerica 1993, 1993.<br />
[Déprés 04] Déprés C., Modèlisation physique des stades précurseurs de l’endommagement en<br />
fatigue, Thèse de PhD, Institut National Polytechnique De Grenoble, 2004.<br />
[Déprés et al. 03] Déprés C., Fivel M., Robertson C. F., Fissolo A. & Verdier M., Etude des<br />
stades précurseurs de l’endommagement en fatigue: expériences et simulations à l’échelle des<br />
dislocations, Journal de Physique IV, Vol. 106, 2003, pp. 81–90.<br />
[Déprés et al. 04] Déprés C., Robertson C. F. & Fivel M., Low-strain fatigue in 316l steel surface<br />
grains: a three dimensional discrete dislocation dynamics modelling of the early cycles. part-1:<br />
Dislocation microstructures and mechanical behaviour, Phil. Mag., Vol. 84, No. 22, 2004, pp.<br />
2257–2275.<br />
[Devincre 95] Devincre B., Three dimensional stress field expressions for straight dislocation<br />
segments, Solid State Communications, Vol. 93, No. 11, 1995, pp. 875–878.<br />
[Devincre et al. 01] Devincre B., Kubin L. P., Lemarchand C. & Madec R., Mesoscopic sim-<br />
ulations of plastic deformation, Materials Science and Engineering, Vol. A309-310, 2001, pp.<br />
211–219.
BIBLIOGRAPHY 155<br />
[Devincre & Roberts 96] Devincre B. & Roberts S., Three-dimensional simulation of<br />
dislocation-crack interactions in b.c.c. metals at the mesoscopic scale, Acta Materialia, Vol. 44,<br />
No. 7, 1996, pp. 2981–2900.<br />
[dewit 67] deWit R., Some relations for straight dislocations, Phys. Stat. Sol., Vol. 20, 1967, pp.<br />
567–573.<br />
[Diehl 56] Diehl J., Z. Metallk., Vol. 47, 1956, p. 331.<br />
[Dongarra et al. 98] Dongarra J. J., Duff I. S., Sorenson D. C. & Vorst H. A., Numerical Linear<br />
Algebra for High Performance Computers, SIAM, Philadelphia, 1998.<br />
[Ebeling & Ashby 66] Ebeling R. & Ashby M. F., Dispersion hardening of copper single crystals,<br />
Phil. Mag., Vol. 13, 1966, p. 805.<br />
[Edwards & Martin 82] Edwards L. & Martin J. W., Proc. of 6th Int. Conf. on the strength of<br />
metals and alloys, ed. by Gifkins R. C., 1982.<br />
[Essmann & Mughrabi 79] Essmann U. & Mughrabi H., Annihilation of dislocations during<br />
tensile and cyclic deformation and limits of dislocation densities, Phil. Mag. A, Vol. 40, No. 6,<br />
1979, pp. 731–756.<br />
[Fahrat & Roux 94] Fahrat C. & Roux F. X., Implicit parallel processing in structural mechanics,<br />
Computational Mechanics Advances, Vol. 2, No. 1, 1994.<br />
[Fisher et al. 53] Fisher J. C., Hart E. W. & Rry R. H., The hardening of metal crystals by<br />
precipitate particles, Acta Materialia, Vol. 1, 1953, p. 336.<br />
[Fivel 97] Fivel M., Études numériques à différentes échelles de la déformation plastique des<br />
monocristaux de structure CFC, Thèse de PhD, Institut National Polytechnique De Grenoble,<br />
1997.<br />
[Fivel & Canova 99] Fivel M. & Canova G. R., Developing rigorous boundary conditions to<br />
simulations of discrete dislocation dynamics, Modelling Simul. Mater. Sci. Eng., Vol. 7, 1999, pp.<br />
753–768.<br />
[Fivel et al. 96] Fivel M., Gosling T. J. & Canova G. R., Implementing image stresses in a 3d<br />
dislocation simulation, Modelling Simul. Mater. Sci. Eng., Vol. 4, No. 6, 1996, pp. 581–596.
156 BIBLIOGRAPHY<br />
[Fivel et al. 98] Fivel M., Robertson C. F., Canova G. R. & Boulanger L., 3d modeling of indent-<br />
induced plastic zone at a mesoscale, Acta Materialia, Vol. 7, 1998, pp. 6183–6194.<br />
[Foreman 67] Foreman A. J. E., The bowing of a dislocation segment, Phil. Mag., Vol. 15, 1967,<br />
pp. 1011–1021.<br />
[Foreman & Makin 66] Foreman A. J. E. & Makin M. J., Dislocation movement through random<br />
arrays of obstacles, Phil. Mag., Vol. 14, 1966, p. 911.<br />
[Fusenig & Nembach 75] Fusenig K. D. & Nembach E., Dynamic dislocation effects in precipi-<br />
tation hardened materials, Acta metall. mater., Vol. 41, 1975, pp. 3181–3189.<br />
[Gerold & Steiner 82] Gerold V. & Steiner D., Fatigue softening in precipitation-hardened<br />
copper-cobalt, Scripta Metallurgica, Vol. 16, 1982, pp. 405–408.<br />
[GG et al. 00] GómezGarcía D., Devincre B. & Kubin L. P., Forest hardening and boundary con-<br />
ditions in 2d simulations of dislocations dynamics, Mat. Res. Soc. Symp. Proc., ed. by Robertson<br />
I. M., Lassila D. H., Devincre B. & Phillips R., 2000.<br />
[Ghoniem et al. 00] Ghoniem N. M., Singh B. N., Sun L. Z. & de la Rubia T. D., Interaction<br />
and accumulation of glissile defect clusters near dislocations, J. Nucl. Mater., Vol. 276, 2000, pp.<br />
166–177.<br />
[Giessen & Needleman 95] Giessen E. Vander. & Needleman A., Discrete dislocation plasticity:<br />
a simple planar model, Modelling Simul. Mater. Sci. Eng., Vol. 3, 1995, pp. 689–735.<br />
[Graf & Hornbogen 78] Graf M. & Hornbogen E., The effect of inhomogeneity of cyclic strain<br />
on initiation of cracks, Scripta Metallurgica, Vol. 12, 1978, pp. 147–150.<br />
[Gullouglu et al. 89] Gullouglu A. N., Srolovitz D. J., Lesar R. & Lomdahl P. S., Dislocation<br />
distributions in two dimensions, Scripta Metallurgica, Vol. 23, 1989, p. 1347.<br />
[Hirth & Lothe 92] Hirth J. P. & Lothe J., Theory of Dislocations, Krieger Publishing Company,<br />
Malabar, Florida, 1992.<br />
[Hull & Bacon 83] Hull D. & Bacon D. J., Introduction to Dislocations, Pergamon Press, p96,<br />
1983.<br />
[Humphreys & Martin 67] Humphreys F. J. & Martin J. W., Phil. Mag., Vol. 16, 1967, p. 927.
BIBLIOGRAPHY 157<br />
[Khraishi et al. 00a] Khraishi T. A., Zbib H. M., Hirth J. P. & de la Rubia T. D., The stress field<br />
of a general circular volterra dislocation loop: Analytical and numerical approches, Phil. Mag.,<br />
Vol. 80, 2000, pp. 95–105.<br />
[Khraishi et al. 00b] Khraishi T. A., Zbib H. M., Hirth J. P. & Khaleel M., The displacement,<br />
and strain-stress fields of a general circular volterra dislocation loop, Int. J. Eng. Sci., Vol. 80,<br />
2000, pp. 251–266.<br />
[Kobashi & Ohr 80] Kobashi S. & Ohr S. M., Phil. Mag. A, Vol. 42, 1980, p. 763.<br />
[Kocks et al. 75] Kocks U. F., Argon A. S. & Ashby M. F., Thermodynamics and kinetics of slip,<br />
Progress in Materials Science, ed. by Kubin L. P., Selinger R. L., Bassani J. L. & Cho K., 1975.<br />
[Lee & Laird 83] Lee J. K. & Laird C., Strain localization during fatigue of precipitation-hardened<br />
aluminium alloys, Phil. Mag., Vol. 47A, 1983, pp. 579–597.<br />
[Lépinoux & Kubin 87] Lépinoux J. & Kubin L. P., The dynamic organization of dislocation<br />
structures: a simulation, Scripta Metallurgica, Vol. 21, 1987, pp. 833–837.<br />
[Li 64] Li J. C. M., Stress field of a dislocation segment, Phil. Mag., Vol. 10, 1964, pp. 1097–1098.<br />
[Li & Laird 94] Li Y. & Laird C., Cyclic response and dislocation structures of aisi 316l stainless<br />
steel. part 1: Single crystals fatigued at intermediate strain amplitude., Materials Science and<br />
Engineering A, Vol. 186, No. 1–2, 1994, pp. 65–86.<br />
[Madec 01] Madec R., Des intersections entre dislocations a la plasticité du monocristal CFC;<br />
Étude par dynamique des dislocations, Thèse de PhD, Universite Paris XI Orsay, 2001.<br />
[Madec et al. 03] Madec R., Devincre B., Kubin L. P., Hoc T. & Rodney D., The role of collinear<br />
interaction in dislocation-induced hardening, Science, Vol. 301, No. 26, 2003, pp. 1879–1882.<br />
[Madec et al. 04] Madec R., Devincre B. & Kubin L. P., On the use of periodic boundary condi-<br />
tions in dislocation dynamcis simulation, Mesoscopic Dynamics in Fracture Process and Stresngth<br />
of Materials, ed. by Shibutani Y. & Kitagawa H., 2004.<br />
[Man et al. 02] Man J., Obrtlik K., Blochwitz C. & Polák J., Atomic force microscopy of surface<br />
relief in individual grains of fatigued 316l austenitic stainless steel, Acta Materialia, Vol. 50, 2002,<br />
pp. 3767–3780.
158 BIBLIOGRAPHY<br />
[Marquis & Dunand 02] Marquis E. A. & Dunand D. C., Model for creep threshold stress in<br />
precipitation-strengthened alloys with coherent particles, Scripta Materialia, Vol. 47, 2002, p.<br />
503.<br />
[Martin 80] Martin J. W., Micromechanisms in particle-hardened alloys, Cambrideg Solide State<br />
Science Series, ed. by Cahn R. W., Thompson M. W. & Ward I. M., 1980.<br />
[Mason 68] Mason W. P., Dislocation dynamics, MacGraw-Hill, 1968.<br />
[Melander & Persson 78] Melander A. & Persson P. A., The strength of a precipitation hard-<br />
ened alznmg alloy, Acta Materialia, Vol. 26, 1978, p. 267.<br />
[Mohles & Nembach 01] Mohles V. & Nembach E., The peak- and overaged states of particle<br />
strengthened materials: computer simulations, Acta Materialia, Vol. 49, 2001, p. 2405.<br />
[Moore 65] Moore G. E., Cramming more components onto integrated circuits, Electronics,<br />
Vol. 38, No. 8, 1965.<br />
[Mughrabi 83] Mughrabi H., Deformation of multi-phase and particle containing materials, Pro-<br />
ceedings of the 4th Risø International Symposium on Metallurgy and Materials Science, ed. by<br />
Bilde-Sørensen J. B., Hansen N., Horsewell A., Leffers T. & Lilholt H., 1983.<br />
[Mughrabi 85] Mughrabi H., Dislocation Properties in Real Materials, Book No. 323, The Institute<br />
of Metals, London, 1985.<br />
[Nembach 83] Nembach E., Phys. Stat. Sol., Vol. 78, 1983, p. 571.<br />
[Nembach 97] Nembach E., Particle strengthening of metals and alloys, John Wiley and Sons,<br />
1997.<br />
[Obrtlik et al. 94] Obrtlik K., Kruml T. & Polák J., Dislocation structures in 316l stainless steel<br />
cycled with plastic strain amplitudes over a wide interval., Materials Science and Engineering A,<br />
Vol. 187, No. 1, 1994, pp. 1–10.<br />
[Reppich 93] Reppich B., Particle strengthinig, Mater. Sci. Technol., Vol. 6, 1993, pp. 311–357.<br />
[Rhee et al. 01] Rhee M., Stolken J. S., Bulatov V. V., de la Rubia T. D., Zbib H. M. & Hirth<br />
J. P., Dislocation stress fields for dynamic codes using anisotropic elasticity: methodology and<br />
analysis, Materials Science and Engineering, Vol. A309-310, 2001, pp. 288–293.
BIBLIOGRAPHY 159<br />
[Risbet et al. 03] Risbet M., Feaugas X., Guillemer-Neel C. & Clavel M., Use of atomic force<br />
microscopy to quantify slip irreversibility in a nickel-base superalloy, Scripta Materialia, Vol. 49,<br />
2003, pp. 533–538.<br />
[Rodney & Phillips 99] Rodney D. & Phillips R., Structure and strength of dislocation junctions:<br />
an atomic-level analysis, Phy. Rev. Lett., Vol. 82, 1999, pp. 1704–1707.<br />
[Santare & Keer 86] Santare M. H. & Keer L. M., Interaction between an edge dislocation and<br />
a rigid elliptical inclusion, J. Appl. Mech., Vol. 53, 1986, p. 382.<br />
[Schmid & Boas 35] Schmid E. & Boas W., Kristallplastizitat, Springer Verlag(Berlin), 1935.<br />
[Schwarz 99] Schwarz K. W., Simulation of dislocations on the mesoscpoic scale. i. methods and<br />
examples, J. Appl. Phys., Vol. 85, No. 1, 1999, pp. 108–119.<br />
[Shenoy et al. 00] Shenoy V. B., Kukta R. V. & Phillips R., Mesoscopic analysis of structure<br />
and strength of dislocatoin junctions in fcc metals, Phy. Rev. Lett., Vol. 84, No. 7, 2000, pp.<br />
1491–1494.<br />
[Shin et al. 01] Shin C. S., Fivel M., Rodney D., Phillips R., Shenoy V. B. & Dupuy L., Forma-<br />
tion and strength of junctions in fcc metals : a study by dislocation simulation and atomistic<br />
simulations, Journal de Physique IV, Vol. 11, No. Pr5, 2001, pp. 19–26.<br />
[Stoltz & Pineau 78] Stoltz R. E. & Pineau A., Dislocation-precipitate interaction and cyclic<br />
stress-strain behavior of a γ’-strengthened superalloy, Materials Science and Engineering, Vol. 34,<br />
1978, pp. 275–284.<br />
[Suresh 98] Suresh S., Fatigue of Materials, 2nd edi., Cambridge University Press, 1998.<br />
[Tang et al. 98] Tang M., Kubin L. P. & Canova G. R., Dislocation moility and the mechanical<br />
response of bcc single crystals: a mesoscopic approach, Acta Materialia, Vol. 46, 1998, p. 9.<br />
[Urabe & Weertman 75] Urabe N. & Weertman J., Dislocation mobility in potassium and iron<br />
single crystals, Materials Science and Engineering, Vol. 18, 1975, p. 41.<br />
[Verdier 04] Verdier M., Plasticity in fine scale semi-coherent metallic films and multilayers,<br />
Scripta Materialia, Vol. 50, No. 6, 2004, pp. 769–773.
160 BIBLIOGRAPHY<br />
[Verdier et al. 98] Verdier M., Fivel M. & Groma I., Mesoscopic scale simulation of dislocation<br />
dynamic in fcc metals: Principle and applications, Modelling Simul. Mater. Sci. Eng., Vol. 6, No.<br />
6, 1998, pp. 755–770.<br />
[Vitek 75] Vitek V., Yielding from a crack with finite root-radius loaded in uniform tension, J.<br />
Mech. Phys. Solids, Vol. 24, 1975, p. 67.<br />
[Weeks et al. 69] Weeks R. W., Pati S. R., Ashby M. F. & Barrand P., The elastic interaction<br />
between a straight dislocation and a bubble or a particle, Acta Metallurgica, Vol. 17, 1969, p.<br />
1403.<br />
[Weygand et al. 01] Weygand D., Friedman L. H., Giessen E. Vander. & Needleman A., Discrete<br />
dislocation modeling in tree-dimensional confined volumes, Materials Science and Engineering,<br />
Vol. A309-310, 2001, p. 420.<br />
[Weygand et al. 02] Weygand D., Friedman L. H. & Giessen E. Vander., Aspect of boundary-<br />
value problem solutions with three-dimensional dislocation dynamics, Modelling Simul. Mater.<br />
Sci. Eng., Vol. 10, 2002, pp. 437–468.<br />
[Zbib et al. 98] Zbib H. M., Rhee M. & Hirth J. P., On plastic deformation and the dynamics of<br />
3d dislocations, Int. J. Mch. Sci., Vol. Nos, No. 2-3, 1998, pp. 113–127.<br />
[Zhou & Lung 88] Zhou S. J. & Lung C. W., An image force expression for the dislocation near<br />
a crack, J. Phys. F: Met. Phys., Vol. 18, 1988, p. 851.<br />
[Zhu & Starke 99] Zhu A. W. & Starke E. A., Computer experiment on superposition of strength-<br />
ening effects of different particles, Acta Materialia, Vol. 47, 1999, p. 3263.