Methodology

Evaluating European performance on the world stage for one particular year seems a reasonably straightforward exercise. The question, after all, is relatively simple: “Did Europeans do well or badly in this calendar year?” However, devising a methodology in order to make a rigorous and consistent judgment across issues and over time is a tricky enterprise that is fraught with unsatisfying trade-offs and inevitable simplifications. Before explaining the methodology used in the Scorecard, we discuss some of the difficulties and dilemmas we faced while devising and revising the methodology. This discussion is meant to offer some perspective on the choices we made – and the revisions which we have introduced in 2015 after a five year review of the project - and to ensure full transparency about the results.

Evaluating European foreign policy performance

Among the many difficulties involved with evaluating Europe’s performance in its external relations, two stand out: the problematic definition of success in foreign policy; and the rigidity of the time frame used.

What is a good European foreign policy?

The nature of international politics is such that “success” and “failure” are not as easily defined as they would be in other public-policy areas. In particular, there is no quantitative tool that can adequately capture performance in foreign policy as in economic policy or social policy (e.g. unemployment rate, crime rate, pollution levels, etc.). Diplomacy is more often about managing problems than fixing them, biding time, choosing the worst of two evils, finding an exit strategy, saving face, etc. States often pursue multiple objectives, and their order of priority is often unclear or disputed. This, of course, is even truer in the case of Europe, in which two member states might have different views on what exact mix of objectives met during the year constitutes success in one policy area, even when they agree on common objectives.

This difficulty is compounded by the heterogeneous nature of foreign policy. Europeans expect their authorities to solve the Israeli-Palestinian conflict, to prevent the proliferation of nuclear weapons, to turn Bosnia and Herzegovina into a functioning state, to protect ships from pirates in the Gulf of Aden, to stabilise the eastern neighbourhood, to defend European values at the UN and speak up for human rights, to convince other countries to fight climate change, to open foreign markets for exporters, to impose European norms and standards to importers, and so on. “Success” is defined very differently in each case: it can be a matter of convincing other actors in a negotiation, building diplomatic coalitions, delivering humanitarian aid on the ground, imposing peace on a region torn by civil unrest, building a state, spreading global norms, etc. Moreover, Europe has very different abilities in each of them, not unlike the way that a student has different abilities in various subjects (e.g. mathematics, languages, physical education, etc.). This makes a unified grading system problematic by creating a dilemma between respecting the specificity of each “subject” on the one hand and ensuring that evaluations are comparable across the Scorecard on the other.

Grading the rate of success of Europeans (the “outcome” score) relies on a comparison between the European objectives and the outcome for each year. But the problem mentioned above resurfaces: who speaks for Europe? There is rarely a single entity to define what the European interest is – what priorities and trade-offs are desirable when conflicting objectives exist. Even where there is broad agreement on a policy, official texts will rarely present the real extent of European objectives, or will do it in vague, consensual terms. Therefore, simply comparing stated objectives with results would have led to an incomplete assessment of performance. It is generally necessary for us to go further and spell out explicitly what the European objectives were in one particular domain in order to compare them to results – a difficult and eminently political exercise.

What’s more, the causal link between one specific set of European policies on the one hand and results on the other is problematic. European objectives can sometimes be met regardless of the European policy put in place to achieve them. For example, independent factors might have modified the context in which actors operate (e.g. forest fires in Russia, rather than EU influence, led to a different attitude of Moscow towards climate change), or other states might have helped to attain the objectives sought by Europeans (e.g. the United States in getting China to support sanctions against Iran). But the opposite can also be true: failure can happen even with the optimal policies in place (e.g. the US Congress decision to abandon cap-and-trade legislation in spite of best efforts by Europeans to convince them otherwise).

This problem of causal disjuncture between policy and result led us to make two choices for the Scorecard. First, we do not try to sort out the reasons for European “success”, let alone try to offer a co-efficient of European agency or credit. While we always specify other factors that contributed to a positive outcome, we deem Europeans to be successful if their objectives were met. In other words, they are not penalised for having been helped by others. Second, we clearly separate policy from results. The grade for each component reflects an equal balance between input - that is unity and resource (graded out of 10) and outcome – that is ‘strategy’ and ‘impact’ (graded out of 10) so that the reader can better appreciate the problematic correlation between the two. Very good policies and best efforts can meet outright failure (e.g. the failure to get the US Congress to move on climate change). However, the opposite situation rarely occurs: luck, it turns out, is not so prevalent in international affairs.

Beyond the question of merits and results lies the question of expectations. If the Scorecard has to spell out what European objectives were, it also has to define the yardstick for success, in the absence of obvious or absolute reference points to assess the underlying level of difficulty – and hence the level of success – in each area. We relied on judgment, based in each case on an implicit alternative universe representing the optimal input and outcome, against which actual European performance was measured. But while it was based on extensive expertise, this approach was necessarily subjective. This is particularly the case because, while it had to be realistic, it also had to avoid either lowering ambitions excessively or demanding impossible results. This is where the political and sometimes even subjective nature of the Scorecard is greatest.

It should also be noted that the relative nature of our judgment and the question of expectations contain an even more political question, that of European leverage – and, this time, the difficulty concerns both the policy score (i.e. “unity” and “resources”) and the results score (i.e. “strategy” and “impact”). We evaluate performance in the context of the calendar year, and try to be politically realistic about European possibilities, about what resources could be mobilised in support of a particular policy. But some observers might object that with some extra will or leadership by the main actors, additional resources could have been mustered to increase European leverage, to the point of completely reconfiguring the political context of a particular issue. For example, on the Israeli-Palestinian conflict, some argue that Europe should take much more drastic and aggressive measures to reach its objectives. For example, it could unilaterally recognise a Palestinian state at the United Nations and bilaterally, or cease its Association Agreement with Israel and impose other trade sanctions. Admitting such proposals as realistic would change the score for “resources” and might potentially have changed the “outcome” grade as well. Here again, we have to make judgment calls about the adequacy of resources in the current European foreign-policy debate as we see it. It remains, however, a political judgment.

When does the clock stop?

A second set of problems has to do with the time frame of the Scorecard. Evaluating foreign policy performance is difficult enough, but it becomes even more difficult when you only consider events that took place during one calendar year. It is well known that some past policies that have yielded remarkable results in the short term proved less effective, and sometimes even disastrous, in the long term – for example, western support for the mujahedeen in Afghanistan in the 1980s. The cost of some policy decisions has gradually increased over time – for example, the admission of Cyprus as an EU member state in the absence of resolution of the Northern Cyprus problem. Since the Scorecard is an annual exercise, this is inevitably become an issue, especially when policies and actions which we rated highly at the beginning prove less compelling in a few years, and vice versa. An example is the process of enlargement in the Western Balkans, which was amongst the highest scoring components in the Scorecard 2014, but lost momentum rapidly in the following year.

This dilemma is especially important when it comes to common security and foreign policy, since many aspects of the foreign relations of the EU take the form of long-term aid, development and rule of law programmes rather than short-term political initiatives. The Scorecard tries to strike a balance between recognising the specificity, assets and successes of Europe as a different, new type of international power on the one hand, and considering Europe as a traditional great power, in the league of the US, China or Russia, on the other hand – a role it cannot escape in today’s world.

This is why, following a five year review of the Scorecard as a project, we have decided to split the ‘outcome’ score explicitly into two going forward, between ‘strategy’ and ‘impact’. Although the sum of these two scores will still be comparable to the ‘ outcome’ scores (out of 10) in the first five editions of the Scorecard, the discipline of making explicit the part of those ten points that looks at impact in a particular year (5 points), and the part ( the other 5 points) that looks at whether the policy is well designed, allows us to value policies that may have a positive effect over the long term even if they have no chance of doing so in the current year. The Iran talks in the years before the interim deal in 2013 are a case in point.

This problem of normative judgment leads to a more general question: how much shall we take into account things Europe is not doing? For example, should Europe get a bad grade because it has been largely absent for years in maritime disputes in Asia, where the future of world peace might be at stake? As discussed earlier, we have tried to strike a balance in the Scorecard. On the one hand, we have graded existing policies and taken into account the specificity of EU foreign policy and what Europe actually is (i.e. long-term programmes and a certain vision of what the international system should be). On the other hand, we have graded according to “great power” norms, emphasising what Europe ultimately should be (e.g. an assertive power playing the multi-polar game).

The points above illustrate the difficulties and dilemmas involved in devising a methodology that can withstand criticism. This is why we call this project a scorecard rather than an index. Indices use hard quantitative data (e.g. UNDP’s Human Development Index; Brookings’ Iraq Index) or scores given by observers to qualitative data (e.g. Freedom House’s Freedom in the World or Freedom in the Press indices; Transparency International’s Corruption Perceptions Index), or a mixture of both (Institute for Economics and Peace’s Global Peace Index; Legatum Institute’s Prosperity Index). A scorecard, on the other hand, is transparent about the subjective nature of judgment and the heterogeneity of the material it grades, and is therefore a better tool for appraising foreign-policy performance. After all, the grades one gets in school are a function of the particular teacher doing the grading and are based on different criteria for each subject. However, this neither prevents the Scorecard from being significant nor means that grades are purely arbitrary, especially when overall results are based on an average of a large number of exercises and as consistent a scale across the board and over time as is feasible.

Nevertheless, quantitative data on the balance of power in the EU’s relationships with other regions is instructive. For the first time in this edition of the Scorecard, we have included in the chapter introductions some of the key data which affect the European relationship with the country or region in question to give some broader context to the individual policies looked at in the components, and some explanation as to the level of ambition involved in the policy decisions made.

Categorisation of member states

In the 2012 edition of the Scorecard, we began to explore the role played by individual member states in European foreign policy as well as evaluating European performance as a whole. However, we chose to add this second dimension of assessment on only a small number of issues because in many cases – particularly those where member states have empowered the EU institutions to negotiate or otherwise act on their behalf – it would make little sense to compare and contrast the roles they played. Focussing in on a limited number of countries also means that we can devote some space to explaining the reasoning behind the categorisation in the chapter introduction.

In each of these components we categorised some member states as a “leaders” and others as “slackers”. Other member states were simply “supporters” of common and constructive policies that were in our view in the European interest – a kind of default category that can encompass many different attitudes, from active support to passive acquiescence. Clearly, categorising member states in this way is not an exact science. Like the grading of European performance as a whole, each categorisation of a member state involved a political judgement and should therefore not be considered definitive. In particular, it assumes a normative judgement on what constitutes a policy that is in the European interest. In addition, given the diverse nature of the components of European foreign policy in the Scorecard, what it means to be a “leader” or “slacker” varies in each case.

We identified member states as “leaders” when they either took initiative in a constructive way or acted in an exemplary way (for example by devoting disproportionate resources). In other words, it is possible for member states to “lead” either directly (in other words by forcing or persuading member states to take action) or indirectly (“leading by example”). Thus, in Scorecard 2014, on the one hand we identified France as “leader” on responding to crises in Mali, CAR South Sudan because they took initiative in pushing for military intervention, leading by example in offering troop contributions. On the other hand we identify countries as leaders on questions of development aid when they reach the agreed amount of aid or increase their funding.

Conversely, we identified member states as “slackers” when they either impeded or blocked the development of policies that serve the European interest in order to pursue their own narrowly defined or short-term national interests or did not pull their weight (for example by failing to devote proportionate resources). In other words, it is also possible for member states to “slack” either directly (by preventing member states taking action) or indirectly (setting a bad example). Thus we identified the United Kingdom as a “slacker” on supporting a common policy on investment and market access in China in Scorecard 2015, because it consistently prioritised bilateral trade relations with China. We identify member states as “slackers” on development aid when they either fail to increase low levels of aid or cut their aid budgets.

Clearly, year by year, the European External Action Service (EEAS) – a new institution when the Scorecard began, but now a fully established network – plays an increasingly significant role in many of the components that the Scorecard looks at, and indeed on many of the policy issues that the member state categorisations consider. However, it does not seem helpful to categorise the EEAS in the same way as a member state since its role is to co-ordinate and amplify the sum of the member states positions, and it does not have its own domestic drivers for its actions. Nevertheless, in many areas, its growing role deserves attention (and conversely, where it is not playing as significant role as it might, this may be worthy of comment). We have therefore introduced in Scorecard 2016 a section in each chapter introduction which looks at the role that the EEAS and other relevant EU institutions have played in Europe’s relationship with the region or country in question. At this stage, the assessment is qualitative: scores are not attributed for the EU institutions’ performance, and we note the contribution of whichever institutions are relevant in each context, including the Commission, the Parliament, and financial bodies where appropriate, but this part of

Explanation of research process

The Scorecard functions in four phases throughout the year:
In the first phase, experts for each of the six chapters (Russia; MENA: Wider Europe; United States; Asia and China; Multilateral issues and Crisis Management) draw up the list of “sub-issues” and “components” – the discrete elements that the Scorecard actually evaluates for in each year. This choice, obviously, is fundamental as it determines what we are assessing within each of the six “issues” and is therefore the subject of intense discussion within the ECFR team and with the steering group.
The experts then provide preliminary assessments of European performance (for the period running from January to September) in each “component”, based on their own knowledge and a range of interviews with officials and specialists. In particular, they identified European objectives – a key precondition for evaluating performance.
The Scorecard team and the experts then devise questions for member states in order to better understand the dynamics of each component. In the third phase (from October to November), questionnaires are sent to researchers in each of the 28 member states, to collect information from officials and analysts in their country on key decision points in the year. This provides a much more granular image of European external relations on critical issues and forms the basis of the leader/ slacker categorisations for member states. This national research process will contribute to quantitative data in each chapter introduction on how member states are allocating their resource in each region.
In this same phase, a separate researcher works on building up the picture of where EU institutions have been active on each of the reasons that the six chapters look at, in order to ensure that the full picture of ‘European’ effort in the year can be reflected in the introduction to each chapter.
In the final phase (December to January), experts write the final assessments and the introductions for each issue, and the whole Scorecard is moderated within the ECFR team and by the steering group. At this point, final scores are agreed, categorisations of member states are tested, and a chapter grade (a letter A to D) awarded . Going forward, the chapter grade will be delinked from the numerical scores for the components in order to allow for the weighting within a chapter to be adjusted. This allows us to give greater prominence to critical initiatives such as the EU sanctions on Russia in 2014, which would otherwise have the same weighting as the other components within the chapter (including relations on the Arctic, or on press freedom, which were much less critical in 2014.

#Scorecard2015 Tweets