Psychometric validation of the Female Sexual Distress Scale-Desire/Arousal/Orgasm

For the treatment of female sexual dysfunction, the most relevant outcome measures are patient-reported treatment effects and changes in symptoms, underscoring the need for reliable, validated patient-reported outcome (PRO) instruments. The aim of this study was to evaluate the psychometric characteristics (validity and reliability) of the Female Sexual Distress Scale-Desire/Arousal/Orgasm (FSDS-DAO) PRO measure, which was adapted from the validated FSDS-Revised (FSDS-R) questionnaire and added 2 questions involving arousal and orgasm.

Methods

Psychometric analyses were based on the data from a multicenter phase 2b dose-finding study that compared the safety and efficacy of bremelanotide versus placebo and were conducted in the evaluable modified intent-to-treat population (N = 325) from that study. Psychometric evaluation of the new items in the FSDS-DAO included confirmatory factor analyses, tests of internal consistency and test–retest reliability, examinations of convergent and discriminant validity, and determination of responsiveness. The validity of the FSDS-DAO was evaluated based on previously developed instruments, including the Female Sexual Function Index (FSFI), General Assessment Questionnaire (GAQ), Women’s Inventory of Treatment Satisfaction (WITS-9), and Female Sexual Encounter Profile-Revised (FSEP-R).

Results

Confirmatory factor analyses demonstrated that the FSDS-DAO items fit very well (Bentler’s comparative fit index of 0.929). Cronbach’s α for the FSDS-DAO total score was ≥ 0.91 at Visits 1, 2, 5, and 12, demonstrating adequate internal consistency reliability. Test–retest reliability was acceptable with an intra-class coefficient of 0.61 and a Spearman’s correlation coefficient score of 0.62 between Visits 1 and 2 (4 weeks). Acceptable construct validity was demonstrated by significant correlations with related PRO scales in the expected directions and magnitude. For example, participants reporting the worst levels of sexual function on the FSFI also showed the worst FSDS-DAO scores at Visits 5 and 12. The FSDS-DAO total score was responsive to change.

Conclusions

Evidence supports the validity and reliability of the FSDS-DAO for assessing sexually related distress in women with female sexual arousal disorder and/or hypoactive sexual desire disorder; the addition of the arousal and orgasm items did not impact the validity and reliability of the measure.

Clinical Trial Registration ClinicalTrials.gov NCT01382719.

Background

Female sexual dysfunction (FSD) comprises a group of common conditions with physiological, psychological, and social components [1]. The most prevalent sexual dysfunction among women is hypoactive sexual desire disorder (HSDD), which is defined as persistent or recurrent diminished or lack of desire for sexual activity accompanied by personal distress, not due to medications or existing medical or relationship issues [2, 3]. The presence of distress experienced by women with HSDD has important implications for diagnosis and treatment [4, 5].

The Female Sexual Distress Scale (FSDS) [6], a 12-item patient-reported outcome (PRO) instrument, was developed to measure sexually related personal distress in women. Both the original 12-item version and the 13-item FSDS-Revised (FSDS-R) [7] version have been shown to have a high degree of internal consistency, test–retest reliability, and discriminative validity to distinguish between sexual function and dysfunction among women. The original FSDS was psychometrically evaluated in 2002 [6]. In 2008, the FSDS-R was created following the addition of Question 13 (bothered by low sexual desire) and was psychometrically evaluated primarily in postmenopausal women [7]. Moderate positive correlations with other conceptually related nonsexual measures of distress have also been noted. In response to recommendations from key FSD opinion leaders and the US Food and Drug Administration (FDA), the FSDS-R was recently modified by the addition of 2 items. This newest version, named the Female Sexual Distress Scale-Desire/Arousal/Orgasm (FSDS-DAO), was also adapted to be completed electronically using a handheld device.

The objective of the current analysis was to evaluate the psychometric characteristics (reliability and validity) of the FSDS-DAO using data from a large (N = 327), multicenter, placebo-controlled, phase 2b, dose-finding study of bremelanotide (PT-141), an investigational, novel cyclic 7-amino acid melanocortin receptor agonist with a high affinity for the type-4 receptor [8] that is currently being evaluated for the treatment of HSDD (with or without decreased arousal) in premenopausal women (ClinicalTrials.gov Identifier: NCT01382719) [9].

Methods

Study participants

Study design

This multicenter, randomized, placebo-controlled, dose-finding study was conducted at 68 sites in the United States and Canada. All participants underwent a 4-week, no-treatment screening/qualification period, followed by a 4-week, single-blind self-dosing (placebo-only) period to establish baseline and were then randomized to self-administer placebo or 3 different doses of bremelanotide (0.75, 1.25, or 1.75 mg) as desired over 12 weeks [9]. In the phase 2b study, the primary efficacy endpoint was the change from baseline to the end of the study in the number of satisfying sexual encounters (SSEs) as assessed by the Female Sexual Encounter Profile-Revised questionnaire (FSEP-R Q10) [9]. Other PRO measures were the Female Sexual Distress Scale-Desire/Arousal/Orgasm (FSDS-DAO), Female Sexual Function Index (FSFI), Sexual Interest and Desire Inventory (SIDI-F), General Assessment Questionnaire (GAQ), and Women’s Inventory of Treatment Satisfaction (WITS-9).

PRO outcomes were assessed at various time points throughout the trial to observe changes over time, including baseline, early in the trial, and at the trial endpoint. Time points varied as not all PRO instruments were administered at each time point: FSEP-R was completed after each sexual encounter, while other PRO outcomes were assessed at Weeks 0 and 4 (Visits 1 and 2), with the exception of GAQ and WITS-9, and at Weeks 10, 16, 20, and 23 (Visits 5, 10, 11, and 12, respectively). All PRO instruments were completed by participants using an electronic handheld device (eDiary). In addition, the SIDI-F was also completed via interview by clinical research staff. In this analysis, data from Weeks 0, 4, 10, and 23 were used for psychometric evaluation.

PRO measures

Female Sexual Distress Scale-Desire/Arousal/Orgasm (FSDS-DAO)

The 15-item FSDS-DAO retains the 13 items from the Likert-type FSDS-R scale, which has evidence supporting reliability and validity [6, 7]. The FSDS-DAO includes 2 new items that ask women to rate their level of distress related to arousal and orgasm. As with previous versions of the FSDS, participant responses to “How often did you feel concerned with difficulties with sexual arousal?” and “How often did you feel frustrated by problems with orgasm?” are provided using a polytomous response scale ranging from 0 (never) to 4 (always). Subjects who met eligibility criteria completed the FSDS-DAO with a 30-day recall at baseline and at Visits 2, 5, 10, 11, and 12. The total score is calculated as the sum of the responses and ranges from 0 to 60, with higher scores indicating a greater level of distress. The total score on the FSDS-R can range from 0 to 52 [11]. For the purposes of this analysis, we present data on the FSDS-DAO for Visit 2 (baseline), Visit 5, and Visit 12 because of the 30-day recall period. The windows between Visits 10 and 11 and Visits 11 and 12 are only 28 and 21 days apart, respectively. Thus, Visits 10 and 11 data were not included in the analyses to reduce overlap in the assessments.

Psychometric evaluation of the FSDS-DAO was undertaken against the following PRO instruments described below. In addition, the analysis was repeated using the FSDS-R, which does not include the arousal and orgasm items in order to provide a comparison between it and the FSDS-DAO.

Female Sexual Function Index (FSFI)

The FSFI is a 19-item measure of female sexual function consisting of 6 domains: desire, arousal, lubrication, orgasm, satisfaction, and pain [10, 12]. Scores for the arousal, lubrication, orgasm, and pain domains range from 0 to 6 using Likert-type scales. Scores for desire range from 1.2 to 6.0, and those for satisfaction range from 0.8 to 6.0. The total score is the sum of the domain scores and ranges from 2 to 36, and the recall period is the past 4 weeks. Higher scores indicate a better level of sexual function.

Female Sexual Encounter Profile-Revised (FSEP-R)

The FSEP-R is a 10-item instrument that is designed to assess sexual encounters, including initiation, level of desire, satisfaction with arousal, lubrication, arousal, ability to achieve orgasm, and satisfaction with the sexual encounter [13]. Participants completed the FSEP-R within 24 h of a sexual encounter. A “sexual encounter” is defined as any act involving sexual contact with genitalia and/or oral mucosa, and includes intercourse, oral sex, and masturbation by self or a partner. Q10 reads “Did you consider this sexual encounter satisfactory for you?” and answers were yes or no.

General Assessment Questionnaire (GAQ)

The GAQ consists of 4 items: satisfaction with arousal, desire, degree of benefit while on study drug, and impact of taking study drug on relationship with partner. Responses are selected on a 7-point numeric rating scale from 1 (very much worse) to 7 (very much better). A score ≥ 5 indicates benefit.

Women’s Inventory of Treatment Satisfaction (WITS-9)

The validated WITS-9 questionnaire assesses satisfaction with treatment and sexual relations over the past 4 weeks [14]. Participants answer the 9 items on a 7-point numeric rating scale from − 3 (very unsatisfied or very likely not to continue) to 3 (very satisfied or very likely to continue). The total score is calculated as the average of the scores from the 9 questions and ranges from − 3.0 to 3.0. A higher score on the WITS-9 indicates a higher level of satisfaction with treatment.

Statistical analysis

Specific statistical tests are described above for each endpoint. All analyses were performed using SAS version 9.2 or later. All statistical tests were conducted with conservative decision-making criteria established a priori according to published guidance [15]. Missing data were considered missing, and no data imputations were performed. All statistical tests were 2-tailed and were conducted with type I error probability fixed at 0.05. For continuous variables, the mean and standard deviation were described; for categorical variables, the percent distribution by category was described.

FSDS-DAO psychometric evaluation

Instrument descriptive characteristics

Individual item performance and frequency of responses on the FSDS-DAO and FSDS-R items and total scores, including rates of missing data, were examined at Visit 1 (Week 0), Visit 2 (Week 4), Visit 5 (Week 10), Visit 11 (Week 19), and Visit 12 (Week 23). Individual item performance and frequency of responses on the FSEP-R item scores, including rates of missing data, were examined at Visit 2 (Week 4), Visit 5 (Week 10), Visit 11 (Week 19), and Visit 12 (Week 23). Distributional characteristics of the FSFI were examined at Visit 1 (Week 0), Visit 2 (Week 4), Visit 5 (Week 10), Visit 11 (Week 19), and Visit 12 (Week 23). The GAQ and WITS-9 were examined at Visit 5 (Week 10), Visit 11 (Week 19), and Visit 12 (Week 23).

Confirmatory factor analysis (CFA)

A CFA was performed using EQS version 6.1 to determine whether a total score was justified or whether multiple subscales were appropriate with the addition of the new items. CFAs were performed with the data from Visit 1 (Week 0) and from Visit 12 (Week 23). Model fit was assessed using Bentler’s comparative fit index (CFI), with a CFI ≥ 0.90 indicating an acceptable model fit. Additional parameters of model fit that were evaluated were the root mean square error of approximation (RMSEA) and weighted root mean square residual (WRMR).

Reliability

Internal consistency reliability

Internal consistency reliability (Cronbach’s α) addressed the extent to which individual items within an instrument were related to one another [16]. Cronbach’s α was calculated for the FSDS-DAO and FSDS-R at Visit 1, Visit 2, Visit 5, and Visit 12. There were no tests of statistical significance for these estimations; α > 0.70 were generally considered acceptable for group-level data [17].

Test–retest reliability

Test–retest reliability was examined using intra-class correlations (ICCs), Spearman’s correlations, and paired t-tests of FSDS-DAO and FSDS-R scores from Visit 1 to Visit 2. ICC values > 0.70 are generally considered acceptable for establishing test–retest reliability [18].

Validity

Convergent validity

To examine convergent validity, the pattern and magnitude of the relationships of the FSDS-DAO and FSDS-R total scores with the FSEP-R, FSFI subscales and total score, GAQ item scores, WITS-9 total score, and number of satisfying sexual events (SSEs) were examined at Visit 5 and Visit 12 using Spearman’s rank correlation coefficients. Convergent validity was supported by correlations > 0.40 with questionnaires measuring similar concepts. It was expected that these measures would be moderately correlated, indicating that they measured related constructs but that they would not be correlated over 0.80 (indicating that they measured the same construct). Those measures that were more directly related to sexual arousal and level of desire were expected to have higher correlations, while scales related to pain were expected to have lower correlations with the FSDS-DAO and FSDS-R scores and potentially demonstrate divergent validity.

Known-groups validity

The ability of the FSDS-DAO and the FSDS-R to differentiate among groups of participants according to known indicators such as treatment group or disease severity/clinical status at baseline (FSFI total score; FSFI arousal, desire, and satisfaction subscale scores; number of SSEs; and GAQ Items 1 and 2) was assessed using paired t-tests and general linear models (PROC GLM) with Scheffe’s post hoc comparisons to evaluate mean differences among participant subgroups at Visit 12.

Responsiveness

Several analytic approaches were taken to evaluate the responsiveness of the FSDS-DAO and FSDS-R. Changes in the total FSDS-DAO score were calculated from baseline (Visit 1; Week 0) to Visit 12 (Week 23) for the overall sample. Effect size [19] and responsiveness statistic were also calculated. Effect size was interpreted as small (0.20), moderate (0.50), or large (0.80) using Cohen’s convention [20]. The responsiveness statistic was computed by subtracting the placebo change score from the treatment change score and dividing by the standard deviation (SD) of the placebo change score ([treatment change score − placebo change score]/SD of placebo change score). The responsiveness statistic provided the magnitude of change between treatment groups.

Ethical conduct

The study was conducted in accordance with Good Clinical Practice requirements, as described in guidelines of the International Conference on Harmonisation of Technical Requirements of Pharmaceuticals for Human Use (ICH) and in the Declaration of Helsinki. Each study site was reviewed by a central or local institutional review board (IRB) or ethics committee. The IRB approval numbers were Compass, 00519; WIRB, 20111036. Before any study procedures were initiated, written informed consent was obtained from each subject.

Results

The sample used in these analyses consisted of all premenopausal women in the evaluable modified intent-to-treat (mITT) population of the phase 2b study who had FSDS-DAO scores at baseline and at ≥ 1 postrandomization follow-up visit. The mITT population was defined as all randomized subjects in the phase 2b study who took at least 1 outpatient dose of double-blind treatment (ie, 1 outpatient dose after the 2 in-clinic doses of double-blind treatment) and who had at least 1 follow-up visit. The number of study participants over time is shown in Table 1. The baseline characteristics for the safety population of the bremelanotide study are summarized in Table 2 and were similar across dose groups.

figure 1

Similar analyses were conducted by distributional cut-points for GAQ Items 1 (satisfaction with arousal) and 2 (satisfaction with desire). The FSDS-DAO discriminated at Visit 5 and Visit 12 between women who scored 1–3 and those who scored 5, between women who scored 1–3 and those who scored 6–7, between women who scored 4 and those who scored 5, and between women who scored 4 and those who scored 6–7 on both GAQ Items 1 and 2 (all P < 0.05 except Item 2 at Visit 12 for women who scored 1–3 and those who scored 5; Fig. 2). The FSDS-DAO was not able to discriminate at either visit on GAQ Items 1 and 2 between women who scored 1–3 and those who scored 4. In addition, the FSDS-DAO discriminated at Visit 5 for GAQ Item 2 and at Visit 12 on both items between women who scored 5 versus those who scored 6–7.

figure 2

Responsiveness

Discussion

In these analyses, both the FSDS-DAO and FSDS-R demonstrated acceptable internal consistency reliability, test–retest reliability, construct validity, known-groups validity, and responsiveness, reliably assessing sexual-related distress in women with FSAD and/or HSDD. Our findings of internal consistency, test–retest reliability, and construct and known-groups validity for the FSDS-DAO demonstrate that the addition of the arousal and orgasm items did not impact the validity and reliability of the PRO measure. Acceptable construct validity, both convergent and divergent, was demonstrated by significant correlations with related PRO scales in the expected directions and magnitude. Test–retest reliability was acceptable between Visits 1 and 2 (4 weeks apart during the no-drug qualification period), with an ICC of 0.61 and a Spearman’s correlation coefficient score of 0.62. The FSDS-DAO demonstrated adequate internal consistency reliability, with Cronbach’s α ≥ 0.91 over the course of the clinical study. The CFAs for the FSDS-DAO and FSDS-R demonstrated some evidence supporting a single factor; however, the CFI (CFI = 0.93 and CFI = 0.94, respectively) and other model fit indices did not meet all recommended fit criteria, although the factor loadings ranged from 0.47 to 0.81 for the FSDS-DAO and 0.61 to 0.81 for the FSDS-R.

Previous psychometric evaluation of the FSDS-R in a sample of women with HSDD demonstrated better test–retest reliability for the correlation between Day 0 and Day 28 (ICC = 0.749) versus that observed for the FSDS-DAO in the current study (ICC = 0.61) [7]. Differences in test–retest reliability between these initial analyses may be attributed to the samples in each study, as the current study included patients with HSDD and/or FSAD, whereas the FDSD-R was evaluated in patients with HSDD only. Moreover, the Spearman’s correlation between the 2 scores of 0.62 was statistically significant (P < 0.001), indicating acceptable test–retest reliability. When the analyses were restricted to participants who had a change score on the FSFI total scores from Visit 1 to Visit 2 within 2 points, the Spearman’s correlation was 0.73 (P < 0.0001) and the ICC was 0.73 for the FSDS-DAO. It is important to note that when we reproduced our analysis using the FSDS-R, which lacks the arousal and orgasm items, the data were generally consistent with those observed with the FSDS-DAO. Thus, a total score is appropriate with or without the orgasm and arousal items.

During the study, SSE counts showed a decreasing correlation with FSDS-DAO total score, in contrast to the observation that subjects with the worst levels of sexual function as measured by the FSFI also had the worst FSDS-DAO scores. As an FSD measure, however, SSE counts have not been extensively validated. Indeed, the definition of HSDD, the most common FSD diagnosis [2], includes no criteria or constraints regarding the patient’s amount of sexual activity. Women with HSDD may frequently engage in sex without having an interest in it. They may do so out of a sense of obligation to their partner, to feel “normal,” or for a multitude of other reasons. Since the number of sexual events in which they participate may be determined by factors that have little to do with a patient’s own sexual interest, the association between event counts and HSDD measures on validated instruments may be, at best, modest.

The present analyses have several limitations that should be considered when interpreting the findings. First, the study utilized the FSDS-DAO to assess FSD-related distress solely in premenopausal women with a clinical diagnosis of HSDD and/or FSAD, and who by definition and eligibility criteria did not have female orgasmic disorder. Future research could be conducted including such individuals. Second, the study’s FSD analyses were based entirely on PROs. For FSD, however, patient-rated treatment effects and changes in symptoms are clearly the most relevant outcome measures—more so, given the paucity of clinical or biological FSD markers. Finally, the use of the FSDS-DAO total score as an inclusion criterion reduces the variability of the scale and limited correlation at baseline, but further measurement evaluation was also performed at later visits. Despite these limitations, the analyses provide strong evidence that the recently developed FSDS-DAO has demonstrated evidence supporting validity, reliability, and responsiveness. For clinical trials and other research, the FSDS-DAO is “fit for purpose” in offering a comprehensive assessment of the distress associated with FSD and may be used with the FSFI-desire domain score to cover the major components of an HSDD diagnosis: low desire and associated distress.

Conclusions

The extensively evaluated FSDS-R is a well-characterized and reliable measure for assessing sexually related personal distress in women. Questions related to arousal and orgasm were added to the FSDS-R to permit an increase in the breadth of coverage for the instrument. These psychometric analyses show evidence of validity, reliability, acceptability, and responsiveness for the FSDS-DAO as a measure of sexually related personal distress in the HSDD/FSAD population. For clinical trials and other research, the FSDS-DAO is “fit for purpose” in offering a comprehensive assessment of the distress associated with FSD and may be paired with the FSFI-desire domain score to cover the key components of an HSDD diagnosis: low desire and associated distress.

Availability of data and materials

The dataset analyzed during the current study is available from the corresponding author on reasonable request.