References
AAMC. (n.d.). AAMC PREview Professional Readiness Exam. https://students-residents.aamc.org/aamc-preview/aamc-previewprofessional-readiness-exam
Abrams, Z. (2024). Addressing equity and ethics in artificial intelligence. Monitor on Psychology, 55(3), 24–29. https://www.apa.org/monitor/2024/04/addressing-equity-ethics-artificial-intelligence
Abyaa, A., Khalidi Idrissi, M., & Bennani, S. (2019). Learner modelling: systematic review of the literature from the last 5 years. Educational Technology Research and Development, 67, 1105–1143.
Acar, S. (2023). Creativity assessment, research, and practice in the age of artificial intelligence. Creativity Research Journal, 1–7. Advance online publication. https://doi.org/10.1080/10400419.2023.2271749
Acuity Insights. (n.d.). What is Casper? https://acuityinsights.app/casper/
Acuity Insights. (2023). Casper technical manual. https://acuityinsights.com/casper-technical-manual/
Adesope, O. O., Trevisan, D. A., & Sundararajan, N. (2017). Rethinking the use of tests: A meta-analysis of practice testing. Review of Educational Research, 87(3), 659–701. https://doi.org/10.3102/0034654316689306
Agrawal, A., Gans, J., & Goldfarb, A. (2022). Power and prediction: The disruptive economics of artificial intelligence. Harvard Business Review Press.
Aguilar, S. J., StuartA.Karabenick, S. A., StephanieD. Teasley, S.D.,Clare Baek,C. (2021).Associations between learning analytics dashboard exposure and motivation and self-regulated learning. Computers & Education, 162, Article 104085, https://doi.org/10.1016/j.compedu.2020.104085
Ahn, T., Arcidiacono, P., Hopson, A., & Thomas, J. R. (2019). Equilibrium grade inflation with implications for female interest in STEM majors (Working Paper 26556). National Bureau of Economic Research. https://doi.org/10.3386/w26556
Alan, S., Boneva, T., & Ertac, S. (2019). Ever failed, try again, succeed better: Results from a randomized educational intervention on grit. The Quarterly Journal of Economics, 134(3), 1121–1162. https://doi.org/10.1093/qje/qjz006
Ali, U. S., & van Rijn, P. W. (2016). An evaluation of different statistical targets for assembling parallel forms in item response theory. Applied Psychological Measurement, 40(3), 163–179. https://doi.org/10.1177/0146621615613308
American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (1999). Standards for educational and psychological testing. American Educational Research Association.
American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (2014). Standards for educational and psychological testing. American Psychological Association.
American Psychological Association. (2018). Top 20 principles from psychology for preK-12 teaching and learning: Coalition for psychology in schools and education. https://www.apa.org/ed/schools/teaching-learning/top-twenty-principles.pdf
Association of Test Publishers. (2022). Guidelines for technology-based assessment. https://www.testpublishers.org/assets/TBA%20Guidelines%203-14-2022%20draft%20numbered.pdf
Attali, Y.,&van der Kleij, F. (2017). Effects of feedback elaboration and feedback timing during computer-based practice in mathematics problem solving. Computers & Education, 110, 154–169. https://doi.org/10.1016/j.compedu.2017.03.012
Attali, Y., Runge, A., LaFlair, G. T., Yancey, K., Goodwin, S., Park, Y., & von Davier, A. A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, Article 903077. https://doi.org/10.3389/frai.2022.903077
Autor, D. H., Levy, F.,&Murnane, R. J. (2003). The skill content of recent technological change: An empirical exploration. The Quarterly Journal of Economics, 118(4), 1279–1333. https://doi.org/10.1162/003355303322552801
Autor, D., Chin, C., Salomons, A., & Seegmiller, B. (2024). New frontiers: The origins and content of new work, 1940–2018. The Quarterly Journal of Economics. Advance online publication. https://doi.org/10.1093/qje/qjae008
Azevedo, R., & Bernard, R. M. (1995). The effects of computer-presented feedback on learning from computer-based instruction: A meta-analysis. Journal of Educational Computing Research, 13(2), 111–127. https://doi.org/10.2190/9LMD-3U28-3A0G-FTQT
Baker, R. S. J. d., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17. https://doi.org/10.5281/zenodo.3554657
Bailey, T., Jeong, D.W.,&Cho, S.W. (2010). Referral, enrollment, and completion in developmental education sequences in community colleges. Economics of Education Review, 29(2), 255–270. https://doi.org/10.1016/j.econedurev.2009.09.002
Bangert-Drowns, R. L., Kulik, C. L. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in test-like events. Review of Educational Research, 61(2), 213–238. https://doi.org/10.3102/00346543061002213
Bauer, M. S., Damschroder, L., Hagedorn, H., Smith, J., & Kilbourne, A. M. (2015). An introduction to implementation science for the non-specialist. BMC Psychology, 3(32), 1–12. https://doi.org/10.1186/s40359-015-0089-9
Bejar, I. I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2002). A feasibility study of on-the-fly item generation in adaptive testing (Research Report No. RR-02-03). ETS. https://doi.org/10.1002/j.2333-8504.2002.tb01890.x
Bennett, R. E. (1993). On the meanings of constructed response. In R. E. Bennett & W. C. Ward (Eds.), Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment (pp. 1–27). Lawrence Erlbaum Associates.
Bennett, R. E. (1998). Reinventing assessment: Speculations on the future of large-scale educational testing (Policy Information Perspective). ETS. http://www.ets.org/Media/Research/pdf/PICREINVENT.pdf
Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/0969594X.2010.513678
Bennett, R. E. (2023). Toward a theory of socioculturally responsive assessment. Educational Assessment, 28(2), 83–104. https://doi.org/10.1080/10627197.2023.2202312
Berman, A. I., Feuer, M. J., & Pellegrino, J. W. (2019). What use is educational assessment? The Annals of the American Academy of Political and Social Science, 683(1), 8–20. https://doi.org/10.1177/0002716219843871
Bernacki, M. L. (2018). Examining the cyclical, loosely sequenced, and contingent features of self-regulated learning: Trace data and their analysis. In D. H. Schunk & J. A. Greene (Eds.), Handbook of self-regulation of learning and performance (2nd ed., pp. 370–387). Routledge. https://doi.org/10.4324/9781315697048-24
Biddle, D. A., & Nooren, P. M. (2006). Validity generalization vs. Title VII: Can employers successfully defend tests without conducting local validation studies? Labor Law Journal, 57, 216–237. https://testgenius.com/articles/validity-generalization.pdf
Bicknell, K., Brust, C., & Settles, B. (2023, February 5). How Duolingo’s AI learns what you need to learn. IEEE Spectrum. https://spectrum.ieee.org/duolingo
Bjork, E. L.,&Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough,&J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). Worth Publishers.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74. https://doi.org/10.1080/0969595980050102
Blackman, R., & Ammanath, B. (2022, March 21). Ethics and AI: 3 conversations companies need to have. Harvard Business Review. https://hbr.org/2022/03/ethics-and-ai-3-conversations-companies-need-to-be-having
Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13(6), 4–16. https://doi.org/10.3102/0013189X013006004
Bolsinova, M., Deonovic, B., Arieli-Attali, M., Burr, S., Hagiwara, M., & Maris, G. (2022). Measurement of ability in adaptive learning and assessment systems when learners use on-demand hints. Applied Psychological Measurement, 46(3), 219–235. https://doi.org/10.1177/01466216221084208
Bratsberg, B., & Rogeberg, O. (2018). Flynn effect and its reversal are both environmentally caused. Proceedings of the National Academy of Sciences, 115(26), 6674–6678. https://doi.org/10.1073/pnas.1718793115
Bresnahan, T. (2010). General purpose technologies. In B. H. Hall & N. Rosenberg (Eds.), Handbook of the economics of innovation (Vol. 2, pp. 761–791). https://doi.org/10.1016/S0169-7218(10)02002-2
Brookhart, S., Stiggins, R., McTighe, J., & Wiliam, D. (2020). The future of assessment practices: Comprehensive and balanced assessment systems. Learning Sciences International. https://testing123.education.mn.gov/cs/groups/communications/documents/document/mdaw/mdaw/∼edisp/000231.pdf
Bradley, M. (1975). Scientific education versus military training: The influence of Napoleon Bonaparte on the Ecole Polytechnique. Annals of Science, 32(5), 415–449. https://doi.org/10.1080/00033797500200381
Buckley, J., Colosimo, L., Kantar, R., McCall, M., & Snow, E. (2021). Game-based assessment for education. In OECD digital education outlook 2021: Pushing the frontiers with artificial intelligence, blockchain and robots (pp. 195–208). OECD. https://read.oecd-ilibrary.org/education/oecd-digital-education-outlook-2021_9289cbfd-en#page1
Bull, S., & Kay, J. (2016). SMILI : A framework for interfaces to learning data in open learner models, learning analytics and related fields. International Journal of Artificial Intelligence in Education, 26, 293–331. https://doi.org/10.1007/s40593-015-0090-8
Burning Glass Technologies. (2019). Mapping the genome of jobs: The Burning Glass skills taxonomy [White paper]. https://www.voced.edu.au/content/ngv%3A84406
Burrus, J., Rikoon, S. H.,&Brenneman, M.W. (Eds.). (2022). Assessing competencies for social and emotional learning: Conceptualization, development, and applications. Routledge. https://doi.org/10.4324/9781003102243
BusinessWire. (2024). Carnegie learning wins 2024 EdTech award for MATHstream [Press release]. https://www.businesswire.com/news/home/20240327088407/en/Carnegie-Learning-Wins-2024-EdTech-Award-for-MATHstream
Buyse, T., & Lievens, F. (2011). Situational judgment tests as a new tool for dental student selection. Journal of Dental Education, 75(6), 743–749. https://doi.org/10.1002/j.0022-0337.2011.75.6.tb05101.x
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105. https://doi.org/10.1037/h0046016
cApStAn & Halleux, B. (2019). PISA 2021 translation and adaptation guidelines. OECD. https://www.oecd.org/pisa/pisaproducts/PISA-2022-Translation-and-Adaptation-Guidelines.pdf
Cao, M., Drasgow, F., & Cho, S. (2015). Developing ideal intermediate personality items for the ideal point model. Organizational Research Methods, 18(2), 252–275. https://doi.org/10.1177/1094428114555993
Casner-Lotto, J., & Barrington, L. (2006). Are they really ready to work? Employers’ perspectives on the basic knowledge and applied skills of new entrants to the 21st century US workforce. Partnership for 21st Century Skills.
Cattell, R. B. (1965). A biometrics invited paper. Factor analysis: An introduction to essentials I. The purpose and underlying models. Biometrics, 21(1), 190–215. https://doi.org/10.2307/2528364
Cattell, R. B.,&Warburton, F.W. (1967). Objective personality and motivation tests: A theoretical introduction and practical compendium. University of Illinois Press.
Chakraborty, M., Tonmoy, T. I., Zaman, M., Gautam, S., Kumar, T., Sharma, K., Barman, N., Gupta, C., Jain, V., Chadha, A., Sheth, A.,& Das, A. (2023). Counter Turing test (CT2): AI-generated text detection is not as easy as you may think—Introducing AI detectability index (ADI). In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 2206–2239). ACL. https://aclanthology.org/2023.emnlp-main.136/
Cengage. (2019, January 16). New survey: demand for “uniquely human skills” increases even as technology and automation replace some jobs [Press release]. https://www.cengagegroup.com/news/press-releases/2019/new-survey-demand-for-uniquely-human-skillsincreases-even-as-technology-and-automation-replace-some-jobs/
Chamorro-Premuzic, T. (2021, May 26). The problem with job interviews. Forbes. https://www.forbes.com/sites/tomaspremuzic/2021/05/26/the-problem-with-job-interviews/?sh=4292b1224dee
Chan, S., Somasundaran, S., Ghosh, D., & Zhao, M. (2022). AGReE: A system for generating automated grammar reading exercises. In W. Che & E. Shutova (Eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 169–177). ACL. https://aclanthology.org/2022.emnlp-demos.17/
Charness, G., Gneezy, U., & Henderson, A. (2018). Experimental methods: Measuring effort in economics experiments. Journal of Economic Behavior & Organization, 149, 74–87. https://doi.org/10.1016/j.jebo.2018.02.024
Chen, L., Feng, G., Joe, J., Leong, C.W., Kitchen, C., & Lee, C. M. (2014). Towards automated assessment of public speaking skills using multimodal cues. In ICMI ’14: Proceedings of the 16th International Conference on Multimodal Interaction (pp. 200–203). ACM. https://doi.org/10.1145/2663204.2663265
Chen, Y., Lee, Y.-H., & Li, X. (2022). Item pool quality control in educational testing: Change point model, compound risk, and sequential detection. Journal of Educational and Behavioral Statistics, 47(3), 322–352. https://doi.org/10.3102/10769986211059085
Cheng, K. H. C., Hui, C. H.,&Cascio, W. F. (2017). Leniency bias in performance ratings: The Big-Five correlates. Frontiers in Psychology, 8, Article 521. https://doi.org/10.3389/fpsyg.2017.00521
Chernyshenko, O. S., Kankaraš, M., & Drasgow, F. (2018). Social and emotional skills for student success and well-being: Conceptual framework for the OECD study on social and emotional skills (OECD Education Working Paper No. 173). OECD. https://one.oecd.org/document/EDU/WKP(2018)9/En/pdf
Chetty, R., Deming, D. J., & Friedman, J. N. (2023). Diversifying society’s leaders? The determinants of causal effects of admission to highly selective private colleges (Working Paper No. 31492). National Bureau of Economic Research. https://doi.org/10.3386/w31492
Chi, M. T. H., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243. https://doi.org/10.1080/00461520.2014.965823
Choi, I., Hao, J., Deane, P.,&Zhang, M. (2021). Benchmark keystroke biometrics accuracy from high-stakes writing tasks (Research Report No. RR-21-15). ETS. https://doi.org/10.1002/ets2.12326
Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational judgment tests: Constructs assessed and a meta-analysis of their criterion-related validities. Personnel Psychology, 63(1), 83–117. https://doi.org/10.1111/j.1744-6570.2009.01163.x
Chopade, P., Edwards, D., Khan, S.M., Andrade, A.,&Pu, S. (2019, November). CPSX: using AI-machine learning for mapping human-human interaction and measurement of CPS teamwork skills. In 2019 IEEE International Symposium on Technologies for Homeland Security (HST) (pp. 1-6). IEEE.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. https://doi.org/10.1037/h0026256
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037/0033-2909.112.1.155
College Board. (2023, September 27). SAT suite: Everything you need to know about the Digital SAT. College Board Blog. https://blog.collegeboard.org/everything-you-need-know-about-digital-sat
Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136(6), 1092–1122. https://doi.org/10.1037/a0021212
Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90(2), 218–244. https://doi.org/10.1037/0033-2909.90.2.218
Corbett, A. T., & Anderson, J. R. (1994). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4, 253–278. https://doi.org/10.1007/BF01099821
Cotra, A. (2023, August 29). Language models surprised us. Planned Obsolescence. https://www.planned-obsolescence.org/language-models-surprised-us/
Cox, C. B., Barron, L. G., Davis, W., & de la Garza, B. (2017). Using situational judgment tests (SJTs) in training: Development and evaluation of a structured, low-fidelity scenario-based training method. Personnel Review, 46(1), 36–45. https://doi.org/10.1108/PR-05-2015-0137
Cronbach, L. J. (1975). Five decades of public controversy over mental testing. American Psychologist, 30(1), 1–14. https://doi.org/10.1037/0003-066X.30.1.1
Darling-Hammond, L. (2001). Inequality in teaching and schooling: How opportunity is rationed to students of color in America. In B. D. Smedley, A. Y. Stith, L. Colburn, & C. H. Evans (Eds.), The right thing to do, the smart thing to do: Enhancing diversity in health professions—Summary of the Symposium on Diversity in Health Professions in Honor of Herbert W. Nickens, M. D. (pp. 208–233). National Academies Press. http://www.nap.edu/catalog/10186.html
Davey, T. (2023). Automated test assembly. In R. J. Tierney, F. Rizvi, & K. Ercikan (Eds.), International encyclopedia of education: Vol. 14. Quantitative research and educational measurement (pp. 201–208). Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.10027-2
Davoli, M., & Entorf, H. (2018). The PISA shock, socioeconomic inequality, and school reforms in Germany (IZA Policy Paper No. 140). IZA – Institute of Labor Economics. https://docs.iza.org/pp140.pdf
De Boeck, P. (2023, July 25–28). Pervasive DIF and DIF detection bias [Paper presentation]. International Meeting of the Psychometric Society (IMPS 2023), University of Maryland, College Park, MD, United States.
De Boeck, P., & Cho, S.-J. (2021). Not all DIF is shaped similarly. Psychometrika, 86(3), 712–716. https://doi.org/10.1007/s11336-021-09772-3
Dietrichson, J., Bøg, M., Filges, T., & Klint Jørgensen, A.-M. (2017). Academic interventions for elementary and middle school students with low socioeconomic status: A systematic review and meta-analysis. Review of Educational Research, 87(2), 243–282. https://doi.org/10.3102/0034654316687036
Dell. (2018, January 30). 3,800 business leaders declare: It’s A tale of two futures. https://www.dell.com/en-us/perspectives/3800-business-leaders-declare-its-a-tale-of-two-futures/
Deming, D. J. (2017). The growing importance of social skills in the labor market. The Quarterly Journal of Economics, 132(4), 1593–1640. https://doi.org/10.1093/qje/qjx022
Deming, D. (2024, March 7). The worst way to do college admissions: Making standardized test scores optional has harmed the disadvantaged applicants it was intended to help. The Atlantic. https://theatlantic.com/ideas/archive/2024/03/standardized-testing-requirements-act-sat/677667/
Deming, D., & Kahn, L. B. (2018). Skill requirements across firms and labor markets: Evidence from job postings for professionals. Journal of Labor Economics, 36(S1), S337–S369. https://doi.org/10.1086/694106
Deonovic, B., Yudelson, M., Bolsinova, M., Attali, M.,&Maris, G. (2018). Learning meets assessment. Behaviormetrika, 45(2), 457–474. https://doi.org/10.1007/s41237-018-0070-z
Diao, Q., & van der Linden, W. J. (2013). Integrating test-form formatting into automated test assembly. Applied Psychological Measurement, 37(5), 361–374. https://doi.org/10.1177/0146621613476157
Di Battista, A., Grayling, S., Hasselaar, E., Leopold, T., Li, R., Rayner, M., & Zahidi, S. (2023, May). Future of jobs report 2023. World Economic Forum. https://www.weforum.org/reports/the-future-of-jobs-report-2023
DiCerbo, K. (2024, March 7). How we built AI tutoring tools. Khan Academy Blog. https://blog.khanacademy.org/how-we-built-ai-tutoring-tools/
Dietrichson, J., Bøg, M., Filges, T., & Klint Jørgensen, A.-M. (2017). Academic interventions for elementary and middle school students with low socioeconomic status: A systematic review and meta-analysis. Review of Educational Research, 87(2), 243–282. https://doi.org/10.3102/0034654316687036
Dobrescu, L., Holden, R., Motta, A., Piccoli A., Roberts, P., & Walker, S. (2021). Cultural context in standardized tests (Working Paper 2021-08). University of New South Wales Business School. https://doi.org/10.2139/ssrn.3983663
Duolingo Team. (2023, March 14). Introducing Duolingo Max, a learning experience powered by GPT-4. Duolingo Blog. https://blog.duolingo.com/duolingo-max/
Eberly Center. (n.d.). Learning principles: Theory and research-based principles of learning. Carnegie Mellon University. https://www.cmu.edu/teaching/principles/learning.html
Elliott, S.W. (2017). Computers and the future of skill demand. OECD. https://doi.org/10.1787/9789264284395-en
Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv. https://arxiv.org/abs/2303.10130v4
Embretson, S. (1994). Applications of cognitive design systems to test development. In C. R. Reynolds (Ed), Cognitive assessment: A multidisciplinary perspective (pp. 107–135). Springer.
Emerson, A., Houghton, P., Chen, K., Basheerabad, V., Ubale, R., Leong, C.W. (2022). Predicting user confidence in video recordings with spatio-temporal multimodal analytics. In ICMI ’22 companion: Companion publication of the 2022 International Conference on Multimodal Interaction (pp. 98–104). ACM. https://doi.org/10.1145/3536220.3558007
Erwin, T. D., & Sebrell, K. W. (2003). Assessment of critical thinking: ETS’s tasks in critical thinking. Journal of General Education, 52(1), 50–70. https://doi.org/10.1353/jge.2003.0019
ETS. (n.d.). Demonstrate program effectiveness with the ETS® Major Field Tests. https://www.ets.org/mft.html
ETS. (2014). ETS standards for quality and fairness. https://ets.org/pdfs/about/standards-quality-fairness.pdf
ETS. (2022). ETS guidelines for developing fair tests and communications. https://www.ets.org/pdfs/about/fair-tests-andcommunications.pdf
ETS. (2023a). ETS human progress study [Unpublished data set].
ETS. (2023b). Your at home testing. https://www.ets.org/gre/test-takers/general-test/register/at-home-testing.html
Falk, A., Becker, A., Dohmen, T., Enke, B., Huffman, D., & Sunde, U. (2018). Global evidence on economic preferences. The Quarterly Journal of Economics, 133(4), 1645–1692. https://doi.org/10.1093/qje/qjy013
Feuer, M. J. (2012). No country left behind: Rhetoric and reality of international large-scale assessment. ETS. http://www.ets.org/Media/Research/pdf/PICANG13.pdf
Feuer, M., Holland, P.W., Green, B. F., Bertenthal, M.W.,&Hemphill, F. C. (Eds.). (1999). Uncommon measures: Equivalence and linkage among educational tests. National Academies Press. https://doi.org/10.17226/6332
Flanagan, C. (2021, July 22). The University of California is lying to us. The Atlantic. https://www.theatlantic.com/ideas/archive/2021/07/why-university-california-dropping-sat/619522/
Flynn, M. (2023, May 30). The soft skills “debate” is over. Forbes. https://www.forbes.com/sites/mariaflynn/2023/05/30/the-soft-skills-debate-is-over/?sh=5baa274b7308
Foster, N., & Piacentini, M. (Eds.). (2023). Innovating assessments to measure and support complex skills. OECD Publishing. https://doi.org/10.1787/e5f3e341-en
Frensch, P. A., & Funke, J. (1995). Complex problem solving: The European perspective. Routledge.
Frey, C. B.,&Osborne, M. A. (2017). The future of employment: How susceptible are jobs to computerization? Technological Forecasting and Social Change, 114, 254–280. https://doi.org/10.1016/j.techfore.2016.08.019
Friedland, N. S., Allen, P. G., Matthews, G., Witbrock, M., Baxter, D., Curtis, J., Shepard, B., Miraglia, P., Angele, J., Staab, S., Moench, E., Oppermann, H., Wenke, D., Israel, D., Chaudhri, V., Porter, B., Barker, K., Fan, J., Chaw, S., … Clark, P. (2004). Project Halo: Towards a digital Aristotle. AI Magazine, 25(4), 29–47. https://doi.org/10.1609/aimag.v25i4.1783
Fu, J., Kyllonen, P. C., & Tan, X. (2024). From Likert to forced choice: Statement parameter invariance and context effects in personality assessment. Measurement: Interdisciplinary Research and Perspectives. Advance online publication. https://doi.org/10.1080/15366367.2023.2258482
Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation: A meta-analysis. Exceptional Children, 53(3), 199–208. https://doi.org/10.1177/001440298605300301
Fyfe, E. R., Borriello, G. A., & Merrick, M. (2023). A developmental perspective on feedback: How corrective feedback influences children’s literacy, mathematics, and problem solving. Educational Psychologist, 58(3), 130–145. https://doi.org/10.1080/00461520.2022.2108426
Fyfe, E. R., De Leeuw, J. R., Carvalho, P. F., Goldstone, R. L., Sherman, J., Admiraal, D., Alford, L.K., Bonner, A., Brassil, C. E., Brooks, C. A., Carbonetto, T., Chang, S. H., Cruz, L., Czymoniewicz-Klippel, Daniel, F., Driessen, M., Habashy, N., Hanson-Bradley, C. L., Hirt, E. R., … Motz, B. A. (2021). Many Classes 1: Assessing the generalizable effect of immediate feedback versus delayed feedback across many college classes. Advances in Methods and Practices in Psychological Science, 4(3), Article 25152459211027575. https://doi.org/10.1177/25152459211027575
Gao, L., Ghosh, D.,&Gimpel, K. (2022). What makes a question inquisitive? A study on type-controlled inquisitive question generation. In V. Nastase, E. Pavlick, M.T. Pilehvar, J. Camacho-Callados,&A. Raganato (Eds.), Proceedings of the 11th Joint Conference on Lexical and Computational Semantics (pp. 240–257). ACL. https://doi.org/10.18653/v1/2022.starsem-1.22
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2414–2423). IEEE. https://doi.org/10.1109/CVPR.2016.265
Geerlings, H., Glas, C.A.,&Van Der Linden, W. J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337–359. https://doi.org/10.1007/s11336-011-9204-x
Geiger, M., Bärwaldt, R., & Wilhelm, O. (2021). The good, the bad, and the clever: Faking ability as a socio-emotional ability? Journal of Intelligence, 9(1), 1–22. https://doi.org/10.3390/jintelligence9010013
Geisinger, K. F. (2011). The future of high-stakes testing in education. In J. A. Bovaird, K. F. Geisinger, & C.W. Buckendahl (Eds.), High-stakes testing in education: Science and practice in K–12 settings (pp. 231–248). American Psychological Association. https://doi.org/10.1037/12330-014
Gierl, M. J., & Haladyna, T. M. (Eds.). (2013). Automatic item generation: Theory and practice. Routledge.
Gil, Y., & Selman, B. (2019). A 20-year community roadmap for artificial intelligence research in the US. arXiv. https://doi.org/10.48550/arXiv.1908.02624
Glas, C. A. W., & van der Linden, W. J. (2001, June 2–4). Modeling variability in item parameters in CAT [Paper presentation]. North American Psychometric Society Meeting King of Prussia, PA, United States.
Godwin, K. E., Almeda, M. V., Seltman, H., Kai, S., Skerbetz, M. D., Baker, R. S., & Fisher, A. V. (2016). Off-task behavior in elementary school children. Learning and Instruction, 44, 128–143. https://doi.org/10.1016/j.learninstruc.2016.04.003
Goldberg, B., & Sinatra, A. M. (2023). Generalized intelligent framework for tutoring (gift) SWOT analysis. In A. M. Sinatra, A. C. Graesser, X. Hu, G. Goodwin, & V. Rus (Eds.), Design recommendations for intelligent tutoring systems: Vol. 10. Strengths, weaknesses, opportunities and threats (SWOT) analysis of intelligent tutoring systems (pp. 9–26). U.S. Army Combat Capabilities Development Command—Soldier Center. https://gifttutoring.org/documents/163
Goodhart, C. A. E. (1984). Monetary theory and practice: The U.K. experience. Springer. https://doi.org/10.1007/978-1-349-17295-5
The Gordon Commission on the Future of Assessment in Education. (2013). To assess, to teach, to learn: A vision for the future of assessment. ETS. https://www.ets.org/Media/Research/pdf/gordon_commission_technical_report.pdf
Gosling, S. D., Augustine, A. A., Vazire, S., Holtzman, N., & Gaddis, S. (2011). Manifestations of personality in online social networks: Self-reported Facebook-related behaviors and observable profile information. Cyberpsychology, Behavior, and Social Networking, 14(9), 483–488. https://doi.org/10.1089/cyber.2010.0087
Gosling, S.D., Ko, S. J., Mannarelli, T.,&Morris, M. E. (2002). A room with a cue: Personality judgments based on offices and bedrooms. Journal of Personality and Social Psychology, 82(3), 379–398. https://doi.org/10.1037//0022-3514.82.3.379
Graf, E. A., & Fife, J. H. (2012). Difficulty modeling and automatic generation of quantitative items: Recent advances and possible next steps. In M. J. Gierl & T. M. Haladyna (Eds.), Automatic item generation (pp. 157–178). Routledge.
Greiff, S., Gaševi´c, D., & von Davier, A. (2017). Using process data for assessment in intelligent tutoring systems: A cognitive psychologist, psychometrician, and computer scientist perspective. In R. Sottilare, A. Graesser, X. Hu, & G. Goodwin (Eds.), Design recommendations for intelligent tutoring systems: Vol. 5. Assessment methods (pp. 171–179). U.S. Army Research Laboratory. https://gifttutoring.org/attachments/download/2410/Design%20Recommendations%20for%20ITS_Volume%205%20-%20Assessment_final_errata%20corrected.pdf
Grigorenko, E. L.,&Sternberg, R. J. (1998). Dynamic testing. Psychological Bulletin, 124(1), 75–111. https://doi.org/10.1037/0033-2909.124.1.75
Grose, J. (2024, January 17). Don’t ditch standardized tests: Fix them. The New York Times. https://www.nytimes.com/2024/01/17/opinion/standardized-tests.html
Grossmann, I., Rotella, A. Sharpinskyi, K., Browne, D. T., & Fong, G. T. (2023). Insights into the accuracy of social scientists’ forecasts of societal change. Nature Human Behavior, 7, 484–501. https://doi.org/10.1038/s41562-022-01517-1
Haberman, S. J., & Lee, Y.-H. (2017). A statistical procedure for testing unusually frequent exactly matching responses and nearly matching responses (Research Report No. RR-17-23). ETS. https://doi.org/10.1002/ets2.12150
Haberman, S. J., Lee, Y.-H., Papierman, P., Zhou, Y., & Subhedar, R. (2022). Systems and methods for detecting unusually frequent exactly matching and nearly matching test responses (U.S. Patent 11,398,161). U.S. Patent Office and Trademark Office. https://ppubs.uspto.gov/pubwebapp/external.html?q=(11398161).pn.&db=USPAT&type=ids
Hambleton, R. K. (2002). Adapting achievement tests into multiple languages for international assessments. In National Research Council (Ed), Methodological advances in cross-national surveys of educational achievement (pp. 58–79). National Academies Press. https://nap.nationalacademies.org/read/10322/chapter/4
Hao, J., Liu, L., Kyllonen, P. C., Flor, M., & von Davier, A. A. (2019). Psychometric considerations and a general scoring strategy for assessments of collaborative problem solving (Research Report No. RR-19-41). ETS. https://doi.org/10.1002/ets2.12276
Hao, J., Liu, L., von Davier, A. A., Lederer, N., Zapata-Rivera, D., Jakl, P.,&Bakkenson, M. (2017). EPCAL: ETS platform for collaborative assessment and learning (Research Report No. RR-17-49). ETS. https://doi.org/10.1002/ets2.12181
Hao, J., von Davier, A. A., Yaneva, V., Lottridge, S., von Davier, M., & Harris, D. J. (2024). Transforming assessment: the impacts and implications of large language models and generative AI. Educational Measurement: Issues and Practices. Advance online publication.
Hattie, J. A. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
Hattie, J., & Gan, M. (2011). Instruction based on feedback. In E. Mayer & P. A. Alexander (Eds.), Handbook of research on learning and instruction (pp. 249–271). Routledge.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487
He, J., Bartram, D., Inceoglu, I., & van de Vijver, F. J. R. (2014). Response styles and personality traits: A multilevel analysis. Journal of Cross-Cultural Psychology, 45(7), 1028–1045. https://doi.org/10.1177/0022022114534773
He, Q., Borgonovi, F., & Paccagnella, M. (2019). Using process data to understand adults’ problem-solving behaviour in the programme for the international assessment of adult competencies (PIAAC): Identifying generalised patterns across multiple tasks with sequence mining (OECD Education working paper No. 205 ). OECD. https://one.oecd.org/document/EDU/WKP(2019)13/en/pdf
Heckman, J.,&Zhou, J. (2021). Interactions as investments: The microdynamics and measurement of early childhood learning [Manuscript submitted for publication].
Hedlund, J., Wilt, J. M., Nebel, K. L., Ashford, S. J., & Sternberg, R. J. (2006). Assessing practical intelligence in business school admissions: A supplement to the graduate management admissions test. Learning and Individual Differences, 16(2), 101–127. https://doi.org/10.1016/j.lindif.2005.07.005
Herman, J. L., Martínez, J. F.,&Bailey, A. L. (2023). Fairness in educational assessment and the next edition of the standards: Concluding commentary. Educational Assessment, 28(2), 128–136. https://doi.org/10.1080/10627197.2023.2215980
Hilton, M., & Herman, J. (Eds.). (2017). Supporting students’ college success: The role of assessment of intrapersonal and interpersonal competencies. National Academies Press.
Himelfarb, I. (2019). A primer on standardized testing: History, measurement, classical test theory, item response theory, and equating. Journal of Chiropractic Education, 33(2), 151–163. https://doi.org/10.7899/JCE-18-22
Hinnant-Crawford, B. N. (2020). Improvement science in education: A primer. Myers Education Press.
Hitt, C., Trivitt, J., & Cheng, A. (2016). When you say nothing at all: The predictive power of student effort on surveys. Economics of Education Review, 52, 105–119. https://doi.org/10.1016/j.econedurev.2016.02.001
Holland, P.W. (1996). Assessing unusual agreement between the incorrect answers of two examinees using the K-index: Statistical theory and empirical support (Research Report No. RR-96-07). ETS. https://doi.org/10.1002/j.2333-8504.1996.tb01685.x
Hood, S. (1998). Culturally responsive performance-based assessment: Conceptual and psychometric considerations. Journal of Negro Education, 67(3), 187–196. https://doi.org/10.2307/2668188
Hoyt, W. T., & Kerns, M.-D. (1999). Magnitude and moderators of bias in observer ratings: A meta-analysis. Psychological Methods, 4(4), 403–424. https://doi.org/10.1037/1082-989X.4.4.403
Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., & Xing, E. P. (2017). Toward controlled generation of text. In D. Precup & Y. W. Teh (Eds.), Proceedings of machine learning research: Vol. 70. Proceedings of the 34th International Conference on Machine Learning (pp. 1587–1598). https://proceedings.mlr.press/v70/hu17e.html
IMS Global. (2022). Question & test interoperability (QTI) 3.0: Best practices and implementation guide. https://www.imsglobal.org/spec/qti/v3p0/impl/
Institute of Medicine. (2015). Psychological testing in the service of disability determination. The National Academies Press. https://doi.org/10.17226/21704
International Test Commission. (2001). International guidelines for test use. International Journal of Testing, 1(2), 93–114. https://doi.org/10.1207/S15327574IJT0102_1
International Test Commission. (2013). ITC guidelines for test use. Final version. https://www.intestcom.org/files/guideline_test_use.pdf
International Test Commission. (2017). The ITC guidelines for translating and adapting tests (2nd ed.). https://www.intestcom.org/files/guideline_test_adaptation_2ed.pdf
International Test Commission & Association of Test Publishers. (2022). Guidelines for technology-based assessment. https://www.intestcom.org/upload/media-library/guidelines-for-technology-based-assessment-v20221108-16684036687NAG8.pdf
Irvine, S. H., & Kyllonen, P. C. (Eds.). (2013). Item generation for test development. Routledge.
Jackson, C. K. (2018). What do test scores miss? The importance of teacher effects on non-test score outcomes. Journal of Political Economy, 126(5), 2072–2107. https://doi.org/10.1086/699018
Jiang, Y., Martin-Raugh, M., Yang, Z., Hao, J., Liu, L., & Kyllonen, P. C. (2023). Do you know your partner’s personality through virtual collaboration or negotiation? Investigating perceptions of personality and their impacts on performance. Computers in Human Behavior, 141, Article 107608. https://doi.org/10.1016/j.chb.2022.107608
John, O. P., & Srivastava, S. (1999). The Big-Five trait taxonomy: History, measurement, and theoretical perspectives. In L. A. Pervin & O. P. John (Eds.), Handbook of personality: Theory and research (Vol. 2., pp. 102–138). Guilford Press
Johnson, M. S. (2024). How do we demonstrate AI responsibility: The devil is in the details. [Manuscript in preparation].
Johnson, M. S., Liu, X., & McCaffrey, D. F. (2022). Psychometric methods to evaluate measurement and algorithmic bias in automated scoring. Journal of Educational Measurement, 59(3), 338–361. https://doi.org/10.1111/jedm.12335
Johnson, M. S., & McCaffrey, D. F. (2023). Evaluating fairness of automated scoring in educational measurement. In S. Lane (Ed.), Advancing natural language processing in educational assessment (pp. 143–164). Routledge. https://doi.org/10.4324/9781003278658-12
Johnson, M. S., & Sinharay, S. (2005). Calibration of polytomous item families using Bayesian hierarchical modeling. Applied Psychological Measurement, 29(5), 369–400. https://doi.org/10.1177/0146621605276675
Jung, J. Y., Tyack, L., & von Davier, M. (2022). Automated scoring of constructed-response items using artificial neural networks in international large-scale assessment. Psychological Test and Assessment Modeling, 64(4), 471–494.
Karay, Y., Reiss, B.,&Schauber, S. K. (2020). Progress testing anytime and anywhere: Does a mobile-learning approach enhance the utility of a large-scale formative assessment tool? Medical Teacher, 42(10), 1154–1162. https://doi.org/10.1080/0142159X.2020.1798910
Karpicke, J.D.,&Blunt, J. R. (2011). Retrieval practice produces more learning than elaborative studying with concept mapping. Science, 331(6018), 772–775. https://doi.org/10.1126/science.1199327
Kautz, T., & Zanoni, W. (2014). Measuring and fostering non-cognitive skills in adolescence: Evidence from Chicago public schools and the OneGoal program. University of Chicago.
Kell, H. J., Martin-Raugh, M. P., Carney, L. M., Inglese, P. A., Chen, L., & Feng, G. (2017). Exploring methods for developing behaviorally anchored rating scales for evaluating structured interview performance (Research Report No. RR-17-28). ETS. https://doi.org/10.1002/ets2.12152
Kessler, J. B., Low, C., & Sullivan, C. D. (2019). Incentivized resume rating: Eliciting employer preferences without deception. American Economic Review, 109(11), 3713–3744. https://doi.org/10.1257/aer.20181714
King, G.,&Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15(1), 46–66. https://doi.org/10.1093/pan/mpl011
Kingston, N., & Nash, B. (2011). Formative assessment: A meta-analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28–37. https://doi.org/10.1111/j.1745-3992.2011.00220.x
Klieger, D. M., Kell, H. J., Rikoon, S., Burkander, K. N., Bochenek, J. L., & Shore, J. R. (2018). Development of the behaviorally anchored rating scales for the skills demonstration and progression guide (Research Report No. RR-18-24). ETS. https://doi.org/10.1002/ets2.12210
Klinger, D. A., McDivitt, P. R., Howard, B. B., Munoz, M. A., Rogers, W. T., & Wylie, E. C. (2015). The classroom assessment standards for preK-12 teachers. Kindle Direct Press.
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254–284. https://doi.org/10.1037/0033-2909.119.2.254
Klute, M., Apthorp, H., Harlacher, J., & Reale, M. (2017). Formative assessment and elementary school student academic achievement: A review of the evidence (Report No. REL 2017-259). Regional Educational Laboratory Central.
Koedinger, K. R., Carvalho, P. F., Liu, R., & McLaughlin, E. A. (2023). An astonishing regularity in student learning rate. Proceedings of the National Academy of Sciences, 120(13), Article e2221311120. https://doi.org/10.1073/pnas.2221311120
Kosinski, M., Bachrach, Y., Kohli, P., Stillwell, D., & Graepel, T. (2014). Manifestations of user personality in website choice and behaviour on online social networks. Machine Learning, 95, 357–380. https://doi.org/10.1007/s10994-013-5415-y
Krachman, S. B., Arnold, R., & LaRocca, R. (2016). Expanding the definition of student success: A case study of the CORE districts. Transforming Education. https://transformingeducation.org/wp-content/uploads/2017/04/TransformingEducationCaseStudyFINAL1.pdf
Kukea Shultz, P., & Englert, K. (2021). Cultural validity as foundational to assessment development: An indigenous example. Frontiers in Education, 6, Article 701973. https://doi.org/10.3389/feduc.2021.701973
Kulik, J. A.,&Fletcher, J.D. (2016). Effectiveness of intelligent tutoring systems: A meta-analytic review. Review of Educational Research, 86(1), 42–78. https://doi.org/10.3102/0034654315581420
Kumar, V., & Boulanger, D. (2020, October). Explainable automated essay scoring: Deep learning really has pedagogical value. In Frontiers in Education, 5, Article 572367. https://doi.org/10.3389/feduc.2020.572367
Kuncel, N. R., Kochevar, R. J., & Ones, D. S. (2014). A meta-analysis of letters of recommendation in college and graduate admissions: Reasons for hope. International Journal of Selection and Assessment, 22(1), 101–107. https://doi.org/10.1111/ijsa.12060
Kyllonen, P. C. (2016). Socio-emotional and self-management variables in learning and assessment. In A. A. Rupp & J. P. Leighton (Eds.), The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 174–197). John Wiley & Sons. https://doi.org/10.1002/9781118956588.ch8
Kyllonen, P. (2021). Taxonomy of cognitive abilities and measures for assessing artificial intelligence and robotics capabilities. In AI and the future of skills: Volume 1. Capabilities and assessments (pp. 50–76). OECD Publishing. https://doi.org/10.1787/feecd512-en
Kyllonen, P. C., & Bertling, J. P. (2013). Innovative questionnaire assessment methods to increase cross-country comparability. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 277–285). CRC Press.
Kyllonen, P., Hao, J., Weeks, J., Fauss, M., & Kerzabi, E. (2023). Collaborative problem solving (CPS) skill: Estimating an individual’s contribution to small group performance [Unpublished manuscript]. ETS.
Kyllonen, P., Hartman, R., Sprenger, A., Weeks, J., Bertling, M., McGrew, K., Kriz, S., Bertling, J., Fife, J., & Stankov, L. (2019). General fluid/inductive reasoning battery for a high-ability population. Behavior Research Methods, 51(2), 507–522. https://doi.org/10.3758/s13428-018-1098-4
Kyllonen, P. C., & Kell, H. (2018). Ability tests measure personality, personality tests measure ability: Disentangling construct and method in evaluating the relationship between personality and ability. Journal of Intelligence, 6(3), Article 32, https://doi.org/10.3390/jintelligence6030032
Kyriazos, T. A. (2018). Applied psychometrics: The application of CFA to multitrait-multimethod matrices (CFA-MTMM). Psychology, 9(12), 2625–2648. https://doi.org/10.4236/psych.2018.912150
Landers, R. N., Armstrong, M. B., Collmus, A. B., Mujcic, S., & Blaik, J. (2022). Theory-driven game-based assessment of general cognitive ability: Design theory, measurement, prediction of performance, and test fairness. Journal of Applied Psychology, 107(10), 1655–1677. https://doi.org/10.1037/apl0000954
Landers, R. N.,&Sanchez, D. R. (2022). Game-based, gamified, and gamefully designed assessments for employee selection: Definitions, distinctions, design, and validation. International Journal of Selection and Assessment, 30(1), 1–13. https://doi.org/10.1111/ijsa.12376
Lane, S., Raymond, M. R., & Haladyna, T. M. (Eds.). (2016). Handbook of test development (Vol. 2, pp. 3–18). Routledge.
Lang, J. W. B., & Tay, L. (2021). The science and practice of item response theory in organizations. Annual Review of Organizational Psychology and Organizational Behavior, 8, 311–338. https://doi.org/10.1146/annurev-orgpsych-012420-061705
Langer, C., & Wiederhold, S. (2023). The value of early-career skills (CESifo Working Paper No. 10288). CESifo Network. https://doi.org/10.2139/ssrn.4369987
Lassébie, J., & Quintini, G. (2022). What skills and abilities can automation technologies replicate and what does it mean for workers? New evidence (OECD Social, Employment and Migration Working Papers, No. 282). OECD Publishing. https://doi.org/10.1787/646aad77-en
Law, K. S., Mobley, W. H., & Wong, C.-S. (2002). Impression management and faking in biodata scores among Chinese job-seekers. Asia Pacific Journal of Management, 19, 541–556. https://doi.org/10.1023/A:1020521726390
Lederman, O., Calacci, D., MacMullen, A., Fehder, D. C., Murray, F. E., & Pentland, A.S. (2016). Open badges: A low-cost toolkit for measuring team communication and dynamics. In The online proceedings of the 2016 International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BriMS 2016). http://sbp-brims.org/2016/proceedings/IN_105.pdf
Lee, G. H., Lee, K. J., Jeong, B. & Kim, T. (2024). Developing personalized marketing service using generative AI. IEEE Access, 12, 22394–22402. https://doi.org/10.1109/ACCESS.2024.3361946
Lee, H. A. (2023, January 23). This is why Microsoft Kinect was a complete failure. SVG. https://www.svg.com/301470/this-is-why-microsoft-kinect-was-a-complete-failure/
Lee, Y.-H., & Haberman, S. J. (2013). Harmonic regression and scale stability. Psychometrika, 78(4), 815–829. https://doi.org/10.1007/s11336-013-9337-1
Lee, Y.-H., & Haberman, S. J. (2021). Studying score stability with a harmonic regression family: A comparison of three approaches to adjustment of examinee-specific demographic data. Journal of Educational Measurement, 58(1), 54–82. https://doi.org/10.1111/jedm.12266
Lee, Y.-H., & Lewis, C. (2021). Monitoring item performance with CUSUM statistics in continuous testing. Journal of Educational and Behavioral Statistics, 46(5), 611–648. https://doi.org/10.3102/1076998621994563
Lee, Y.-H., Lewis, C., & von Davier, A. A. (2014). Monitoring the quality and security of multistage tests. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 285–300). CRC Press.
Lee, Y.-H., & von Davier, A. A. (2013). Monitoring scale scores over time via quality control charts, model-based approaches, and time series techniques. Psychometrika, 78(3), 557–575. https://doi.org/10.1007/s11336-013-9317-5
Leenknecht, M., Hompus, P., & van der Schaaf, M. (2019). Feedback seeking behaviour in higher education: The association with students’ goal orientation and deep learning approach. Assessment & Evaluation in Higher Education, 44(7), 1069–1078. https://doi.org/10.1080/02602938.2019.1571161
Lehman, B., Sparks, J. R., & Zapata-Rivera, D. (2018). When should an adaptive assessment care? In N. Guin & A. Kumar (Eds.), Proceedings of ITS 2018: Intelligent Tutoring Systems 14th International Conference, Workshop on Exploring Opportunities for Caring Assessments (pp. 87–94). ITS. https://ceur-ws.org/Vol-2354/w3paper1.pdf
Leonhardt, D. (2024, January 7). The misguided war on the SAT. The New York Times. https://www.nytimes.com/2024/01/07/briefing/the-misguided-war-on-the-sat.html
Lewin, T. (2002, December 4). Henry Chauncey dies at 97; Shaped admission testing for the nation’s colleges. The New York Times. https://www.nytimes.com/2002/12/04/nyregion/henry-chauncey-dies-at-97-shaped-admission-testing-for-the-nation-s-colleges.html
Lewis, C. (2001). Expected response functions. In A. Boomsma, M. A J. van Duijn, & T. A. B. Snijders (Eds.), Essays on item response theory (pp. 163–171). Springer. https://doi.org/10.1007/978-1-4613-0169-1_9
Lewis, C.,&Thayer, D. T. (1998). The power of the K-index (or PMIR) to detect copying (Research Report No. RR-98-49). ETS. https://doi.org/10.1002/j.2333-8504.1998.tb01798.x
LinkedIn Talent Solutions. (2019). Global talent trends: The 3 trends transforming your workplace. https://business.linkedin.com/content/dam/me/business/en-us/talent-solutions/resources/pdfs/global_talent_trends_2019_emea.pdf
Linzarini, A., & Catarino da Silva, D. (2024). Innovative assessments for Social Emotional Skills [webinar slides]. SlideShare. https://www.slideshare.net/slideshow/webinar-innovative-assessments-for-social-emotional-skills/270083576
Lira, B., O’Brien, J. M., Peña, P. A., Galla, B. M., D’Mello, S., Yeager, D. S., Defnet, A., Kautz, T., Munkacsy, K., & Duckworth, A. L. (2022). Large studies reveal how reference bias limits policy applications of self-report measures. Scientific Reports, 12, Article 19189. https://doi.org/10.1038/s41598-022-23373-9
Lissitz, R.W. (2009). Introduction. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 1–15). IAP Information Age Publishing.
Liu, O. L., Bridgeman, B., & Adler, R. M. (2012). Measuring learning outcomes in higher education: Motivation matters. Educational Researcher, 41(9), 352–362. https://doi.org/10.3102/0013189X12459679
Liu, O. L., Kell, H. J., Liu, L., Ling, G., Wang, Y., Wylie, C., Sevak, A., Sherer, D., LeMahieu, P., & Knowles, T. (2023). A new vision for skills-based assessment. ETS. https://ets.org/pdfs/rd/new-vision-skills-based-assessment.pdf
Liu, O. L., Mao, L., Frankel, L., & Xu, J. (2016). Assessing critical thinking in higher education: The HEIghten approach and preliminary validity evidence. Assessment & Evaluation in Higher Education, 41(5), 677–694. https://doi.org/10.1080/02602938.2016.1168358
Liu, X., Zhang, Z., Wang, Y., Pu, H., Lan, Y., & Shen, C. (2023). COCO: Coherence-enhanced machine-generated text detection under low resource with contrastive learning. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 16167–16188). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.1005
Loewus, L. (2016). What is digital literacy? Education Week. https://www.edweek.org/teaching-learning/what-is-digital-literacy/2016/11
Loukina, A., Yoon, S.-Y., Sakano, J., Wei, Y., & Sheehan, K. (2016). Textual complexity as a predictor of difficulty of listening items in language proficiency tests. In Y. Matsumoto & R. Prasad (Eds.), Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical papers (pp. 3245–3253). https://aclanthology.org/C16-1306
Ludlow, L. H., O’Keefe, T., Braun, H., Anghel, E., Szendey, O., Matz, C., & Howell, B., (2022). An enhancement to the theory and measurement of purpose. Practical Assessment, Research, and Evaluation 27(1), Article 4. https://doi.org/10.7275/c5jb-rr95
Ma, W., Adesope, O. O., Nesbit, J. C., & Liu, Q. (2014). Intelligent tutoring systems and learning outcomes: A meta-analysis. Journal of Educational Psychology, 106(4), 901–918. https://doi.org/10.1037/a0037123
MacCann, C., & Roberts, R. D. (2008). New paradigms for assessing emotional intelligence: Theory and data. Emotion, 8(4), 540–551. https://doi.org/10.1037/a0012746
Madnani, N., & Cahill, A. (2018). Automated scoring: Beyond natural language processing. In E. M. Bender, L.Derczynski, & P. Isabelle (Eds.), Proceedings of the 27th International Conference on Computational Linguistics (pp. 1099–1109). ACL. https://aclanthology.org/C18-1094
Mammadov, S. (2022). Big Five personality traits and academic performance: A meta-analysis. Journal of Personality, 90(2), 222–255. https://doi.org/10.1111/jopy.12663
Mankki, V. (2023). Research using teacher or teacher educator job advertisements: A scoping review. Cogent Education, 10(1), Article 2223814. https://doi.org/10.1080/2331186X.2023.2223814
Martin-Raugh, M. P., Kyllonen, P. C., Hao, J., Bacall, A., Becker, D., Kurzum, C., Yang, Z., Yan, F., & Barnwell, P. (2020). Negotiation as an interpersonal skill: Generalizability of negotiation outcomes and tactics across contexts at the individual and collective levels. Computers in Human Behavior, 104, Article 105966. https://doi.org/10.1016/j.chb.2019.03.030
Martín-Raugh, M., Roohr, K. C., Leong, C. W., Molloy, H., McCulla, L., Ramanarayan, V., & Mladineo, Z. (2023). Better understanding oral communication skills: The impact of perceived personality traits. American Journal of Distance Education. Advance online publication. https://doi.org/10.1080/08923647.2023.2235950
Mattingly, S.M., Gregg, J.M., Audia, P., Bayraktaroglu, A. E., Campbell, A. T., Chawla, N. V., Das Swain, V., DeChoudhury, M., D’Mello, S. K., Dey, A. K., Gao, G., Jagannath, K., Jiang, K., Lin, S., Liu, Q., Mark, G., Martinez, G. J. Masaba, K., Mirjafari, S., … Striege, A. (2019, May). The tesserae project: Large-scale, longitudinal, in situ, multimodal sensing of information workers. In Extended abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-8). ACM. https://doi.org/10.1145/3290607.3299041
McLaughlin, K., Ainslie. M., Coderre, S., Wright, B., & Violato, C. (2009). The effect of differential rater function over time (DRIFT) on objective structured clinical examination ratings. Medical Education, 43(10), 989–992. https://10.1111/j.1365-2923.2009.03438.x
McWhorter, J. (2024, March 14). No, the SAT isn’t racist. The New York Times. https://www.nytimes.com/2024/03/14/opinion/sat-college-admissions-antiracism.html
Mervosh, S. (2022, September 1). The pandemic erased two decades of progress in math and reading: The results of a national test showed just how devastating the last two years have been for 9-year-old schoolchildren, especially the most vulnerable. The New York Times. https://www.nytimes.com/2022/09/01/us/national-test-scores-math-reading-pandemic.html
Meyer, R. H., Wang, C., & Rice, A. B. (2018). Measuring students’ social-emotional learning among California’s CORE districts: An IRT modeling approach [Working paper]. Policy Analysis for California Education. https://edpolicyinca.org/sites/default/files/Measuring_SEL_May-2018.pdf
Mignogna, G., Carey, C. E., Wedow, R., Baya, N., Cordioli, M., Pirastu, N., Bellocco, R., Mlerbi, K. F., Nivard, M. G., Neale, B. M., Walters, R. K., & Ganna, A. (2023). Patterns of item nonresponse behaviour to survey questionnaires are systematic and associated with genetic loci. Nature Human Behaviour, 7, 1371–1387. https://doi.org/10.1038/s41562-023-01632-7
Millsap, R. (2011). Statistical approaches to measurement invariance. Routledge.
Mirjafari, S., Masaba, K., Grover, T., Wang, W., Audia, P., Campbell, A. T., Chawla, N. V., Das Swain, V., De Choudhury, M., Dey, A. K., D’Mello, S. K., Gao, G., Gregg, J. M., Jagannath, K., Jiang, K., Lin, S., Qiang, L., Mark, G., Martinez, G. J., Martinez, S. M., … Striegel, A. (2019). Differentiating higher and lower job performers in the workplace using mobile sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(2), 1–24. https://doi.org/10.1145/3328908
Mislevy, R. (2018). Sociocognitive foundations of educational measurement. Routledge.
Mislevy, R. J., Oranje, A., Bauer, M. I., von Davier, A., Hao, J., Corrigan, S., Hoffman, E., DiCerbo, K., & Michael, J. (2014). Psychometric considerations in game-based assessment. GlassLab Research, Institute of Play. https://web.archive.org/web/20160320151604/http://www.instituteofplay.org/wp-content/uploads/2014/02/GlassLab_GBA1_WhitePaperFull.pdf
Mislevy, R. J., Sheehan, K.M., & Wingersky, M. (1993). How to equate tests with little or no data. Journal of Educational Measurement, 30(1), 55–78. https://doi.org/10.1111/j.1745-3984.1993.tb00422.x
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1(1), 3–62. https://doi.org/10.1207/S15366359MEA0101_02
Molenaar, I., de Mooij, S., Azevedo, R., Bannert, M., Järvelä, S., & Gaševi´c, D. (2023). Measuring self-regulated learning and the role of AI: Five years of research using multimodal multichannel data. Computers in Human Behavior, 139, Article 107540. https://doi.org/10.1016/j.chb.2022.107540
Morell, Z. (2017). Introduction to the New York State next generation early learning standards. https://www.nysed.gov/sites/default/files/introduction-to-the-nys-early-learning-standards.pdf
Moreno, R. (2004). Decreasing cognitive load for novice students: Effects of explanatory versus corrective feedback in discovery-based multimedia. Instructional Science, 32(1–2), 99–113. https://doi.org/10.1023/B:TRUC.0000021811.66966.1d
Moro, E., Frank, M. R., Pentland, A., Rutherford, A., Cebrian, M., & Rahwan, I. (2021). Universal resilience patterns in labor markets. Nature Communications, 12, Article 1972. https://doi.org/10.1038/s41467-021-22086-3
Mumford, M. D., & Owens, W. A. (1987). Methodology review: Principles, procedures, and findings in the application of background data measures. Applied Psychological Measurement, 11(1), 1–31. https://doi.org/10.1177/014662168701100101
Murphy, S. C., Klieger, D. M., Borneman, M. J., & Kuncel, N. R. (2009). The predictive power of personal statements in admissions: A meta-analysis and cautionary tale. College and University, 84(4), 83–86.
Narciss, S. (2004). The impact of informative tutoring feedback and self-efficacy on motivation and achievement in concept learning. Experimental Psychology, 51(3), 214–228. https://doi.org/10.1027/1618-3169.51.3.214
Narciss, S., Sosnovsky, S., Schnaubert, L., Andrès, E., Eichelmann, A., Goguadze, G., & Melis, E. (2014). Exploring feedback and student characteristics relevant for personalizing feedback strategies. Computers & Education, 71, 56–76. https://doi.org/10.1016/j.compedu.2013.09.011
National Academies of Sciences, Engineering, and Medicine. (2018). How people learn II: Learners, contexts, and cultures. The National Academies Press. https://doi.org/10.17226/24783
National Academies of Sciences, Engineering, and Medicine. (2019). Monitoring educational equity. The National Academies Press. https://doi.org/10.17226/25389
National Association of Colleges and Employers. (2022). NACE job outlook 2022. https://www.naceweb.org/uploadedFiles/files/2022/resources/nace-job-outlook-2022.pdf
National Research Council. (1999a). High stakes: Testing for tracking, promotion, and graduation. The National Academies Press. https://doi.org/10.17226/6336
National Research Council. (1999b). Myths and tradeoffs: The role of tests in undergraduate admissions. The National Academies Press. https://doi.org/10.17226/9632
National Research Council. (2000). How people learn: Brain, mind, experience, and school (expanded ed.). The National Academies Press. https://doi.org/10.17226/9853
National Research Council (2001). Knowing what students know: The science and design of educational assessment. The National Academies Press. https://doi.org/10.17226/10019.
National Research Council. (2012). Education for life and work: Developing transferable knowledge and skills in the 21st century. The National Academies Press. https://doi.org/10.17226/13398.
Nesbit, J.C., Adesope, O.O., Liu, Q.,&Ma, W. (2014, July). How effective are intelligent tutoring systems in computer science education? In 2014 IEEE 14th International Conference on Advanced Learning Technologies (pp. 99–103). IEEE. https://doi.org/10.1109/ICALT.2014.38
Nguyen, T. H., Han, H.-R., Kim, M. T., & Chan, K. S. (2014). An introduction to item response theory for patient-reported outcome. Measurement, 7(1), 23–35. https://doi.org/10.1007/s40271-013-0041-0
Nickow, A., Oreopoulos, P., & Quan, V. (2020). The impressive effects of tutoring on PreK-12 learning: A systematic review and meta-analysis of the experimental evidence (NBER working paper No. 27476). National Bureau of Economic Research. https://doi.org/10.3386/w27476
Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2017). Measuring non-cognitive predictors in high-stakes contexts: The effect of self-presentation on self-report instruments used in admission to higher education. Personality and Individual Differences, 106, 183–189. https://doi.org/10.1016/j.paid.2016.11.014
Noor, N., Beram, S., Yuet, F. K. C., Gengatharan, K., Syafiq, M., & Rasidi, M. S. M. (2023). Bias, halo effect and horn effect: A systematic literature review. International Journal of Academic Research in Business & Social Sciences, 13(3), 1116–1140. https://doi.org/10.6007/IJARBSS/v13-i3/16733
Norville, V. (2022). States sketch ‘portraits of a graduate.’ State Innovations, 27(1), 1–4.
Novarese, M., & Di Giovinazzo, V. (2013). Promptness and academic performance (MPRA Paper No. 49746). Munich Personal RePEc Archive. https://mpra.ub.uni-muenchen.de/49746/
Ober, T. M., Lehman, B. A., Gooch, R., Oluwalana, O., Solyst, J., Phelps, G., & Hamilton, L. S. (2023). Culturally responsive learning: Recommendations for a working definition and framework (Research Report No. RR-23-09). Educational Testing Service. https://doi.org/10.1002/ets2.12372
O’Dwyer, E., Sparks, J. R., & Nabors Oláh, L. (2023). Enacting a process for developing culturally relevant classroom assessments. Applied Measurement in Education, 36(3), 286–303. https://doi.org/10.1080/08957347.2023.2214652
OECD. (n.d.). Education & Skills Online Assessment. https://www.oecd.org/skills/ESonline-assessment/abouteducationskillsonline/
OECD. (2015). Skills for social progress: The power of social and emotional skills. OECD Publishing. https://doi.org/10.1787/9789264226159-en
OECD. (2019). An OECD learning framework 2030. In G. Bast, E. G. Carayannis, & D. F. J. Campbell (Eds.), The future of education and labor. Arts, research, innovation and society (pp. 23–35). Springer. https://doi.org/10.1007/978-3-030-26068-2_3
OECD. (2021). AI and the future of skills: Volume 1. Capabilities and assessments. OECD Publishing. https://doi.org/10.1787/5ee71f34-en.
OECD. (2022a). Building the future of education. OECD Publishing. https://web-archive.oecd.org/2022-11-30/618066-future-of-education-brochure.pdf
OECD. (2022b). PISA 2022 results. https://www.oecd.org/publication/pisa-2022-results#pisa2022results
OECD. (2023). OECD skills outlook 2023: Skills for a resilient green and digital transition. OECD Publishing. https://doi.org/10.1787/27452f29-en
Oh, I.-S., Wang, G., & Mount, M. K. (2011). Validity of observer ratings of the five-factor model of personality traits: A meta-analysis. Journal of Applied Psychology, 96(4), 762–773. https://doi.org/10.1037/a0021832
O’Neil, H., Baker, E. L., Wainess, R., Chen, C., Mislevy, R., & Kyllonen, P. (2004). Final report on plan for the assessment and evaluation of individual and team proficiencies developed by the DARWARS Environments. Office of Naval Research; Defense Advanced Research Project Agency. https://apps.dtic.mil/sti/tr/pdf/ADA432802.pdf
OPM. (n.d.). Other assessment methods. OPM U.S. Office of Personnel Management. https://www.opm.gov/policy-data-oversight/assessment-and-selection/other-assessment-methods/
Ormerod, C. M., Malhorta, A.,&Jafari, A. (2021). Automated essay scoring using efficient transformer-based language models. PsyArXiv. https://arxiv.org/pdf/2102.13136.pdf
Ortner, T.M.,&Proyer, R. T. (2015). Objective personality tests. In T.M. Ortner&F. J. R. van de Vijver (Eds.), Behavior-based assessment in psychology: Going beyond self-report in the personality, affective, motivation, and social domains (pp. 133–149). Hogrefe.
Ortner, T.M., Proyer, R. T., & Kubinger, K. D. (2006). Theorie und praxis objektiver personlichkeitstests [Theory and practice of objective personality tests]. Verlag Hans Huber.
Ostini, R., & Nering, M. L. (2006). Polytomous item response theory models (No. 144). Sage.
Oswald, F. L., Schmitt, N., Kim, B. H., Ramsay, L. J., & Gillespie, M. A. (2004). Developing a biodata measure and situational judgment inventory as predictors of college student performance. Journal of Applied Psychology, 89(2), 187–207. https://doi.org/10.1037/0021-9010.89.2.187
Panadero, E. (2023). Toward a paradigm shift in feedback research: Five further steps influenced by self-regulated learning theory. Educational Psychologist, 58(3), 193–204. https://doi.org/10.1080/00461520.2023.2223642
Panadero, E., & Lipnevich, A. A. (2022). A review of feedback models and typologies: Towards an integrative model of feedback elements. Educational Research Review, 35, Article 100416. https://doi.org/10.1016/j.edurev.2021.100416
Panthier, C., & Gatinel, D. (2023). Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: A novel approach to medical knowledge assessment. Journal Français d’Ophtalmologie, 46(7), 706–711. https://doi.org/10.1016/j.jfo.2023.05.006
Patrick, S. (2021). Transforming learning through competency-based education. State Education Standard, 21(2), 23–29.
Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I. Braubn, D. N. Jackson, & D. E. Wiley (Eds.), The role of constructs in psychological and educational measurement (pp. 49–69). Erlbaum.
Phelps, R. P. (2019). Test frequency, stakes, and feedback in student achievement: A meta-analysis. Evaluation Review, 43(3–4), 111–151. https://doi.org/10.1177/0193841X19865628
Poropat, A. E. (2014). A meta-analysis of adult-rated child personality and academic performance in primary education. British Journal of Educational Psychology, 84(2), 239–252. https://doi.org/10.1111/bjep.12019
Posso, A. (2016). Internet usage and educational outcomes among 15-year old Australian students. International Journal of Communication, 10, 3851–3876. https://ijoc.org/index.php/ijoc/article/view/5586/1742
Powers, D. E., & Fowles, M. E. (1997). The personal statement as an indicator of writing skill: A cautionary note. Educational Assessment, 4(1), 75–87. https://doi.org/10.1207/s15326977ea0401_3
Prabhumoye, S., Tsvetkov, Y., Salakhutdinov, R., & Black, A. W. (2018). Style transfer through back-translation. In I. Gurevych & Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Volume 1. Long Papers (pp. 866–876). Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1080
Qian, Y., Tao, J., Suendermann-Oeft, D., Evanini, K., Ivanov, A. V., & Ramanarayanan, V. (2018a). Computer-implemented systems and methods for speaker recognition using a neural network (U.S. Patent 10,008,209). U.S. Patent Office and Trademark Office. https://ppubs.uspto.gov/pubwebapp/external.html?q=(10008209).pn.&db=USPAT&type=ids
Qian, Y., Tao, J., Suendermann-Oeft, D., Evanini, K., Ivanov, A. V., & Ramanarayanan, V. (2018b). Noise and metadata sensitive bottleneck features for improving speaker recognition with non-native speech input. In Proceedings of INTERSPEECH 2016: 17th Annual Conference of the International Speech Communication Association (pp. 3648–3652). https://doi.org/10.21437/Interspeech.2016-548
RAND. (2020). RAND education assessment finder. https://www.rand.org/education-and-labor/projects/assessments/tool.html
Randall, J. (2023). It ain’t near ’bout fair: Re-envisioning the bias and sensitivity review process from a justice-oriented antiracist perspective. Educational Assessment, 28(2), 68–82. https://doi.org/10.1080/10627197.2023.2223924
Rees, A., (2021, December 27). The history of predicting the future. Wired. https://www.wired.com/story/history-predicting-future/
Rios, J. A., Ling, G., Pugh, R., Becker, D., & Bacall, A. (2020). Identifying critical 21st-century skills for workplace success: A content analysis of job advertisements. Educational Researcher, 49(2), 80–89. https://doi.org/10.3102/0013189X19890600
Roediger III, H. L., Agarwal, P. K., McDaniel, M. A., & McDermott, K. B. (2011). Test-enhanced learning in the classroom: long-term improvements from quizzing. Journal of Experimental Psychology: Applied, 17(4), 382–395. https://doi.org/10.1037/a0026252
Roll, I., & Barhak-Rabinowitz, M. (2023). Measuring self-regulated learning using feedback and resources. In N. Foster & M. Piacentini (Eds.), Innovating assessments to measure and support complex skills. OECD Publishing. https://doi.org/10.1787/c93ac64e-en
Rowland, C. A. (2014). The effect of testing versus restudy on retention: a meta-analytic review of the testing effect. Psychological Bulletin, 140(6), 1432–1463. https://doi.org/10.1037/a0037559
Rupp, A. A. (2018). Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions. Applied. Measurement in Education, 31(3), 191–214. https://doi.org/10.1080/08957347.2018.1464448
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.
Salgado, J. F., & Moscoso, S. (2019). Meta-analysis of interrater reliability of supervisory performance ratings: Effects of appraisal purpose, scale type, and range restriction. Frontiers in Psychology, 10, Article 2281. https://doi.org/10.3389/fpsyg.2019.02281
Salgado, J. F., & Tauriz, G. (2014). The Five-Factor model, forced-choice personality inventories and performance: A comprehensive meta-analysis of academic and occupational validity studies. European Journal of Work and Organizational Psychology, 23(1), 3–30. https://doi.org/10.1080/1359432X2012.716198
Scalise, K., & Gifford, B. (2006). Computer-based assessment in e-learning: a framework for constructing “intermediate constraint” questions and tasks for technology platforms. The Journal of Technology, Learning and Assessment, 4(6). https://ejournals.bc.edu/index.php/jtla/article/view/1653
Scalise, K., Malcom, C. & Kaylor, E. (2023). A tale of two worlds: Machine learning approaches at the intersection with educational measurement. In N. Foster&M. Piacentini (Eds.), Innovating assessments to measure and support complex skills (pp. 229–237). OECD Publishing. https://doi.org/10.1787/d01eb8a4-en
Schmeiser, C. B., & Welch, C. J. (2006). Test development. In R. L. Brennan (Ed.), Educational measurement (4th ed.; pp. 307–353). American Council on Education; Praeger.
Schmill, S. (2022, March 28). We are reinstating our SAT/ACT requirement for future admissions cycles in order to help us continue to build a diverse and talented MIT. MIT Admissions. https://mitadmissions.org/blogs/entry/we-are-reinstating-our-sat-act-requirement-for-future-admissions-cycles/#annotation-10
Schmitt, N., Keeney, J., Oswald, F. L., Pleskac, T. J., Billington, A. Q., Sinha, R., & Zorzie, M. (2009). Prediction of 4-year college student performance using cognitive and noncognitive predictors and the impact on demographic status of admitted students. Journal of Applied Psychology, 94(6), 1479–1497. https://doi.org/10.1037/a0016810
Schrum, L., & Levin, B. B. (2013). Leadership for twenty-first-century schools and student achievement: Lessons learned from three exemplary cases. International Journal of Leadership in Education, 16(4), 379–398. https://doi.org/10.1080/13603124.2013.767380
Schwartz, D. L., Tsang, J. M., & Blair, K. P. (2016). The ABCs of how we learn: 26 scientifically proven approaches, how they work, and when to use them. W.W. Norton & Company.
Segal, C. (2012). Working when no one is watching: Motivation, test scores, and economic success. Management Science, 58(8), 1438–1457. https://doi.org/10.1287/mnsc.1110.1509
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61(2), 331–354. https://doi.org/10.1007/BF02294343
Shafer, G. W., Viskupic, K., & Egger, A. E. (2023). Critical workforce skills for bachelor-level geoscientists: An analysis of geoscience job advertisements. Geosphere, 19(2), 628–644. https://doi.org/10.1130/GES02581.1
Shen, T., Lei, T., Barzilay, R.,&Jaakkola, T. (2017). Style transfer from non-parallel text by cross-alignment. In I. Gurevych, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems 30 (NIPS 2017) (pp. 1–12). Curran Associates. https://papers.nips.cc/paper_files/paper/2017/file/2d2c8394e31101a261abf1784302bf75-Paper.pdf
Shepard, L. A. (2017). Formative assessment: Caveat emptor. In C. A. Dwyer (Ed.), The future of assessment (pp. 279–303). Routledge. https://doi.org/10.4324/9781315086545-12
Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge. https://doi.org/10.4324/9780203122761
Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189. https://doi.org/10.3102/0034654307313795
Shute, V. J., & Zapata-Rivera, D. (2012). Adaptive educational systems. Adaptive technologies for Training and Education, 7(27), 1–35. https://doi.org/10.1017/CBO9781139049580.004
Sinatra, A. M., Robinson, R. L., Goldberg, B., & Goodwin, G. (2023). Impact of engaging with intelligent tutoring system lessons prior to class start. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 67(1), 2262–2266. https://doi.org/10.1177/21695067231192709
Sinharay, S. (2023). Statistical methods for detection of test fraud on educational assessments. In R. J. Tierney, F. Rizvi, & K. Ercikan (Eds.) International encyclopedia of education (4th ed., pp. 298–307). Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.10030-2
Sinharay, S., & Johnson, M. S. (2013). Statistical modeling of automatically generated items. In M. J. Gier & T. M. Haladyna (Eds.), Automatic item generation (pp. 183–195). Routledge.
Sinharay, S., & Johnson, M. S. (2023). Computation and accuracy evaluation of comparable scores on culturally responsive assessments. Journal of Educational Measurement, 61(1), 5-46. https://doi.org/10.1111/jedm.12381
Sireci, S. G. (2020). Standardization and UNDERSTANDardization in educational assessment. Educational Measurement: Issues and Practice, 39(3), 100–105. https://doi.org/10.1111/emip.12377
Slavich, G. (2019). Stressnology: the primitive (and problematic) study of life stress exposure and pressing need for better measurement. Brain Behavior and Immunity, 75, 3–5. https://doi.org/10.1016/j.bbi.2018.08.011
Society for Industrial Organizational Psychology. (2018). Principles for the validation and use of personnel selection procedures (5th ed.). https://www.apa.org/ed/accreditation/personnel-selection-procedures.pdf
Soland, J.,&Kuhfeld, M. (2021). Do response styles affect estimates of growth on social-emotional constructs? Evidence from four years of longitudinal survey scores. Multivariate Behavioral Research, 56(6), 853–873. https://doi.org/10.1080/00273171.2020.1778440
Solano-Flores, G. (2019). Examining cultural responsiveness in large-scale assessment: The matrix of evidence for validity argumentation. Frontiers in Education, 4, Article 2019.00043. https://doi.org/10.3389/feduc.2019.00043
Solano-Flores, G. (2023). How serious are we about fairness in testing and how far are we willing to go? A response to Randall and Bennett with reflections about the Standards for Educational and Psychological Testing. Educational Assessment, 28(2), 105–117. https://doi.org/10.1080/10627197.2023.2226388
Soto, C. J., Napolitano, C. M., Sewell, M. N., Yoon, H. J., & Roberts, B. W. (2022). An integrative framework for conceptualizing and assessing social, emotional, and behavioral skills: The BESSI. Journal of Personality and Social Psychology, 123(1), 192–222. https://doi.org/10.1037/pspp0000401
Sottilare, R. A., Baker, R. S., Graesser, A. C., & Lester, J. (2018). Special issue on the generalized intelligent framework for tutoring (GIFT): Creating a stable and flexible platform for innovations in AIED research. International Journal of Artificial Intelligence and Education, 28(1), 139–151. https://doi.org/10.1007/s40593-017-0149-9
Sparks, J. R., Lehman, B., & Zapata-Rivera, D. (2024). Caring assessments: Challenges and opportunities. Frontiers in Education, 9, Article 1216481. https://doi.org/10.3389/feduc.2024.1216481
Stankov, L., Kleitman, S., & Jackson, S. A. (2015). Measures of the trait of confidence. In G. J. Boyle, D. H. Saklofske, & G. Matthews (Eds.), Measures of personality and social psychological constructs (pp. 158–189). Elsevier Academic Press. https://doi.org/10.1016/B978-0-12-386915-9.00007-3
Steenbergen-Hu, S., & Cooper, H. (2013). A meta-analysis of the effectiveness of intelligent tutoring systems on K–12 students’ mathematical learning. Journal of Educational Psychology, 105(4), 970–987. https://doi.org/10.1037/a0032447
Sternberg, R. J., Forsythe, G. B., Hedlund, J., Horvath, J. A., Wagner, R. K., Williams, W. M., Snook, S. A., & Grigorenko, E. L. (2000). Practical intelligence in everyday life. Cambridge University Press.
Stecher, B. M., & Hamilton, L. S. (2014). Measuring hard-to-measure student competencies: A research and development plan (Research Report No. RR-863-WFHF). RAND Corporation. https://doi.org/10.7249/RR863
Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17(3), 277–292. https://doi.org/10.1177/014662169301700308
Stowe, K., Ghosh, D., & Zhao, M. (2022). Controlled language generation for language learning items. arXiv. https://doi.org/10.48550/arXiv.2211.15731
Straub, L. M., Lin, E., Tremonte-Freydefont, L., & Schmid, P. C. (2023). Individuals’ power determines how they respond to positive versus negative performance feedback. European Journal of Social Psychology, 53(7), 1402–1420. https://doi.org/10.1002/ejsp.2985
Su, R., Tay, L., Liao, H.-Y., Zhang, Q., & Rounds, J. (2019). Toward a dimensional model of vocational interests. Journal of Applied Psychology, 104(5), 690–714. https://doi.org/10.1037/apl0000373
Tang, R., Chuang, Y.-N.,&Hu, X. (2023). The science of detecting LLM-generated texts. arXiv. https://doi.org/10.48550/arXiv.2303.07205
Tang, Z., & Kirman, B. (2023). Exploring curiosity in games: A framework and questionnaire study of player perspectives. International Journal of Human-Computer Interaction. Advance online publication. https://doi.org/10.1080/10447318.2024.2325171
Tannenbaum, R. J., & Kane, M. T. (2019). Stakes in testing: Not a simple dichotomy but a profile of consequences that guides needed evidence of measurement quality (Research Report No. RR-19-19). ETS. https://doi.org/10.1002/ets2.12255
Tenison, C., & Sparks, J. R. (2023). Combining cognitive theory and data driven approaches to examine students’ search strategies in simulated digital environments. Large-Scale Assessments in Education, 11, Article 28. https://doi.org/10.1186/s40536-023-00164-w
Turchin, D. (Host). (2023, March 6). Andi Mann, Sageable CEO and AIOps pioneer, discusses enterprise AI wins and the impact of automation on jobs [Audio podcast episode]. In AI and the Future of Work. Apple Podcasts. https://podcasts.apple.com/us/podcast/andi-mann-sageable-ceo-and-aiops-pioneer-discusses/id1476885647?i=1000602978601
Trull, T. J., & Ebner-Priemer, U. (2013). Ambulatory assessment. Annual Review of Clinical Psychology, 9, 151–176. https://doi.org/10.1146/annurev-clinpsy-050212-185510
U.S. Congress, Office of Technology Assessment. (1992). Testing in American schools: Asking the right questions (Report No. OTA-SET-519). U.S. Government Printing Office.
U.S. Office of Personnel Management. (n.d.). Situational judgment tests. https://www.opm.gov/policy-data-oversight/assessment-and-selection/other-assessment-methods/situational-judgment-tests/
van der Linden, W. J. (2005). Linear models for optimal test design. Springer. https://doi.org/10.1007/0-387-29054-0
van der Linden, W. J., & Glas, C. A. W. (Eds.). (2010). Elements of adaptive testing. Springer. https://doi.org/10.1007/978-0-387-85461-8
van de Vijver, F. J. R.,& He, J. (2016). Bias assessment and prevention in noncognitive outcome measures in context assessments. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: International perspectives (pp. 229–253). Springer. https://doi.org/10.1007/978-3-319-45357-6_9
van de Vijver, F., & Poortinga, Y. H. (2005). Conceptual and methodological issues in adapting tests. In R. H. Hambleton, P. F. Merenda, & C. D. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment. Taylor & Francis. https://doi.org/10.4324/9781410611758
VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46(4), 197–221. https://doi.org/10.1080/00461520.2011.611369
VanLehn, K., Graesser, A. C., Jackson, G. T., Jordan, P., Olney, A., & Rosé, C. P. (2007). When are tutorial dialogues more effective than reading? Cognitive Science, 31(1), 3–62. https://doi.org/10.1080/03640210709336984
von Davier, M. (2010). Hierarchical mixtures of diagnostic models. Psychological Test and Assessment Modeling, 52(1), 8–28.
von Davier, M., Tyack, L., & Khorramdel, L. (2023). Scoring graphical responses in TIMSS 2019 using artificial neural networks. Educational and Psychological Measurement, 83(3), 556–585. https://doi.org/10.1177/00131644221098021
Waheed, H., Hassan, S.-U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104, Article 106189. https://doi.org/10.1016/j.chb.2019.106189
Wainer, H. (1987). The first four millennia of mental testing: From ancient China to the computer age (Research Report No. RR-87-34). ETS. https://doi.org/10.1002/j.2330-8516.1987.tb00238.x
Wainer, H., & Thissen, D. (2001). True score theory: The traditional method. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 35–84). Routledge. https://doi.org/10.4324/9781410604729
Walker, M. E., Olivera-Aguilar, M., Lehman, B., Laitusis, C., Guzman-Orth, D.,&Gholson, M. (2023). Culturally responsive assessment: provisional principles (Research Report No. RR-23-11). ETS. https://doi.org/10.1002/ets2.12374
Walkington, C., & Bernacki, M. L. (2020). Appraising research on personalized learning: Definitions, theoretical alignment, advancements, and future directions. Journal of Research on Technology in Education, 52(3), 235–252. https://doi.org/10.1080/15391523.2020.1747757
Wang, F., Liu, Q., Chen, E., Huang, Z., Chen, Y., Yin, Y., Huang, Z.,&Wang, S. (2020). Neural cognitive diagnosis for intelligent education systems. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 6153–6161. https://doi.org/10.1609/aaai.v34i04.6080
Wang, J., Jou, M., Lv, Y., & Huang, C. C. (2018). An investigation on teaching performances of model-based flipping classroom for physics supported by modern teaching technologies. Computers in Human Behavior, 84, 36–48. https://doi.org/10.1016/j.chb.2018.02.018
Weinberger, C. J. (2014). The increasing complementarity between cognitive and social skills. Review of Economics and Statistics, 96(5), 849–861. https://doi.org/10.1162/REST_a_00449
Weirich, S., Hecht, M., Penk, C., Roppelt, A., & Böhme, K. (2017). Item position effects are moderated by changes in test-taking effort. Applied Psychological Measurement, 41(2), 115–129. https://doi.org/10.1177/0146621616676791
Weiss, S., Wilhelm, O., & Kyllonen, P. (2021). An improved taxonomy of creativity measures based on salient task attributes. Psychology of Aesthetics, Creativity, and the Arts. Advance online publication. https://doi.org/10.1037/aca0000434
West, M., Pier, L., Fricke, H., Hough, H. J., Loeb, S., Meyer, R. H., & Rice, A. B. (2018). Trends in student social-emotional learning: evidence from the CORE districts (Working paper). Policy Analysis for California Education. https://edpolicyinca.org/publications/trends-student-social-emotional-learning
Williamson, D.M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13. https://doi.org/10.1111/j.1745-3992.2011.00223.x
Wilkie, D. (2023, December 21). Employers say students aren’t learning soft skills in college. https://www.shrm.org/topics-tools/news/employee-relations/employers-say-students-arent-learning-soft-skills-college
Wise, S. L. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1–17. https://doi.org/10.1207/s15326977ea1001_1
Wisniewski, B., Zierer, K., & Hattie, J. (2020). The power of feedback revisited: A meta-analysis of educational feedback research. Frontiers in Psychology, 10, Article 3087. https://doi.org/10.3389/fpsyg.2019.03087
Wolcott, M. D., Lobczowski, N. G., Zeeman, J. M., & McLaughlin, J. E. (2020). Situational judgment test validity: an exploratory model of the participant response process using cognitive and think-aloud interviews. BMC Medical Education, 20, Article 506. https://doi.org/10.1186/s12909-020-02410-z
World Economic Forum. (2021). Building a common language for skills at work: A global taxonomy. https://www3.weforum.org/docs/WEF_Skills_Taxonomy_2021.pdf
World Economic Forum. (2022). Catalysing Education 4.0: Investing in the future of learning for a human-centric recovery. https://www3.weforum.org/docs/WEF_Catalysing_Education_4.0_2022.pdf
World Economic Forum. (2023). Defining Education 4.0: A taxonomy for the future of learning. https://www3.weforum.org/docs/WEF_Defining_Education_4.0_2023.pdf
Xuan, Q., Cheung, A., & Sun, D. (2022). The effectiveness of formative assessment for enhancing reading achievement in K-12 classrooms: A meta-analysis. Frontiers in Psychology, 13, Article 990196. https://doi.org/10.3389/fpsyg.2022.990196
Yang, Z., Hu, Z., Dyer, C., Xing, E. P., & Berg-Kirkpatrick, T. (2018). Unsupervised text style transfer using language models as discriminators. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems 31 (NeurIPS 2018) (pp. 7287–7289). Curran Associates. https://papers.neurips.cc/paper_files/paper/2018/hash/398475c83b47075e8897a083e97eb9f0-Abstract.html
Yeung, C. (2019). Deep-IRT: Make deep learning based knowledge tracing explainable using item response theory. arXiv. https://doi.org/10.48550/arXiv.1904.11738.
Yeung, C.-K., & Yeung, D.-Y. (2019). Incorporating features learned by an enhanced deep knowledge tracing model for STEM/non-STEM job prediction. International Journal of Artificial Intelligence and Education, 29, 317–341. https://doi.org/10.1007/s40593-019-00175-1
Youyou, W., Kosinski, M.,&Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), 1036–1040. https://doi.org/10.1073/pnas.1418680112
Zapata-Rivera, D., & Forsyth, C. M. (2022). Learner modeling in conversation-based assessment. In R. A. Sottilare, & J. Schwarz (Eds.), Adaptive instructional systems: International Conference on Human-Computer Interaction. HCII 2022 (pp. 73–83). Springer. https://doi.org/10.1007/978-3-031-05887-5_6
Zapata-Rivera, D., Graesser, A. C., Kay, J., Hu, X.,&Ososky, S. J. (2020). Visualization implications for the validity of intelligent tutoring systems. In A. M. Sinatra, A. C. Graesser, X. Hu, B. Goldberg, & A. J. Hampton (Eds.), Design recommendations for intelligent tutoring systems: Volume 8. Data visualization (pp. 61-68). US Army Combat Capabilities Development Command - Soldier Center.
Zapata-Rivera, D., & Hu, X. (2022). Assessment in intelligent tutoring systems SWOT analysis. In A. M. Sinatra, A. C. Graesser, X. Hu, G. Goodwin,&V. Rus (Eds.), Design recommendations for intelligent tutoring systems: Vol. 10. Strengths, weaknesses, opportunities and threats (SWOT) analysis of intelligent tutoring systems (pp. 83–90). US Army Combat Capabilities Development Command – Soldier Center. https://gifttutoring.org/attachments/download/4751/Vol%2010_DesignRecommendationsforITSs_SWOTAnalysisofITSs.pdf#page=87
Zapata-Rivera, D., Lehman, B., & Sparks, J. R. (2020). Learner modeling in the context of caring assessments. In R. A. Sottilare & J. Schwarz (Eds.), Adaptive instructional systems: Second International Conference (AIS) 2020 (pp. 422–431). Springer.
Zhan, J., Her, Y. W., Hu, T., & Du, C. (2018). Integrating Data Analytics into the Undergraduate Accounting Curriculum. Business Education Innovation Journal, 10(2), 169–178. http://www.beijournal.com/images/V10N2_draft_5.pdf
Zhang, Z. (2012). Microsoft Kinect sensor and its effect. IEEE multimedia, 19(2), 4–10. https://doi.org/10.1109/MMUL.2012.24
Zu, J., & Choi, I. (2023a, April 12–15). Utilizing deep language models to predict item difficulty of language proficiency tests [Paper presentation]. The annual meeting of National Council on Measurement in Education, Chicago, IL, United States.
Zu, J., & Choi, I. (2023b, July 25–28). Predicting the psychometric properties of automatically generated items [Paper presentation]. International Meeting of the Psychometric Society, College Park, MD, United States.
Zu, J., Choi, I., & Hao, J. (2023). Automated distractor generation for fill-in-the-blank items using a prompt-based learning approach. Psychological Testing and Assessment Modeling, 65(2), 55–75.
Zu, J.,&Kyllonen, P. C. (2020). Nominal response model is useful for scoring multiple-choice situational judgment tests. Organizational Research Methods, 23(2), 342–366. https://doi.org/10.1177/1094428118812669