Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. The growth of online learning since the 1990s, particularly in higher education, has contributed to the advancement of Learning Analytics as student data can be captured and made available for analysis. When learners use an LMS, social media, or similar online tools, their clicks, navigation patterns, time on task, social networks, information flow, and concept development through discussions can be tracked. The rapid development of massive open online courses (MOOCs) offers additional data for researchers to evaluate teaching and learning in online environments.
Although a majority of Learning Analytics literature has started to adopt the aforementioned definition, the definition and aims of Learning Analytics are still contested.
One earlier definition discussed by the community suggested that Learning Analytics is the use of intelligent data, learner-produced data, and analysis models to discover information and social connections for predicting and advising people's learning. But this definition has been criticised by George Siemens and Mike Sharkey.
Dr. Wolfgang Greller and Dr. Hendrik Drachsler defined learning analytics holistically as a framework. They proposed that it is a generic design framework that can act as a useful guide for setting up analytics services in support of educational practice and learner guidance, in quality assurance, curriculum development, and in improving teacher effectiveness and efficiency. It uses a general morphological analysis (GMA) to divide the domain into six "critical dimensions".
The broader term "Analytics" has been defined as the science of examining data to draw conclusions and, when used in decision-making, to present paths or courses of action. From this perspective, Learning Analytics has been defined as a particular case of Analytics, in which decision-making aims to improve learning and education. During the 2010s, this definition of analytics has gone further to incorporate elements of operations research such as decision trees and strategy maps to establish predictive models and to determine probabilities for certain courses of action.
Another approach for defining Learning Analytics is based on the concept of Analytics interpreted as the process of developing actionable insights through problem definition and the application of statistical models and analysis against existing and/or simulated future data. From this point of view, Learning Analytics emerges as a type of Analytics (as a process), in which the data, the problem definition and the insights are learning-related.
In 2016, a research jointly conducted by the New Media Consortium (NMC) and the EDUCAUSE Learning Initiative (ELI) -an EDUCAUSE Program- describes six areas of emerging technology that will have had significant impact on higher education and creative expression by the end of 2020. As a result of this research, Learning analytics was defined as an educational application of web analytics aimed at learner profiling, a process of gathering and analyzing details of individual student interactions in online learning activities.
In 2017, Gašević, Коvanović, and Joksimović proposed a consolidated model of learning analytics. The model posits that learning analytics is defined at the intersection of three disciplines: data science, theory, and design. Data science offers computational methods and techniques for data collection, pre-processing, analysis, and presentation. Theory is typically drawn from the literature in the learning sciences, education, psychology, sociology, and philosophy. The design dimension of the model includes: learning design, interaction design, and study design. In 2015, Gašević, Dawson, and Siemens argued that computational aspects of learning analytics need to be linked with the existing educational research in order for Learning Analytics to deliver its promise to understand and optimize learning.
Differentiating the fields of educational data mining (EDM) and learning analytics (LA) has been a concern of several researchers. George Siemens takes the position that educational data mining encompasses both learning analytics and academic analytics, the former of which is aimed at governments, funding agencies, and administrators instead of learners and faculty. Baepler and Murdoch define academic analytics as an area that "...combines select institutional data, statistical analysis, and predictive modeling to create intelligence upon which learners, instructors, or administrators can change academic behavior". They go on to attempt to disambiguate educational data mining from academic analytics based on whether the process is hypothesis driven or not, though Brooks questions whether this distinction exists in the literature. Brooks instead proposes that a better distinction between the EDM and LA communities is in the roots of where each community originated, with authorship at the EDM community being dominated by researchers coming from intelligent tutoring paradigms, and learning anaytics researchers being more focused on enterprise learning systems (e.g. learning content management systems).
Regardless of the differences between the LA and EDM communities, the two areas have significant overlap both in the objectives of investigators as well as in the methods and techniques that are used in the investigation. In the MS program offering in learning analytics at Teachers College, Columbia University, students are taught both EDM and LA methods.
Learning Analytics, as a field, has multiple disciplinary roots. While the fields of artificial intelligence (AI), statistical analysis, machine learning, and business intelligence offer an additional narrative, the main historical roots of analytics are the ones directly related to human interaction and the education system. More in particular, the history of Learning Analytics is tightly linked to the development of four Social Sciences' fields that have converged throughout time. These fields pursued, and still do, four goals:
A diversity of disciplines and research activities have influenced in these 4 aspects throughout the last decades, contributing to the gradual development of learning analytics. Some of most determinant disciplines are Social Network Analysis, User Modelling, Cognitive modelling, Data Mining and E-Learning. The history of Learning Analytics can be understood by the rise and development of these fields.
Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them. Social network analysis is prominent in Sociology, and its development has had a key role in the emergence of Learning Analytics. One of the first examples or attempts to provide a deeper understanding of interactions is by Austrian-American Sociologist Paul Lazarsfeld. In 1944, Lazarsfeld made the statement of "who talks to whom about what and to what effect". That statement forms what today is still the area of interest or the target within social network analysis, which tries to understand how people are connected and what insights can be derived as a result of their interactions, a core idea of Learning Analytics.
Citation analysis
American linguist Eugene Garfield was an early pioneer in analytics in science. In 1955, Garfield led the first attempt to analyse the structure of science regarding how developments in science can be better understood by tracking the associations (citations) between articles (how they reference one another, the importance of the resources that they include, citation frequency, etc). Through tracking citations, scientists can observe how research is disseminated and validated. This was the basic idea of what eventually became a "page rank", which in the early days of Google (beginning of the 21st century) was one of the key ways of understanding the structure of a field by looking at page connections and the importance of those connections. The algorithm PageRank -the first search algorithm used by Google- was based on this principle. American computer scientist Larry Page, Google's co-founder, defined PageRank as "an approximation of the importance" of a particular resource. Educationally, citation or link analysis is important for mapping knowledge domains.
The essential idea behind these attempts is the realization that, as data increases, individuals, researchers or business analysts need to understand how to track the underlying patterns behind the data and how to gain insight from them. And this is also a core idea in Learning Analytics.
Digitalization of Social network analysis
During the early 1970s, pushed by the rapid evolution in technology, Social network analysis transitioned into analysis of networks in digital settings.
During the first decade of the century, Professor Caroline Haythornthwaite explored the impact of media type on the development of social ties, observing that human interactions can be analyzed to gain novel insight not from strong interactions (i.e. people that are strongly related to the subject) but, rather, from weak ties. This provides Learning Analytics with a central idea: apparently un-related data may hide crucial information. As an example of this phenomenon, an individual looking for a job will have a better chance of finding new information through weak connections rather than strong ones. (Siemens, George (2013-03-17). Intro to Learning Analytics. LAK13 open online course for University of Texas at Austin & Edx. 11 minutes in. Retrieved 2018-11-01.)
Her research also focused on the way that different types of media can impact the formation of networks. Her work highly contributed to the development of social network analysis as a field. Important ideas were inherited by Learning Analytics, such that a range of metrics and approaches can define the importance of a particular node, the value of information exchange, the way that clusters are connected to one another, structural gaps that might exist within those networks, etc.
The application of social network analysis in digital learning settings has been pioneered by Professor Shane P. Dawson. He has developed a number of software tools, such as Social Networks Adapting Pedagogical Practice (SNAPP) for evaluating the networks that form in [learning management systems] when students engage in forum discussions.
The main goal of user modelling is the customization and adaptation of systems to the user's specific needs, especially in their interaction with computing systems. The importance of computers being able to respond individually to into people was starting to be understood in the decade of 1970s. Dr Elaine Rich in 1979 predicted that "computers are going to treat their users as individuals with distinct personalities, goals, and so forth". This is a central idea not only educationally but also in general web use activity, in which personalization is an important goal.
User modelling has become important in research in human-computer interactions as it helps researchers to design better systems by understanding how users interact with software. Recognizing unique traits, goals, and motivations of individuals remains an important activity in learning analytics.
Personalization and adaptation of learning content is an important present and future direction of learning sciences, and its history within education has contributed to the development of learning analytics.Hypermedia is a nonlinear medium of information that includes graphics, audio, video, plain text and hyperlinks. The term was first used in a 1965 article written by American Sociologist Ted Nelson. Adaptive hypermedia builds on user modelling by increasing personalization of content and interaction. In particular, adaptive hypermedia systems build a model of the goals, preferences and knowledge of each user, in order to adapt to the needs of that user. From the end of the 20th century onwards, the field grew rapidly, mainly due to that the internet boosted research into adaptivity and, secondly, the accumulation and consolidation of research experience in the field. In turn, Learning Analytics has been influenced by this strong development.
Education/cognitive modelling has been applied to tracing how learners develop knowledge. Since the end of the 1980s and early 1990s, computers have been used in education as learning tools for decades. In 1989, Hugh Burns argued for the adoption and development of intelligent tutor systems that ultimately would pass three levels of "intelligence": domain knowledge, learner knowledge evaluation, and pedagogical intervention. During the 21st century, these three levels have remained relevant for researchers and educators.
In the decade of 1990s, the academic activity around cognitive models focused on attempting to develop systems that possess a computational model capable of solving the problems that are given to students in the ways students are expected to solve the problems. Cognitive modelling has contributed to the rise in popularity of intelligent or cognitive tutors. Once cognitive processes can be modelled, software (tutors) can be developed to support learners in the learning process. The research base on this field became, eventually, significantly relevant for learning analytics during the 21st century.
While big data analytics has been more and more widely applied in education, Wise and Shaffer addressed the importance of theory-based approach in the analysis. Epistemic Frame Theory conceptualized the "ways of thinking, acting, and being in the world" in a collaborative learning environment. Specifically, the framework is based on the context of Community of Practice (CoP), which is a group of learners, with common goals, standards and prior knowledge and skills, to solve a complex problem. Due to the essence of CoP, it is important to study the connections between elements (learners, knowledge, concepts, skills and so on). To identify the connections, the co-occurrences of elements in learners' data are identified and analyzed.
Shaffer and Ruis pointed out the concept of closing the interpretive loop, by emphasizing the transparency and validation of model, interpretation and the original data. The loop can be closed by a good theoretical sound analytics approaches, Epistemic Network Analysis.
In a discussion of the history of analytics, Adam Cooper highlights a number of communities from which learning analytics has drawn techniques, mainly during the first decades of the 21st century, including:
The first graduate program focused specifically on learning analytics was created by Ryan S. Baker and launched in the Fall 2015 semester at Teachers College, Columbia University. The program description states that
"(...)data about learning and learners are being generated today on an unprecedented scale. The fields of learning analytics (LA) and educational data mining (EDM) have emerged with the aim of transforming this data into new insights that can benefit students, teachers, and administrators. As one of world's leading teaching and research institutions in education, psychology, and health, we are proud to offer an innovative graduate curriculum dedicated to improving education through technology and data analysis."
Masters programs are now offered at several other universities as well, including the University of Texas at Arlington, the University of Wisconsin, and the University of Pennsylvania.
Methods for learning analytics include:
Learning Applications can be and has been applied in a noticeable number of contexts.
Analytics have been used for:
There is a broad awareness of analytics across educational institutions for various stakeholders, but that the way learning analytics is defined and implemented may vary, including:
Some motivations and implementations of analytics may come into conflict with others, for example highlighting potential conflict between analytics for individual learners and organisational stakeholders.
Much of the software that is currently used for learning analytics duplicates functionality of web analytics software, but applies it to learner interactions with content. Social network analysis tools are commonly used to map social connections and discussions. Some examples of learning analytics software tools include:
The ethics of data collection, analytics, reporting and accountability has been raised as a potential concern for learning analytics, with concerns raised regarding:
As Kay, Kom and Oppenheim point out, the range of data is wide, potentially derived from:
Thus the legal and ethical situation is challenging and different from country to country, raising implications for:
In some prominent cases like the inBloom disaster, even full functional systems have been shut down due to lack of trust in the data collection by governments, stakeholders and civil rights groups. Since then, the learning analytics community has extensively studied legal conditions in a series of experts workshops on "Ethics & Privacy 4 Learning Analytics" that constitute the use of trusted learning analytics. Drachsler & Greller released an 8-point checklist named DELICATE that is based on the intensive studies in this area to demystify the ethics and privacy discussions around learning analytics.
It shows ways to design and provide privacy conform learning analytics that can benefit all stakeholders. The full DELICATE checklist is publicly available.
Privacy management practices of students have shown discrepancies between one's privacy beliefs and one's privacy related actions. Learning analytic systems can have default settings that allow data collection of students if they do not choose to opt-out. Some online education systems such as edX or Coursera do not offer a choice to opt-out of data collection. In order for certain learning analytics to function properly, these systems utilize cookies to collect data.
In 2012, a systematic overview on learning analytics and its key concepts was provided by Professor Mohamed Chatti and colleagues through a reference model based on four dimensions, namely:
Chatti, Muslim and Schroeder note that the aim of open learning analytics (OLA) is to improve learning effectiveness in lifelong learning environments. The authors refer to OLA as an ongoing analytics process that encompasses diversity at all four dimensions of the learning analytics reference model.
For general audience introductions, see:
Owlapps.net - since 2012 - Les chouettes applications du hibou