Category Archives: Methods

Catching up with the Internet Era: Online data collection for researchers

In the world we humans spend a great deal of time connected to the internet, this is especially true for younger people – who are growing up surrounded by this technology. You can see this huge change over time in graph from Our World in Data below!

Alex post pic1

Increasingly, researchers and companies are leveraging this remote access to behaviour to answer questions about how humans behave. Companies have been collecting ‘user data’ for years from online platforms, and using this inferred information about people to improve user experience, and in some cases sell more products to the correct people. The amount of data we are able to collect on behaviour is expanding exponentially, and at the same time so is the quality and modality of this data – as people connect different devices (like activity monitors, clocks, fridges). Wearable sensors are becoming particularly more frequent – often this data is stored using internet-based services.


Infographic: The Predicted Wearables Boom Is All About The Wrist | Statista
Taken from Statista

Psychology and cognitive science is starting to catch up on this trend, as it offers the ability to carry out controlled experiments on a much larger scale. This offers the opportunity to characterise subtle differences, that would be lost in the noise of small samples tested in a lab environment.

However, for many the task of running an online experiment is daunting; there are so many choices, and dealing with building, hosting and data processing can be tricky!

Web Browsers

A good starting point, and often the most straightforward, is building experiments to work in a web browser. The primary advantage of this is that you can run experiments on the vast majority of computers, and even mobile devices, with no installation overhead. There are some limitations though:


Internet Explorer - sigh

With multiple different web browsers, operating systems, and devices, the possible combinations number in the 1000s. This can lead to unexpected bugs and errors in your experiment. A workaround is to restrict access to a few devices (see below for tips on how to do this in JavaScript) – but this is traded off with how many participants you would like to access.


Web browsers were not designed to run reaction time experiments in, or present stimulus with millisecond precision. Despite this, some research has shown equivalent precision for Reaction Time, and Stimulus Presentation.

If you are very concerned still, you may utilise WebGL, a web graphics engine, which allows you to gain analogous presentation times to native programs, and even use a computer’s graphics card. Although this will be limited by the operating system and hardware of the user!

There are a number of tools that can help you with browser experiments. From fee-paying services like Gorilla, which deals with task building, hosting and data management for you, to fully open source projects like jsPsych, and PsychoPy’s PsychoJS – which deal with building experiments and data, but not hosting (although there are plans to develop a hosting and data storage solution). All of these offer a graphical user-interface, which allows experiments to be built without any prior knowledge of programming!



One intermediate tool – which we are currently using – is cross platform development environment called Unity. Whilst originally intended for creating video-games, Unity can be repurposed for creating experimental apps. The large advantage is an easy capability to build to a vast variety of operating systems and platforms with minimal effort: a Unity project can be built for a web browser, iOS app, Android app, Windows, OSX, Linux…. and so on. You can also gain access to sensor information on devices (hear rate monitors, step counting, microphone, camera), to start to access the richness of information contained in these devices.

The utility of this tool for experimental research is huge, and apparently appears to be under-utilised – it has an easy to learn interface, and requires minimal programming knowledge.


Whilst this post is largely non-instructional, hopefully it has shed some light on the potential tools you can use to start running research online (without employing an expensive web or app developer), or hopefully just piqued your interest a tiny bit.

If you would like to dive in to the murky (but exciting) world of web development, you can also check out a few tips for improving the quality of your online data here.



This exciting post was written by Alexander Irvine, one of the newest members of our lab. Alex previously worked on developing web-based study at Oxford before joining the lab and is experienced in an array of programming languages and tools. Check out his personal website if you want to read more in-depth about online data collection.


The weather and the brain – using new methods to understand developmental disorders


The latest article was written by our brilliant lab member Danyal Akarca. It describes some of his MPhil research which aims to explore transient brain networks in individuals with a particular type of genetic mutation. Dan finished his degree in Pre-Clinical Medicine before joining our lab and has since been fascinated by the intersection of genetic disorders and the dynamics of brain networks.

The brain is a complex dynamic system. It can be very difficult to understand how specific differences within that system can be associated with the cognitive and behavioural difficulties that some children experience. This is because even if we group children together on the basis that they all have a particular developmental disorder, that group of children will likely have a heterogeneous aetiology. That is, even though they all fall into the same category, there may be a wide variety of different underlying brain causes. This makes these disorders notoriously difficult to study.

Developmental disorders that have a known genetic cause can be very useful for understanding these brain-cognition relationships, because by definition they all have the same causal mechanism (i.e. the same gene is responsible for the difficulties that each child experiences). We have been studying a language disorder caused by a mutation to a gene called ZDHHC9. These children have broader cognitive difficulties, and more specific difficulties with speech production, alongside a form of childhood epilepsy called rolandic epilepsy.

In our lab, we have explored how brain structure is organised differently in individuals with this mutation, relative to typically developing controls. Since then our attention has turned to applying new analysis methods to explore differences in dynamic brain function. We have done this by directly recording magnetic fields generated by the activity of neurons, through a device known as a magnetoencephalography (MEG) scanner. The scanner uses magnetic fields generated by the brain to infer electrical activity.

The typical way that MEG data is interpreted, is by comparing how electrical activity within the brain changes in response to a stimulus. These changes can take many forms, including how well synchronised different brain areas are, or the how size of the magnetic response differs across individuals. However, in our current work, we are trying to explore how the brain configures itself within different networks, in a dynamic fashion. This is especially interesting to us, because we think that the ZDHHC9 gene has an impact on the excitability of neurons in particular parts of the brain, specifically in those areas that are associated with language. These changes in network dynamics might be linked to the kinds of cognitive difficulties that these individuals have.

We used an analysis method called “Group Level Exploratory Analysis of Networks” – or GLEAN for short – and has recently been developed at the Oxford centre for Human Brain Activity. The concept behind GLEAN is that the brain changes between different patterns of activation in a fashion that is probabilistic. This is much like the concept of the weather – just as the weather can change from day to day in some probabilistic way, so too may the brain change in its activation.


This analysis method not only allows us to observe what regions of the brain are active when the participants are in the MEG scanner. It also allows us to see the probabilistic way in which they can change between each other. For example, just as it is more likely to transition from rain one day to cloudiness the next day, relative to say rain to blistering sun, we find that brain activation patterns can be described in a very similar way over sub-second timescales. We can characterise those dynamic transitions in lots of different ways, such as how long you stay in a specific brain state or how long does it take to return to a state once you’ve transitioned away. (A more theoretical account of this can be found in another recent blog post in our Methods section – “The resting brain… that never rests”.) We have found that a number networks differ between individuals with the mutation and our control subjects.


(These are two brain networks that show the most differences in activation – largely in the parietal and frontotemporal regions of the brain.)

Interestingly, these networks strongly overlap with areas of the brain that are known to express the gene (we found this out by using data from the Allen Atlas). This is the first time that we know of that researchers have been able to link a particular gene, to differences dynamic electrical brain networks, to a particular pattern of cognitive difficulties. And we are really excited!


The resting brain… that never rests

“Spend 5-10 minutes lying down, make yourself comfortable, and keep your eyes open. Be still. Don’t think of anything specific”

These are the typical instructions gives to participants in a ‘resting state’. This is the study of brain activity with neuroimaging while the subject is literally told to do nothing.  This approach is very popular in our field… but why is it worth putting such effort into understanding a brain that isn’t doing anything? But in reality, the brain is never doing nothing. And studying the ongoing spontaneous activity that it produces can provide key insights to how the brain is organised.

Traditionally, a technique called functional Magnetic Resonance Imaging (fMRI) is used to study the resting brain. This uses changes in metabolism to chart brain activity. It turns out that the patterns of activity across the brain are not random, but are highly consistent across many studies. Some brain areas – in some cases anatomically distant from one another – have very similar patterns of activity to each other. These are referred to as resting state networks (RSNs).

A problem with this imaging method is that it is slow. It measures changes in metabolism in the order of seconds, even though electrical brain activity really occurs on a millisecond scale.  A landmark paper by Baker et al. (2014) instead used an electrophysiological technique, MEG, which can capture this incredibly rapid brain activity. They combined this technique with a statistical model, called a Hidden Markov Model (HMM).

They showed that contrary to previous thinking, these networks are not stable and consistent over time. Even when they brain is at rest they change in a rapid and dynamic way – the resting brain is never actually resting.

For more details of how they did – read on:


What is a Hidden Markov Model (HMM)?

A model, such HMM, is a representation of reality, built around several predictions based on elementary observations and a set of rules which aim to find the best solution of the problem. Let’s think about Santa Claus: he has to carry a present to all the nice kids. The problem is that he doesn’t know how to fill the sack with the toys, which has limited capacity. The input in this case will be the toys which have a certain weight and volume. Santa could try lots of solutions, until he finds this optimal configuration of toys. In essence, he is using a ‘stochastic model’ that tries multiple solutions. Santa knows the inputs, and can see how varying this results in a more optimal solution.

An HMM is also a stochastic model.  But this time the input is hidden, that is, we cannot observe it. These are the brain states that produce these network patterns. Instead, the output – the brain recordings – is visible. To better understand how the model works, imagine a prisoner is locked in a windowless cell for a long time. They want to know about the weather outside.  The only source of information is if the guard in front of his cell is carrying an umbrella (🌂) or not (x🌂x). In this case the states are the weather: sunny (☀), cloudy (☁), or rainy (☔), and they are hidden. The observation is the presence or absence of an umbrella. Imagine now that after a few days the prisoner has recorded a sequence of observations so he can turn to science and use HMMs to predict what the weather is like. The prisoner needs to know just 3 things to set up the model on the basis of their observation:

  • Prior probabilities: the initial probabilities of any particular type of weather e.g. if the prisoner lives in a country where it is just as likely to be sunny, cloudy or rainy, then the prior probabilities are equiprobable.
  • Emission probabilities: the probability of the guard bringing an umbrella or not given a weather condition.
  • Transition probabilities: the probabilities that a state is influenced by the past states, e.g. if it’s raining today, what is the probability of it being sunny, cloudy or rainy tomorrow.

What is the probability that the next day will be ☔ given that the guard is carrying an umbrella 🌂? After many days of observations, let’s say 50, what is the probability that day 51 will be ☀? In this case the calculation is really hard. The prisoner needs to integrate the various sources of information in order to establish the most likely weather condition on the following day – there is actually an algorithm for doing this, it is called a ‘Viterbi algorithm’.Picture1

How HMM is used in resting state paradigm

Using HMMs, Baker et al. (2014) identified 8 different brain states that match the networks typically found using fMRI. More importantly, they revealed that the transitions between the different RSNs are much faster than previously suggested. Because they used MEG and not fMRI, it was possible to calculate when a state is active or not, that is, the temporal characteristics of the states.

The authors additionally mapped where the state was active. They used the temporal information of the states to identify only the neural activity that is unique in each state. Therefore, they combined this information with the neuronal activity localization to build the networks maps.  This procedure identifies the brain areas associated with each state.

This study provides evidence that within-network functional connectivity is sustained by temporal dynamics that fluctuate from 200-400 ms. These dynamics are generated by brain states that match the classic RSNs, but which are constituted in a much more dynamic way than previously thought. The fact that each state remains active for only 100-200 ms, suggests that these brain states are underpinned by fast and transient mechanisms. This is important, because it has previously been unclear how these so called ‘resting’ networks are related to rapid psychological processes. This new approach provides an important step in bridging this gap. At last we have a method capable of exploring these networks on a time-scale that allows us to explore how they meaningfully support cognition.



Baker, A. P., Brookes, M. J., Rezek, I. A., Smith, S. M., Behrens, T., Probert Smith, P. J., & Woolrich, M. (2014). Fast transient networks in spontaneous human brain activity. eLife, 3, e01867–18.

Getting figures publication ready

Part I: R with ggplot2
The ggplot2 packages for R has some fantastic features for very powerful, flexible, and aesthetic data visualisation (If you are not familiar with the packages, you can have a look at some of the capabilities here: []). It is also relatively easy to export figures in a way that matches any journals figure specifications.

I will use the sample dataset ‘iris’ that is included with R for the following demonstration. This dataset contains data on petal length of three iris species among other measures.

# Loading the data

# Producing a basic plot
ggplot(data=iris,aes(x=factor(Species), y=Petal.Length)) +
geom_boxplot() +
xlab('Species') +
ylab('petal length [cm]')


For publication, we would probably like to make the main features of the plot a bit bolder, control the font size of the axis labels, and use a white background:

figure <- ggplot(data=iris,aes(x=factor(Species), y=Petal.Length)) +
geom_boxplot(width=0.5,lwd=1,fill='grey') +
xlab('Species') +
ylab('petal length [cm]') +
theme_bw() +


We might also want to include annotations that indicate results of statistical analyses. Here is a one-way ANOVA to compare petal length between species followed by post-hoc t-tests to determine differences between species pairs:

# one-way ANOVA:
summary(aov(data=iris,Petal.Length ~ Species))

# t-test single contrasts:

This analysis indicates that there are significant differences between all species in petal length. Next, we will add information about the group differences to the boxplot for the convenience of the reader:

figure +
 geom_segment(aes(x=1, y=7, xend=2, yend=7), size=0.1) +
 geom_segment(aes(x=2, y=7.2, xend=3, yend=7.2), size=0.1) +
 geom_segment(aes(x=1, y=7.4, xend=3, yend=7.4), size=0.1) +
 annotate("text",x=1.5, y=7,label="*",size=8) +
 annotate("text",x=2.5, y=7.2,label="*",size=8) +
 annotate("text",x=2, y=7.5, label="*",size=8)


The final step is to export the figure with properties that match the specifications of the publisher. As an example, I will export a figure with 4cm height and 3cm width at 300 dpi resolution in PNG format:

ggsave(figure,file=‘/Users/joebathelt/Example.png’,   width=3,height=4,dpi=300,limitsize=TRUE)


Coder’s Little Time Saver

We all know the problem; it’s getting late in the office, all your colleagues left hours ago, and your eyes are watering from staring at the analysis output of a script that should have finished running ages ago. Yet for some inexplicable reason, it’s still not done. Wouldn’t it be great if you could just nip out to get some fresh air and be informed when the script is finally done? Well actually, you can! Here’s a handy little tip explaining how to embed email alerts in MATLAB and Python scripts:

MATLAB comes with a handy function that supports sending emails within scripts. But before we can actually get to the email sending, we need to configure some server information. Here is an example for a gmail account:

mail = ''; %Your GMail email address
password = 'secret'; %Your GMail password

props = java.lang.System.getProperties;
props.setProperty('mail.smtp.socketFactory.class', '');

Now, we are ready to send an email:

sendmail(‘’,’Hello there!’);

The second argument in the sendmail function corresponds to the subject line of the email. If you are keen to let MATLAB send a more elaborate email, you can also include a text body:

sendmail('','Hello there!','Have you seen that great post on the Forging Connections Blog?');

It is even possible to send attachments:

sendmail('','Hello there!','Have you seen that great post on the Forging Connections Blog?',{'/Users/Fred/image.jpeg’});

By including these few lines of code in your time-consuming MATLAB script, you can now get notified when it is time to go back to the office for the results.

Python offers a simple solution to send emails from within scripts via the smtplib module. Here is a function that provides the configuration for gmail:
def send_email():
import smtplib

gmail_user = ""
gmail_pwd = "Password"
FROM = ''
TO = ['']
SUBJECT = "Meeting at 3pm"
TEXT = "Are you coming to the meeting?"

# Prepare actual message
message = """\From: %s\nTo: %s\nSubject: %s\n\n%s
""" % (FROM, ", ".join(TO), SUBJECT, TEXT)
server = smtplib.SMTP("", 587)
server.login(gmail_user, gmail_pwd)
server.sendmail(FROM, TO, message)
print 'successfully sent the mail'
print "failed to send mail"

For more information see and