Dissociable prelexical and lexical contributions to visual word recognition and priming: Evidence from MEG and behavior

Visual word recognition is facilitated by word knowledge (i.e., word familiarity) and predictive context, as reflected in faster reading times and reduced neuronal activation for highly familiar or predictable words. Previous studies could not dissociate whether knowledge- and context-based facilitation requires semantic knowledge or can also stem from prelexical sources of information. Here, we experimentally separate prelexical (i.e., orthographic/phonological) and semantic knowledge in two repetition priming experiments, to investigate their role for knowledge- and context-based facilitation. Experiment 1 investigates repetition suppression effects (i.e., reduced activation for predictable stimuli) in magnetoencephalographic brain responses of human participants (N=38) and Experiment 2 uses response times to investigate behavioral priming effects (N=24). To disentangle prelexical and semantic knowledge, we realized a pseudoword familiarization procedure in both experiments and contrasted familiarized pseudowords with novel pseudowords (unfamiliar, no semantic knowledge) and words (semantics available). In Experiment 2, one further set of pseudowords was additionally associated with semantic information (i.e., objects). We found, in both experiments, a general context effect for all letter strings, which was specifically enhanced when semantic information was available. A knowledge effect for pseudowords was found (familiarized vs. novel pseudowords) but prelexical (i.e., orthographic/phonological) knowledge alone did not enhance context effects. We conclude that knowledge- and context-based facilitation in visual word recognition can be achieved without semantic information processing, i.e., exclusively on the basis of prelexical perceptual knowledge. Semantic knowledge, however, drastically enhances context-based facilitation. Significance Statement The goal of reading is the extraction of meaning from script. This highly automatized process relies on facilitation based on word familiarity and text context. Here we use repetition priming to show that context-based facilitation is increased when semantic knowledge is present. This was demonstrated by enhanced context effects for letter strings with semantic associations. Still, earlier context effects (∼80 ms) and orthographic knowledge effects were found irrespective of semantic processing. Our findings highlight the stronger role of semantic knowledge for achieving facilitated visual word recognition in contrast to semantic-free knowledge. Our findings suggest predictive coding as a potential mechanism that underlies efficient visual word recognition.

Experiment 2, one further set of pseudowords was additionally associated with semantic information (i.e., objects). We found, in both experiments, a general context effect for all letter strings, which was specifically enhanced when semantic information was available. A knowledge effect for pseudowords was found (familiarized vs. novel pseudowords) but prelexical (i.e., orthographic/phonological) knowledge alone did not enhance context effects. We conclude that knowledge-and context-based facilitation in visual word recognition can be achieved without semantic information processing, i.e., exclusively on the basis of prelexical perceptual knowledge. Semantic knowledge, however, drastically enhances context-based facilitation.

Significance Statement
The goal of reading is the extraction of meaning from script. This highly automatized process relies on facilitation based on word familiarity and text context. Here we use repetition priming to show that context-based facilitation is increased when semantic knowledge is present. This was demonstrated by enhanced context effects for letter strings with semantic associations. Still,

Introduction
Efficient reading, i.e., fast extraction of semantic information from text, relies on automatized visual word recognition (Rayner, 1998). Central for information extraction from letter strings are orthographic (i.e., decoding of letter combinations), phonological (i.e., translation of orthographic into phonological information), and semantic (accessing meaning; see Coltheart et al., 2001;Dehaene and Cohen, 2011) processing stages. Efficient reading can be achieved after automatization at all these stages has been accomplished through learning (e.g., Yarkoni et al., 2008;Yates et al., 2004;Yap et al., 2011;respectively). In addition, visual word recognition is facilitated by contextual constraints, i.e., by the predictability of a word depending on its text context (e.g., Kliegl et al., 2004). Visual word recognition, thus, benefits from both pre-existing word knowledge on multiple processing levels and from a constraining context.
Facilitated visual word recognition is typically accompanied by reduced neuronal activation. For example, context-dependent reductions in brain activation elicited by words are typically observed at the N400, a component of the event-related brain potential (ERP) peaking at around 400 ms after stimulus onset, that is associated with semantic processing (Kutas and Federmeier, 2011). In addition, facilitatory effects of context on ERPs were found as early as 50 ms post-stimulus onset, during natural reading (Dimigen et al., 2011) or under particularly strong contextual constraints (Dambacher et al., 2009). Similarly, the hypothesized availability of more word knowledge, as indexed by higher word frequency, is reflected in reduced ERP amplitudes to high as compared to low frequent words starting 300 ms, sometimes even 120 ms after word onset (see Barber and Kutas, 2007, for a review). Context and knowledge, thus, can modulate both early pre-lexical and later semantic processing stages of word recognition, and semantic information processing accordingly plays a critical role for increasing the efficiency of word recognition.
However, given that investigations of knowledge and context effects on word recognition have so far been strongly confounded with semantics, it is at present still an open question whether knowledge-and context-based facilitation can be realized without semantic information -i.e., based on orthographic information alone. For example, word frequency, which is often used as a proxy for word knowledge (e.g., Coltheart et al., 2001), is not only associated with orthographic properties (e.g., correlations with orthographic neighborhood around .5; Yap et al., 2012) but also with semantic characteristics of words (e.g., correlations with semantic neighborhood around .75; Goh et al., 2016;Yap et al., 2012). Even more so, contextual effects on word processing typically depend upon semantic information from, e.g., the preceding sentence (e.g., " The mouse eats the … cheese"; cf., Kutas and Hillyard, 1984;Staub, 2015).
To understand the nature of facilitation effects, it is necessary to explicitly dissociate the contributions of prelexical (i.e., orthographic and phonological) and semantic information. A first indication of semantic-free context effects was found in priming investigations (which mimic context-based predictions; see DeLong et al., 2014), demonstrating reliable priming effects for non-words for which prelexical but no semantic information exists. Note, however, that in some studies priming effects are stronger when lexical/semantic information is present, i.e., for words (e.g., Almeida and Poeppel, 2013;Ferrand and Grainger, 1992;Fiebach et al., 2005), while others found no differences between word and non-word priming (Deacon et al., 2004;Laszlo and Federmeier, 2007;Laszlo et al., 2012). Initial evidence for knowledge effects without semantics comes from non-word familiarization tasks, e.g., for left posterior regions using fMRI (Glezer et al., 2015;Xue and Poldrack, 2007) and for late positivities measured with ERPs (Bermúdez-Margaretto et al., 2015). However, to the best of our knowledge, the interaction between non-semantic (i.e., prelexical) and semantic context vs. knowledge effects has so far not been investigated directly.
Here, we examined the distinct contributions of prelexical and semantic processing for both knowledge-and context-based facilitation of word recognition in two repetition priming experiments with words and pseudowords (i.e., pronounceable non-words matched for prelexical properties; Fig. 1), one using magnetoencephalography (MEG) to study the time course of neuronal activation, and one studying behavioral response patterns. Prelexical properties (i.e., orthographic Levenshtein distance 20 estimated based on a psycholinguistic lexicon and number of syllables) were matched between words and pseudowords. However, prelexical knowledge without semantics was temporarily increased for one set of pseudowords through a familiarization procedure prior to the experiments (compare, e.g., Glezer et al., 2015). In combination with the priming paradigm, this allowed us to investigate the interactive effects of context and prelexical vs. semantic knowledge on word recognition.

Experiment 1: Magnetoencephalography
In the first experiment, we investigated prelexical vs. semantic contributions to context and knowledge effects in visual word recognition at a neuronal level, using MEG. First, we expected facilitatory effects of knowledge irrespective of context. Prelexical properties (e.g., orthographic familiarity based on orthographic Levenshtein distance/OLD20; Yarkoni et al., 2008) were matched between words and both pseudoword conditions (i.e., familiarized vs. novel), so that a priori, comparable levels of prelexical knowledge should lead to similar levels of prelexical activation across all three stimulus groups (e.g., Grainger and Jacobs, 1996). However, the familiarization training temporarily increases the perceptual (i.e., orthographic/phonological) familiarity with the trained pseudowords, so that we expected differences in event-related fields (ERFs) elicited by prelexical processing stages between familiar pseudowords compared to those elicited by novel pseudowords and words. Furthermore, we assumed that effects of semantic knowledge should be reflected in ERF differences between words and both novel and familiar pseudowords.
We also expected to observe interactions between context and knowledge. We hypothesized that if semantic knowledge is a prerequisite for context-based facilitation of word recognition, stronger repetition suppression should occur for words in contrast to both familiar and novel pseudowords. If, however, prelexical knowledge is sufficient to enhance context effects, we expected stronger repetition suppression for familiar compared to novel pseudowords.

Participants
38 healthy native speakers of German (26 females, mean age 23.0±2.8 years, range 18-29 years) recruited from university campuses participated in familiarization procedures and MEG recordings. All were right-handed as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971), had normal or corrected-to-normal vision, and normal reading abilities as assessed with an adult version of the Salzburg Reading Screening (unpublished adult version of Mayringer and Wimmer, 2003). Further participants were excluded at different stages of the experimental procedure due to the following reasons: Low reading skills (i.e., reading test score below 16 th percentile; N = 5), insufficient performance during pseudoword familiarization (i.e., accuracy for to-be-familiarized pseudowords < 50 % in the final learning session; N = 2), self-reported developmental speech disorder (N = 1), technical artifacts during the MEG measurement (N = 4), insufficient number of trials after artifact rejection (i.e., < 15 repetition trials in at least one condition; N = 2), contraindication to MEG measurement (N = 1, participant with retainer which might cause artifacts in MEG data), or drop out by choice of participants (N = 4, participants did not finalize the experimental procedure). All participants gave written informed consent according to procedures approved by the local ethics committee and received 10 € per hour or course credit as compensation.

Stimuli and presentation procedure
Words and pseudowords consisted of five letters, with the first letter in uppercase following convention for German nouns. Pseudowords were generated by the Wuggy software (Keuleers and Brysbaert, 2010) conserving phonological structure (i.e., sub-syllabic structure was matched to the input words; all pseudowords were pronounceable). Estimates of word frequency and orthographic Levenshtein distance 20 (OLD20; Yarkoni et al., 2008) were based on the SUBTLEX-DE database (Brysbaert et al., 2011). All word and pseudoword stimuli will be made available at [to be filled in].
Sixty German nouns (logarithmic word frequency: mean = 2.14 ± 0.89, range 0.00 to 4.03) and 120 pronounceable pseudowords were presented twice during MEG acquisition. In addition, 80 catch trials were presented (see below). Pseudowords were divided into two sets of 60 items, such that both pseudoword lists and the set of words were matched on orthographic similarity (OLD20; group means: 1.82 ± one standard deviation: 5 ± 0.013; 1.717 ± 0.026; 1.743 ± 0.027) and number of syllables (1.817 ± 0.050; 1.95 ± 0.028; 1.933 ± 0.032; see Table 1 for stimulus characteristics). Despite the high similarity of the word characteristics between groups, all were included in the statistical models to account for potential confounds from the parameters (see analysis section for details). Participants were familiarized with 60 pseudowords before the actual repetition priming task was conducted in the MEG (see details of the familiarization procedure below). The second group of pseudowords was never seen by the participants before the MEG experiment. In addition, four further lists of 120 pseudowords were generated as fillers for the familiarization procedure (one list per session).
Stimuli were presented using Experiment Builder software (SR-Research Ltd., Ottawa, Ontario, Canada). Words and pseudowords were presented in black bold Courier New font (14 pt.) in front of a white background. In the behavioral sessions, stimuli were presented on an LCD monitor with a refresh rate of 60 Hz, while during the MEG session, stimuli were projected with a refresh rate of 60 Hz onto a translucent screen.

Pseudoword Familiarization
Participants visited the lab on the two days prior to the MEG experiment, and during each visit completed two familiarization sessions of about 20 min length. The two previous days were chosen to take advantage of sleep consolidation effects (James et al., 2017). Based on the highly similar orthographic familiarity (OLD20) for words and both pseudoword groups, reading models (e.g., MROM: Grainger and Jacobs, 1996) would assume similar activation in orthographic units across all three groups prior to training. Therefore, we implemented pseudoword learning to specifically increase prelexical knowledge (i.e., visual, orthographic, and phonological familiarity of the learned pseudowords) for one group of pseudowords (for similar approaches, see, e.g., Glezer et al., 2015). Each familiarization session started with reading aloud the pseudowords from a printed list. Reading errors were documented (mean across all sessions: 0.7 %). Subsequently, participants performed a computer-based old/new recognition task in which the to-be-familiarized pseudowords were presented two times per session, randomly intermingled with a new set of 120 filler pseudowords for every session (total of 480 filler pseudowords across all four sessions). For every pseudoword, participants had to indicate by button press as fast and accurately as possible, if it was familiar to them, or not. Pseudowords were preceded by two black vertical bars displayed above and below the center of the screen where participants were asked to fixate (500 ms; Fig. 1a), and presentation was terminated with the button press. Linear mixed model (LMM) analyses with session (centered and scaled) as fixed effect and participant and item as random effects on the intercept were performed with the lme4 package (Bates et al., 2015) in the statistical software package R, version 3.4.1, 2017-06-30 (R Development Core Team, 2008. All effects with t > 2, reflecting that the effect differs from zero by more than two standard errors, were considered significant; p-values cannot be computed in a reasonable way in the LMM approach (e.g., see Kliegl et al., 2011). Old/new response sensitivity indices d' (Green and Swets, 1966) (Fig. 2b). Based on the strong improvement in sensitivity and the high performance in the final session, we conclude that prelexical familiarization of the trained pseudowords was successful.

Figure 1. Experimental procedures. (a)
For the pseudoword familiarization procedure of Experiment 1, in each learning session 60 pseudowords were presented until response, intermingled by novel filler pseudowords, in an old/new recognition task. 500 ms before stimulus onset, two vertical bars indicated the center of the screen where participants were asked to fixate. The inter-trial interval was 2,000 ms. (b) During MEG recording, participants performed a repetition priming task. Each trial consisted of a sequence of two letter strings (prime and target) presented for 800 ms each, separated by an interval of 800 ms during which a string of five hash marks was presented. Letter strings could be words, familiarized pseudowords, or novel pseudowords (120 trials each regarding the prime). 75 % of trials were repetition trials, i.e. prime and target were identical (left). The remaining 25 % were nonrepetition trials in which two different letter strings were presented (middle). In this case, prime and target could be from the same condition or from two different conditions, with all combinations of conditions appearing equally often. Participants were instructed to silently read presented letter strings and respond only to rare catch trials (right). Before onset of the prime, two black vertical bars presented for 800 -1,000 ms indicated the center of the screen where participants were asked to fixate. After presentation of the target, two grey vertical bars were presented for 1,000 ms, indicating a blinking period of 1,500 ms starting from onset of the bars. Before the onset of the next trial, a blank screen was presented for the remainder of the blinking period. (c) In Experiment 2, a paired-association task was used for familiarization of pseudowords with and without semantics. Pseudowords were presented for 800 ms, followed by the presentation of an object image until button press (maximally 1,500 ms). During the inter-trial interval of 1,000 ms, two vertical bars indicated the center of the screen where participants were asked to fixate. In the semantic condition, there was a reliable association between object and pseudoword. In the familiarization only condition, in contrast, pseudowords and objects were randomly paired so that each pair occurred only once. (d) In the subsequent naming task, each pseudoword from the familiarization conditions with and without semantic associations was presented once. Participants named the object they associated with each pseudoword, or responded "next" in case they did not associate a meaning with a pseudoword. Before each pseudoword presentation two vertical bars framing the center of the screen were presented until button press by the experimenter. (e) The repetition priming task involved in each trial a sequence of two letter strings presented for 800 ms each, separated by an interval of 800 ms during which five hash marks were displayed. The hash mark string was also presented for 800 ms before the onset of the first letter string. Letter strings could be words, familiarized pseudowords with and familiarized pseudowords without semantics, or novel pseudowords (180 trials each regarding the prime). Repetition probability was varied across blocks between 25, 50, and 75 %. Participants were instructed to silently read presented letter strings and respond to the target whether they had an explicit semantic association with it, or not. During the inter-trial interval of 800 -1,200 ms, two vertical bars indicated the center of the screen where participants were asked to fixate.

Repetition Priming
The repetition priming task, realized during MEG recording, included words, familiarized pseudowords, and novel pseudowords. At the start of each trial, participants had to fixate between two vertical black bars presented above and below the center of the screen (analogous to the familiarization procedure; cf. Fig. 1b). Stimulus presentation was initiated after an eyefixation to the cued region was detected by an MEG compatible eye-tracker (Eyelink CL 1000, SR Research Ltd., Ottawa, ON, Canada), comprising the successive presentation of two letter strings (prime and target) for 800 ms each, separated by an interval of 800 ms during which a string of five hash marks was shown. Both letter strings had to be read silently; the task served only to maintain attention and required a button press whenever a catch trial (i.e., the word Taste; Engl.: button) was detected in either the first, second, or both positions. The silent reading task was chosen to avoid contaminating the neuronal response to words with motor responses; catch trials were excluded from analysis. In addition, silent reading is most common for adults. The explicit fixation control before stimulus presentation assured that eyes were open and directed towards the position where the stimulus was presented. Response hands were counterbalanced across participants and responses were recorded using a fiber optic response pad (LUMItouch; Photon Control Inc., Burnaby, BC, Canada). 100 ms after target offset, grey vertical bars were presented for 100 ms, indicating that participants were allowed to blink for a period of 1,000 ms.
Stimuli were presented at a viewing distance of 51 cm yielding horizontal visual angles of about 0.3° per letter.
Each pseudoword was presented in two trials, once during each half of the experiment. 440 pseudo-randomized trials were presented in total, 80 of which were catch trials. Of the remaining 360 trials, 75 % (i.e., 90 trials per condition) were repetition trials, in which the same word or pseudoword was presented twice. The high number of repetition trials was included to realize a highly predictable context and enable the investigation of knowledge effects in predictable situations. In the remaining 25 % of trials (i.e., 30 trials per condition), two different letter strings were presented (i.e., non-repetition trials). Out of the total of 90 non-repetition trials, each possible combination of words, familiarized as well as novel pseudowords appeared equally often, i.e., 10 times. The repetition priming task lasted about 40 min, divided into three blocks separated by brief pauses.

MEG data acquisition
MEG data were acquired in accordance with guidelines for MEG recordings (Gross et al., 2013), Online filtering was performed with fourth-order Butterworth filters with 300 Hz low pass and 0.1 Hz high pass. Head positions of the participants relative to the gradiometer array were recorded continuously by three localization coils, placed at the nasion and above both ear canal entrances using ear-plugs. Additionally, two electrodes placed centrally on each clavicula recorded an electrocardiogram (ECG), while two pairs of electrodes placed distal to the outer canthi of both eyes, and above and below the right eye, respectively, recorded an electrooculogram (EOG). The impedance of each electrode was below 5 kΩ for EOG electrodes and below 20 kΩ for ECG electrodes, measured with an electrode impedance meter (Astro-Med GmbH, Rodgau, Germany).
MEG data were segmented into epochs of 2,600 ms length, lasting from -160 ms to 2,440 ms with respect to the onset of the prime.
Individually for each participant, trials were selected for analysis in which the head position fell within a range of 5 mm (across all blocks) relative to the majority of other trials.
Trials contaminated with sensor jump and muscle artifacts were rejected automatically, using the FieldTrip routine for automatic artifact detection. For jump artifact detection, a 9 th order median filter was applied to the data, while for muscle artifact detection, an 8 th order Butterworth IIR filter between 110 and 140 Hz was applied. The filtered data were z-transformed and averaged across sensors. Trials were rejected if for any time point the z value exceeded a threshold of z = 20 for jump artifacts and z = 6 for muscle artifacts, following standards established for the local measurement characteristics. Trials contaminated with eye blink, eye movement, or heart beat artifacts were cleaned using Independent Component Analysis (ICA; Makeig et al.,1996).
Components whose time courses correlated with EOG and ECG electrodes were rejected, using as threshold a correlation coefficient of r > 0.1, which sufficiently removed artifacts based on visual inspection. After these procedures, an average of 51.1 repetition trials (range 20 to 79) per condition and participant could be retained. Non-repetition trials were averaged across conditions for analysis, with on average 52.6 trials per participant available (range: 29 to 80).
Prior to computation of ERFs, a 20 Hz low pass filter was applied to data epochs.
Original epochs were split into separate epochs for prime and target stimulus, ranging from -110 ms to 800 ms with respect to each stimulus onset. Epochs were baseline corrected by subtracting the average activation between -110 and -10 ms from each time point. For each sensor, we identified the participants for which the recorded magnetic field averaged across repetition trials and time lay outside the range of the mean across all participants ± 3.29 standard deviations. The signal of these noisy sensors (33 sensors in total; one to nine sensors in ten participants), per participant, was approximated by trial-wise interpolation from activation in neighboring sensors.
ERFs were then calculated for each subject and condition (repeated words, non-repeated words, repeated familiar pseudowords, non-repeated familiar pseudowords, repeated novel pseudowords, non-repeated novel pseudowords), separately for prime and target, by averaging the epochs across all trials. ERFs were compared between conditions using cluster-based permutation tests (Maris and Oostenveld, 2007) for dependent samples, corrected for multiple comparisons across time points (-110 to 800 ms) and sensors at cluster level. Clusters were defined as spatially and temporally adjacent samples with F-values exceeding an uncorrected αlevel of 0.001. The cluster-level statistic was calculated using the standard approach, i.e., taking the sum of F-values within a cluster (Maris and Oostenveld, 2007). Cluster-level statistics were compared to the distribution of cluster-level statistics obtained from Monte Carlo simulations with 5,000 permutations, in which condition labels were randomly exchanged within each subject. Resulting cluster p-values were multiplied by 3 to account for the computed contrasts (one interaction and two main effects). Original cluster-level statistics larger than the 95 th percentile of the distribution of cluster-level statistics obtained in the permutation procedure were considered to be significant. To compute interaction statistics, we used the permANOVA functions by Helbling (2015; https://github.com/sashel/permANOVA/).
First, as a general check of our experimental manipulation, we assessed the repetition effect by computing a 2x2 interaction between the experimental factors repetition congruency (repetition vs. non-repetition trials, reflecting whether context-based processing was indeed possible; referred to as CR) and stimulus sequence (prime vs. target, reflecting the absence vs. presence of a preceding context; referred to as CS). As the low number of non-repetition trials did not allow separate analyses of these effects for the different conditions, data were pooled across knowledge conditions (words, familiar, and novel pseudowords). Within each knowledge condition, the number of repetition trials was randomly stratified to match the number of nonrepetition trials.
In the second analysis, we examined how knowledge (words/ semantic knowledge vs. familiar pseudowords/ prelexical knowledge vs. novel pseudowords) and stimulus sequence (prime vs. target) interact, restricting ourselves to repetition trials. In this analysis, all stimuli were repeated and we examined the effects of different knowledge types on the neuronal repetition effect.
To determine the nature of significant interaction effects, we performed post hoc linear mixed model analyses for pairwise differences between relevant conditions. All post hoc tests were based on participant-and condition-specific ERF values averaged across sensors and time points from the respective significant cluster, and included participant and item as random effects on the intercept. Since not all trials entered the analyses due to exclusion of artefactual trials, which might have affected the matching across letter string conditions, OLD20 and number of syllables, both scaled and centered, were entered as additional fixed effects.
To rule out the possibility that our baseline correction approach, i.e., using separate baselines for ERFs elicited by prime and target stimulus in a trial, has created artificial effects due to the presentation of hash marks only before the target, we performed the analyses of repetition congruency by stimulus sequence and knowledge by repetition a second time, using the period before the prime as a common baseline for correction of ERFs to both stimuli. Of in total 25 significant clusters from the analyses after separate baselining, 13 were also found significant in the analysis after common baselining. Therefore, in the results and discussion sections, we will focus on those clusters replicated with the common baseline approach. A comparison of significant clusters from both analyses can be found in Fig. 3-1 and 4-2.
As a further sanity check of the separate baselines approach, we additionally report a peak-to-peak analysis for the repetition by knowledge interactions as well as main effects. For this analysis, the positive (in case of right sensors) and negative (in case of left sensors) peaks of the ERFs were identified per participant, condition, and sensor (restricted to the time window +/-150 ms around the peak latency of grand average ERFs, as well as restricted to 0 and 500 ms). In case of central sensors close to the midline (sensors MZC01, MZC03, MZC04, MZF01, MZF02, MZF03, MZO01, MZO02, MZO03, and MZP01), we separately decided whether to select the positive or negative peak, depending on which of the two peaks was absolutely higher in the ERFs averaged across participants. We decided against taking this approach in the majority of sensors because the ERFs typically declined during later time windows, in many cases reaching a value absolutely higher than the actual peak. Therefore, selecting the positive peak in the case of right sensors, and the negative peak in the case of left sensors, was the best compromise between automatic peak determination and avoidance of misplacing the actual peak value with a value that falls within the time range of decline of the ERF. We then subtracted the preceding peak value of respective other polarity (between stimulus onset and detected peak) from the already defined peak value. Statistical analyses were then performed on the absolute peak difference, using the cluster-based permutation procedure as described above, defining clusters solely based on spatial adjacency between sensors due to the lack of the temporal dimension. Given its independence from the pre-stimulus baseline, hash mark strings presented prior to the target cannot influence this analysis. However, a limitation of this analysis is that it cannot detect significant differences occurring at time ranges prior to and after the peak. Therefore, the results of this analysis did not influence whether a specific cluster was interpreted or not.

Results
During the MEG measurement, participants correctly identified 94.74 % of catch trials, indicating that they were attending to the presented letter strings.
Repetition suppression phenomenon. As manipulation check, we investigated the interaction between stimulus sequence (prime vs. target) and repetition congruency (repetition vs. non-repetition), combined over all knowledge conditions. Repetition trials (  4ab, see also Table 3 for a post hoc statistic controlling for OLD20 and number of syllables).
Post hoc LMMs revealed that semantic but not prelexical knowledge reliably modulated the Repetition suppression, thus, was stronger for words than for pseudowords.
Knowledge effects. We had assumed that effects of prelexical knowledge should be reflected in ERF differences between familiar pseudowords (since familiarization had temporally increased prelexical knowledge for these pseudowords only) and both novel pseudowords and words, while effects of semantic knowledge should be reflected in ERF differences between words and both novel and familiar pseudowords. Differences between the knowledge conditions, averaged across prime and target (i.e., representing main effects of knowledge), occurred at two topographic clusters: At left posterior sensors between 290 and 380 ms, familiar pseudowords elicited a larger negative ERF amplitude than both words and novel pseudowords ( Fig. 4def; Tables 4 and 4-1). At left frontal sensors between 330 and 380 ms, words elicited the largest negative-going ERF amplitude, followed by novel pseudowords, while familiar pseudowords elicited the smallest ERF amplitude ( Fig. 4ghi; Tables 4 and 4-1). Only the frontal cluster was qualified by a significant interaction with context effects, as revealed by the strong spatiotemporal overlap with the interaction cluster described above. Even in the post hoc analysis on the posterior cluster, no significant interaction between context and prelexical knowledge was found (Table 4).
Context effects. We here refer to the main effect of repetition based on the analysis of repetition trials only (i.e., unpredictable prime vs. predictable target) as reflecting pure context effects. Such effects were found in multiple time-windows, spanning time ranges from 80 to 690 ms after stimulus onset (Fig. 4jklm). More specifically, cluster C1 was significant from 80 to 170 ms at occipito-central sensors, cluster C2 from 150 to 180 ms at right central sensors, cluster C3 from 210 to 590 ms at bilateral frontal sensors, cluster C4 from 380 to 420 ms at bilateral temporo-central sensors, cluster C5 from 470 to 530 ms at right temporo-central sensors, and cluster C6 from 590 to 690 ms at occipital sensors. Within all observed clusters, the absolute amplitudes of ERFs were significantly reduced from prime to target presentation (see exemplary time course in Fig. 4k and box plot in Fig. 4l) with no differentiation of the knowledge conditions with the exception of cluster C3, which strongly overlapped in time and space with the interaction cluster (CxK; Fig. 4ab). See Fig. 4m for the scalp localizations of clusters C2-6. , and main effects of context (j-l; repetition trials only). Respective cluster topographies of the peak-to-peak analysis are shown in Fig. 4-1. A detailed overview of all clusters obtained with common baseline, separate baselines, and peak-to-peak analysis can be found in Fig. 4

Control analysis for baseline effects.
To evaluate the robustness of context by knowledge interaction effects against different choices of baselines, we performed an additional peak-topeak analysis (cf. Methods section for further details). Significant results from the peak-to-peak analysis strongly support the interaction between context and knowledge at left frontal sensors, the main effects of knowledge at left posterior and left frontal sensors, as well as main effects of context at bilateral frontal sensors, resembling the effects of clusters CxK, K1, K2, and C3 in Fig. 4 (see Fig. 4-1 and 4-2, including also further clusters from the peak-to-peak analysis). Due to the high similarity between standard baseline corrected ERF analysis and peak-to-peak analysis, we conclude that the presented results can be reproduced with a different analysis strategy and therefore are not artificially introduced via baseline correction.

Interim discussion
The current MEG data demonstrate that context and knowledge effects in visual word recognition are present irrespective of semantic information, i.e., also in the comparison of familiar vs. novel pseudowords. We found semantic-free context effects as early as 80 ms poststimulus onset. In the N400 time window, however, context effects were greatest for words, suggesting that semantic knowledge strongly enhances context effects. Semantic-free knowledge effects were found at left posterior sensors shortly before the context by semantic knowledge interaction at bilateral frontal sensors. In addition, at the same left posterior cluster no difference between novel pseudowords and words was found. We interpret this combined pattern as reflecting (a) the a-priori comparable orthographic similarity between pseudowords and words (due to the stimulus matching procedure) and (b) that the orthographic-phonological familiarization procedure temporarily altered processing in these prelexical word recognition systems. Surprisingly, the additional visual, orthographic and phonological information that was learned in the familiarization procedure did not result in an increased context effect found in brain potentials.
Nevertheless, in this first experiment we cannot rule out one further potentially confounding influence, i.e., that words and pseudowords not only differ in semantic knowledge but also in their word status. I.e., words are by necessity different since they were encountered prior to the experiment and thus rely on a lifelong in contrast to a recent two-day familiarization process. To systematically examine the role of word status and to obtain complementing behavioral data, a second repetition priming experiment was run.

Experiment 2
In addition to the knowledge manipulations of Experiment 1, we included a third group of pseudowords for which semantic associations were learned using a paired-association task. Note that both types of familiar pseudowords, i.e., with and without semantic associations, were visually/perceptually familiarized to the same degree during the learning period, so that these two experimental conditions only varied with respect to whether or not meaning could be associated with the pseudoword. Including this additional knowledge condition allowed us to examine potentially different roles of word status and the presence of semantic associations.
Response times were measured, after pseudoword familiarization, in a repetition priming paradigm. Participants had to indicate whether or not a letter string had a semantic association, which was true for words and for familiarized pseudowords with semantic associations, but not for novel and only perceptually familiarized pseudowords. As an additional manipulation, the probability of repetition (i.e., probability of prime and target being the same letter string) was varied across blocks to investigate if the priming effect increases when the context across trials allows predicting that the prime is highly likely identical to the target. This was shown previously (e.g., Olkkonen et al., 2017) and indicates that repetition effects can be explained best by top-down optimization in contrast to neuronal fatigue (Grill-Spector et al., 2006;Summerfield et al., 2008;2011;Grotheer and Kovács, 2014; see Discussion for additional details), as both local prime-target and global repetition probability can be integrated at a higher-level system to facilitate upcoming processing.  Breitenstein et al. (2007).

Participants
Six variants of the familiarization task were prepared, across which the assignment of the three pseudoword sets to the familiarized, i.e., familiar vs. semantic, as well as to the novel condition was varied (see Table 5). In addition, the assignment of the two object image sets to the familiarized pseudowords with and without semantic associations was varied. Note that for 18 of the 24 participants, the six experimental versions as well as the order of blocks and response hands in the repetition priming task (see below) were counterbalanced. In addition, six participants were included from the pilot investigation in which this was not the case (all had the same response hands and the initial block had a repetition probability of 25 %). Results did not differ qualitatively when these participants were included or not. Stimulus presentation procedures were identical to those of behavioral sessions of Experiment 1 (Fig. 1a), with the exception that the background was set to grey.

Pseudoword Familiarization
Participants performed five pseudoword familiarization sessions in the course of three consecutive days, i.e., two sessions each on day 1 and 2, and one session on day 3 (before the repetition priming task). Each session lasted about one hour, and participants could take a short break after the first half, as well as a mandatory one-hour break before the next session. Each session consisted of reading aloud each pseudoword (mean error rate across sessions: 1.4 %), a computer-based paired-association task including pseudowords and object images, and a naming task. While one set of pseudowords was familiarized prelexically as in Experiment 1, i.e., merely through repeated exposure ('familiar pseudowords'), one set was additionally associated with semantic information ('semantic pseudowords'). The paired-association procedure was adapted from previous studies (e.g., Breitenstein and Knecht, 2002;Breitenstein et al., 2007;Dobel et al., 2009), however using visual instead of auditory pseudowords and naturalistic photographs of objects instead of line drawings (see above). Furthermore, we used an explicit instead of an implicit learning instruction in order to establish strong associations between pseudowords and the assigned meanings.
Pseudowords were presented in random order for 800 ms, followed by an object image (horizontal and vertical visual angles 15.8°) for 1,500 ms or until response (Fig. 1c). During the ITI of 1,000 ms, two vertical black bars indicating the center of the screen where participants were asked to fixate were presented. Each pseudoword was presented four times in the first and four times in the second half of each session (960 trials in total per session). Semantic pseudowords were arbitrarily but consistently (i.e., six out of eight presentations) matched with object images, so that participants could learn to associate their meaning over the course of the familiarization sessions. This ratio was chosen so that despite successful learning, false alarms could be investigated which provide important information on participants' sensitivity. In contrast, familiar pseudowords were followed by a different object image in each trial.
Participants were asked to learn a meaning for the presented pseudowords based on the frequency with which the pseudowords were paired with certain object images. They were explicitly informed about the inconsistent pairings for half of the pseudowords. Participants were instructed to silently read the presented pseudowords and to respond as accurately and quickly as possible, whether a presented object image matched the preceding pseudoword or not.
In addition, they were encouraged to guess if insecure. Participants responded by pressing one of two buttons on a keyboard with either the left or right index finger. To prevent potential response biases, the assignment of response hand and response varied from trial to trial (by presenting a red bar indicating non-match on one side and a green bar indicating match on the other side of the object image). In the first familiarization session, participants completed a short practice block of ten trials before the start of the actual paired-association task.
In the naming task (Fig. 1d), each pseudoword from the paired-association task was presented once. Participants were instructed to name its associated object, if an association could be retrieved, or to respond "weiter" (German for "next") whenever this was not possible. The experimenter wrote down the participants' responses and logged the three possible responses (correct, incorrect, next) into the presentation software. Responses were considered correct whenever a name suitable for the corresponding object was provided (e.g., "cabin" instead of "barn"). Participants did not receive feedback.
LMMs ( (Fig. 2d). Importantly, participants also demonstrated high average accuracies of 95.69 % for semantic pseudowords in case they were presented with a non-matching object (Fig. 5b), indicating that their high performance for matching pseudowordobject combinations cannot be attributed to a response bias, i.e., responding "match" whenever a semantic pseudoword was presented.
In the pseudoword naming task, which was administered in the end of each

Repetition Priming
Following the fifth familiarization session on day 3, participants completed a repetition priming experiment after a break of at least one hour. Experimental procedures were analogous to those described for Experiment 1, with the following exceptions: Semantic pseudowords were presented as additional knowledge condition, and no catch trials were presented. The prime stimulus in each trial was preceded by 800 ms of hash mark presentation. The inter-trial interval varied between 800 and 1,200 ms. Furthermore, the repetition probability was varied across the three experimental blocks. 15 participants first completed a block with 25 % repetition probability, followed by 50 % in the second and 75 % in the last block; the remaining nine participants completed the blocks in the reverse order. Participants were informed about the repetition probabilities at the start of each block. Their task was to silently read the presented letter strings and respond as accurately and quickly as possible to the second letter string in each trial, whether they could explicitly associate a meaning or not (button presses on a keyboard with left/right index finger; dominant vs. non-dominant hand for yes-response: 13 vs. 11 participants, respectively). This task was chosen to elicit the same response for semantic pseudowords as for words. Each letter string (i.e., word or pseudoword) was presented once per block, either in the repetition or in the non-repetition condition. In total, 240 trials (60 per condition) were presented in each block. Letter strings were used at maximum twice for non-repetition trials; in this case, they were combined with two different letter strings. Prior to the task, eight practice trials were completed. The total duration of the priming task was around 45 min.

Analyses
Analogous to the analysis of the pseudoword familiarization procedure, behavioral data of the repetition priming task were analyzed using LMMs allowing random effects of both participant and items (prime and target stimulus) on the intercept, as well as analysis of imbalanced data (Baayen et al., 2008). We mainly focused on response times of correct responses, but also investigated accuracies using generalized LMMs with a binomial link function. Response times were log transformed to account for their skewed ex-Gaussian distribution.
We first performed an analysis with factors repetition congruency (repetition vs. nonrepetition trials) and repetition probability (25, 50 or 75 %). To assess prelexical and semantic contributions to behavioral context and knowledge effects, we investigated the four-way interaction between repetition congruency, repetition probability, prelexical, and semantic knowledge. Knowledge was entered as two factors coding prelexical (0: novel pseudowords and words; 1: familiar pseudowords with and without semantics) and semantic knowledge (0: novel and familiar pseudowords without semantics; 1: semantic pseudowords and words). Since context effects might override knowledge effects (Kretzschmar et al., 2015), we additionally investigated the three-way interaction between prelexical knowledge, semantic knowledge, and repetition probability in non-repetition trials only (i.e., in the absence of valid contextual information). Note that for repetition priming analyses, we set behavioral responses from the first block (i.e., with 75 % repetition probability) of one participant to NA, because she reported a misinterpretation of the task instruction that was clarified for the final two blocks.
All (generalized) LMMs included the interactions of all fixed effects described so far.
Since not all trials entered the analyses (due to miss trials and for the response time analysis due to exclusion of trials with incorrect responses), which might have affected the match across letter string conditions, OLD20 and number of syllables were included as additional fixed effects. All fixed effects were centered and scaled. For each significant interaction, pairwise differences between conditions were investigated using post hoc linear mixed models including only the relevant conditions. Behavioral data and analysis scripts are published under [to be filled in].

Results
To assess the influence of repetition congruency, repetition probability, and knowledge (prelexical vs. semantic) on behavioral performance, we mainly focused on response times. Still, the response accuracy data are described in the following. In the semantic association judgments of the repetition priming experiment, average accuracies for repetition trials were high across all repetition probabilities (86.9 %, 85.8 %, and 83.9 % for 75 %, 50 %, and 25 % repetition probability, respectively; Fig. 5-1a), as well as across all knowledge conditions with the exception of familiarized pseudowords with semantic associations (90.7 %, 88.1 %, 72.2 %, and 91.1 % for novel pseudowords, familiarized pseudowords without and with semantic associations, and words, respectively; Fig. 5-1b). The lower accuracy for semantic pseudowords reflects that participants did not establish a semantic association with all (but yet the majority of) pseudowords, which is also consistent with their performance in the final naming session (see Analyses section and Fig. 2f). As a consequence, we only used correct trials for the response time analysis. Accuracies in non-repetition trials were overall lower (82.6 %) compared to repetition trials (88.5; Fig. 5-1a). Statistical analyses of accuracies can be found in Tables 9-1 and 9-2.
Repetition priming. For a first manipulation check we investigated the influence of repetition probability on priming effects irrespective of knowledge conditions ( Fig. 5a and statistics in Table 6). Response times showed a significant interaction between context I.e., the priming effect (difference between repetition and non-repetition trials) was smaller for a repetition probability of 25 vs. 50 % (estimate = -0.040, SE = 0.0055, t = 7.36) and smaller for 50 vs. 75 % (estimate = -0.043, SE = 0.010, t = 4.26; Fig. 5a; Table 6). This finding indicates that context effects increase when they can be expected more reliably.
Knowledge effects. To investigate the influence of prelexical and semantic knowledge in the absence of valid contextual information, we focused on non-repetition trials. In contrast to the MEG analysis, we included knowledge effects related to prelexical familiarity and semantics as two separate factors, since prelexical knowledge was manipulated orthogonally to semantics (cf. Methods). In the following, we report the effects most relevant for our hypotheses, while Tables 7 and 8 provide a detailed overview of all statistical results. Repetition probability did not interact with prelexical or semantic knowledge (all t's < 1, including the 3-way interaction; see Table 7). However, we observed a significant interaction between prelexical and semantic knowledge. Post hoc tests revealed longest response times for pseudowords with semantic associations (all t's > 4 for post hoc contrasts of semantic pseudowords vs. the other three conditions; see Table 8 for details), reflecting the specific difficulty of retrieving semantics for a newly acquired vocabulary, particularly in case of unfulfilled expectations. This notion is also in line with the accuracy data (see Fig. 5-1b). However, faster response times for words compared to novel (estimate = -0.039, SE = 0.0064, t = 6.06) and familiar pseudowords (estimate = -0.022, SE = 0.0063, t = 3.52; Table 8) indicate facilitated processing of letter strings with both fully established semantic associations and word status. In addition, response times were faster for familiar vs. novel pseudowords (estimate = -0.016, SE = 0.0046, t = 3.49).

Combined knowledge and context effects. A significant interaction between repetition
congruency and semantic knowledge revealed stronger priming effects for letter strings with semantic associations in comparison to pseudowords without semantic associations (estimate = -0.033, SE = 0.0030, t = 11.13; Table 9). Critical here is that in repetition trials, the response times for pseudowords with associated semantics were lower than for the other pseudoword conditions (semantic vs. novel pseudowords: estimate = -0.048, SE = 0.0059, t = 8.11; semantic vs. familiar pseudowords: estimate = -0.051, SE = 0.0060, t = 8.52; Table 8) which differs from the response time pattern in non-repetition trials. This indicates that the involvement of semantic information increases context effects dramatically, even reversing knowledge effects found in the absence of context-based facilitation.

Discussion
Here, we tested if semantic information processing is a prerequisite for knowledge-and contextbased facilitation in reading. To examine the effect of knowledge, we compared the processing of words with pseudowords that were perceptually familiarized (i.e., at the prelexical level involving orthographic and phonological information; Experiment 1, MEG) and with pseudowords familiarized both perceptually and semantically (Experiment 2, behavior). The influence of context on word recognition was examined by using a repetition priming paradigm; context thus always refers to whether or not the identical stimulus was previously seen on the same trial. As expected, in Experiment 1 using MEG, we found strong neuronal repetition suppression (i.e., reduced neuronal activation to repeated events) and, in Experiment 2, strong behavioral priming effects (i.e., reduced response times to repeated events). Interestingly, Experiment 1 showed context-based facilitation, irrespective of knowledge, starting from 80 ms post stimulus onset at multiple posterior-central sensors. In addition, irrespective of context, prelexical knowledge elicited an increased negative response at left posterior sensors (starting at 300 ms) in contrast to novel pseudowords and words. In the behavioral study, we replicated the knowledge-independent context effects (i.e., across all knowledge conditions) as well as prelexical knowledge-based facilitation. Prelexical knowledge effects could be found only in non-repetition trials (i.e., prime not equal to the target) by faster response times for familiar compared to novel pseudoword targets. No additional context-based facilitation through prelexical knowledge could be identified in both experiments.
In contrast, when semantic information was present (i.e., for words or semantically familiarized pseudowords) context-based facilitation increased drastically. In Experiment 1, the strongest context effects were found at bilateral frontal sensors between 210 and 590 ms, and semantic information (i.e., available for words) increased the context effect in this time window.
In Experiment 2, context-based facilitation was strongest when semantic information was present (i.e., larger priming effects for pseudowords with semantic association and words). In addition, when context was unreliable (i.e., non-repetition trials), we identified a word status effect as words were recognized faster than pseudowords with semantic associations. Combined, our findings clearly show that semantic-free context and knowledge are implemented to facilitate word recognition. In addition, when semantic knowledge is present, context-based facilitation is even stronger.

Context-based facilitation and its relation to semantic knowledge
We expected that context-based facilitation can be found with and without semantic knowledge present but should be stronger for words, which was shown in behavioral studies. Both behavioral priming and neuronal repetition suppression were found for pseudowords (see also Deacon et al., 2004;Laszlo and Federmeier, 2007;Laszlo et al., 2012). Most likely, pseudoword priming is based on low-level visual information that is relatively similar across conditions (i.e., all stimuli consisted of the same alphabetic letters; cf. e.g., Grotheer and Kovács, 2014;Kok et al., 2012;Kok et al., 2014). In line with this notion is our finding that this context effect was present early (~80 ms) at central posterior topography.
Previous studies provided inconsistent results on the interaction of context and knowledge: When studying sentence level phenomena, some studies did and some did not find this interaction (e.g., Dambacher et al., 2006;Payne et al., 2015;van Petten and Kutas, 1990;1991vs. Kretzschmar et al., 2015Penolazzi et al., 2007, respectively; knowledge was operationalized as word frequency in these studies). The same is true for word/pseudoword priming studies exploring context by knowledge interactions (e.g., Almeida and Poeppel, 2013 vs. Deacon et al., 2004;Laszlo and Federmeier, 2007;Laszlo et al., 2012, respectively). Here, we found a stronger context effect for words, i.e., stronger lexical than pre-lexical priming.
Previous mixed results from sentence studies may be due to more ambiguities arising from the gradual increase of semantic context as a sentence unfolds. In addition, priming studies that did not find a context by knowledge interaction (Deacon et al., 2004;Laszlo and Federmeier, 2007;Laszlo et al., 2012), in contrast to the present and other studies finding this interaction (Almeida and Poeppel, 2013; match of orthographic similarity based on bigram frequency) did not explicitly control for the orthographic similarity of words and non-words. Therefore, we claim that priming paradigms like we used here allow a more systematic way to investigate context by knowledge interactions as the context manipulation can be stringently controlled and the equalized orthographic similarity controls for a critical confounding variable.
The context by knowledge interaction found in the present study is compatible with the well-established association of the N400 amplitude with context (i.e., neuronal repetition suppression; e.g., van Petten and Kutas, 1990) and semantic knowledge effects (e.g., Kutas and Federmeier, 2011). A novel finding of our study is that the increase of the behavioral priming effect is also present for pseudowords to which semantic knowledge was recently associated.
Interestingly, in the unprimed condition these pseudowords were processed slowest, which might indicate that the access to the newly learned semantics is hard, while accessing the semantic information allows context-based facilitation as strong as for words. This strongly suggests that the increased priming effect for words can be directly related to semantic information processing and not necessarily to the word status. In sum, our findings strongly suggest semantic-free context-based facilitation and an increase of this facilitation when semantic knowledge is present.

Knowledge effects
Prelexical familiarity is critical for the interpretation of our knowledge effects as we selected the letter strings such that prior to familiarization training, prelexical familiarity (i.e., OLD20; Yarkoni et al., 2008) and thus also prelexical processing, were comparable between the pseudowords and words. This match reflects that, a priori, orthographic processing difficulty was held constant across knowledge conditions and therefore should elicit similar activation strength in left posterior brain areas (as for example shown by Gagl et al., 2016or Vinckier et al., 2007 using a match with quadrigram frequency). Still, we expected (consistent with Glezer et al., 2015) that pseudoword familiarization, by mere repetition, facilitates prelexical processing of the learned pseudowords. Irrespective of context, we found enhanced MEG responses in left posterior regions (290 to 380 ms) for familiarized as compared to novel pseudowords and words but no differentiation between words and novel pseudowords. At left frontal sensors, 40 ms later, the N400 was lowest for familiarized pseudowords, intermediate for novel pseudowords and highest for words. This indicates a dissociation between prelexical and semantic knowledge effects in the N400 at frontal sensors. Convincingly, the posterior and frontal knowledge effects were found in locations and time windows previously described as highly relevant for visual word processing (e.g., Embick et al., 2001;Pylkkänen et al., 2002). In behavior, we found that both prelexical and semantic knowledge facilitates word recognition, demonstrated by faster response times for perceptually familiarized pseudowords and words compared to novel pseudowords in non-repetition trials. It can thus be speculated that the enhanced ERF negativation, at an earlier or later time window, is associated with facilitated recognition of letter strings based on prelexical and semantic knowledge, respectively.
From the sequence of effects and the different locations (earlier posterior and later frontal) we conclude that a succession of multiple computations is implemented. Topography and time window suggest that the posterior knowledge effect might reflect prelexical processing in brain regions near the so-called visual word form area (cf., e.g., Cohen et al., 2000;Vinckier et al., 2007). The first computation, performed by these brain regions, might be an orthographic decoding process similar to the implementation of the MROM model (Grainger and Jacobs, 1996). This model describes how words can be recognized based on the activation of an orthographic code, i.e., the representation of a word in the orthographic lexicon. Crucially, global activation within the orthographic lexicon reflects the orthographic decoding process and is based on activation of orthographic representations of the presented stimulus and its neighbors.
As a consequence, for words and pseudowords with highly similar orthographic characteristics (i.e., matched OLD20 of the words and novel pseudowords) the activation is likely similar since a comparable amount of orthographic neighbors is activated for both.
After the orthographic decoding process, we assume computations that reflect the access to and the activation of word meaning (i.e., lexical access). This is reflected by the later frontal knowledge effect showing enhanced activation for items present in the lexicon (i.e., words) in contrast to pseudowords. In addition, response times in non-repetition trials were fastest for words reflecting knowledge effects without context-based facilitation. For the interpretation of the lower activation for familiar in contrast to novel pseudowords at the same location and time window where the highest activation to words occurred we currently can only speculate since in behavior, in non-repetition trials both words and perceptually familiar pseudowords enabled faster semantic association judgments than novel pseudowords. One plausible explanation could be that the lower N400 for familiar compared to both novel pseudowords and words reflects a sooner termination of the attempt of lexical access. One could also assume that the lexicon search for words is terminated early, since no exhaustive search has to be performed, which should be reflected in lower activation for words vs. novel pseudowords. However, such an effect might be masked by the activation of word meaning, which is reflected by higher N400 activation (cf. e.g., increasing N400 with semantic richness; Rabovsky et al., 2012). This assumption, reflecting multiple parallel processes in the N400 time window, should be reassessed in future studies using e.g. regression-based accounts (e.g., see Laszlo and Federmeier, 2014), which might be able to differentiate the location in time and space of each process. Still, in general, the succession of prelexical orthographic and lexical access including semantic processing is implemented by most reading models (e.g., Coltheart et al., 2001;Dehaene and Cohen, 2011).

Theoretical implication: Predictive processing models
Reading behavior, including context and semantic knowledge effects, can be well described by an optimal Bayesian reader implementing a process that integrates all prior information (i.e., Norris, 2006). On the neuronal level, context-based facilitation is reflected in reduced neuronal activation as a consequence of predicting likely upcoming words (e.g., DeLong et al., 2014;Kuperberg and Jaeger, 2016). In general, reductions of neuronal activation as observed in the present study, i.e., repetition suppression, can be explained by multiple accounts (see reviews: Grill -Spector et al., 2006;Gotts et al., 2012). However, we here can discard accounts like fatigue, which assume that context-based facilitation is a pure bottom-up phenomenon without top-down influences (Grill-Spector et al., 2006), since our findings clearly show top-down influences by the effect of repetition probability in the behavioral experiment (see also Grotheer and Kovács, 2014;Mayrhauser et al., 2014;Summerfield, 2008;2011;Todorović et al., 2011).
In short, predictive coding assumes that an optimal use of neuronal resources is achieved by predicting upcoming sensory signals based on prior information (e.g., strong contextual constraints or prior knowledge). Specifically, bottom-up processing of the predictable portion of the sensory input is suppressed, whereas any (residual) unexpected information within the input is signaled to higher cortical levels for further processing. Thus, only unpredicted signals, also including noise, should be reflected in the bottom-up neuronal signal. Sharpening, in contrast, proposes that neuronal representations of expected sensory inputs are sharpened by suppressing activation of neurons not optimally coding the input, thus reducing noise and strengthening the signal. Rephrased, predictive coding reduces the signal and sharpening the noise for an expected stimulus.
The observed context effects of the present study (i.e., neuronal and behavioral repetition effects) are in line with both accounts. Nonetheless, the interaction pattern of context and knowledge allows to differentiate between the two theoretical accounts, because predictive coding assumes that the signal is suppressed. Accordingly, the differential pattern between knowledge conditions observed at the prime (i.e., N400 amplitude lowest for familiarized pseudowords, intermediate for novel pseudowords and highest for words) should vanish at the target (compare Blank and Davis, 2016;Kok et al., 2012;Richter et al., 2018). As described above this was the case only when semantic knowledge was included, suggesting that the assumptions of the predictive coding framework hold only when semantic processing is involved. For semantic-free knowledge, the evidence is much weaker since no significant interaction could be detected. The only clue we currently have can be derived from the post-hoc analysis of behavioral knowledge effects (Fig. 5b). Here it is shown that the differential pattern between the semantic-free knowledge conditions is present in non-repetition trials but no significant differentiation can be detected in repetition trials. This might weakly indicate a suppression of the effect consistent with a predictive coding-like process.
The differentiation of semantic and semantic-free knowledge might be explained by another characteristic of predictive coding models, i.e., the hierarchical structure through which predictions for upcoming events are passed down to sensory cortices. As a consequence, the knowledge of each level is integrated into the prediction, thereby increasing the precision of the prediction with each additional source of information. In the present case, all letter strings invoke prelexical processing, but only words and semantically familiarized pseudowords additionally contain semantic information. As a consequence, predictions informed by two (prelexical and semantic) instead of one source should lead to a more precise prediction and therefore a stronger prediction error reduction. Currently, we can only investigate this issue in our behavioral data, as only Experiment 2 contained pseudowords familiarized on the basis of both prelexical and semantic knowledge. A first indication of an additive facilitation by both prelexical and semantic knowledge can be seen in the response time difference between primed and unprimed letter strings. Here it is evident that semantic pseudowords were read slowest when not primed but about as fast as words when primed. The difference between unprimed and primed (i.e., the priming effect) was largest for these semantic pseudowords (i.e., semantic pseudowords: 276 ms; all other conditions: 142-198 ms). We interpret this finding as indicating that multiple sources are integrated in order to precisely predict an upcoming stimulus. Based on these findings, we propose a predictive coding-like process involved in facilitating visual word recognition.

Conclusion
The present repetition priming study showed that context-and knowledge-based facilitation of visual word recognition can be achieved when no semantic information processing is implemented. However, we found that the availability of semantic knowledge strongly increased context-based facilitation of visual word recognition in left frontal brain areas and behavior. For behavior, we could show that facilitation based on semantic knowledge is implemented even for pseudowords for which semantics were only recently associated. In sum, these results suggest that efficient reading is realized based on a predictive process that implements, most-likely, all sources of information present.   Note. Clusters found in both separate and common baseline analysis presented in the results section are marked in bold. For peak-to-peak analysis, the time range row represents the peak latency averaged across all conditions and significant sensors.    Note. Significant effects (i.e., t > 2) are shown in bold numerals. FE = fixed effect estimates, SE = standard error. Note. Novel = pseudowords first shown in the repetition priming task. Participants refers to the number of participants assigned to each version. Note. Significant effects (i.e., t > 2) are shown in bold numerals. FE = fixed effect estimates, SE = standard error. Note. Significant effects (i.e., p < 0.05) are shown in bold numerals. FE = fixed effects, SE = standard error. Note. Significant effects (i.e., t > 2) are shown in bold numerals. FE = fixed effects, SE = standard error. Table 9. Results from the linear mixed model analyses investigating repetition congruency, probability, familiarity, and semantics in log transformed response times in the repetition priming task (Experiment 2). Statistical analyses of accuracy data can be found in Tables 9-1 and 9-2. Note. Significant effects (i.e., p < 0.05) are shown in bold numerals. FE = fixed effects, SE = standard error.