論文作成ソフト『SCIgen(サイジェン)』の嘘を見抜け!! 〈93JKI07〉

サイジェン2ダウンロード米国マサチューセッツ工科大学(MIT)の研究者らが作ったソフト『SCIgen(サイジェン)』が話題となっている。無作為に選んだ工学・科学用語がちりばめられた、全くデタラメな論文を誰でも簡単に自動生成できるというものだ。実際にこのソフトで作成された論文が受理されて、コンファレンスで発表されたり、学術誌に掲載されたこともある、というから驚きだ。 

「SCIgen(サイジェン)」とは何だ?

SCIgenとは、無作為に選択した工学・科学用語をちりばめた一見それらしい論文を、誰もが簡単に作成可能なソフトウェアである。

当初は学会や学術会議などの「論文などの審査に関するいい加減な体制」を証明するために、2005年に米国マサチューセッツ工科大学(Massachusetts Institute of Technology、MIT)の研究者らによって作られたソフトだ。

このソフトで作成したデタラメな論文を、彼らが「WMSCI 2005」というコンファレンスに投稿したところ、その論文は正式に受理されてしまい、後にこの「でっちあげ」は公表された。その後、このことが研究者やプログラマーの間で話題になったそうである。

その後、一般公開されているこのソフトは世界中で使用され、偽の論文作成ツールとして有名になった。

今年(2014年)2月24日の英国「ネイチャー(Nature)」誌で、国際的な学会誌「シュプリンガー(Springer)」や「IEEE」の商用プラットフォーム内の会議録論文D/Bの中に、SCIgenにより生成されたデタラメな論文が120本以上も含まれていたことが報じられた。尚、現在はこれらの論文は削除されているとのことだ。

「SCIgen(サイジェン)」での論文作成

SCIgenは、参考・引用するための著書の作者名を5名まで入力して「生成する(Generate)」ボタンを押すだけで、適当なでっちあげ論文の作成が体験できる。試しにやってみたところ、「Evaluation of Courseware」という論文らしき文章が出来上がった。英語には疎いので詳しくは解らないが、読んでみると適当な言葉を繋ぎ合わせただけなんだろうけど、図表や注釈・引用もそれらしく出来上がっている。

以下は筆者がSCIgenで作成してみた論文である。最初に選択した5人の学術論文の著者には、ノーベル経済学賞を受賞した経済学者を無作為に選んだ。本来、慎重に学派や論調・発表理論の関係性などを検討した上で5名を選べば、よりそれらしい「デタラメ論文」が出来上がるのではないか、と思う。

選択した経済学者名:William Forsyth Sharpe, George Arthur Akerlof, Paul Robin Krugman, Thomas John Sargent and Douglass Cecil North

=============================================

Evaluation of Courseware

Kijidasu

Abstract

The theory solution to rasterization is defined not only by the emulation of multi-processors, but also by the structured need for redundancy. In fact, few leading analysts would disagree with the evaluation of Lamport clocks, which embodies the technical principles of hardware and architecture. In this work, we describe new real-time configurations (PumyYuen), validating that digital-to-analog converters can be made collaborative, read-write, and cacheable.

 

Table of Contents

1) Introduction
2) Related Work
3) PumyYuen Analysis
4) Implementation
5) Results and Analysis

5.1) Hardware and Software Configuration

5.2) Experiments and Results

6) Conclusion

 1.  Introduction

Recent advances in interposable models and interactive algorithms are often at odds with expert systems. In fact, few physicists would disagree with the study of wide-area networks, which embodies the appropriate principles of complexity theory. Furthermore, for example, many applications cache write-back caches. Nevertheless, Boolean logic alone will be able to fulfill the need for interactive modalities.
In this position paper we consider how Smalltalk can be applied to the refinement of 2 bit architectures. Existing signed and autonomous applications use trainable methodologies to construct ubiquitous methodologies . Two properties make this approach ideal: our approach requests the refinement of erasure coding, and also our application observes optimal communication. Two properties make this approach distinct: our heuristic is based on the principles of software engineering, and also PumyYuen prevents stochastic symmetries, without investigating link-level acknowledgements.

The rest of this paper is organized as follows. We motivate the need for IPv4. Further, we place our work in context with the prior work in this area . We demonstrate the visualization of A* search. As a result, we conclude.

 2.  Related Work

The concept of certifiable epistemologies has been explored before in the literature. Similarly, Richard Stearns  suggested a scheme for synthesizing Boolean logic, but did not fully realize the implications of the deployment of robots at the time . Similarly, the original approach to this issue by Qian and Bose  was satisfactory; contrarily, such a claim did not completely realize this intent. Continuing with this rationale, instead of studying Lamport clocks, we overcome this obstacle simply by improving the synthesis of expert systems that would allow for further study into kernels . Even though we have nothing against the previous method by Raj Reddy , we do not believe that approach is applicable to algorithms.

While we are the first to present Bayesian theory in this light, much related work has been devoted to the analysis of DHTs . Continuing with this rationale, Nehru and Anderson suggested a scheme for constructing reinforcement learning, but did not fully realize the implications of web browsers at the time. A recent unpublished undergraduate dissertation  constructed a similar idea for the Turing machine . Although Bose et al. also described this approach, we studied it independently and simultaneously.

 3.  PumyYuen Analysis

Suppose that there exists robots such that we can easily synthesize the construction of DNS. we hypothesize that efficient methodologies can investigate IPv7 without needing to measure autonomous information. This may or may not actually hold in reality. We believe that DHTs can be made pseudorandom, random, and stochastic. This seems to hold in most cases. Therefore, the framework that PumyYuen uses holds for most cases.

Figure 1: The relationship between our method and A* search .

Our solution relies on the compelling architecture outlined in the recent foremost work by Jones in the field of networking. This is crucial to the success of our work. The framework for PumyYuen consists of four independent components: Markov models, wireless symmetries, spreadsheets, and suffix trees. This seems to hold in most cases. Next, we postulate that the foremost atomic algorithm for the emulation of digital-to-analog converters is impossible.

Reality aside, we would like to harness a methodology for how PumyYuen might behave in theory. This may or may not actually hold in reality. Any important study of the visualization of expert systems will clearly require that the acclaimed self-learning algorithm for the understanding of online algorithms  is recursively enumerable; our system is no different. Although cyberinformaticians largely estimate the exact opposite, our framework depends on this property for correct behavior. On a similar note, we ran a 8-week-long trace confirming that our model is not feasible. This is a natural property of PumyYuen. Consider the early design by X. Suzuki et al.; our design is similar, but will actually solve this quagmire. We withhold a more thorough discussion for now. We use our previously visualized results as a basis for all of these assumptions.Figure 2: PumyYuen’s unstable improvement.
 4.  Implementation

After several weeks of arduous coding, we finally have a working implementation of our system. On a similar note, since our heuristic is optimal, programming the collection of shell scripts was relatively straightforward. Our approach requires root access in order to deploy reliable models. It was necessary to cap the hit ratio used by our algorithm to 67 celcius. The hacked operating system and the codebase of 80 x86 assembly files must run in the same JVM .

 5.  Results and Analysis

Systems are only useful if they are efficient enough to achieve their goals. In this light, we worked hard to arrive at a suitable evaluation method. Our overall evaluation seeks to prove three hypotheses: (1) that the location-identity split no longer adjusts performance; (2) that mean power stayed constant across successive generations of Atari 2600s; and finally (3) that voice-over-IP no longer toggles performance. Only with the benefit of our system’s RAM speed might we optimize for scalability at the cost of response time. Our logic follows a new model: performance is king only as long as scalability constraints take a back seat to expected hit ratio. We hope that this section proves to the reader B. White’s analysis of the transistor in 1986.

  5.1  Hardware and Software Configuration

Figure 3: The mean power of our system, as a function of clock speed.

Our detailed performance analysis mandated many hardware modifications. We performed a deployment on the KGB’s mobile telephones to measure the independently game-theoretic nature of unstable models. To start off with, we removed 25kB/s of Ethernet access from our introspective testbed. Second, we added 300 25GHz Pentium IIIs to our network to understand information. We added 200MB/s of Internet access to our reliable overlay network.

PumyYuen does not run on a commodity operating system but instead requires a provably exokernelized version of Microsoft Windows 98 Version 2.5.7, Service Pack 3. our experiments soon proved that automating our Ethernet cards was more effective than making autonomous them, as previous work suggested. All software was hand assembled using Microsoft developer’s studio with the help of Y. Bose’s libraries for opportunistically studying NeXT Workstations. Next, all of these techniques are of interesting historical significance; John McCarthy and David Johnson investigated a similar heuristic in 1993.Figure 4: The effective block size of PumyYuen, as a function of seek time.
 

  5.2  Experiments and Results

Is it possible to justify the great pains we took in our implementation? Unlikely. We ran four novel experiments: (1) we asked (and answered) what would happen if opportunistically disjoint active networks were used instead of write-back caches; (2) we measured USB key space as a function of floppy disk throughput on an Atari 2600; (3) we ran RPCs on 63 nodes spread throughout the 2-node network, and compared them against neural networks running locally; and (4) we ran symmetric encryption on 25 nodes spread throughout the planetary-scale network, and compared them against von Neumann machines running locally .

We first shed light on the second half of our experiments as shown in Figure 3. The data in Figure 4, in particular, proves that four years of hard work were wasted on this project. Along these same lines, the data in Figure 3, in particular, proves that four years of hard work were wasted on this project. These median energy observations contrast to those seen in earlier work , such as Scott Shenker’s seminal treatise on SCSI disks and observed effective ROM speed.

We have seen one type of behavior in Figures 4 and 3; our other experiments (shown in Figure 4) paint a different picture. Gaussian electromagnetic disturbances in our underwater overlay network caused unstable experimental results. Second, these clock speed observations contrast to those seen in earlier work , such as Charles Bachman’s seminal treatise on interrupts and observed effective optical drive space. These median popularity of I/O automata observations contrast to those seen in earlier work , such as X. Takahashi’s seminal treatise on spreadsheets and observed average instruction rate. Our intent here is to set the record straight.

Lastly, we discuss the first two experiments. Gaussian electromagnetic disturbances in our system caused unstable experimental results. Second, note the heavy tail on the CDF in Figure 4, exhibiting duplicated effective instruction rate. These distance observations contrast to those seen in earlier work , such as U. Kumar’s seminal treatise on checksums and observed effective RAM space.

 6.  Conclusion

 In conclusion, we described an analysis of sensor networks (PumyYuen), which we used to confirm that Web services and semaphores are mostly incompatible . We also introduced a novel application for the simulation of Smalltalk. we examined how RPCs can be applied to the understanding of neural networks. Further, we proved that the little-known reliable algorithm for the refinement of voice-over-IP by Deborah Estrin  is Turing complete. This is an important point to understand. one potentially limited flaw of our approach is that it might simulate wide-area networks; we plan to address this in future work.

 

✱References

[1]C. Leiserson, T. J. Sargent, and D. Ritchie, “Analyzing a* search using concurrent archetypes,” Journal of Homogeneous, Encrypted Models, vol. 25, pp. 20-24, Oct. 2001.
[2]E. Feigenbaum, “Journaling file systems considered harmful,” in Proceedings of the Symposium on Optimal, Trainable Symmetries, Apr. 2001.

[3]Z. Takahashi, V. Shastri, C. A. R. Hoare, A. Newell, and G. Balachandran, “Congestion control considered harmful,” Journal of Interposable, Permutable, Trainable Archetypes, vol. 71, pp. 1-17, Oct. 1994.

[4]D. Brown, “Emulating cache coherence using secure models,” in Proceedings of JAIR, Oct. 1998.

[5]V. Ramasubramanian and R. Milner, “Improving access points using concurrent models,” Journal of Relational, Homogeneous Symmetries, vol. 43, pp. 20-24, Apr. 2005.

[6]A. Yao, D. Estrin, C. Lee, M. Gupta, E. Robinson, and J. Backus, “GamyFigment: Exploration of web browsers,” Journal of Constant-Time, Collaborative Methodologies, vol. 58, pp. 54-60, Apr. 1993.

[7]S. Hawking and V. Johnson, “ANT: Real-time, robust technology,” Journal of Reliable, Event-Driven Theory, vol. 65, pp. 47-59, Dec. 2002.

[8]R. Agarwal, “Emulating replication and XML,” in Proceedings of PLDI, Feb. 2002.

[9]D. Jackson, “Lasket: A methodology for the evaluation of e-commerce,” in Proceedings of ASPLOS, July 2002.

[10]a. U. Maruyama, O. Johnson, and O. Ashok, “Metamorphic configurations for 802.11b,” in Proceedings of the Symposium on Wireless, Event-Driven, Decentralized Technology, Aug. 1980.

[11]K. Iverson and I. Maruyama, “A case for SCSI disks,” Journal of Perfect, Adaptive Algorithms, vol. 37, pp. 1-10, Feb. 2004.

[12]V. Jones, “NonplusEnema: Synthesis of IPv6,” Microsoft Research, Tech. Rep. 968/387, Jan. 2005.

[13]H. Zheng and I. Newton, “Anteroom: A methodology for the improvement of spreadsheets,” in Proceedings of the Symposium on Event-Driven Modalities, Apr. 1992.

[14]R. Tarjan, “Randomized algorithms considered harmful,” Journal of Unstable, Lossless Technology, vol. 35, pp. 78-91, July 1999.

[15]M. Welsh, E. Schroedinger, A. Yao, F. Garcia, P. I. Jones, and V. Zhou, “Studying lambda calculus and XML with Truck,” in Proceedings of the Workshop on Signed Epistemologies, June 1999.

[16]T. Ganesan and R. Rivest, “Decoupling e-business from gigabit switches in DHTs,” in Proceedings of ECOOP, Sept. 1997.

[17]K. Thompson, “An evaluation of public-private key pairs that paved the way for the development of 802.11b,” Journal of Wearable, Atomic Information, vol. 48, pp. 58-67, Dec. 2004.

[18]T. Brown and T. J. Sargent, “Deconstructing active networks with AsepticVertex,” NTT Technical Review, vol. 1, pp. 150-194, Mar. 2004.

 

注意:当初作成した時点ではグラフや図表も自動生成されていましたが、一度リンク切れが発生した後、一部が表示されなくなりました。
=============================================

 問い直される「査読審査」

では、SCIgenで作られたデタラメな論文を見抜くにはどうしたらよいのだろうか?

通常、論文の審査については査読という行為が行なわれている。査読とは、学術誌などに投稿された論文を専門家が読み、その内容を査定することだ。

SCIgenにより作られた論文を発見する技術を開発したフランスの研究者 Cyril Labbé氏によると、偽論文の中には、一見より本物らしく見せるために、序章や結論を人の手で書き換えているものもあるという。このことにより、SCIgenで作成されたデタラメな論文がより本物に近づき、査読をする側にとってはやっかいな代物となってしまう。

Labbé氏は、 「シュプリンガー(Springer)」誌に多くのSCIgen作成の偽論文が掲載されたことで、査読審査の仕組みに対する信頼性が裏切られたと指摘するが、査読に関わる専門家らにとって論文の正否の判断は極めて困難なものとなってきた。

そもそも査読の難しさについて、SCIgenの関与のいかんにかかわらず、東京大学 医科学研究所の上昌広特任教授は、「査読は難しい。意図を持って一部の人がバレないようにやるわけなので、見たらわからないですよね」と、デタラメな論文を見抜くことの難しさをTV番組で語っていた。

ちなみに、 Labbé氏は2010年にSCIgenを利用して、架空の科学者「アイク・アントカーレ(Ike Antkare)」が書いたとする論文を102件作成して、学術論文の検索サイト「Google Scholar」にそれらを登録した。

アントカーレはしばらくの間、「世界で最も多く引用された科学者」リストの21位にランクされていた。この順位は、その時36位だった物理学者アルバート・アインシュタイン(Albert Einstein)よりもはるかに上位であった。

それだけSCIgenの完成度が高いのか、査読審査を行う側のレベルが低いのか?

 

「シュプリンガー(Springer)」誌によると、科学の文献発行を行う分野は「詐欺や過ちに対する免疫性がない」という。性悪説に傾くのは誠に残念であるが、とにかく査読の技量の向上と審査にあたる人々の心構えの確立が急務であろう。

尚、Labbé氏はある論文がSCIgenで作成されたものかどうかをチェックできるツールも公開している、という。このツールで、例の「STAP細胞」に関する論文を調べてみたらどうなるのだろうか・・・。

-終-

 

【続報】

7月4日、科学誌『ネイチャー(Nature)』のニュースサイトに、「STAP細胞」に関する論文が撤回に至った経緯についての検証記事が掲載された。

過去の論文不正の事件についての教訓が生かされず、不十分な審査で論文を掲載してしまったと記述している。

同誌は2005年に発覚したソウル大学教授の論文捏造(ねつぞう)を以降、「インパクトが大きい内容の論文は、より厳密にチェックする」との方針を定めたが、同誌は編集部も査読者も当該論文の致命傷となる問題を見抜けなかったと述べた。

また、「内容の精査より出版を急いだ」とか「論文の共著者の名声をもとに掲載を決めた」と指摘する第三者の科学者の意見も掲載されている。

編集部は、表現の盗用を調べる専用ソフトで事前に「STAP細胞」に関する論文をチェックしたが、引用元とされる論文が該当データベースになかったため、引用した部分を発見できなかったという。また画像はチェック対象に含まれておらず、今後は画像チェックの対象数を増やす予定という。

 

 

《スポンサードリンク》